Environmental Proteomics Reveals Taxonomic and Functional Changes in an Enriched Aquatic Ecosystem

Environmental proteomics reveals taxonomic and functional changes in an enriched aquatic ecosystem

The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters

Citation Northrop, Amanda C., Rachel K. Brooks, Aaron M. Ellison, Nicholas J. Gotelli, and Bryan A. Ballif. 2017. “Environmental Proteomics Reveals Taxonomic and Functional Changes in an Enriched Aquatic Ecosystem.” Ecosphere 8 (10) (October): e01954. doi:10.1002/ ecs2.1954.

Published Version 10.1002/ecs2.1954

Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:34389684

Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA 1 Environmental proteomics reveals taxonomic and functional changes in an enriched

2 aquatic ecosystem

1 1 2 3 Authors: Amanda C. Northrop , Rachel Brooks , Aaron M. Ellison , Nicholas J.

1* 1* 4 Gotelli , and Bryan A. Ballif

5 Affiliations:

1 6 Department of Biology, University of Vermont, Burlington, VT 05405, USA.

2 7 Harvard Forest, Harvard University, Petersham, MA 01366, USA.

8 *Corresponding authors. E-mail: [email protected] (B.A.B); [email protected]

9 (N.J.G.)

11 Conflict of Interest

12 The authors declare no conflict of interest.

1 15 Abstract

16 Aquatic ecosystem enrichment can lead to distinct and irreversible changes to undesirable

17 states. Understanding changes in active microbial community function and composition

18 following organic-matter loading in enriched ecosystems can help identify biomarkers of

19 such state changes. In a field experiment, we enriched replicate aquatic ecosystems in the

20 pitchers of the northern pitcher plant, Sarracenia purpurea. Shotgun metaproteomics

21 using a custom metagenomic database identified proteins, molecular pathways, and

22 contributing microbial taxa that differentiated control ecosystems from those that were

23 enriched. The number of microbial taxa contributing to protein expression was

24 comparable between treatments; however, taxonomic evenness was higher in controls.

25 Functionally active bacterial composition differed significantly among treatments and

26 was more divergent in control pitchers than enriched pitchers. Aerobic and facultative

27 anaerobic bacteria contributed most to identified proteins in control and enriched

28 ecosystems, respectively. The molecular pathways and contributing taxa in enriched

29 pitcher ecosystems were similar to those found in larger enriched aquatic ecosystems and

30 are consistent with microbial processes occurring at the base of detrital food webs.

31 Detectable differences between protein profiles of enriched and control ecosystems

32 suggest that a time series of environmental proteomics data may identify protein

33 biomarkers of impending state changes to enriched states.

35 Key words: aquatic ecosystems; bacterial communities; environmental proteomics;

36 model ecosystem; organic matter enrichment; Sarracenia purpurea.

2 37 Introduction

38 Chronic and directional environmental drivers such as nutrient and organic matter

39 enrichment are causing state changes in many ecosystems (Rabalais et al. 2009, Scheffer

40 2009). Mitigating or preventing these state changes requires predicting them with

41 sufficient lead-time (Biggs et al. 2009). Current prediction methods rely on the statistical

42 signature of “critical slowing down” (Scheffer et al. 2009) – an increase in the variance or

43 temporal autocorrelation of a state variable (Dakos et al. 2015). However, such indicators

44 usually require long time series of data with frequent sampling of an appropriate state

45 variable (Bestelmeyer et al. 2011, Levin and Mollmann 2015). Even when such data are

46 available, the signature of critical slowing down may not provide enough lead-time for

47 intervention (Biggs et al. 2009, Contamin and Ellison 2009).

48 In aquatic systems, water quality indicators such as total suspended solids

49 (Hargeby et al. 2007), submersed macrophyte vegetation cover (Dennison et al. 1993,

50 Sondergaard et al. 2010), diatom composition (Pan et al. 1996), and phytoplankton

51 biomass (Carpenter et al. 2008) often are used as state variables. However, whether top-

52 down or bottom-up forces initiate the change, the proximate cause of eutrophication in

53 many freshwater aquatic ecosystems is microbial processes associated with the

54 breakdown of detritus (Chrost and Siuda 2006). A primary reason that it has been

55 difficult to forecast shifts with sufficient lead-time may be that changes in monitored

56 variables lag behind the microbial processes that underlie state changes. We hypothesize

57 that biomarkers linked closely to microbial function, such as proteins, may serve as better

58 early warning signals of impending state changes than traditional aquatic ecosystem

59 biomarkers.

3 60 One of the challenges to studying aquatic ecosystem state changes is the lack of

61 replicable natural ecosystems that can be ethically manipulated. Recently, we have

62 identified the aquatic ecosystem that assembles in the cup-shaped leaves of the northern

63 pitcher plant Sarracenia purpurea as a model system for identifying whole-ecosystem

64 microbial processes associated with detrital enrichment. Each leaf functions as an

65 independent ecosystem that can be experimentally enriched and monitored through time

66 in the field or lab (Srivastava et al. 2004). Arthropod prey, mostly ants and flies, form the

67 base of a “brown” food web that includes dipteran larvae, protozoa, mites, rotifers, and a

68 diverse assemblage of bacteria that decompose and mineralize nearly all the captured

69 prey biomass (Ellison et al. 2003, Butler et al. 2008, Koopman and Carstens 2011, Gray

70 et al. 2012). Even in the absence of macroinvertebrates, the dominant transfer of nutrients

71 to the plant occurs via microbial activity (Butler et al. 2008). With excess organic matter

72 loading, microbial activity increases, pitcher fluid becomes turbid, and oxygen levels

73 collapse to hypoxic conditions even during daytime photosynthesis (Sirota et al. 2013).

74 Such consequences are similar to those seen in larger aquatic ecosystems that have

75 switched from a green to a brown food web dominated by detritivores, as an initial

76 increase in primary production leads to internal organic-matter loading and increasing

77 biological oxygen demand as primary producers decompose (Correll 1998).

78 In the last decade, environmental proteomics has emerged as a powerful tool to

79 measure microbial community function in a variety of aquatic habitats, including

80 contaminated groundwater (Benndorf et al. 2007), coastal upwelling systems (Sowell et

81 al. 2011), estuaries (Colatriano et al. 2015), and meromictic lakes (Lauro et al. 2011).

82 Additionally, environmental proteomics has promise as a tool for identifying biomarkers

4 83 of changing environmental conditions, including aquatic pollution (Campos et al. 2012,

84 Ullrich et al. 2016). Environmental proteomics looks at the complete set of proteins

85 expressed in an ecosystem at a single time point and gives insight into the function of a

86 community. While metatranscriptomics also serves as an important tool for

87 understanding community function, mRNA and protein levels are generally not strongly

88 correlated (Vogel and Marcotte 2012); this is especially true for bacteria in perturbed

89 systems (Jayapal et al. 2008). Therefore, metaproteomics may provide a more accurate

90 picture of bacterial community function in enriched aquatic habitats.

91 As a first step toward determining the utility of microbial protein biomarkers as

92 early warning signals of state changes we conducted an environmental proteomics screen

93 of the aquatic ecosystem in S. purpurea pitchers enriched with organic matter to

94 determine whether there are detectable differences between the proteins, associated

95 molecular pathways, and taxa contributing to expressed proteins in microbial (nonviral

96 organisms <30 µm) communities in enriched vs. control ecosystems. We hypothesized

97 that an environmental proteomics survey would reveal detectable differences in taxa

98 contributing to protein expression, proteins, and functional pathways between enriched

99 and control ecosystems. We expected to find differences between control and enriched

100 pitchers in pathways related to respiration and decomposition, changes in the oxygen

101 requirement of microbes contributing to expressed proteins, and shifts in the taxonomic

102 composition of microbes contributing to protein expression. Specifically, we predicted an

103 abundance of contributing anaerobic bacteria in enriched pitchers relative to controls. We

104 identified and found detectable differences in taxa, proteins, and pathways common to a

105 wide range of aquatic ecosystems. Our results suggest that environmental proteomics can

5 106 be a useful tool for detecting alternative enriched and unenriched states in aquatic

107 ecosystems and may serve as a means to identify protein biomarkers of impending shifts

108 between such states.

109

110 Methods

111 Enrichment Experiment

112 The field experiment was conducted in Tom Swamp, a nutrient-poor fen located at the

113 northern end of Harvard Pond (42.51 N, −72.21 W) at Harvard Forest, Worcester County,

114 Massachusetts. Newly opened pitchers were identified and randomly assigned to an

115 ambient control or detritus-enriched treatment (Appendix S1). Previous work by Peterson

116 et al. (2008) using culture-independent methods revealed that newly opened pitchers are

117 sterile and impermeable to bacteria, so we are reasonably sure that our experimental

118 pitchers did not harbor diverse bacterial communities prior to the start of the experiment.

-1 -1 119 Detritus-enriched pitchers received 1 mg ml d of oven-dried, finely ground wasps

120 (Dolichovespula maculata) (Appendix S1), which have elemental ratios (C:N, 5.99:1,

121 N:P:K, 10.7:1.75:1.01) similar to those of Sarracenia’s natural ant prey (C:N, 5.9:1;

122 N:P:K, 12.1:1.52:0.93) (Farnsworth and Ellison 2008). Proteomic analysis of the ground

123 wasp (not reported here) failed to identify microbial proteins, so we are confident that

124 microbial contribution to enriched pitchers from the wasps was minimal. Enrichment

125 treatments were applied for 14 consecutive days; all pitchers were otherwise

126 unmanipulated. Pitcher fluid was sampled on the first and last days of the experiment,

127 filtered to remove microbes > 30 µm, pelleted, and stored at −80 °C until processed

128 (Appendix S1).

6 129

130 Protein Extraction, SDS-Page, and Mass Spectrometry

131 Six of ten replicate microbial pellets from each treatment yielded enough protein

132 for analysis via tandem mass spectrometry. All replicates were analyzed separately using

133 SDS-PAGE and Coomassie staining (Fig. 1, Appendix 1: Fig. S1a, and Appendix 1: Fig.

134 S1b). All six of the enriched pitchers and five of the six control pitchers had visible

135 protein staining levels and were chosen for mass spectrometry. Proteins were subjected to

136 a tryptic digest (Appendix S1) and to LC-MS/MS as previously described (Cheerathodi

137 and Ballif 2011) using a linear ion trap mass spectrometer (Thermo Electron, Waltham,

138 MA, USA). MS/MS spectra were matched to peptides in a custom protein database using

139 SEQUEST software as described below.

140

141 Custom Metagenomic Databases

142 We generated a custom protein database from a six-frame forward and reverse translation

143 of a metagenomic database constructed from microbial communities of three previously

144 collected pitchers that had captured diverse amounts of prey (Appendix 1: Fig. S2).

145 Pitchers were collected from Molly Bog, an ombrotrophic bog located in Morristown, VT

146 (44.50 N, -72.64 W) on 18 August 2008 and transported in a cooler directly from the

147 field to the University of Vermont. Microbial pellets were obtained immediately as

148 described above. DNA was extracted, prepared, and sent for library construction,

149 sequencing, and assembly to Genome Quebéc (Montréal, QC, Canada) with the 454 GS-

150 FLX Titanium Sequencing System (Roche) (Appendix S1). Contigs were assembled de-

151 novo with Roche’s Newbler assembler v2.3 (release 091027_1459) using default

7 152 parameters (minimum Read Length = 20; overlap Seed Step = 12; overlap Seed Length

153 =16; overlap Min Seed Count = 1; overlap Seed Hit Limit = 70; overlap Min Match

154 Length = 40; overlap Min Match Identity = 90; overlap Match Ident Score = 2; overlap

155 Match Diff Score = -3; overlap Match Unique Thresh = 12; map Min Contig Depth = 1;

156 all Contig Thresh = 100), with the exception of minimum read length (20 bp) and overlap

157 Hit Position Limit (1,000,000). The assembled contigs were imported into MG-RAST

158 4.0.2 to assess functional and taxonomic potential (Meyer et al. 2008). Taxonomic

159 assignments were visualized using the Krona plugin and the following cutoffs were

160 applied to both taxonomic and subsystem functional category assignments: minimum

-5 161 identity = 60%, e-value of 1 x 10 or less, and a minimum alignment length of 15 bp

162 (Appendix 1: Fig. S3). We calculated Hurlbert’s probability of an interspecific encounter

163 (PIE) to estimate the evenness of bacterial classes in the metagenome (Hurlbert 1971)

164 (Appendix S1). KEGG pathways (level 2 and level 3) were assigned to contigs using the

165 KEGG database via MG-RAST (we report only the top 73 level 3 pathways

166 here)(Appendix 1: Fig. S4).

167 A metaproteomic database was created with a six-frame forward and reverse

168 translation of the assembled metagenome using open-source Ruby software. Sequences

169 with greater than 100 amino acids (n=184,128) in length were retained. A decoy database

170 was constructed by reversing the retained sequences and concatenating them to the

171 forward database to allow for an estimation of the false discovery rate as has been

172 described (Elias and Gygi 2007).

173

174 Protein Orthologue Identification

8 175 Peptide and protein identifications were made via a SEQUEST search of the tandem mass

176 spectral data against the custom pitcher-plant microbial community protein database

177 described above (Appendix S1). The number of protein hits varied substantially among

178 replicates, so to have enough proteins for treatment comparisons, peptides and proteins

179 from the five control samples and six enriched samples were pooled after LC-MS/MS

180 and the SEQUEST search into a single control and a single enriched sample dataset. The

181 doubly- and triply-charged peptide ions were further considered and each dataset was

182 filtered by first adjusting the cutoffs for XCorr and ΔCn until the false discovery rate was

183 < 10%. The final filters were: Xcorr ≥ 3.0 for doubly-charged ions, Xcorr ≥ 3.3 for triply-

184 charged ions and unique Δcorr ≥ 0.15. The resulting list of protein hits for each treatment

185 was then ranked by unique number of peptides and the top 220 proteins from each

186 treatment were selected so that the false discovery rate for control and enriched

187 treatments were 6.6% and 0%, respectively. These top 220 proteins and their associated

188 peptides are found in Data Supplement S1.

189 In the list of control peptides, a protein hit from the decoy database was

190 represented by 25 total peptides; therefore, we suspected that this hit was a true positive

191 not represented in our target database. However, a BLAST search of the full amino acid

192 sequence did not yield an identical match, so we cannot definitively claim it is a true

193 positive; therefore, we removed this peptide from our top 220 list of control peptides.

194 With this peptide removed, the false discovery rate for the control treatment was 4.3%.

195 All peptide hits were pooled within treatments and mapped back to their source

196 sequences in the custom protein database. Those source sequences were imported in fasta

197 file format into blast2go v.2.8.0 (Conesa et al. 2005) for identification and annotation

9 198 using the following configuration settings: blastp program, Blast Expect Value of 1.0E-3,

199 10 Blast Hits, Annotation CutOff >55, GO Weight >5.

200

201 Analysis of the Top Proteins Shared Between Treatments

202 A randomization test was done using R Studio (v. 0.98.1059) to test the hypothesis that

203 there was a single common protein pool for both the control and enriched treatments and

204 that the number of observed shared proteins between treatments reflects chance effects

205 resulting from random draws from this single protein pool (Appendix S1). We conducted

206 an additional simulation in R to determine the likelihood of a Type I error in our

207 randomization test (Appendix S1).

208

209 Comparison of the Top 20 Proteins from Each Treatment

210 We downloaded the sequence by annotation file from the blast2go search for each

211 treatment to get the protein names associated with each protein hit (sequence description

212 in blast2go). Each of the top 220 identified proteins in each treatment, ordered by the

213 number of total peptides associated with the protein hit, was matched to a protein name

214 using R software. If multiple protein hits within a treatment matched a single protein

215 name, the protein names were merged in silico and the total peptides representing them

216 were summed. Protein names were ranked in order of the abundance of total peptides for

217 each treatment.

218

219 Taxonomic Analysis

10 220 To determine the taxonomic composition of the microbes contributing to identified

221 proteins in our treatments, we conducted a BLAST homology search of the metagenomic

222 sequence data for protein hits. All peptides from the top 220 identified proteins in each

223 treatment were mapped back to their contigs of origin to obtain nucleotide sequences.

224 Because contigs were at least 500 base pairs in length, we felt confident that a BLAST

225 search of the nucleotide sequences would yield correct taxonomic identifications at

226 course taxonomic levels and acknowledge that ambiguity can remain in the taxonomic

227 identification from a metacommunity at genus and species levels. The top BLAST hit

228 was retained for each nucleotide sequence associated with an identified protein and

229 linked to a bacterial class (Appendix S1). For each bacterial class identified, a 2×2

230 contingency table was created with treatments as columns and the number of peptides

231 associated and not associated with the taxon as rows. A chi-square test was then used to

232 determine if the abundance of the bacterial class was significantly different between

233 treatments. All P values were adjusted using the Benjamini-Hochberg method (Benjamini

234 and Hochberg 1995) (Table 1). Species composition was visualized using Krona (Ondov

235 et al. 2011) (Appendix 1: Fig. S5). In addition to the BLAST homology search, we used

236 Unipept (Mesuere 2016) to map tryptic peptides to the UniprotKB database and retrieve

237 the least common taxonomic ancestor (= most derived shared taxonomic node) associated

238 with each peptide for pooled replicates (Appendix 1: Fig. S6). We calculated Hurlbert’s

239 PIE to estimate the evenness of bacterial classes contributing to expressed proteins in

240 control and enriched pitchers (Hurlbert 1971) (Appendix S1).

241

242 Functional Analysis

11 243 Functional pathways (two levels) associated with each identified protein from each

244 treatment were retrieved using the KEGG (Kyoto Encyclopedia of Genes and Genomes)

245 (Kanehisa et al. 2014) mapping function of blast2go v.2.8.0. Each pathway was weighted

246 by the total number of peptides associated with protein hits, or the number of spectral

247 counts, mapping to that pathway (Appendix 1: Fig. S7). For each pathway identified, a

248 2×2 contingency table was created with treatments as columns and the number of

249 peptides associated and not associated with the pathway as rows. A chi-square test was

250 used to determine if each pathway was significantly over- or under-represented in

251 enriched pitchers relative to controls. All P values were adjusted using the Benjamini-

252 Hochberg method (Benjamini and Hochberg 1995) (Appendix 1: Table S1).

253 To determine whether bacteria contributing to expressed proteins in control and

254 enriched ecosystems differed in their O2 requirements, we mapped each bacterial species

255 identified in our BLAST search to its O2 requirement using data from the Integrated

256 Microbial Genomes database (IMG) (Timinskas et al. 2014, Reddy et al. 2015)

257 (Appendix S1, Fig. ). The IMG database contains 6 classes of O2 requirements: aerobe,

258 anaerobe, facultative, microaerophillic, obligate aerobe, and obligate anaerobe. The latter

259 three categories make up less than 7% of the database. We merged any species classified

260 as obligate aerobes or obligate anaerobes into the aerobe and anaerobe classes,

261 respectively.

262

263 Analysis of Unpooled Data

264 In addition to analyzing pooled data, we used ordination and permutation analyses to

265 determine the effect of enrichment on microbial community protein expression,

12 266 taxonomic contribution to expressed proteins at the class and family levels, and KEGG

267 pathways. We tested the similarity within and among replicates of control and enriched

268 microbial communities using ADONIS, a nonparametric permutation test in the ‘vegan’

269 package (v. 2.4.1) in R (Oksanen et al. 2016). We used a multivariate homogeneity of

270 group dispersions test (betadisper function in the ‘vegan’ package) to determine if the

271 composition of contributing microbial taxa was more divergent in control replicates than

272 in enriched replicates. The permutation tests used 999 permutations and were done using

273 total peptide counts associated with protein identifications, microbial classes, microbial

274 families, and KEGG pathways (Table 2). To visualize the similarities among replicate

275 ecosystems, we used the ‘vegan’ package function metaMDSto perform non-metric

276 multidimensional scaling (NMDS) ordination using Bray-Curtis distances. Data were

277 square-root transformed and standardized using Wisconsin double standardization. To

278 determine which taxa contributed the most to Bray-Curtis dissimilarity of taxa

279 contributing to protein expression between the treatments, we did a similarity percentages

280 test using the simper function in the ‘vegan’ package.

281

282 Results

283 From 243 Mb of DNA sequence information, roughly 54% of 567,549 filtered

284 reads (median read length=482 bp) were assembled into 26,713 contigs ranging from 500

285 to 43,200 bp (N50=1135) (Appendix 1: Fig. S2b, Appendix 1: Fig. S2c). All the contigs

286 passed MG-RAST quality control. The metagenome was dominated by bacteria (99.11%)

287 at the domain level. The top five bacterial classes were Betaproteobacteria (31.99%),

288 Alphaproteobacteria (19.42%), Sphingobacteria (13.32%), Gammaproteobacteria

13 289 (10.10%), and Acidobacteria (7.04%). The top five genera comprising the genome were

290 Burkholderia (8.87%), Variovorax (6.50%), Pedobacter (5.24%), Mucilaginibacter

291 (4.04%) and Lutiella (3.91%). Within the metagenome, 23% of aligned contigs were

292 mapped to the order Burkholderiales while only 7% mapped to Neisserialies (Appendix

293 1: Fig. S3). Taxonomic evenness of the metagenome, calculated using Hurlbert’s PIE,

294 was equal to 0.79.

295 Representation of the contigs mapping to functional pathways was dominated by

296 amino acid metabolism (20.6%), followed by membrane transport (12.9%), carbohydrate

297 metabolism (11.9%), translation (7.2%), and metabolism of cofactors and vitamins

298 (6.4%). Within amino acid metabolism, pathways were represented primarily by glycine,

299 serine, and threonine metabolism (17.1%), alanine, aspartate, and glutamate metabolism

300 (13.8%), and valine, leucine, and isoleucine degradation (12.7%). Membrane transport

301 was represented by ABC transporters (78.2%), bacterial secretion system (19.4%), and

302 phosphotransferase system (PTS) (2.4%). Carbohydrate metabolism was dominated by

303 pyruvate metabolism (13.9%), glycolysis/glucogenesis (12.6%), and pentose phosphate

304 pathway (11.6%). Overall, the top 5 level 3 KEGG categories included ABC transporters

305 (10.1%), two-component system (4.8%), aminoacyl-tRNA biosynthesis (3.8%), glycine,

306 serine, and threonine metabolism (3.5%), and ribosome (3.3%) (Appendix 1: Fig. S4).

307 We identified a total of 986 proteins in the enriched treatment and 616 proteins in

308 the control treatment. Of the 220 most abundant protein identifications for each

309 treatment, 65 were shared between treatments leaving 155 unique to each treatment (Fig

310 2a). The randomization test revealed significantly fewer protein hits shared between the

311 treatments than expected by chance (Fig 2b). In both treatments, the top three of the 20

14 312 most abundant proteins, as measured by the total number of matched peptides (spectral

313 counts), were the same in the control and enriched treatments. However, the relative

314 abundances of the remaining 17 proteins in this top list differed strongly between

315 treatments, with only seven of the 20 proteins unique to each treatment (Fig. 2c).

316 The majority of identified proteins were associated with bacteria. The most

317 common microbial class contributing to identified proteins in both treatments was

318 Betaproteobacteria, but the contribution was higher in enriched (84.4%) versus control

319 (50.3%) treatments (Table 1, Fig. 3a, Appendix 1: Fig. S5, Appendix 1: Fig. S6). This

320 difference was driven by a higher abundance of Alphaproteobacteria in multiple families,

321 including Sphingobacteriaceae, Phyllobacteriaceae, Xanthomonadaceae, and

322 Rhizobiaceae, in control ecosystems relative to the enriched ecosystems. The similarity

323 percentages test identified Betaprotebacteria (38.8%) and Alphaproteobacteria (9.9%) as

324 the main contributors to dissimilarity of active microbial class composition between

325 treatments and Neisseriaceae (23.8%) and Comamonadaceae (9.7%) as the main

326 contributors to active microbial family dissimilarity between treatments. Although both

327 treatments yielded similar numbers of identified microbial classes (control = 12, enriched

328 = 11), taxonomic evenness of microbial classes contributing to identified proteins was

329 substantially higher in the controls (PIE = 0.71) than in the enriched pitchers (PIE =

330 0.31). Similar taxonomic profiles were obtained using Unipept’s search for the least

331 common taxonomic ancestors of the pooled data (Appendix 1: Fig. S6). For the unpooled

332 data, taxonomic and functional variability among treatments was greater than variability

333 among replicate ecosystems within treatments (Fig. 3, Fig. 4). Multivariate analysis of

334 group dispersion revealed that composition of microbes contributing to protein

15 335 expression was significantly more variable in control replicates than in enriched

336 replicates at both the family (P = 0.003) and class (P = 0.023) levels.

337 The BLAST search yielded taxonomic assignments for 191 and 173 of the 220

338 sequences in enriched and control treatments, respectively, and all E-values were less

-5 339 than 10 . Of top species hits identified in the BLAST search, Variovorax paradoxus and

340 Chromobacterium violaceum were the only two of the most six abundant “species”

341 contributing to identified proteins common to both treatments. Novosphingobium

342 aromaticivorans, Starkeya novella, Sphingomonas wittichii, and Sphingomonas sp. were

343 among the six most abundant contributors in control pitchers. Pseudogulbenkiania sp.,

344 Rhodanobacter denitrificans, Janthinobacterium sp., and Dechlorosoma suillum were

345 among the six most abundant contributors in enriched pitchers (Appendix 1: Table S2).

346 Obligate aerobic bacteria contributed the most to identified proteins in the control

347 pitchers, while facultative anaerobic bacteria contributed the most in enriched pitchers

348 (Fig. 5b).

349 Functional pathways represented by the top 220 expressed microbial proteins also

350 differed between control and enriched pitchers. We detected significant differences in

351 metabolic pathways, including those involved in the metabolism of amino acids,

352 carbohydrates, lipids, secondary metabolites, cofactors & vitamins, and terpenoids &

353 polyketides (Appendix 1: Table S1, Appendix 1: Fig. S7, Appendix 1: Fig. S8a) and, at

354 courser pathway levels, energy metabolism, nucleotide metabolism and amino acid

355 metabolism (Figure 5a). In the control treatment, 161 of the top 220 protein hits were not

356 assigned to a KEGG pathway (represented by 906 total peptides). Of the 220 top protein

16 357 hits in the enriched treatment, 129 were not assigned to a pathway (represented by 2,375

358 total peptides).

359

360 Discussion

361 We hypothesized that there would be detectable differences in the taxonomic

362 composition of microbes contributing to expressed proteins. Indeed, we observed striking

363 differences between unenriched and enriched ecosystems in the taxonomic composition

364 of the microbes contributing to identified proteins (Fig. 3). The taxonomic composition of

365 bacteria contributing to protein expression in our study, and in our metagenome, is

366 consistent with findings of previous studies of bacterial communities in Sarracenia

367 species. S. purpurea pitchers contain more than 1,000 species of bacteria and a negligible

368 amount of archaea (Paisie et. al, 2014). One genomic study of S. alata pitcher bacterial

369 communities revealed an abundance of Proteobacteria (primarily Gammaproteobacteria).

370 Taxonomic groups within the Betaprotebacteria had relative abundances similar to our

371 metagenome and to control pitcher communities in our experiment, with a high

372 percentage of sequences derived from Burkholderiales and a lower proportion from the

373 Neisseriales (Koopman et al. 2010). A study of sub-habitats in S. purpurea revealed an

374 abundance of Betaproteobacteria (primarily Burkholderiales) on the pitcher walls and in

375 the sediment, co-dominance in pitcher liquid by Beta- and Alphaproteobacteria, and the

376 presence of Bacteroidetes and Firmicutes, though in a low proportion, in the sediment,

377 fluid, and pitcher walls (Krieger and Kourtev 2012). This finding is fairly consistent with

378 the taxonomic potential revealed by our metagenome, in which 35%, 23%, 14%, and 1%

379 of identified contigs were mapped to Betaproteobacteria, Alphaproteobacteria,

17 380 Bacteroidetes, and Firmicutes, respectively. Grey et al. (2012) found that S. purpurea

381 pitchers were composed primarily of Proteobacteria and Bacteroidetes, with

382 Gammproteobacteria, Alphaproteobacteria, or Betaproteobacteria dominating within the

383 Proteobacteria, but that taxonomic composition varied from pitcher to pitcher within and

384 across geographic regions.

385 The composition of bacteria contributing to protein expression in our experiment

386 varied between control replicates, much more so than between enriched pitcher

387 communities. This pattern is likely the result of a combination of factors. First, pitchers

388 contain distinct sub-habitats that vary in light availability and concentration of dissolved

389 oxygen and organic matter and therefore provide multiple habitats for a diverse set of

390 microbes (Krieger and Kourtev 2012). As organic matter enrichment increases biological

391 oxygen demand, the subsequent decline in dissolved oxygen may create a more

392 homogenous oxygen environment such that microbes sensitive to oxygen conditions can

393 no longer compete against low-oxygen tolerant bacteria, decreasing bacterial diversity.

394 Low bacterial diversity in enriched pitchers echoes findings in larger enriched

395 aquatic ecosystems. Analysis of the 16S rRNA gene product of bacterial communities in

396 nutrient-enriched salt marsh sediments revealed that the bacterial diversity of active

397 bacteria decreased relative to that of communities in unenriched sediments (Kearns et al.

398 2016). Similarly, enrichment of heterotrophic stream biofilm communities yielded

399 lowered diversity; however, in contrast to our enriched pitcher communities, the stream

400 biofilm communities diverged in composition (Van Horn et al. 2011).

401 The composition of microbes contributing to protein expression in S. purpurea

402 pitchers was similar to the composition of larger freshwater aquatic ecosystems.

18 403 Betaproteobacteria dominated microbes contributing to protein expression in both

404 enriched and control pitchers though in higher abundances in enriched pitchers relative to

405 control pitchers. Betaproteobacteria are generally the most abundant class of bacteria in

406 freshwater lakes (Percent et al. 2008, Newton et al. 2011) and dominate contaminated

407 sediments (Haller et al. 2011) and organic aggregates in eutrophic lakes (Tang et al.

408 2009). Betaproteobacteria populations associated with the beta II clade have been shown

409 to increase rapidly with the addition of organic carbon in humic lakes (Burkert et al.

410 2003, Kent et al. 2006). Furthermore, experimental dissolved organic matter additions to

411 microcosms containing alpine lake bacteria cultures led to a near-dominance of

412 Betaproteobacteria, suggesting that these bacteria are good competitors in enriched

413 aquatic ecosystems (Perez and Sommaruga 2006). These results suggest that bacterial

414 communities in S. purpurea pitchers are structured and behave like bacterial communities

415 in larger lakes and ponds in response to enrichment. It is important to note that most

416 existing literature on freshwater bacteria and S. purpurea bacterial communities rely

417 primarily on genomic methods for identification and therefore are likely capturing

418 functionally active and inactive bacteria, whereas our methods are capturing only the

419 functionally active bacteria. As a result, we use caution when directly comparing the

420 results of our study to those in larger aquatic ecosystems. However, the Unipept search of

421 our identified tryptic peptides and NCBI Blast search of their contigs of origin yielded

422 remarkably similar results (Fig 3a, Appendix 1: Fig. S6), suggesting that tryptic peptides

423 could be used to correctly identify microbes contributing to identified proteins, though at

424 coarser taxonomic levels than can be achieved by nucleic acid analysis.

19 425 We hypothesized that there would be detectable differences in the function of

426 microbial communities in control and enriched pitchers. We measured function in two

427 ways: first, we mapped identified bacterial classes associated with proteins to their

428 oxygen requirements and second, we mapped peptides to functional KEGG pathways.

429 Oxygen requirements differed significantly between taxa contributing to protein

430 expression in control and enriched microbial communities. Bacteria contributing to

431 protein expression in control pitchers were predominately aerobic whereas bacteria

432 contributing to protein expression in enriched pitchers were primarily facultatively

433 anaerobic. The difference in oxygen requirement of contributing bacteria between the two

434 treatments was driven largely by two taxa: the obligate aerobe Variovorax paradoxus

435 (28.4% of total peptides in the control treatment and 7.2% in the enriched treatment) and

436 the facultative anaerobe Chromobacterium violaceum (53.3% of total peptides in the

437 enriched treatment and 6.6% in the control treatment) (Appendix 1: Table S2). Peptides

438 that mapped to C. violaceum in the BLAST search mapped in the Unipept search to

439 Aquitalea magnusonii, a betaproteobacteria most closely related to C. violaceum, isolated

440 from a humic lake in Wisconsin, USA (Lau et al. 2006). Although we did not measure

441 dissolved oxygen during the field experiment, enriched pitchers in a subsequent

442 experiment enriched with the same concentration of organic matter became hypoxic

443 within 48 hours, suggesting that pitchers in the field were likely hypoxic (Sirota et al.

444 2013). Dissolved oxygen concentration is one of three primary drivers of bacterial

445 community composition in eutrophic, dimictic lakes (Shade et al. 2007) and appears to

446 also drive the composition of functionally active bacteria in enriched S. purpurea

447 pitchers.

20 448 We expected to see a high proportion of obligate anaerobic bacteria in enriched

449 pitchers. Bacteroidetes and Firmicutes, to a lesser degree, have been found to inhabit S.

450 purpurea pitchers (Krieger and Kourtev 2012); however, we identified very few proteins

451 associated with these taxa. Of the 3008 and 969 peptides associated with the top 220

452 proteins in enriched and control treatments, respectively, we found only 17 peptides

453 associated with obligate anaerobes in the enriched pitchers (7 of which were associated

454 with Firmicutes) and 13 associated with obligate anaerobes in the control pitchers (3 of

455 which were associated with Firmicutes). Though we did find a higher number of peptides

456 associated with Bacteroidetes (74 peptides in control pitchers and 89 in enriched

457 pitchers), they were facultative anaerobes and not strict anaerobes. It is likely that the low

458 numbers of identified peptides associated with these taxa in experimental pitchers are the

459 result of a skewed protein database. Our database was built using metagenomic data from

460 pitchers in the field, the majority of which are oxygen-rich (Adlassnig et al. 2011), and

461 likely contained nucleotide sequences primarily from aerobic and facultative anaerobic

462 bacteria. Additionally, pitchers are generally oxygen-rich due to photosynthetic activity

463 of the plant and therefore primarily harbor aerobic inquilines (Adlassnig et al. 2011).

464 Even when dissolved oxygen is low, there is a constant flux of oxygen into the pitcher

465 fluid and so the pitchers are rarely ever truly anoxic. It is not surprising, therefore, that

466 peptides associated with anaerobic bacteria were rare. In the absence of a fully

467 representative database, we feel that the higher number of proteins represented by

468 facultative bacteria in enriched pitchers relative to control pitchers is a good indicator of

469 changing oxygen conditions. These results are consistent with the shift to a hypoxic state

470 when S. purpurea is enriched with additional prey (Sirota et al. 2013).

21 471 We assigned KEGG pathways to contigs in the metagenome and to protein

472 identifications in the metaproteomes to compare microbial community function between

473 control and enriched pitchers, and between the metaproteomes and functional potential in

474 the metagenome. Not surprisingly, the functional potential revealed by the metagenome

475 differed from function revealed by the metaproteomes. Amino acid metabolism and

476 carbohydrate metabolism were represented in the top five rank-ordered pathways in both

477 the metaproteomes and the metagenome; however, carbohydrate metabolism was ranked

478 first in the metaproteomes (~34-40% of total peptides) and third in the metagenome

479 (~12% of mapped contigs). Nucleotide metabolism and energy metabolism were

480 represented in the top five in the metaproteomes (~18% of total peptides in controls and

th th 481 ~34% of total peptides in enriched pitchers), but were ranked 9 (~4%) and 7 (~5%) in

482 the metagenome, respectively. Such differences could be a result of not all nucleotide

483 sequences being transcribed and translated to proteins, but may also be an artifact of only

484 including 220 proteins from each treatment in the metaproteome analysis.

485 We hesitate to hypothesize broader relevance of our functional pathway results

486 for two reasons. First, we are most interested in the identification of proteins that can

487 serve as biomarkers of aquatic ecosystem state changes. Whereas we expect that

488 functional information will be useful for determining the utility and generality of such

489 biomarkers, it is not necessary for finding useful biomarkers. Second, it seems

490 impossible, with our limited data, to identify a complete set of functions. With that

491 caveat, we found that coarse KEGG pathway assignments differed between control and

492 enriched microbial communities. Enriched pitchers contained significantly more

493 microbial biomass, as evidenced by the size of the microbial pellets post-centrifugation.

22 494 When samples were pooled and total peptide counts were normalized, chi-square analysis

495 revealed an enrichment of peptides associated with energy metabolism in enriched

496 pitchers.

497 These results are consistent with patterns seen in larger aquatic ecosystems:

498 mineralization of organic matter, an effect of microbial energy metabolism, has been

499 shown to increase along trophic gradients, with bacteria contributing most to

500 mineralization in eutrophic freshwater lakes (Simcic 2005). Not surprisingly, peptides

501 associated with processes requiring oxygen including oxidative phosphorylation and the

502 citric acid cycle were enriched in oxygen-rich control pitcher microbial communities.

503 One protein associated with the citric acid cycle, isocitrate lyase, was present in the top

504 20 rank ordered protein identifications in the enriched treatment, but not in the control

505 treatment. This protein, which has been found to be upregulated during periods of oxygen

506 depletion in M. tuberculosis (Wayne and Lin 1982), could be a candidate biomarker for

507 an impeding tipping point in the S. purpurea microecosystem. Though we did not find a

508 significant difference in lipid metabolism pathways between control and enriched pitcher

509 proteins, there was a trend for increased pathway representation of unsaturated fatty acid

510 biosynthesis and fatty acid elongation in enriched pitchers. Such an increase has been

511 found in bacteria in low-oxygen or anaerobic conditions, primarily resulting from an

512 increase in membrane lipids (Lemmer et al. 2015). While these differences do not

513 immediately reveal a functional explanation, it is promising that there were signatures of

514 detectable differences in the protein profiles between treatments. Such differences imply

515 that there are changes in the expression of the most abundant proteins in the most

516 abundant taxa related to organic matter loading.

23 517 In larger aquatic systems, traditional water quality indicators may not provide

518 enough lead-time to forecast a tipping point (Contamin and Ellison 2009), especially if

519 they lag behind changes in the microbial community. We hypothesize that microbial

520 proteins may be more sensitive and timely indicators of impending tipping points than

521 traditional chemical markers of water quality. We argue that even though

522 metatranscriptomic and metagenomic methods have superior throughput, metaproteomic

523 methods can inexpensively and rapidly simultaneously characterize the function and

524 (indirectly) composition of the active microbial community members responsible for

525 processes related to aquatic ecosystem state changes. Our study includes a semi-

526 quantitative small initial sampling at only a single time point and therefore does not yet

527 enable a comprehensive enough proteomic analysis to determine the identity of

528 biomarkers or place them in an ecological context. Future studies using more sensitive

529 instrumentation will allow for the identification of a larger number of proteins. Time

530 series of environmental proteomics data and quantitative analysis of changes in protein

531 abundances prior to state changes will allow for the identification and ecological

532 characterization of tipping point biomarkers.

533

534 Acknowledgments

535 This work was funded by the National Science Foundation (grant numbers 1144055 and

536 1144056). Proteomic analysis was funded by the Vermont Genetics Network through

537 U.S. National Institutes of Health Grant 8P20GM103449 from the INBRE program of the

538 NIGMS. The authors thank Hailee Tenander for assisting with preparation of samples for

539 mass spectrometry analysis.

24 540 This Whole Genome Shotgun project has been deposited at DDB/ENA/GenBank

541 under the accession NMRC01000000. The version described in this paper is version

542 NMRC01000000. The protein database and all code used to analyze the data is freely

543 available on the Harvard Forest Data Archive under ID number HF295.

544

25 545 References

546

547 Adlassnig, W., M. Peroutka, and T. Lendi. 2011. Traps of carnivorous pitcher plants as a

548 habitat: composition of the fluid, biodiversity and mutualistic activities. Annals of

549 Botany 107:181-194.

550 Benjamini, Y., and Y. Hochberg. 1995. Controlling the False Discovery Rate - a Practical

551 and Powerful Approach to Multiple Testing. Journal of the Royal Statistical

552 Society Series B-Methodological 57:289-300.

553 Benndorf, D., G. U. Balcke, H. Harms, and M. von Bergen. 2007. Functional

554 metaproteome analysis of protein extracts from contaminated soil and

555 groundwater. Isme Journal 1:224-234.

556 Bestelmeyer, B. T., A. M. Ellison, W. R. Fraser, K. B. Gorman, S. J. Holbrook, C. M.

557 Laney, M. D. Ohman, D. P. C. Peters, F. C. Pillsbury, A. Rassweiler, R. J.

558 Schmitt, and S. Sharma. 2011. Analysis of abrupt transitions in ecological

559 systems. Ecosphere 2.

560 Biggs, R., S. R. Carpenter, and W. A. Brock. 2009. Turning back from the brink:

561 Detecting an impending regime shift in time to avert it. Proceedings of the

562 National Academy of Sciences of the United States of America 106:826-831.

563 Burkert, U., F. Warnecke, D. Babenzien, E. Zwirnmann, and J. Pernthaler. 2003.

564 Members of a readily enriched beta-proteobacterial clade are common in surface

565 waters of a humic lake. Applied and Environmental Microbiology 69:6550-6559.

26 566 Butler, J. L., N. J. Gotelli, and A. M. Ellison. 2008. Linking the brown and green:

567 Nutrient transformation and fate in the Sarracenia microecosystem. Ecology

568 89:898-904.

569 Campos, A., S. Tedesco, V. Vasconcelos, and S. Cristobal. 2012. Proteomic research in

570 bivalves Towards the identification of molecular markers of aquatic pollution.

571 Journal of Proteomics 75:4346-4359.

572 Carpenter, S. R., W. A. Brock, J. J. Cole, J. F. Kitchell, and M. L. Pace. 2008. Leading

573 indicators of trophic cascades. Ecology Letters 11:128-138.

574 Cheerathodi, M., and B. A. Ballif. 2011. Identification of CrkL-SH3 Binding Proteins

575 from Embryonic Murine Brain: Implications for Reelin Signaling during Brain

576 Development. Journal of Proteome Research 10:4453-4462.

577 Chrost, R. J., and W. Siuda. 2006. Microbial production, utilization, and enzymatic

578 degradation of organic matter in the upper trophogenic layer in the pelagial zone

579 of lakes along a eutrophication gradient. Limnology and Oceanography 51:749-

580 762.

581 Colatriano, D., A. Ramachandran, E. Yergeau, R. Maranger, Y. Gelinas, and D. A.

582 Walsh. 2015. Metaproteomics of aquatic microbial communities in a deep and

583 stratified estuary. Proteomics 15:3566-3579.

584 Conesa, A., S. Gotz, J. M. Garcia-Gomez, J. Terol, M. Talon, and M. Robles. 2005.

585 Blast2GO: a universal tool for annotation, visualization and analysis in functional

586 genomics research. Bioinformatics 21:3674-3676.

27 587 Contamin, R., and A. M. Ellison. 2009. Indicators of regime shifts in ecological systems:

588 What do we need to know and when do we need to know it? Ecological

589 Applications 19:799-816.

590 Correll, D. L. 1998. The role of phosphorus in the eutrophication of receiving waters: a

591 review. Journal of Environmental Quality 27:261-266.

592 Dakos, V., S. R. Carpenter, E. H. van Nes, and M. Scheffer. 2015. Resilience indicators:

593 prospects and limitations for early warnings of regime shifts. Philosophical

594 Transactions of the Royal Society B-Biological Sciences 370.

595 Dennison, W. C., R. J. Orth, K. A. Moore, J. C. Stevenson, V. Carter, S. Kollar, P. W.

596 Bergstrom, and R. A. Batiuk. 1993. Assessing water-quality with submersed

597 aquatic vegetation. Bioscience 43:86-94.

598 Elias, J. E., and S. P. Gygi. 2007. Target-decoy search strategy for increased confidence

599 in large-scale protein identifications by mass spectrometry. Nature Methods

600 4:207-214.

601 Ellison, A. M., N. J. Gotelli, J. S. Brewer, D. L. Cochran-Stafira, J. M. Kneitel, T. E.

602 Miller, A. C. Worley, and R. Zamora. 2003. The evolutionary ecology of

603 carnivorous plants. Advances in Ecological Research, Vol 33 33:1-74.

604 Farnsworth, E. J., and A. M. Ellison. 2008. Prey availability directly affects physiology,

605 growth, nutrient allocation and scaling relationships among leaf traits in 10

606 carnivorous plant species. Journal of Ecology 96:213-221.

607 Gray, S. M., D. M. Akob, S. J. Green, and J. E. Kostka. 2012. The Bacterial Composition

608 within the Sarracenia purpurea Model System: Local Scale Differences and the

609 Relationship with the Other Members of the Food Web. Plos One 7.

28 610 Haller, L., M. Tonolla, J. Zopfi, R. Peduzzi, W. Wildi, and J. Pote. 2011. Composition of

611 bacterial and archaeal communities in freshwater sediments with different

612 contamination levels (Lake Geneva, Switzerland). Water Research 45:1213-1228.

613 Hargeby, A., I. Blindow, and G. Andersson. 2007. Long-term patterns of shifts between

614 clear and turbid states in Lake Krankesjon and Lake Takern. Ecosystems 10:29-

615 36.

616 Hurlbert, S. H. 1971. The nonconcept of species diversity: A critique and alternative

617 parameters. Ecology 52:577-586.

618 Jayapal, K. P., R. J. Philp, Y. J. Kok, M. G. S. Yap, D. H. Sherman, T. J. Griffin, and W.

619 S. Hu. 2008. Uncovering Genes with Divergent mRNA-Protein Dynamics in

620 Streptomyces coelicolor. Plos One 3.

621 Kanehisa, M., S. Goto, Y. Sato, M. Kawashima, M. Furumichi, and M. Tanabe. 2014.

622 Data, information, knowledge and principle: back to metabolism in KEGG.

623 Nucleic Acids Research 42:D199-D205.

624 Kearns, P. J., J. H. Angell, E. M. Howard, L. A. Deegan, R. H. R. Stanley, and J. L.

625 Bowen. 2016. Nutrient enrichment induces dormancy and decreases diversity of

626 active bacteria in salt marsh sediments. Nature Communications 7.

627 Kent, A. D., S. E. Jones, G. H. Lauster, J. M. Graham, R. J. Newton, and K. D.

628 McMahon. 2006. Experimental manipulations of microbial food web interactions

629 in a humic lake: shifting biological drivers of bacterial community structure.

630 Environmental Microbiology 8:1448-1459.

631 Koopman, M. M., and B. C. Carstens. 2011. The microbial phyllogeography of the

632 carnivorous plant Sarracenia alata. Microbial Ecology 61:750-758.

29 633 Koopman, M. M., D. M. Fuselier, S. Hird, and B. C. Carstens. 2010. The Carnivorous

634 Pale Pitcher Plant Harbors Diverse, Distinct, and Time-Dependent Bacterial

635 Communities. Applied and Environmental Microbiology 76:1851-1860.

636 Krieger, J. R., and P. S. Kourtev. 2012. Bacterial diversity in three distinct sub-habitats

637 within the pitchers of the northern pitcher plant, Sarracenia purpurea. Fems

638 Microbiology Ecology 79:555-567.

639 Lau, H. T., J. Faryna, and E. W. Triplett. 2006. Aquitalea magnusonii gen. nov., sp nov.,

640 a novel Gram-negative bacterium isolated from a humic lake. International

641 Journal of Systematic and Evolutionary Microbiology 56:867-871.

642 Lauro, F. M., M. Z. DeMaere, S. Yau, M. V. Brown, C. Ng, D. Wilkins, M. J. Raftery, J.

643 A. E. Gibson, C. Andrews-Pfannkoch, M. Lewis, J. M. Hoffman, T. Thomas, and

644 R. Cavicchioli. 2011. An integrative study of a meromictic lake ecosystem in

645 Antarctica. Isme Journal 5:879-895.

646 Lemmer, K. C., A. C. Dohnalkova, D. R. Noguera, and T. J. Donohue. 2015. Oxygen-

647 Dependent Regulation of Bacterial Lipid Production. Journal of Bacteriology

648 197:1649-1658.

649 Levin, P. S., and C. Mollmann. 2015. Marine ecosystem regime shifts: challenges and

650 opportunities for ecosystem-based management. Philosophical Transactions of the

651 Royal Society B-Biological Sciences 370.

652 Mesuere, B., Williams, T., Van der Jeugt, F., Devreese, B., Vandamme, P., Dawyndt, P.

653 2016. Unipept web services for metaproteomics analysis. Bioinformatics.

654 Meyer, F. Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., Paczian, T.,

655 Rodriguez, A., Stevens, R., Wilke, A., Wilkening, J., Edwards, R.A. 2008. The

30 656 Metagenomics RAST Server – a Public Resource for the Automatic Phylogenetic

657 and Functional Analysis of Metagenomes. BMC Bioinformatics 9: 386.

658 Newton, R. J., S. E. Jones, A. Eiler, K. D. McMahon, and S. Bertilsson. 2011. A Guide to

659 the Natural History of Freshwater Lake Bacteria. Microbiology and Molecular

660 Biology Reviews 75:14-49.

661 Oksanen, J., F. Blanchet, R. Kindt, R. Legendre, P. R. Minchin, R. B. O'Hara, G. L.

662 Simpson, P. Solymos, M. Henry, H. Stevens, E. Szoecs, and H. Wagner. 2016.

663 Vegan: community ecology package, R Package Version 2.4-1 edn. Oksanen J,

664 Blanchet FG, Kindt R, Legendre R, Minchin PR, O’Hara RB et al. (2012). Vegan:

665 community ecology package, R Package Version 2.1-17 edn.

666 Ondov, B. D., N. H. Bergman, and A. M. Phillippy. 2011. Interactive metagenomic

667 visualization in a Web browser. Bmc Bioinformatics 12.

668 Pan, Y. D., R. J. Stevenson, B. H. Hill, A. T. Herlihy, and G. B. Collins. 1996. Using

669 diatoms as indicators of ecological conditions in lotic systems: A regional

670 assessment. Journal of the North American Benthological Society 15:481-495.

671 Percent, S. F., M. E. Frischer, P. A. Vescio, E. B. Duffy, V. Milano, M. McLellan, B. M.

672 Stevens, C. W. Boylen, and S. A. Nierzwicki-Bauer. 2008. Bacterial community

673 structure of acid-impacted lakes: What controls diversity? Applied and

674 Environmental Microbiology 74:1856-1868.

675 Perez, M. T., and R. Sommaruga. 2006. Differential effect of algal- and soil-derived

676 dissolved organic matter on alpine lake bacterial community composition and

677 activity. Limnology and Oceanography 51:2527-2537.

31 678 Peterson, C. N., S. Day, B. E. Wolfe, A. M. Ellison, R. Kolter, and A. Pringle. 2008. A

679 keystone predator controls bacterial diversity in the pitcher-plant (Sarracenia

680 purpurea) microecosystem. Environmental Microbiology 10:2257-2266.

681 Rabalais, N. N., R. E. Turner, R. J. Diaz, and D. Justic. 2009. Global change and

682 eutrophication of coastal waters. Ices Journal of Marine Science 66:1528-1537.

683 Reddy, T. B. K., A. D. Thomas, D. Stamatis, J. Bertsch, M. Isbandi, J. Jansson, J.

684 Mallajosyula, I. Pagani, E. A. Lobos, and N. C. Kyrpides. 2015. The Genomes

685 OnLine Database (GOLD) v.5: a metadata management system based on a four

686 level (meta)genome project classification. Nucleic Acids Research 43:D1099-

687 D1106.

688 Scheffer, M. 2009. Critical Transitions in Nature and Society. Princeton University Press.

689 Scheffer, M., J. Bascompte, W. A. Brock, V. Brovkin, S. R. Carpenter, V. Dakos, H.

690 Held, E. H. van Nes, M. Rietkerk, and G. Sugihara. 2009. Early-warning signals

691 for critical transitions. Nature 461:53-59.

692 Shade, A., A. D. Kent, S. E. Jones, R. J. Newton, E. W. Triplett, and K. D. McMahon.

693 2007. Interannual dynamics and phenology of bacterial communities in a

694 eutrophic lake. Limnology and Oceanography 52:487-494.

695 Simcic, T. 2005. The role of plankton, zoobenthos, and sediment in organic matter

696 degradation in oligotrophic and eutrophic mountain lakes. Hydrobiologia 532:69-

697 79.

698 Sirota, J., B. Baiser, N. J. Gotelli, and A. M. Ellison. 2013. Organic-matter loading

699 determines regime shifts and alternative states in an aquatic ecosystem.

32 700 Proceedings of the National Academy of Sciences of the United States of America

701 110:7742-7747.

702 Sondergaard, M., L. S. Johansson, T. L. Lauridsen, T. B. Jorgensen, L. Liboriussen, and

703 E. Jeppesen. 2010. Submerged macrophytes as indicators of the ecological quality

704 of lakes. Freshwater Biology 55:893-908.

705 Sowell, S. M., P. E. Abraham, M. Shah, N. C. Verberkmoes, D. P. Smith, D. F. Barofsky,

706 and S. J. Giovannoni. 2011. Environmental proteomics of microbial plankton in a

707 highly productive coastal upwelling system. Isme Journal 5:856-865.

708 Srivastava, D. S., J. Kolasa, J. Bengtsson, A. Gonzalez, S. P. Lawler, T. E. Miller, P.

709 Munguia, T. Romanuk, D. C. Schneider, and M. K. Trzcinski. 2004. Are natural

710 microcosms useful model systems for ecology? Trends in Ecology & Evolution

711 19:379-384.

712 Tang, X. M., G. Gao, B. Q. Qin, L. P. Zhu, J. Y. Chao, J. J. Wang, and G. J. Yang. 2009.

713 Characterization of bacterial communities associated with organic aggregates in a

714 large, shallow, eutrophic freshwater lake (Lake Taihu, China). Microbial Ecology

715 58:307-322.

716 Timinskas, K., M. Balvociute, A. Timinskas, and C. Venclovas. 2014. Comprehensive

717 analysis of DNA polymerase III alpha subunits and their homologs in bacterial

718 genomes. Nucleic Acids Research 42:1393-1413.

719 Ullrich, N., P. Casper, A. Otto, and M. O. Gessner. 2016. Proteomic evidence of

720 methanotrophy in methane-enriched hypolimnetic lake water. Limnology and

721 Oceanography 61:S91-S100.

33 722 Van Horn, D. J., R. L. Sinsabaugh, C. D. Takacs-Vesbach, K. R. Mitchell, and C. N.

723 Dahm. 2011. Response of heterotrophic stream biofilm communities to a gradient

724 of resources. Aquatic Microbial Ecology 64:149-161.

725 Vogel, C., and E. M. Marcotte. 2012. Insights into the regulation of protein abundance

726 from proteomic and transcriptomic analyses. Nature Reviews Genetics 13:227-

727 232.

728 Wayne, L.G., and K. Lin. 1982. Glyoxylate Metabolism and Adaptation of

729 Mycobacterium tuberculosis to Survival under Aerobic Conditions. Infection and

730 Immunity 37:1042-1049.

34 731 Table 1. Results of chi-square analysis of bacterial classes in control and enriched

732 pitchers. Bolded values represent those in which the adjusted P value is <0.05.

Class Control Peptides Enriched Peptides Adjusted chi-square Acidobacteria 6 0 0.000 Actinobacteria 32 3 0.000 Alphaproteobacteria 276 196 0.000 Bacteroidia 12 16 0.059 Betaproteobacteria 469 2448 0.000 Chloroflexi 0 3 0.816 Clostridia 3 7 0.959 Cytophagia 14 17 0.021 Deltaproteobacteria 2 0 0.132 Flavobacteriia 0 3 0.816 Gammaproteobacteria 50 146 0.816 Gloeobacteria 8 0 0.000 Sphingobacteriia 48 53 0.000 Spirochaetia 11 9 0.006 733

35 734 Table 2. Effect of treatment on microbial proteins, contributing taxa (class and family),

735 and pathways. Bolded values represent those in which the adjusted P value is <0.05.

Proteins Taxa (Class) Taxa (Family) KEGG Pathways

df F R2 P df F R2 P df F R2 P df F R2 P

Treatment 1 4.217 0.319 0.004 1 3.766 0.295 0.022 1 4.218 0.319 0.003 1 4.753 0.373 0.024

Residuals 9 0.681 9 0.705 9 0.681 9 0.627

Total 10 1.000 10 1.000 10 1.000 10 1.000

736

36 blastp' Control' ortholog' N=5' designa>on' Protein)analysis) C E

-elongation factor tu -elongation factor tu Enriched' -f0f1 atp synthase subunit beta -f0f1 atp synthase subunit beta -molecular chaperone -molecular chaperone -branched-chain amino acid abc transporter -porin substrate-binding protein -branched-chain amino acid abc transporter -porin N=6' substrate-binding protein -f0f1 atp synthase subunit alpha -atp synthase beta subunit -phosphate abc transporter -dna-directed rna polymerase subunit beta substrate-binding protein -abc transporter -heat shock protein 60 substrate-binding protein -dna-directed rna polymerase subunit beta -outer membrane protein --dependent receptor plug -isocitrate lyase -glutamine synthetase -phosphate abc transporter substrate-binding protein -malate dehydrogenase -3-hydroxyacyl- dehydrogenase -membrane protein -aldehyde dehydrogenase -rna polymerase sigma factor -atp synthase alpha subunit In#gel'digest'&' -outer membrane protein -glutamine synthetase -polymerase -phasin family protein --dependent receptor -ribosomal protein l1 -abc transporter -malate dehydrogenase LC#MS/MS' substrate-binding protein -atp synthase beta subunit -rna polymerase sigma factor -60 kda partial -ribosomal l10 family protein Metagenome' -not present -not present ' ' Protein'ID'via'search'of' Color Key custom'database' 0 0.02 0.04 0.06 0.08Func/onal)analysis)0.1 Proportion of all peptides in pathway

Pyrimidine metabolism Purine metabolism Citrate cycle TCA cycle Glyoxylate and dicarboxylate metabolism Cysteine and methionine metabolism Shared Proteins Pyruvate metabolism Carbon fixation in photosynthetic organisms Pentose phosphate pathway Novobiocin biosynthesis Valine leucine and isoleucine degradation protein'entries' Phenylalanine tyrosine and tryptophan biosynthesis Taxonomic)analysis) Arginine and proline metabolism Lysine degradation Butanoate metabolism Limonene and pinene degradation C E Oxidative phosphorylation Alanine aspartate and glutamate metabolism Pantothenate and CoA biosynthesis Glutathione metabolism Aminobenzoate degradation Tyrosine metabolism Toluene degradation Valine leucine and isoleucine biosynthesis Aflatoxin biosynthesis Sulfur metabolism beta Alanine metabolism Selenocompound metabolism Phenylalanine metabolism Porphyrin and chlorophyll metabolism Glycine serine and threonine metabolism 155 65 155 C5 Branched dibasic acid metabolism alpha Linolenic acid metabolism Retinol metabolism

C E

blast2go' 737

738 Fig. 1. Pipeline for data collection and analysis. Proteins from the microbial 739 communities in experimentally enriched and ambient control pitcher fluid were processed 740 using SDS-PAGE, tryptic digest, LC-MS/MS, and a SEQUEST search of a custom 741 metagenomic database. The composition of microbial communities was determined using 742 a BLAST homology search of metagenomic data associated with identified proteins. 743 Protein identity and annotation was determined via a blastp search to identify orthologs 744 and blast2go.

37 (a) (b)

Shared Proteins 0.10

C E 0.08 0.06 Density 155 65 155 0.04 0.02 0.00 60 80 100 120 140 160 180 Number of shared proteins

(c)

C E

-elongation factor tu - (158) -elongation factor tu - (623) -f0f1 atp synthase subunit beta - (105) -f0f1 atp synthase subunit beta - (192) -molecular chaperone - (100) -molecular chaperone - (162) -branched-chain amino acid abc transporter -porin - (128) substrate-binding protein - (53) -branched-chain amino acid abc transporter -porin - (38) substrate-binding protein - (126) -f0f1 atp synthase subunit alpha - (34) -atp synthase beta subunit - (113) -phosphate abc transporter -dna-directed rna polymerase subunit beta - (84) substrate-binding protein - (30) -abc transporter -heat shock protein 60 - (28) substrate-binding protein - (81) -dna-directed rna polymerase subunit beta - (22) -outer membrane protein - (69) --dependent receptor plug - (19) -isocitrate lyase - (58) -phosphate abc transporter -glutamine synthetase - (19) substrate-binding protein - (54) -malate dehydrogenase - (17) -3-hydroxyacyl- dehydrogenase - (53) -membrane protein - (16) -aldehyde dehydrogenase - (50) -rna polymerase sigma factor - (16) -atp synthase alpha subunit - (46) -outer membrane protein - (13) -glutamine synthetase - (44) -polymerase - (13) -phasin family protein - (44) --dependent receptor - (12) -ribosomal protein l1 - (36) -abc transporter -malate dehydrogenase - (34) substrate-binding protein - (12) -atp synthase beta subunit - (11) -rna polymerase sigma factor - (34) -60 kda partial - (9) -ribosomal l10 family protein - (34) -not present - (369) -not present - (1154) 745

746 Fig. 2. Protein identifications differed between control and enriched pitchers. (a) 747 Protein hits shared between control and enriched treatments. (b) Results of a 748 randomization test in which 220 protein hits were randomly assigned to each treatment 749 and the number of shared protein hits was calculated. Red line indicates the actual shared 750 number of proteins. Grey probability density function indicates the 95% confidence 751 interval for the simulated shared protein hit values. (c) Top 20 proteins in rank order for 752 each treatment. Proteins are ranked by the number of total peptides associated with them 753 (in parentheses). Identical proteins in both treatments are connected by lines. Blue lines 754 indicate proteins unique to the top 20 in control pitchers (C) and brown lines indicate 755 proteins unique to the top 20 in enriched pitchers (E).

38 (a)$

(b)$

756 757 Fig. 3. Distinctly different microbial communities contributed to protein expression 758 in control and enriched pitchers. The proportion of total peptides from the top 220 759 proteins associated with particular microbial classes (a) and families (b) in all enriched 760 and control replicate pitchers.

39 0.5 PROTEIN HITS Control TAXA (CLASS) Control Enriched Enriched 0.0 0.5 -0.5 -0.5 NMDS2 -1.0 -1.5 -3 -2 -1 0 1 2 3 4 -2 -1 0 1 2

Control Control

TAXA (FAMILY) 0.6 Enriched KEGG PATHWAYS Enriched 0.5 0.4 0.0 0.2 NMDS2 0.0 -0.5 -0.2 -1.0 -2 -1 0 1 -1.0 -0.5 0.0 0.5 1.0 NMDS1 NMDS1

761 762 Fig. 4. Microbial communities in control and enriched pitchers differ in the proteins 763 they produce, taxa that contribute to protein expression, and function. Ordination of 764 Bray-Curtis dissimilarities of total peptides shows clustering of pitcher microbial 765 communities by treatment for protein hits (adonis P=0.004), microbial classes (adonis 766 P=0.022), microbial families (adonis P=0.003) and KEGG pathways (adonis P=0.003) as a 767 function of treatment (control or enriched)

40 Color Key Color Key (a)$ Color Key

0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 0 0.1 0.2 0.3 0.4 Proportion of total peptide identifications in treatment/replicate Proportion of total peptide identifications in treatment/replicate Proportion of total peptide identifications in treatment/replicate

Carbohydrate metabolism Carbohydrate metabolism Carbohydrate metabolism Amino acid metabolism Amino acid metabolism Amino acid metabolism Nucleotide metabolism * Nucleotide metabolism Nucleotide metabolism Energy metabolism * Energy metabolism Xenobiotics biodegradation and metabolism Energy metabolism Xenobiotics biodegradation and metabolism Biosynthesis of other secondary metabolites Xenobiotics biodegradation and metabolism * Biosynthesis of other secondary metabolites Lipid metabolism Biosynthesis of other secondary metabolites Lipid metabolism Metabolism of other amino acids

Lipid metabolism Metabolism of terpenoids and polyketides * Metabolism of other amino acids Metabolism of cofactors and vitamins MetabolismMetabolism of other of terpenoids amino acids and polyketides Signal transduction MetabolismMetabolism of terpenoids of cofactors and and polyketides vitamins Translation

MetabolismSignal transduction of cofactors and vitaminsC E E1 E2 E3 E4 E5 E6 C1 C2 C3 C4

SignalTranslation transduction

C E H1 H2 H3 2C 5B 5A H4 H6 3E 4C Translation

C E E1 E2 E3 E4 E5 E6 C1 C2 C3 C4 (b)$

0.8

C E 0.6

0.4

0.2 Proportion of all peptides

0.0 Aerobe Anaerobe Facultative Unclassified

768

769 Fig. 5. Microbial function differed between control and enriched pitchers. (a) Heat 770 map of the proportional representation of course-level KEGG pathways between control 771 pitchers (C) and enriched pitchers (E) and individual control (H4, H6, E3, C4) and 772 enriched (H1, H2, H3, 2C, 5B, 5A) replicates. Significantly different pathways between 773 pooled control and enriched samples are indicated with “*”. (b) Oxygen requirement of 774 microbial classes contributing to protein expression as a proportion of all peptides in 775 control (C) and enriched (E) pitchers.

41 776 Appendix S1 777 778 779 Detailed Methods 780 781 Field Experiment 782 783 Starting June 10, 2011, we selected newly opened, and therefore sterile (Peterson et al.,

784 2008), S. purpurea pitchers for five days until 20 pitchers were selected. One pitcher

785 from each group was randomly assigned to one of two treatments—ambient control and

786 detritus-enriched. The average pitcher length, measured from the base of the pitcher

787 along the back of the keel to the top of the hood, was 12.4±2.3 cm. Pitcher volume was

788 not measured during the experiment. The final average volume of fluid in the pitchers

789 was 9.6±5.8 mL.

790 After the first rain, initial samples of 1.5 ml of pitcher fluid were drawn from all

791 pitchers and replaced with 1.5 ml of deionized water. In the detrital enrichment treatment,

792 each pitcher received 1mg of detritus per day between 7:00 am and 9:00 am. Wasps were

793 ground in a coffee grinder, dried for 48-72 hours in an oven at 70 °C, weighed, and stored

794 in a −20 °C freezer until used.

795

796 Sampling

797 Initial 1.5-ml samples and the entire final contents of each pitcher were drawn

798 independently through the frit of separate Bio-Rad (Hercules, California, USA) Poly-Prep

799 chromatography columns to remove any organisms larger than 30 microns. For each

800 sample, the filtrate was centrifuged in 2ml aliquots at 13,000g to concentrate the

801 microbial assemblage and the supernatant was removed. The resulting microbial pellet

802 was stored at −80 °C. All frozen samples were transported on dry ice from Harvard

42 803 Forest to the University of Vermont (June 29, 2011), where they were stored at −80 °C

804 until processed.

805

806 Metagenome Extraction and Sequencing

807 We used the DNeasy Blood and Tissue kit (Qiagen) to extract DNA from the microbial

808 pellets of three pitchers using the Purification of Total DNA from Animal Tissues Spin-

809 Column Protocol (pages 28-30 of the handbook dated 07/2006). Samples were pre-treated

810 with proteinase K (as described on page 45 of the booklet). For each pitcher, one pellet

811 was also pre-treated with lysozyme during the extraction. Five percent of genomic DNA

812 preparation was loaded on a 1% agarose gel (Appendix 1: Fig. S2a). Samples from all six

813 preparations were pooled and the DNA was precipitated with 0.1 volume 3M sodium

814 acetate and two volumes of absolute ethanol. The pooled samples were then centrifuged

815 and the precipitated DNA was washed with 75% ethanol and then resuspended in water.

816 More than 10 µg of total DNA was sent for library construction, sequencing and

817 assembly to Genome Quebéc (Montréal, QC, Canada) using the 454 GS-FLX Titanium

818 Sequencing System (Roche).

819 820 821 Protein Extraction, SDS-Page, and Mass Spectrometry

822 Microbial pellets were resuspended in 100 µl of bromophenol blue sample buffer

823 (150mM Tris pH 6.8, 2% SDS, 5% β-mercaptoethanol, 7.8% glycerol) and boiled at 95

824 °C for five minutes. All samples were diluted proportional to their pellet size to obtain

825 similar staining levels. After centrifugation, samples were loaded into separate lanes of a

826 10% polyacrylamide (37.5:1 acrylamide:bis-acrylamide) gel and subjected to SDS-PAGE

827 and Coomassie staining (Fig. 1, Appendix 1: Fig. S1a, and Appendix 1: Fig. S1b).

43 828 All six of the enriched pitchers and five of the six control pitchers had visible

829 protein staining levels and were chosen for mass spectrometry. These 11 sample lanes

830 were each divided into five regions (Appendix1: Fig. S1b) and each region was diced into

3 831 1 mm pieces. Gel cubes were rinsed with HPLC-grade water, incubated at 37 °C for 30

832 minutes in 1 ml of destain solution (50 mM ammonium bicarbonate, 50% acetonitrile),

833 and dehydrated in 100% acetonitrile for 10 min in order to remove the Coomassie stain.

834 This destain procedure was repeated a second time to ensure complete removal of the

835 stain.

836 An in-gel tryptic digest was performed by submerging the dehydrated gel pieces

837 in ice-cold sequencing-grade modified trypsin (6 ng/µl) (Promega, Fitchburg, WI, USA)

838 for 15 minutes, adding ice-cold 50 mM ammonium bicarbonate solution, letting the gel

839 pieces swell on ice, and then incubating the pieces overnight at 37 °C. Digests were

840 centrifuged at 13,000g for five minutes and the peptide-containing supernatant

841 transferred to a .6 ml tube. Peptides were further extracted from the gel pieces by adding

842 100 µl of 50% acetonitrile and 2.5% formic acid, centrifuging for 15 minutes at 13,000 x

843 g, and dehydrating in 100% acetonitrile. All extracted peptides were pooled, dried in a

844 SpeedVac for 1 hour, and stored at -80°C.

845

846 Custom Metagenomic and Protein Databases

847 We generated a custom protein database from a six-frame forward and reverse translation

848 of a metagenomic database constructed from microbial communities of three previously

849 collected pitchers that had captured diverse amounts of prey (Appendix 1: Fig. S2).

850 Pitchers were collected from Molly Bog, an ombrotrophic bog located in Morristown, VT

44 851 (44.50 -72.64) on August 18, 2008 and transported in a cooler directly from the field to

852 the University of Vermont. Microbial pellets were obtained immediately as described

853 above.

854 We used the DNeasy Blood and Tissue kit (Quiagen) to extract DNA from the

855 microbial pellets of three pitchers using the Purification of Total DNA from Animal

856 Tissues Spin-Column Protocol (pages 28-30 of the handbook dated 07/2006). Samples

857 were pre-treated with proteinase K (as described on page 45 of the booklet). For each

858 pitcher, one pellet was also pre-treated with lysozyme during the extraction. Five percent

859 of genomic DNA preparation was loaded on a 1% agarose gel (Appendix 1: Fig. S2a).

860 Samples from all six preparations were pooled and the DNA was precipitated with 0.1

861 volume 3M sodium acetate and two volumes of absolute ethanol. The pooled samples

862 were then centrifuged and the precipitated DNA was washed with 75% ethanol and then

863 re-suspended in water. More than 10 µg of total DNA was sent for library construction,

864 sequencing and assembly to Genome Quebéc (Montréal, QC, Canada) using the 454 GS-

865 FLX Titanium Sequencing System (Roche). From 243 Mb of sequence information,

866 roughly 54% of 567,549 filtered reads (median read length = 482 bp) were assembled

867 into 26,713 contigs of length greater than 500 bp (Appendix 1: Fig. S2b, Appendix 1:

868 Fig. S2c).

869 A custom metaproteomic database was created from the metagenome database

870 using open-source Ruby programming software. Each contig was translated to an amino

871 acid sequence in all six reading frames. Of the resulting amino acid sequences, only

872 sequences with greater than 100 amino acids in length were retained. Those 184,128

873 sequences were written to new fasta files and retained their original description line. If

45 874 multiple amino acid sequences came from a single contig, the resulting description lines

875 included unique letter identifiers. As such, all amino acid sequences could be mapped

876 back to a single nucleotide sequence greater than 300 bp in length. To create a decoy

877 database, all retained protein sequences were reversed and then concatenated to the

878 forward database. The decoy database allowed for an estimate of the false identification

879 rate during the database search process as has been described (Elias & Gygi, 2007).

880

881 SEQUEST Search Parameters

882 The following search parameters were used during the SEQUEST search: peptides were

883 required to be tryptic; peptide precursor mass tolerance was set at plus or minus 2 Da;

884 and differential oxidation of methionine (15.9949 Da) and differential acrylamidation of

885 cysteine (71.0371 Da) were permitted.

886

887 Randomization Test of Shared Proteins

888 A pool of identified proteins was generated by combining the total protein hits from the

889 top 220 protein hits in both treatments. We chose to analyze only the top 220 proteins in

890 each group because the identification status of proteins that are rarer is less certain and

891 because including many rare proteins in the test was likely to add noise caused by the

892 sampling of rare elements. With enough noise added from rarity, there is a danger that the

893 real signal of differences among the common proteins will be swamped by this noise.

894 Two hundred twenty protein hits were randomly drawn and assigned to each of the two

895 treatments, without replacement. Each protein hit in the original pool was weighted by

896 sum of the total number of peptides associated with that protein hit in the two treatments.

46 897 For each simulation (N =1000), the number of shared protein hits between treatments was

898 calculated, yielding a probability distribution of the expected number of shared protein

899 hits. The observed shared number of protein hits was calculated by finding the

900 intersection of the list of top 220 control protein hits and the top 220 enriched protein

901 hits. Whether protein hits were drawn with or without replacement, the number of shared

902 proteins was less than expected by chance supporting the alternative hypothesis that the

903 protein pools from the two treatments are distinct from one another (Fig 2b).

904 We conducted an additional simulation experiment (programmed in R) to test for

905 the possibility of a Type I error (incorrectly rejecting a true null hypothesis) in our

906 randomization test to determine the expected number of shared proteins between enriched

907 and control pitchers. We first simulated a single source protein pool consisting of 10,000

908 distinct protein types. Next, we created two sets to represent control and treatment

909 groups. For each group, we sampled with replacement from the protein source pool until

910 we had accumulated enough proteins so that there were exactly 200 proteins represented

911 in each group (typically this necessitated sampling somewhere between 200 and 210

912 individual proteins because there were occasional duplicates observed). As you would

913 expect, there are usually no proteins shared or only a small number between these two

914 samples.

915 Next, we followed the procedure that we described in our randomization test.

916 Namely, we reshuffled these proteins between the two groups, and calculated the number

917 of shared proteins between them. We used 100 replicates per simulated set of proteins

918 and repeated this procedure for 100 trials (preliminary runs showed that the results were

919 just as precise using only 100 replicates instead of the full 1000 employed in the analysis

47 920 of the real data). If our algorithm is behaving properly, less than 5% of such trial

921 simulations should yield a statistically significant result. We conducted two variants of

922 this test. In the first variant, each of the 10,000 proteins was equally abundant. In the

923 second variant, the protein abundances followed an exponential distribution, in which

924 there are a few relatively abundant proteins and a large number of relatively rare proteins.

925 We simulated this distribution by drawing elements from a beta distribution with

926 parameters shape1 = 0.5, shape2 = 1.0.

927 Of the 100 trials with equally abundant proteins, there was only 1 simulation in

928 which the null hypothesis was rejected. Of the 100 trials with an exponential distribution

929 of protein abundances, none of the trial data sets rejected the null hypothesis. We

930 conclude from this exercise that the null model test that we used has good Type I error

931 properties, and does not lead to spurious rejection of the null hypothesis when both

932 treatments are sampled from a single protein pool.

933

934 Taxonomic Analysis

935 To determine the taxonomic composition of the microbes contributing to identified

936 proteins in our treatments, we conducted a BLAST homology search of the metagenomic

937 sequence data for protein hits. All peptides from the top 220 identified proteins in each

938 treatment were mapped back to their contigs of origin to obtain nucleotide sequences.

939 Each nucleotide sequence was repeated by the number of associated peptides and

940 searched via BLAST (NCBI), allowing us to obtain a weighted hit table for each

941 treatment. The GI number from the top blast hit was extracted from the hit table for each

942 query sequence for each treatment. The resulting GI numbers were then searched against

48 943 the NCBI Nucleotide Database via a script that returned organism subfield values (i.e.

944 species name), yielding a list of species names for each treatment, associated with the top

945 blast hit.

946 Hurlbert’s Probability of an Interspecific Encounter (PIE) was calculated for each

947 treatment using the following equation:

! � �� = 1 − � ! � − 1 ! !!!

948 where N is the total number of peptides identified in a treatment, pi is the proportion of

949 peptides in a treatment represented by bacterial class i, and s is the number of bacterial

950 classes identified in a treatment.

951

952 O2 Requirements

953 We mapped each bacterial species identified in our BLAST search to its O2 requirement

954 using data from the Integrated Microbial Genomes database (IMG) (Reddy et al., 2015;

955 Timinskas et al., 2014)The IMG database contains 6 classes of O2 requirements: aerobe,

956 anaerobe, facultative, microaerophilic, obligate aerobe, and obligate anaerobe. The latter

957 three categories make up less than 7% of the database. We merged any species classified

958 as obligate aerobes or obligate anaerobes into the aerobe and anaerobe classes,

959 respectively.

49 960 Table S1. Results of a chi-square analysis of pathways represented by proteins in control 961 and enriched pitchers. Bolded values represent those in which the adjusted p-value is 962 <0.05. Columns with peptide counts refer to the total number of peptides associated with 963 a pathway. Pathway Control Enriched Adjusted Aflatoxin biosynthesis Peptides9 Peptides 0 Pvalue0.000 Alanine aspartate and glutamate metabolism 15 196 0.000 alpha Linolenic acid metabolism 0 10 0.294 Aminoacyl tRNA biosynthesis 0 21 0.069 Aminobenzoate degradation 1 23 0.120 Arginine and proline metabolism 21 170 0.000 Ascorbate and aldarate metabolism 4 10 0.824 Benzoate degradation 7 34 0.697 beta Alanine metabolism 4 190 0.000 Biosynthesis of unsaturated fatty acids 2 37 0.048 Biotin metabolism 6 8 0.172 Butanoate metabolism 21 118 0.120 C5 Branched dibasic acid metabolism 0 20 0.077 Caprolactam degradation 7 15 0.465 Carbon fixation in photosynthetic organisms 47 53 0.000 Carbon fixation pathways in prokaryotes 51 152 0.354 Chloroalkane and chloroalkene degradation 8 67 0.064 Citrate cycle TCA cycle 92 140 0.000 Cyanoamino acid metabolism 0 7 0.447 Cysteine and methionine metabolism 68 39 0.000 Drug metabolism cytochrome P450 2 14 0.660 Fatty acid biosynthesis 4 6 0.400 Fatty acid degradation 22 51 0.168 Fatty acid elongation 2 56 0.004 Geraniol degradation 3 67 0.004 Glutathione metabolism 14 73 0.346 Glycerolipid metabolism 2 111 0.858 Glycerophospholipid metabolism 0 19 0.085 Glycine serine and threonine metabolism 0 45 0.004 Glycolysis and glucogenesis 33 169 0.120 Glyoxylate and dicarboxylate metabolism 81 239 0.186 Histidine metabolism 5 33 0.369 Inositol phosphate metabolism 0 70 0.000 Limonene and pinene degradation 21 67 0.781 Lysine degradation 21 53 0.298 Metabolism of xenobiotics by cytochrome P450 2 6 1.000 Methane metabolism 28 76 0.331 Naphthalene degradation 0 4 0.741 Nitrogen metabolism 21 169 0.000 Novobiocin biosynthesis 39 0 0.000 One carbon pool by folate 0 16 0.120 Oxidative phosphorylation 18 98 0.194 Pantothenate and CoA biosynthesis 14 19 0.021 Pentose and glucuronate interconversions 3 39 0.075 Pentose phosphate pathway 40 2 0.000 Phenylalanine tyrosine and tryptophan biosynthesis 30 3 0.000 Phenylalanine metabolism 3 49 0.026 Phenylpropanoid biosynthesis 0 27 0.031 Phosphatidylinositol signaling system 0 5 0.637 Porphyrin and chlorophyll metabolism 3 4 0.027 Primary bile acid biosynthesis 3 20 0.554 Propanoate metabolism 22 144 0.027 Purine metabolism 91 346 0.767

50 Pyrimidine metabolism 94 257 0.048 Pyruvate metabolism 62 58 0.000 Retinol metabolism 0 3 0.858 Selenocompound metabolism 4 0 0.004 Streptomycin biosynthesis 0 3 0.858 Sulfur metabolism 4 0 0.004 Synthesis and degradation of ketone bodies 5 5 0.120 Taurine and hypotaurine metabolism 6 7 0.120 Terpenoid backbone biosynthesis 2 3 0.741 Tetracycline biosynthesis 2 0 0.120 Thiamine metabolism 3 11 1.000 Toluene degradation 11 17 0.085 Tryptophan metabolism 20 203 0.000 Tyrosine metabolism 13 14 0.007 Valine leucine and isoleucine biosynthesis 10 89 0.021 Valine leucine and isoleucine degradation 37 222 0.013 964

51 965 Table S2. Species, oxygen requirements, and bacterial classes identified in control and 966 enriched pitchers in a BLAST search of nucleotide sequences associated with the top 220 967 proteins in each treatment, weighted by total peptides. NA values represent species that 968 were non-bacterial. Oxygen Control Enriched Species Name Requirement Class Peptides Peptides

Achromobacter xylosoxidans Aerobe Betaproteobacteria 9 0

Acidiphilium multivorum Aerobe Alphaproteobacteria 0 7

Acidovorax avenae Aerobe Betaproteobacteria 12 4

Acidovorax citrulli Aerobe Betaproteobacteria 13 32

Agrobacterium radiobacter Aerobe Alphaproteobacteria 3 0

Alicycliphilus denitrificans Facultative Betaproteobacteria 9 2

Alkalilimnicola ehrlichii Anaerobe Gammaproteobacteria 0 2

Azoarcus sp Facultative Betaproteobacteria 4 0

Azorhizobium caulinodans Unclassified Alphaproteobacteria 3 3

Azospirillum sp Facultative Alphaproteobacteria 0 3

Bordetella pertussi Aerobe Betaproteobacteria 0 2

Bradyrhizobium sp Aerobe Alphaproteobacteria 10 0

Burkholderia cenocepacia Facultative Betaproteobacteria 0 6

Burkholderia cepaci Aerobe Betaproteobacteria 2 0

Burkholderia fungoru Aerobe Betaproteobacteria 14 0

Burkholderia gladiol Aerobe Betaproteobacteria 0 24

Chitinophaga pinensis Aerobe Sphingobacteriia 17 14

Chromobacterium violaceum Facultative Betaproteobacteria 62 1549

Clavibacter michiganensis Aerobe Actinobacteria 2 0

Clostridium saccharobutylicum Anaerobe Clostridia 3 7

Collimonas fungivorans Aerobe Betaproteobacteria 10 0

Corynebacterium halotolerans Aerobe Actinobacteria 3 0

Croceicoccus naphthovoran Unclassified Alphaproteobacteria 4 0

Cupriavidus taiwanensis Facultative Betaproteobacteria 2 0

Dechloromonas aromatica Facultative Betaproteobacteria 2 5

Dechlorosoma suillum Anaerobe Betaproteobacteria 17 61

Delftia acidovorans Aerobe Betaproteobacteria 3 0

52 Delftia sp Aerobe Betaproteobacteria 0 2

Desulfovibrio vulgaris Anaerobe Deltaproteobacteria 2 0

Draconibacterium oriental Facultative Bacteroidia 12 16

Dyadobacter fermentans Aerobe Cytophagia 0 2

Dyella jiangningensi Aerobe Gammaproteobacteria 2 17

Emticicia oligotrophica Aerobe Cytophagia 4 11

Flavobacteriaceae bacterium Aerobe Flavobacteriia 0 3

Gloeobacter violaceus Aerobe Gloeobacteria 8 0

Hymenobacter sp Aerobe Cytophagia 8 4

Janthinobacterium agaricidamnosum Unclassified Betaproteobacteria 0 4

Janthinobacterium sp Unclassified Betaproteobacteria 13 82

Laribacter hongkongensis Anaerobe Betaproteobacteria 0 8

Leifsonia xyli Aerobe Actinobacteria 14 0

Leptospira interrogans Aerobe Spirochaetia 11 9

Leptothrix cholodnii Aerobe Betaproteobacteria 2 0

Mesorhizobium ciceri Aerobe Alphaproteobacteria 11 0

Methylobacterium aquaticu Aerobe Alphaproteobacteria 4 5

Methylobacterium populi Aerobe Alphaproteobacteria 2 4

Methylobacterium radiotolerans Aerobe Alphaproteobacteria 17 0

Methylovorus sp Facultative Betaproteobacteria 4 0

Microbacterium testaceum Aerobe Actinobacteria 13 3

Niabelli soli Aerobe Sphingobacteriia 7 6

Novosphingobium aromaticivorans Aerobe Alphaproteobacteria 56 40

Oxalis latifoli NA NA 2 2

Pedobacter heparinus Aerobe Sphingobacteriia 16 27

Polaromonas naphthalenivorans Aerobe Betaproteobacteria 0 6

Polymorphum gilvum Facultative Alphaproteobacteria 2 0

Pseudogulbenkiania sp Facultative Betaproteobacteria 15 445

Pseudomonas denitrificans Aerobe Gammaproteobacteria 0 7

Pseudomonas entomophila Aerobe Gammaproteobacteria 0 7

Pseudomonas knackmussii Unclassified Gammaproteobacteria 0 10

53 Pseudomonas pseudoalcaligene Aerobe Gammaproteobacteria 0 3

Pseudomonas putida Aerobe Gammaproteobacteria 0 12

Pseudomonas rhizosphaera Aerobe Gammaproteobacteria 21 0

Pseudopedobacter saltans Aerobe Sphingobacteriia 8 6

Pseudoxanthomonas spadix Aerobe Gammaproteobacteria 2 0

Pusillimonas sp Unclassified Betaproteobacteria 2 0

Ramlibacter tataouinensis Aerobe Betaproteobacteria 8 6

Rhizobium etli Aerobe Alphaproteobacteria 22 31

Rhizobium sp Aerobe Alphaproteobacteria 0 5

Rhizophagus intraradice NA NA 2 0

Rhodanobacter denitrifican Facultative Gammaproteobacteria 22 88

Rhodopseudomonas palustris Facultative Alphaproteobacteria 2 31

Roseiflexus sp Facultative Chloroflexi 0 3

Runella slithyformus Aerobe Cytophagia 2 0

Sinorhizobium fredii Aerobe Alphaproteobacteria 8 0

Sphingobium chlorophenolicum Aerobe Alphaproteobacteria 0 5

Sphingobium japonicum Aerobe Alphaproteobacteria 2 0

Sphingobium sp Unclassified Alphaproteobacteria 7 0

Sphingomonas sanxanigenens Aerobe Alphaproteobacteria 3 0

Sphingomonas sp Aerobe Alphaproteobacteria 38 3

Sphingomonas tax Aerobe Alphaproteobacteria 4 0

Sphingomonas wittichii Aerobe Alphaproteobacteria 30 11

Sphingopyxis alaskensis Aerobe Alphaproteobacteria 7 7

Starkeya novella Aerobe Alphaproteobacteria 41 41

Stenotrophomonas rhizophil Unclassified Gammaproteobacteria 3 0

Terriglobus roseus Aerobe Acidobacteria 4 0

Terriglobus saanensis Aerobe Acidobacteria 2 0

Variovorax paradoxus Aerobe Betaproteobacteria 266 210 969

54 (a)"

(b)"

a" b"

c" d" e"

970 971 Fig. S1. Microbial proteins in control (C) and enriched (E) pitchers. (a) Three 972 replicate pitchers of each treatment were initially processed in November 2012. (b) The 973 remaining replicates were processed in May 2013. Lanes 4, 5, and 7 represent enriched 974 pitchers. Lanes 9, 11, and 13 represent control pitchers. The replicate in lane 13 was 975 omitted from the study due to a lack of protein. Letters a-e represent the regions that each 976 lane was cut into for MS/MS analysis.

55 (a)$

(b)$ (c)$ 7 140 6 5 100 4 80 3 60 Frequency 2 40 Frequency (thousands) Frequency 1 20 0 0 18 77 143 217 291 365 439 513 587 500 837 1211 1624 2058 2606 3554 8388 Read length (bp) Contig length (bp)

977 978 Fig. S2. (a) Agarose gel electrophoresis of metagenomic DNA from three pitcher plant 979 microbial communities. One pellet from each pitcher was treated with lysozyme. All 980 samples were pooled prior to sequencing. (b) Frequency distribution of the read lengths 981 in the sequenced metagenomic data. The median read length was 482 bp. (c) Frequency 982 distribution of assembled contig lengths in the metagenomic database. All contigs were 983 500 bp or greater in length.

56 984 985 Fig. S3 Taxonomic assignments of metagenome, as visualized by Krona. The rings, 986 from the center outward represent Kingdom (Bacteria), Phylum, Class, Order.

57 a)#

0.20

0.15

0.10

0.05 Proportion of contigs assigned to pathways assigned of contigs Proportion 0.00 Cancers Translation Cell motility Cell Transcription Lipid metabolism Lipid Digestive system Digestive Immune diseases Immune Endocrine system Endocrine Infectious diseases Infectious Signal transduction Signal Energy metabolism Energy Membrane transport Membrane Replication and repair and Replication Cell growth and death and growth Cell Nucleotide metabolism Nucleotide Amino acid metabolism acid Amino Substance dependence Substance Transport and catabolism and Transport Environmental adaptation Environmental Carbohydrate metabolism Carbohydrate Neurodegenerative diseases Neurodegenerative Folding, sorting and degradation and sorting Folding, Metabolism of other amino acids of amino other Metabolism Endocrine and metabolic diseases metabolic and Endocrine Glycan biosynthesis and metabolism and biosynthesis Glycan b)# vitamins of cofactors and Metabolism Metabolism of terpenoids and polyketides and of terpenoids Metabolism Xenobiotics biodegradation and metabolism and biodegradation Xenobiotics Biosynthesis of other secondary metabolites of secondary other Biosynthesis

0.14

0.12

0.10

0.08

0.06

0.04

0.02

0.00 Proportion of contigs assigned to pathways assigned of contigs Proportion Ribosome Peroxisome DNA replication DNA Mismatch repair Mismatch RNA polymerase RNA ABC ABC transporters RNA degradation RNA Biotin metabolism Biotin Sulfur metabolism Sulfur Purine metabolism Purine Flagellar assembly Flagellar Lysine degradation Lysine Sulfur relay system relay Sulfur Folate biosynthesis Folate Lysine biosynthesis Lysine Base excision repair excision Base Nitrogen metabolism Nitrogen Tyrosine metabolism Tyrosine Histidine metabolism Histidine Bacterial chemotaxis Bacterial Methane metabolism Methane Pyruvate metabolism Pyruvate Geraniol degradation Geraniol Thiamine metabolism Thiamine Riboflavin metabolism Riboflavin Benzoate degradation Benzoate Galactose metabolism Galactose Butanoate metabolism Butanoate Pyrimidine metabolism Pyrimidine Fatty biosynthesis acid Glutathione metabolism Glutathione Cell cycle - cycle Caulobacter Cell Propanoate metabolism Propanoate Two-component system Two-component metabolism Glycerolipid HIF-1 signaling pathway HIF-1 signaling Citrate cycle (TCA cycle cycle) Citrate Chemical carcinogenesis Chemical Nucleotide excision repair excision Nucleotide One carbon pool by folate pool One carbon Oxidative phosphorylation Oxidative Streptomycin biosynthesis Streptomycin Bacterial secretion system secretion Bacterial Phenylalanine metabolism Phenylalanine Peptidoglycan biosynthesis Peptidoglycan Homologous recombination Homologous pathway PI3K-Akt signaling Pentose phosphate pathway phosphate Pentose Arachidonic acid metabolism acid Arachidonic Glycolysis / Glycolysis Gluconeogenesis Aminoacyl-tRNA biosynthesis Aminoacyl-tRNA Phenylpropanoid biosynthesis Phenylpropanoid Inositol phosphate metabolism phosphate Inositol Starch and sucrose metabolism sucrose Starch and Arginine and proline metabolism proline and Arginine Lipopolysaccharide biosynthesis Lipopolysaccharide Glycerophospholipid metabolism Glycerophospholipid Terpenoid backbone biosynthesis backbone Terpenoid Phosphotransferase system (PTS) Phosphotransferase Fructose and mannose metabolism mannose Fructose and Pantothenate and CoA biosynthesis CoA and Pantothenate Cysteine and methionine metabolism methionine and Cysteine Porphyrin and chlorophyll metabolism chlorophyll and Porphyrin Nicotinate and nicotinamide metabolism nicotinamide and Nicotinate Carbon fixation pathways in prokaryotes in pathways fixation Carbon Glyoxylate and dicarboxylate metabolism dicarboxylate and Glyoxylate Glycine, serine and threonine metabolism threonine and serine Glycine, Pentose and glucuronate interconversions glucuronate and Pentose Valine, leucine and isoleucine degradation isoleucine and leucine Valine, Valine, leucine and isoleucine biosynthesis isoleucine and leucine Valine, Alanine, aspartate and glutamate metabolism glutamate and aspartate Alanine, Amino sugar and nucleotide sugar metabolism sugar nucleotide and sugar Amino Chlorocyclohexane and chlorobenzene degradation chlorobenzene and Chlorocyclohexane Phenylalanine, tyrosine and tryptophan biosynthesis tryptophan and tyrosine Phenylalanine, Ubiquinone and other terpenoid-quinone biosynthesis terpenoid-quinone other and Ubiquinone 987 988 Fig. S4. Functional potential of the metagenome. Rank abundance of the proportion of 989 mapped contigs assigned to a) level 2 KEGG pathways and b) level 3 KEGG pathways.

58 s i t a

B h

8 p C s

H h o

r N h o . p (a)$ m p r

s o e J t b a a c

i n a a t n h c b i t a l i C i n e o k u o r

b l i n a li u m m e c m u t b c e o l c n v ri u u a i A e g m s o a l o s f a z s u d u y c t r p n u . g e a o e M i u d v s i a m r a o d i P

r r n p a s a ll s e n a e i 8 o l s C v l % % z e o

2 n A 1 a 2 % y % % e 5 % rk Ac m 2 ta ido riu S vo te ae rax 3 c e tli av m ba ac % e en or o er iaceae Rh 5 m ae e in ct eisser o biu su th a N do X zo b n b . a hi sp a lo .. R . J a ac .. R av x e . h Acid en O a a % ns ov a e e iz 3 era o e x R tol rax 1 o dio % a a h R ra ci r i A b tru r i ium ll ia z . i er o r l u i e ct te p . a 1 v t c o . b % a e m lo h y o c ob b eth a M d a te i i o a % r p 2 b P l . c e . o r . um A o s zobi e dyrhi t t . Bra e . 1% o . B r o

p b . u . Mesorhizobium ciceri a a 1% C r t Bacteria k c o e h t m B e o r e a l d i a a m . e . . o r i p 7% n a s N S o a l te vo e a sp d s e . hin s .le id .. gob a Act.. ro ium c te G . arom e ac .s .. aticiv a B .. oran e S s ae .. Mi... . 3% 2 mo 9% R re 2 2 h % o s 2 d u an ox % P o d 2 e b a 2 a r % C d c a % o te p h b r x i a d a L t c e r i n o M n t v e o e itr o i r i i i f p f r c s h ic a h e a V r o o a p n n s b g a i a a r a in p c x u t i s e y n l e r i i n u s s u m i b s s Gloeobacter violaceus 1% t e p s .

t c a y c n Terriglobus 0.8% e o u d m o n t i Clostridium saccharobutylicum 0.4% s

Roseiflexus sp. RS-1 0%

C h (b)$ r o m o b a c te r iu m v io la c e u m

5 3 % ae eriace Neiss ria cte oba ote apr Bet ia Proteobacter NH8B kiania sp. udogulben 15% Pse Bacteria

tes acteroide a . e B .. .. a 4% A. ce .a a C Ja O B G.. .. r a n x ur . te nd th a. kho c id in .. ld eae ba 2 a o ea eria n... o % tus b e les Xa th A ac C n A cc te o... a 1 z u r ae X 2 % o m i 2 s u u % p li m % i ba S r c % a t 3 3 t e S a o r % r r p S k p yz ho e e l % p h s l y a p i 3 e 1 h i a h e % R n a

i m t s 5 g n i n h s r o x o o a g o v a m r o e s d e M r l . b a o l o u a p n v x a n s a o o c o d Actinomycetales 0.1% m d d t b i e a u a i c a r c r r c i e A a e a t t e a l c p e e a r Gloeobacter violaceus 0% s b x d o a e n r i n o th i v t n r Terriglobus 0% o i a i f i J r c a a

V n s Clostridium saccharobutylicum 0.2%

Roseiflexus sp. RS-1 0.1% 990 991 Fig. S5 Taxonomic assignments of bacterial proteins, as visualized by Krona, 992 differed between control (a) and enriched (b) pitchers. Sunburst diagrams were 993 constructed using nucleotide sequences from the metagenomic data associated with 994 identified proteins in the custom protein database. Nucleotide sequences were weighted 995 by the total number of peptides associated with each sequence. Replicates were pooled 996 for each treatment. Figures feature only matches to bacteria. The rings, from the center 997 outward represent Kingdom (Bacteria), Phylum, Class, Order, Family.

59 (a)

(b)

998 999 Fig. S6. Taxonomic assignments of bacterial proteins, as visualized by Unipept, 1000 differed between (a) control and (b) enriched pitchers. The rings, from the center 1001 outward represent Kingdom (dark blue = Bacteria), Phylum (white = Proteobacteria), 1002 Class (red = Alphaproteobacteria, light blue = Betaproteobacteria), Order (dark blue = 1003 Burkholderiales, rose = Neisseriales, light blue = Sphingomonadales). 1004

60 1005 1006 Fig. S7. Pathway representation of the proportion of total peptides associated with 1007 KEGG pathways differed between control (blue) and enriched (brown) pitchers.

61 Color Key

0 0.05 0.1 0.15 Proportion of total peptide identifications in treatment/replicate

* Pyrimidine metabolism * Purine metabolism * Pyrimidine metabolism * Citrate cycle TCA cycle * Purine metabolism * Glyoxylate and dicarboxylate metabolism * Cysteine and methionine metabolism * Citrate cycle TCA cycle * Pyruvate metabolism * Glyoxylate and dicarboxylate metabolism Carbon fixation pathways in prokaryotes * Carbon fixation in photosynthetic organisms * Cysteine and methionine metabolism * Pentose phosphate pathway * Pyruvate metabolism * Novobiocin biosynthesis * Valine leucine and isoleucine degradation Carbon fixation pathways in prokaryotes Glycolysis Gluconeogenesis * Carbon fixation in photosynthetic organisms * Phenylalanine tyrosine and tryptophan biosynthesis Methane metabolism * Pentose phosphate pathway Fatty acid degradation * Novobiocin biosynthesis Propanoate metabolism * Arginine and proline metabolism * Valine leucine and isoleucine degradation * Butanoate metabolism Glycolysis Gluconeogenesis * Limonene and pinene degradation * Lysine degradation * Phenylalanine tyrosine and tryptophan biosynthesis Tryptophan metabolism Methane metabolism * Oxidative phosphorylation Fatty acid degradation * Alanine aspartate and glutamate metabolism * Aminobenzoate degradation Propanoate metabolism * Glutathione metabolism * Arginine and proline metabolism * Pantothenate and CoA biosynthesis * Tyrosine metabolism * Butanoate metabolism * Toluene degradation * Limonene and pinene degradation * Valine leucine and isoleucine biosynthesis * Aflatoxin biosynthesis * Lysine degradation Nitrogen metabolism Tryptophan metabolism Chloroalkane and chloroalkene degradation Benzoate degradation * Oxidative phosphorylation Caprolactam degradation * Alanine aspartate and glutamate metabolism Biotin metabolism Fatty acid biosynthesis * Aminobenzoate degradation Taurine and hypotaurine metabolism * Glutathione metabolism Histidine metabolism Ascorbate and aldarate metabolism * Pantothenate and CoA biosynthesis * beta Alanine metabolism * Tyrosine metabolism * Selenocompound metabolism * Sulfur metabolism * Toluene degradation Geraniol degradation * Valine leucine and isoleucine biosynthesis Pentose and glucuronate interconversions * Aflatoxin biosynthesis * Phenylalanine metabolism * Porphyrin and chlorophyll metabolism Nitrogen metabolism Primary bile acid biosynthesis Chloroalkane and chloroalkene degradation Thiamine metabolism Biosynthesis of unsaturated fatty acids Benzoate degradation Drug metabolism cytochrome P450 Caprolactam degradation Fatty acid elongation Glycerolipid metabolism Biotin metabolism Metabolism of xenobiotics by cytochrome P450 Fatty acid biosynthesis Synthesis and degradation of ketone bodies Terpenoid backbone biosynthesis Taurine and hypotaurine metabolism Tetracycline biosynthesis Histidine metabolism * alpha Linolenic acid metabolism Aminoacyl tRNA biosynthesis Ascorbate and aldarate metabolism * C5 Branched dibasic acid metabolism * beta Alanine metabolism Cyanoamino acid metabolism Glycerophospholipid metabolism * Selenocompound metabolism * Glycine serine and threonine metabolism * Sulfur metabolism Inositol phosphate metabolism Naphthalene degradation Geraniol degradation One carbon pool by folate Pentose and glucuronate interconversions Phenylpropanoid biosynthesis * Phenylalanine metabolism Phosphatidylinositol signaling system * Retinol metabolism * Porphyrin and chlorophyll metabolism Streptomycin biosynthesis Primary bile acid biosynthesis Thiamine metabolism C E E1 E2 E3 E4 E5 E6 C1 C2 C3 C4 Biosynthesis of unsaturated fatty acids Drug metabolism cytochrome P450 Fatty acid elongation Glycerolipid metabolism Metabolism of xenobiotics by cytochrome P450 Synthesis and degradation of ketone bodies Terpenoid backbone biosynthesis Tetracycline biosynthesis * alpha Linolenic acid metabolism Aminoacyl tRNA biosynthesis * C5 Branched dibasic acid metabolism Cyanoamino acid metabolism Glycerophospholipid metabolism * Glycine serine and threonine metabolism Inositol phosphate metabolism Naphthalene degradation One carbon pool by folate Phenylpropanoid biosynthesis Phosphatidylinositol signaling system * Retinol metabolism Streptomycin biosynthesis C E H1 H2 H3 2C 5B 5A H4 H6 3E 4C 1008 1009 Figure S8. KEGG pathway assignments differed between control and enriched 1010 pitchers. (a) Heat map of the proportional representation of pathways between control 1011 pitchers (C) and enriched pitchers (E) and individual control (H4, H6, E3, C4) and 1012 enriched (H1, H2, H3, 2C, 5B, 5A) replicates. Significantly different pathways between 1013 pooled control and enriched samples are indicated with “*”.

62 1014 Data S1. Table in .csv file form listing the proteins from the top 220 analyzed in control 1015 and enriched treatments and their associated peptides.