bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

1 2 Phage-centric ecological interactions in aquatic ecosystems revealed through ultra-deep metagenomics 3

4

5 Vinicius S. Kavagutti1, Adrian-Ştefan Andrei1, Maliheh Mehrshad1, Michaela M. Salcher1,2, 6 Rohit Ghai1*

7

8 1Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre of the Academy of

9 Sciences of the Czech Republic, České Budějovice, Czech Republic.

10 2Limnological Station, Institute of Plant and Microbial Biology, University of Zurich, Seestrasse 187,

11 8802 Kilchberg, Switzerland.

12 *Corresponding author: Rohit Ghai

13 Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre of the Academy of

14 Sciences, Na Sádkách 7, 370 05, České Budějovice, Czech Republic

15 Phone: +420 387 775 881

16 Fax: +420 385 310 248

17 E-mail: [email protected]

18 bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

19 Abstract

20 The persistent inertia in the ability to culture environmentally abundant microbes from aquatic 21 ecosystems represents an obstacle in disentangling the complex web of ecological 22 interactions spun by a diverse assortment of participants (pro- and eukaryotes and their 23 viruses). In aquatic microbial communities, the numerically most abundant actors, the 24 viruses, remain the most elusive, and especially in freshwaters their identities and ecology 25 remain obscure. Here, using ultra-deep metagenomic sequencing from freshwater habitats 26 we recovered complete genomes of >2000 phages, including small “miniphages” and large 27 “megaphages” infecting iconic prokaryotic lineages. We also show that many phages encode 28 genes that likely enhance survival of infected microbes during strong eukaryotic grazing 29 pressure. For instance, we describe genes that afford protection to their host from reactive 30 oxygen species (ROS) in the environment and from the oxidative burst in protist 31 phagolysosomes (phage-mediated ROS defense) or those that directly effect targeted killing 32 of the predators upon ingestion of a phage-infected microbe (Trojan horse). Spatiotemporal 33 abundance analyses of phage genomes revealed evanescence as the primary dynamic in 34 upper water layers, where they displayed short-lived existences. In contrast, persistence was 35 characteristic for the deeper layers where many identical phage genomes were recovered 36 repeatedly. Phage and host abundances corresponded closely, with distinct populations 37 displaying preferential distributions in different seasons and depths, closely mimicking overall 38 stratification and mixis.

39

40 Introduction

41 Freshwater planktonic communities are complex and dynamic, exhibiting distinct, recurrent 42 patterns driven by both biotic and abiotic environmental factors [1]. However, in practice the 43 accurate resolution of recurrence of individual pelagic components is challenging, and from 44 small to large (viruses, , eukaryotes) our discriminative ability to quantify each 2 4 45 participant varies greatly. Freshwaters typically contain 10 -10 unicellular eukaryotes and 5 7 6 46 10 -10 prokaryotes per mL, but viruses are clearly the most abundant entities, with up to 10 - 8 47 10 viruses per mL [2]. Moreover, viruses are extraordinarily diverse, and complete genomic 48 contexts are essential to understand the nature and dynamics of this diversity. The viral 49 collective influences microbial community ecology by increasing carbon and phosphorous 50 transfer to microbes [3–6], modulating individual lifestyles and evolutionary histories of 51 microbial lineages [7,8] and maintaining the diversity of the community at large [9]. Viruses in 52 aquatic habitats are responsible for the mortality of nearly 20-40% prokaryotes every day bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

53 [10], yet freshwater viruses remain largely understudied and untouched by advances in 54 microbial culturing techniques and environmental genomics. Only a handful of isolate phage 55 genomes are available from freshwater habitats [11–14] and only a few metagenomic studies 56 are available [15–19]. However, the host-virus community responses to the establishment of 57 the characteristic vertical zones in the water column of seasonally stratified water bodies, (a 58 relatively warmer, light exposed epilimnion and a deeper and colder hypolimnion) remain 59 uncharacterized. Even more importantly, a representative collection of complete viral 60 genomes from freshwater has so far remained out of reach.

61 Here, we exploit the potential of ultra-deep metagenomic time-series sequencing to 62 simultaneously recover phage (Caudovirales) and host genomic data from two common 63 freshwater habitats (a drinking water reservoir and a humic pond). In doing so, we 64 reconstructed 2034 complete genomes of phages infecting freshwater prokaryotes. These 65 phage genomes are predicted to infect freshwater , Betaproteobacterales, 66 Alphaproteobacteria, , and other phyla for which no phages have 67 been described before. Using the abundant freshwater Actinobacteria and their phages as 68 models, we show that not only do phage genome abundances in deep water bodies mirror 69 the abundance of their hosts, they also reflect the classical patterns in thermal cycles of the 70 water column i.e. stratification and mixis. High abundances for both phages and hosts in the 71 epilimnion are transitory and persistence at lower abundances in the hypolimnion, the far 72 larger niche, is the rule.

73

74 Results and Discussion

75 Metagenomic sequencing, assembly and complete phage genome recovery

76 We chose for our study two sites that serve as models for two distinct freshwater habitat 77 types: meso-eutrophic Římov reservoir, a typical man-made, canyon-shaped reservoir, 78 common to north temperate regions [20], and Jiřická pond, a shallow, humic mountain pond 79 habitat found across the world [21]. The Římov reservoir is dimictic [22] and begins to mix at 80 the onset of spring (March-April). It is stratified in summer when a distinct, warm epilimnion 81 develops (May-October) above a colder hypolimnion. At the onset of winter, the colder 82 waters sink, and the reservoir mixes again when water temperature throughout the water 83 column drops to ca. 4ºC (Supplementary Figure S1). It is ice-covered during winter for at 84 least two months (See methods for more site details).

85 We generated metagenomic time-series from both sites, producing 18 metagenomes from 86 Římov (both epi- and hypolimnion) and 5 from Jiřická (12.97 billion reads, ca. 1.9 Tb, bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

87 Supplementary Table S1). While most samples we sequenced were ca. 54 Gb in size 88 (ranging from 190-482 million reads, average 368 million reads), two Římov samples (epi 89 and hypolimnion) we sequenced ca. 380 Gb each (2.5 billion reads each). An overview of the 90 microbial community using 16S rRNA abundances for both sites is shown in Supplementary 91 Figure S2.

92 We also collected an additional 149 publicly available freshwater metagenomes 93 (Supplementary Table S1, total of 4.04 billion reads, 1.09 Tb data) to search for complete 94 phage genomes. All datasets were assembled independently (no co-assembly). In total, we 95 analyzed ca. 3 Tb of metagenomic sequences from freshwater (ca. 17 billion reads).

96 The number of complete phage genomes recovered from any sample increased with 97 sequencing depth, but with diminishing returns (Figure 1a), with genome recovery 98 maintaining linearity up to 100 Gb (ca. 1 phage genome for every additional Gb) before 99 tapering off at a maximum of 160 genomes from 400 Gb sequence data. While a total of 100 1677 genomes were assembled from the sequence data generated from the two study sites, 101 (Římov and Jiřická), 357 genomes were recovered from all other available freshwater 102 metagenomes. This suggests that the potential of ultra-deep sequencing to recover far more 103 phage genomes has not yet been fully realized. We also recovered a number of 104 metagenome-assembled genomes (MAGs) from the Římov metagenome time-series dataset 105 (see below). We denominate this entire collection of genomes as the Uncultured Freshwater 106 Organisms (UFO) dataset, where the UFOv subset refers to viruses and the UFOp subset to 107 prokaryotic genomes.

108 Phage genome analyses

109 A total of 598 complete phage genomes were recovered from the Římov epilimnion (10 110 samples), 800 from the hypolimnion (8 samples) and 279 from Jiřická (5 samples). Upon 111 dereplication (genomes with >95% identity and >95% coverage treated as one, see 112 methods), these numbers reduced by nearly three-fold for the hypolimnion suggesting 113 repeated capture of nearly identical genomes from multiple samplings. We found only a 114 single instance of a phage that was nearly identical in two habitats (Římov and Jiřická).

115 The comparison of recovered freshwater phage genomes to representative sets of phages 116 from Viral RefSeq (1996 genomes) and the marine habitat (1335 genomes) [23–25] is shown 117 in Fig. 1c. Intriguingly, the genome size distributions of marine and freshwater phage 118 genome sizes appear similar, except for a pronounced peak at small genome size (ca. 15 119 Kb) in the freshwater datasets. We recovered 155 “miniphage” genomes that were <15 Kb in 120 length (minimum length 13.5 Kb).This somewhat bimodal distribution is remarkably 121 reminiscent of cell size distributions of prokaryotes themselves in freshwater [26] and while bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

122 not conclusive in itself, this suggests that phage size distribution mirrors host cell size. That 123 such peaks are not visible in the size distributions from isolate phages (Supplementary 124 Figure S3) also points towards a more ecological explanation for the bimodal distribution in 125 freshwater datasets. On the other hand, we also recovered 27 “megaphage” genomes (>200 126 Kb in length, maximum length 446 Kb) that are similar in genome size to some recently 127 described phages from the human gut microbiome [27].

128 Common to both freshwater and marine habitats, two frequently recovered genome sizes 129 appear to be ca. 40 Kb and 60 Kb, with 40 Kb being the most frequent. Unsurprisingly 130 perhaps, genomic GC% of recovered phage genomes mirrors the GC% of the habitats 131 (Figure 1f and Supplementary Figure S4). As a measure of how many novel proteins are 132 available in our freshwater phage dataset, we clustered all proteins in these datasets at two 133 percentage identity levels (30% and 60%) [28]. The RefSeq dataset has the maximum 134 number of unique protein clusters (not found in the others), followed closely by the UFO 135 dataset (Figure 1g). Additionally, the number of distinct Pfam domains detected in each 136 dataset were 1761, 927 and 932 for RefSeq, marine and freshwater datasets respectively. 137 These statistics suggest the UFO complete phage genome dataset adds significant novelty 138 in phage sequence space. We also applied vContact2 (that uses Viral RefSeq phage 139 genomes as references) to assess the novelty of the recovered phages [29]. Of the 1330 140 freshwater and 1202 marine phage genomes (both dereplicated), 775 freshwater phages 141 could not be assigned to any known RefSeq or marine phage cluster, thus remaining either 142 unclassified or clustering only with freshwater phages. Nearly half of marine phages (n=553) 143 also did not cluster with any freshwater or RefSeq phage suggesting the existence of 144 extremely divergent phage populations in these habitats. This is also seen in an all-vs-all 145 comparison of all phage genomes, where the freshwater phages form large clusters that are 146 only weakly related to other known groups (Supplementary Figure S5).

147 Host predictions and lifestyle strategies

148 Using multiple methods (host genes, similarity of tRNA integration sites and presence of 149 CRISPR spacers) we were able to predict hosts for 404 phage genomes (ca. 20%). The 150 maximum number were predicted to be actinophages, largely owing to the presence of the 151 characteristic whiB gene (sometimes even in multiple copies, similar to their hosts) that are 152 taxonomically restricted to members of the actinobacterial phylum [19]. These actinophage 153 genomes show an extremely broad size distribution, with multiple “miniphages” (n=15, 6 154 dereplicated clusters) and “megaphages” (n=3, 2 dereplicated clusters) (Figure 1h). It 155 appears that the most abundant microbes in the freshwater water-column are infected by the 156 full-size range of tailed phages ranging from as small as 14 Kb to as large as 347 Kb. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

157 While for most actinophages we could not specifically pinpoint the host, it was possible for a 158 few, and we predict at least 36 to infect the most abundant lineage in freshwaters (‘Ca. 159 Nanopelagicales’, acI lineage [30]). In addition, others that possibly infect freshwater 160 Acidiimicrobia (acIV lineage [31]) and Microbacteraceae (Luna cluster [32]) were also found 161 (Supplementary Table S2). In comparison to known actinobacterial phages (from cultured 162 isolates), those from the metagenomes display a wide range of GC content and lengths 163 (Figure 1f), with a somewhat lower genomic GC% corresponding to the relatively lower GC% 164 of freshwater Actinobacteria (esp. ‘Ca. Nanopelagicales’, 42% GC, Neuenschwander et al. 165 2018).

166 We also present the first phages predicted to infect multiple other abundant freshwater 167 groups e.g. Betaproteobacterales, Alpha- and Gammaproteobacteria, Bacteroidetes, 168 , , , etc. (Figure 1h, Supplementary Figure 169 S5, Supplementary Table S2). We recovered only six recognizable cyanophages from our 170 freshwater datasets. We attribute this to the high abundances of filamentous Cyanobacteria 171 in freshwaters that are excluded by our filtration and phage recovery methodology. Amongst 172 the other predicted hosts are several well-known freshwater genera (e.g., Limnohabitans, 173 Polynucleobacter, Fonsibacter, Flavobacterium, Novosphingobium, Sphingomonas, etc. 174 Supplementary Table S2) with no described phages so far, except for ‘Ca. Methylopumilus’ 175 [13].

176 Remarkably, 24 freshwater phages were found to encode the toxin ADP-ribosyltransferase. 177 These are eukaryotic toxins related to VIP2 like toxins (a special class of AB toxins), that 178 inhibit actin polymerization [33] and have been previously suggested to function as a “trojan 179 horse”, effecting targeted killing of eukaryotic predators that phagocytose phage-infected 180 microbes [19,34,35] (Figure 2). These are frequently encoded in phage genomes, and have 181 been found in both free-living phages or inserted prophages, e.g. Shiga toxin [36]. We also 182 found evidence of the same toxin in six marine phage genomes and 116 phages from 183 RefSeq infecting isolates of Escherichia, Mycobacterium, Aeromonas, etc. (Supplementary 184 Table S3). Some of the freshwater phages encoding this toxin are predicted to infect 185 Actinobacteria. Recently, phages have been shown to also encode ribosomal proteins that 186 likely assist phage protein translation during infection (Mizuno et al 2019). We found 20 187 freshwater phages encoding at least one ribosomal protein (either S21 or L12) 188 (Supplementary Table S4).

189 More than 10% of phages (i.e. 254 phage genomes) harbored genes involved in oxidative 190 stress mitigation (Supplementary Table S5). Aquatic routinely experience the 191 damaging effects of reactive oxygen species (ROS) (superoxide, hydrogen peroxide and 192 hydroxyl radicals) produced by their own metabolic machineries, released by other bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

193 community members or generated by UV-induced photochemical reactions [37]. In defense, 194 bacteria employ several strategies to prevent ROS formation, deactivate ROS and repair the 195 ROS-induced damage. The widespread occurrence of ROS defense mechanisms in bacteria 196 designates oxidative stress as one of the main threats to their fitness and a major culprit in 197 mortality [37–39]. Thus, based on the plethora of ROS-defense mechanisms found in these 198 freshwater phages e.g. ferritin prevents ROS formation; superoxide dismutases and 199 glutathione peroxidases inhibit ROS; thioredoxins and glutaredoxins repair oxidized amino 200 acids particularly cysteine and methionine and PAPS reductases boost reduced sulfur group 201 assimilation [39,40] (Supplementary Table S5), we consider that phages could provide their 202 hosts the means to combat the harmful effects of oxidative stress. Such a strategy could be 203 beneficial for phages, as it ensures the survival (during the lytic cycle) and proliferation of the 204 hosts (during the lysogenic cycle) and protects their own proteins and DNA against oxidative 205 damage. Given that nearly 10% of all recovered phages encode some ROS defense genes, 206 and the high rate of infections in the natural environments (estimated to be up to ca. 25% 207 [41]), it also appears that this strategy is commonly employed. However, unicellular 208 eukaryotes (primarily flagellates) may consume up to 50% of bacteria in freshwater habitats 209 on a daily basis [42]. This significant number, coupled with high phage infection frequencies 210 suggests that multiple phage-infected bacteria must be ingested by these flagellates. Not all 211 microbes are fully digested in the food vacuoles and many are expelled outside again 212 [43,44]. It is quite likely that ROS-defense mechanisms improve the odds for surviving the 213 phagolysosome where reactive oxygen species are discharged to destroy bacteria (oxidative 214 burst). We postulate that a bacterium infected by a phage containing a ROS-defense 215 mechanism will have a selective advantage during phagocytic flagellate grazing. Thus, the 216 phage-encoded proteins could help the host survive the high ROS environment that 217 characterizes the phagolysosomes [45], i.e. a phage -mediated ROS defense (Figure 2). In 218 both these mechanisms, (the trojan horse and phage-mediated ROS defense), genes that 219 enhance the preservation of the host-phage interaction at the expense of the eukaryotic 220 predators appear to be selected and widely encoded in environmental phage genomes.

221 222 Phage abundance time-series

223 Owing to the samples from multiple time points in the epi- and hypolimnion of Římov 224 reservoir, we were able to recover several, nearly identical phage genomes repeatedly. The 225 most extreme case was that of a predicted actinophage that was recovered twelve times, in 226 distinct seasonal phases (spring bloom, summer and winter). Remarkably, even though it 227 was retrieved at different times of the year, only seven “variant” locations are seen 228 (Supplementary Figure S6). Six of these are present in hypothetical genes and one in an bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

229 intergenic region. Exhaustive sequence searches using jackhmmer [46] and HHpred server 230 [47] revealed little clues to their functions. However, the retrieval of multiple, nearly identical 231 phage genomes from multiple time-points and strata suggests that some lineages are 232 persistent, likely also owing to the constant presence of the host, in this case, Actinobacteria 233 that are always abundant in the Římov reservoir [20] (Supplementary Figure S2). Similar to 234 this phage, we also recovered multiple other examples that remain unchanged during our 235 sampling efforts (Supplementary Table S2).

236 Remarkably, the abundance patterns of phages recovered from the Římov reservoir 237 (n=1398) (Figure 3) suggest seasonality in their appearance. Distinct sets of phages peak in 238 different layers during summer stratification (epi- or hypolimnion) or mixis (both spring and 239 early winter). In the hypolimnion, some phages are persistent throughout the year while 240 others appear only during stratification. Moreover, the abundances (coverage per GB) for 241 each phage across the entire timeline of the Římov reservoir (18 samples) show that most 242 phages in the epilimnion transiently achieve high abundances followed by near 243 disappearance (a boom and bust scenario) while several in the hypolimnion appear to be 244 more persistent and are recovered at multiple time points (Figure 3). The shorter timeline of 245 the Jiřická samples also shows a near continuous replacement of the abundant phages 246 comparable to the Římov epilimnion (Supplementary Figure S7) in the relatively warm time 247 period sampled here. However, as this time series is far shorter (lasting only a few summer 248 months, Supplementary Figure S2), it remains to be seen if in such dynamic systems 249 anything resembling persistence as observed in the colder hypolimnion of the Římov 250 reservoir.

251

252 Spatiotemporal dynamics of actinophages and their hosts

253 As actinobacterial phages were the largest identifiable group (owing to the presence of the 254 whiB gene), and that Actinobacteria are known to be dominant members of the community 255 throughout the year (Supplementary Figure S2), we chose to focus subsequent analyses on 256 both recovered actinophages and actinobacterial MAGs. Abundance profiles of all recovered 257 actinophages encoding whiB (confident predictions, n=125) are shown in Figure 4. Phages 258 that are nearly identical, i.e. persistent phages (>95% identity and >95% coverage) are 259 shown as part of a cluster. The profiles within a cluster appear homogenous but different 260 clusters show distinct preferences for either the epilimnion or the hypolimnion, suggesting 261 that their hosts (Actinobacteria) would also show similarly distinct patterns.

262 We recovered 444 actinobacterial metagenome-assembled genomes (MAGs) from the 263 Římov Reservoir datasets, whereof 305 MAGs fulfilled the criteria for further analyses (see bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

264 Methods). A phylogenomic analysis of the recovered actinobacterial MAGs in context of 265 known isolate genomes is shown in Figure 5. Most MAGs are placed within the three known 266 groups of Actinobacteria frequently found in freshwater habitats, ‘Ca. Nanopelagicales’ 267 (n=280), Acidimicrobiia (n=114) and Microbacteriaceae (n=37) (Supplementary Table S6). In 268 particular, the order ‘Ca. Nanopelagicales’ are the most cosmopolitan microbes from 269 freshwater [30,48,49] and have only recently been brought into culture [30,50]. Isolates from 270 Microbacteriaceae are also available [32,51] but no cultured representatives exist yet for 271 Acidimicrobiia and these are described only from metagenome-assembled genomes [52]. 272 Within the order ‘Ca. Nanopelagicales’, two genera are defined, ‘Ca. Planktophila’ (formerly 273 acI-A) and ‘Ca. Nanopelagicus’ (formerly acI-B) [30,53]. However, several other lineages 274 have been described from 16S rRNA based surveys, e.g. acI-C [53], acSTL [54], acTH1 [55]. 275 While recently a single genome from acI-C isolate has become available [50], no isolates or 276 genomes have been described for either acSTL or acTH1. Based on the presence of 16S 277 rRNA sequences, we identified MAGs that belong to acI-C2, acSTL, and acTH1 lineages. 278 The acI-C2 related MAGs branch outside the acI-A and acI-B in accordance with known 279 phylogeny. Both acSTL and acTH1 appear as a deep-branching sister group to other ‘Ca. 280 Nanopelagicales’ but still belong to the same order (as classified by GTDB [56]). In addition, 281 we also recovered ‘Ca. Nanopelagicales’ MAGs that are basal to all known groups (Figure 282 5).

283 The abundance profiles of these Actinobacteria MAGs revealed that the vast majority are 284 more abundant in the hypolimnion than the epilimnion, especially ‘Ca. Nanopelagicales’ and 285 Acidimicrobiia, while the reverse is true for Microbacteriaceae, that are nearly always more 286 abundant in the epilimnion (Figure 5). MAGs that are in close phylogenetic proximity (for 287 instance within the genus ‘Ca. Nanopelagicus’) do not necessarily show similar abundance 288 patterns implying niche divergence even within closely related organisms (Neuenschwander 289 et al. 2018) and several show only peaks in epilimnion or hypolimnion alone. Remarkably, 290 the temporal abundances of both phages and their hosts mirror three distinct states of the 291 reservoir, warm and stratified epilimnion, cold hypolimnion and mixed water column. This is 292 seen even more clearly in the predicted actinophages (Figure. 4) and reflected in the 293 abundances of their hosts (Figure 5).

294 The sporadic peaks of phages observed in the epilimnion reflect the transient niche that it 295 truly is, in comparison to the apparently more stable environment of the hypolimnion (with 296 little temperature variation). The ground state for Římov is a low-temperature regime, lasting 297 for nearly two-thirds of the year. Only at the end of the spring overturn until onset of winter a 298 shallow, peripheral zone with higher temperature and light intensity (i.e. epilimnion) is 299 established within which periodic blooms of photosynthetic organisms are observable [57]. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

300 Regardless of the sporadic peaks, we captured a complete genome of an actinophage in 301 summer (at high abundance), and the same genome was recovered repeatedly at lower 302 abundances, from both epi- and hypolimnion samples and at widely different temperatures 303 (4ºC to 24ºC) at different times of the year (cluster 1 in Figure 4). This suggests its host 304 experiences conditions favorable for its increased abundance in the warmer epilimnion what 305 leads to the higher abundance of its phage. The persistent recovery of such a phage also 306 suggests that its host remains available throughout the year. In line with these observations, 307 many actinobacterial MAGs show short-lived maxima in the epilimnion followed by lower 308 abundances in the hypolimnion as observed earlier via fluorescence in situ hybridization with 309 species to genus-specific probes [30]. It has also been shown recently that in the absence of 310 the optimal host phages may switch to sub-optimal hosts [58], further driving diversification 311 and likely helping extend the longevity of phage lineages.

312 The major observable dynamic in the hypolimnion is the remarkably similar abundance 313 patterns of the hosts and their phages, i.e. both show persistence from the onset of 314 stratification till next spring. It appears that the hypolimnion maintains a large pool of highly 315 related host genomes most of which are well-adapted to the long-lasting low-temperature 316 regime of the reservoir. A fraction of these might find favorable niches in warmer 317 temperatures, blooming, and then retreating within the hypolimnion at winter onset. However, 318 even with the observable “persistent” abundances of phages, the hypolimnion is not without 319 its perturbations (not in temperature but in other environmental variables, e.g. hypoxia, 320 irregular nutrient input). Less obvious, but sporadic peaks are observed in the hypolimnion as 321 well (Figure 3), what would suggest clonal expansions of the host, as has been reported for 322 some Planctomycetes [59], suggesting similar dynamics are also played out in deeper 323 waters.

324

325 Conclusions

326 Freshwater habitats are relatively accessible to monitoring and the use of time-series 327 metagenomes allows sensitive, genome-based surveys to capture both recurrent and 328 anomalous changes in community composition. In this work, we used a deep-sequencing 329 time-series approach to viral ecology aimed towards the recovery of a representative 330 freshwater phage genome collection that significantly expands the known viral sequence 331 space and revealed viruses infecting many different freshwater phyla for which none were 332 known before. This collection of complete phage genomes should serve as a critical 333 reference in boosting environmental genomics of viruses in freshwater habitats at large, 334 illuminating their wide genomic repertoire and revealing their ecological interactions not only bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

335 with their microbial hosts, but also with the predators of their hosts, i.e. unicellular 336 eukaryotes.

337 While this study was focused only upon phages, the metagenomes described here are 338 expected to shed light on many other viral groups as well e.g. ssDNA viruses, 339 phycodnaviruses and virophages. With the availability of relatively inexpensive and 340 increasingly higher throughput in sequencing technologies and the advent of even longer 341 reads, freshwater viral ecology can transition from a gene-based to a genome-centric view of 342 the viral world around us.

343

344 Data Availability

345 Sequence data for all metagenomes generated in this work are archived at 346 DDBJ/EMBL/GenBank and can be accessed under the Bioproject PRJNA429141 for the 347 Římov reservoir and Bioproject PRJNA429145 for Jiřická pond. All phage genomes and the 348 actinobacterial metagenome assembled genomes are available in NCBI Bioproject 349 PRJNA449258.

350 Acknowledgments

351 The authors thank Petr Znachor and Pavel Rychtecký for help with sampling of Římov 352 Reservoir and Petr Porcal for sampling Jiřická Pond. VSK was supported by the research 353 grant 17-04828S (Grant Agency of the Czech Republic). A-Ş. A was supported by the 354 research grants 17-04828S (Grant Agency of the Czech Republic) and MSM200961801 355 (Academy of Sciences of the Czech Republic). MM was supported by the Postdoctoral 356 program PPPLZ (application number L200961651) provided by the Academy of Sciences of 357 the Czech Republic. MMS was supported by research grant 19-23469S (Grant Agency of the 358 Czech Republic). RG was supported by the research grants 17-04828S (Grant Agency of the 359 Czech Republic) and CZ.02.1.01/0.0/0.0/16_025/0007417 (ERDF/ESF).

360

361 Competing interests

362 The authors declare that there are no competing interests.

363 bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

364 Methods

365 Sampling site and collection: Site I: Římov reservoir (Czech Republic, 48.846361 N 366 14.487639 E) with 2.06 km2, volume 34.5x106 m3, length 13.5 km, circum-neutral pH, 367 maximum depth 40m, average depth 16.5m, retention time ~100 days, dimictic, meso- 368 eutrophic, moderately humic). Built in 1979, on the Malše River, it is part of the Czech Long 369 Term Ecological Research network [22,60,61]. Eighteen water samples were collected from 370 epilimnion (0.5m, n=10) and hypolimnion (30m, n=8) from June 2015 to August 2017. Two of 371 these have been published previously [59,62] and the rest were generated in this study. 372 Vertical profiles of the physicochemical characteristics of the water column (temperature, pH, 373 oxygen; GRYF XBQ4, Havlíčkův Broc, CZ) and chlorophyll a (FluoroProbe TS-16-12, bbe 374 Moldaenke, Kiel, Germany) were also taken. Site II: Jiřická pond (Czech Republic, 375 48.616034 N 14.676594 E) with 0.0356 km2, volume 6.59 x103 m3, pH 5.6-6.2, maximum 376 depth 3.7 m, retention time ~5-7 days, dystrophic, located in the Novohradské mountains of 377 Southern Bohemia [21]. Five samples were collected from the epilimnion (0.5m) from May 378 2016 to August 2017.

379 Filtration and DNA extraction: All water samples (ca. 10 L each) from Římov reservoir and 380 Jiřická pond were sequentially filtered through 20 µm, 5 µm and 0.22 µm polycarbonate 381 membrane filters (Sterlitech, USA). The 0.22 µm filters (containing the 5 - 0.22 µm microbial 382 size fraction) were cut in small pieces (≅ 3–5 mm) using sterile scissors and processed for 383 DNA extraction using the ZR Soil Microbe DNA MiniPrep kit (Zymo Research, Irvine, CA, 384 USA), following the manufacturer’s instructions.

385 Preprocessing of metagenomic datasets: Shotgun sequencing was performed using 386 Illumina HiSeq4000 (for samples of 2015 and 2016 - 2x 151bp) (BGI HongKong, China) and 387 Novaseq 6000 (for samples of 2017 and Jiřická 2016 - 2x 151bp) (Novogene, HongKong, 388 China). Raw Illumina metagenomic reads were preprocessed in order to remove low-quality 389 bases/reads and adaptor sequences using the bbmap package [63]. Briefly, the PE reads 390 were interleaved by reformat.sh and quality trimmed by bbduk.sh (using a Phred quality 391 score of 18). Subsequently, bbduk.sh was used for adapter trimming and 392 identification/removal of possible PhiX and p-Fosil2 contamination. Additional checks (i.e. de 393 novo adapter identification with bbmerge.sh) were performed in order to ensure that the 394 datasets meet the quality threshold necessary for assembly. The preprocessed reads were 395 assembled independently with MEGAHIT (v1.1.5) [64] using the k-mer sizes: 396 49,69,89,109,129,149, and default settings.

397 Publicly available freshwater metagenomes (total 149 datasets) were downloaded and 398 assembled as described above. Basic metadata (sampling date, location, depth, Bioproject bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

399 identifiers, SRA accessions), and sequence statistics of all metagenomes generated or used 400 in this study are provided in Supplementary Table S1.

401 16S rRNA abundance based taxonomic classification: 20 million reads were randomly 402 sampled from each metagenome and compared to the SILVA database (version 132) [65] to 403 identify candidate 16S reads using an e-value cutoff of 1e-3 using MMSeqs2 [66]. The 404 candidate reads were further screened with ssu-align [67] to find bona-fide 16S rRNA 405 sequences. The 16S rRNA sequences were compared to the SILVA database using blastn 406 and of the best hits was used to obtain the final taxonomic classification.

407 Gene prediction, phage detection and annotation: Prodigal was used for gene prediction 408 in metagenomic mode [68]. To retrieve complete phage genomes we selected assembled 409 contigs >10 Kb that provided evidence of a circular genome as described before [19] 410 (n=3576 circles). These sequences were scanned with the VirSorter tool (default settings, 411 Virome and RefSeq decontamination mode, scores 1 and 2) 412 (https://de.iplantcollaborative.org/de/) [69] and MARVEL [70], and those contigs that were 413 detected by either method to be of phage origin were retained. Finally, a set of 2034 414 genomes were judged to be complete. Existing marine phage genomes were recovered from 415 the Tara Oceans metavirome assemblies[24,25] and the uvMED dataset[71] ans re-run 416 through VirSorter (using the same criteria as for freshwater phage genomes). Additional 417 manual curation was performed using NCBI Batch CDD server to minimize errors in phage 418 identification [72].

419 Recovery of actinobacterial genomes: The curated metagenomic datasets of Římov 420 reservoir and Jiřická Pond were mapped using bbwrap.sh [73] (kfilter=31 subfilter=15 421 maxindel=80) against the assembled contigs (longer than 3 Kb) in a lake-dependent fashion. 422 The resulting BAM files (324 for Římov Reservoir, 25 for Jiřická Pond, respectively) were 423 used to generate contig abundance files with jgi_summarize_bam_contig_depths [74] (-- 424 percentIdentity 97). The contigs and their abundance files were used for binning with 425 MetaBAT2 [74](default settings). Bin completeness, contamination and strain heterogeneity 426 were estimated using CheckM [75] (with default parameters). Bins with estimated 427 completeness above 40% and contamination below 5% were denominated as metagenome- 428 assembled genomes (MAGs). MAGs were taxonomically classified with GTDB-Tk [56] with 429 default settings. MAGs belonging to Actinobacteria (444), together with reference genomes 430 (205) recovered from public repositories (Supplementary Table 6) were annotated using the 431 TIGRFAMs database [76]. 35 conserved marker proteins (Supplementary Table S7) were 432 extracted from the annotated Actinobacteria genomes. MAGs that had more than 19 markers 433 present and reference genomes were used for phylogenetic reconstruction. Briefly, 434 homologous proteins were independently aligned with PRANK [77] (default settings), bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

435 trimmed with BMGE [78] (-t AA -g 0.5 -b 3 -m BLOSUM30) and concatenated. A maximum- 436 likelihood phylogeny was constructed using IQ-TREE [79] with the VT+F+R10 substitution 437 model (chosen as the best-fitting model by ModelFinder [80]) and 1,000 ultrafast bootstrap 438 replicates [81]. These samples also allowed us to assemble and bin ca. 2400 microbial 439 genomes in order to maximize chances for host prediction for the recovered phages.

440 Sequence annotation: All selected viral contigs were compared to the NCBI non-redundant 441 protein database. Protein domains in coding sequences were annotated with Interproscan 442 [82,83]. Searches were performed locally using HMMER3 package [84] with e-value = 1e-3 443 for Clusters of Orthologous Groups (COGS) [85] and trusted score cutoffs for TIGR Families 444 – TIGRfams [76]. Pfam domains were identified in all datasets using the script pfam_scan.pl 445 (ftp://ftp.sanger.ac.uk/pub/databases/Pfam/Tools/) with the PFAM database release 31 [86].

446 Host prediction: Multiple methods were used to assign a host to a phage genome. Host- 447 specific genes were used to predict the host e.g. photosystem genes to link to cyanophages 448 [87] and whiB for Actinobacteria [19]. Integration sites in the host genome (termed attB), 449 usually a tRNA locus, were checked using BLASTN [88] for assigning a specific association 450 between host and phage as described before [71]. CRISPR spacers in microbial genomes 451 were detected using minced (https://github.com/ctSkennerton/minced) and for host prediction 452 spacers were compared to phage genomes using BLASTN with stringent cutoffs (alignment 453 length ≥30 bp, ≥97% nucleotide identity, ≥97% query coverage, ≤1e-5). Additionally a direct 454 comparison of phage genomes to host genomes was made using BLASTN to identify shared 455 nucleotide sequences (alignment length ≥30 bp, ≥97% nucleotide identity, ≥97% query 456 coverage, ≤1e-5) [89].

457 Selecting representative phage genomes: Caudovirales (tailed phages) phage genomes 458 were divided into three sets, from NCBI Viral RefSeq (n=1996), marine Caudovirales (Tara 459 Oceans and uvMED, n=1335) and freshwater Caudovirales (n=2034). Within each set all-vs- 460 all blastn comparisons were made retaining significant matches at evalue <1e-3 and >95% 461 nucleotide identity. For each comparison, two phages were considered as belonging to a 462 cluster if the genome coverage of both phages in a pairwise comparison was ≥95%. Clusters 463 of phages that meet these criteria were then merged together if they shared a phage genome 464 in common (single linkage). The longest phage in each cluster was selected as a 465 representative phage genome for this cluster. Clustering at these relatively high nucleotide 466 identity levels and genome coverages is expected to retain phage genomes that are very 467 closely related at the genomic level. At these cutoffs, the number of representatives in each 468 dataset was: RefSeq 1887, marine 1202 and freshwater 1330, making a total of 4419 469 representative phage genomes. 470 Phage proteomic tree: An all-vs-all tblastx comparison was performed for all 4419 phages bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

471 (-M BLOSUM45 –e 1e-3) and the scores of all significant hits were added together to provide 472 a comparison score for all pairwise comparisons. The comparison scores between two 473 phage genomes were normalized by the self-comparison of both phages to provide a 474 similarity metric (Dice coefficient). For example, 2 * comparison score of A and B / (self- 475 comparison score of A + self-comparison score B). The Dice coefficient was subtracted from 476 1 to provide a distance measure [25,71]. The resultant distance matrix of the all-vs-all 477 comparison was used to generate 10000 alternative, but equally valid tree topologies using 478 clearcut [90] (clearcut –in distance_matrix.txt –out 10000.trees –n 10000), and a consensus 479 tree was computed using IQ-TREE [79] (iqtree -t 10000.trees –con consensus.tre) that 480 contains confidence values for each comparison. The final tree was visualized in iTOL [91] 481 (http://itol.embl.de).

482 vContact2 classification: vContact2 was run on the Cyberverse infrastructure using 483 diamond, freshwater phages only (1330 genomes, dereplicated), marine phages only (1202 484 genomes, dereplicated), and finally both freshwater and marine phages together (2532 485 genomes).

486 Fragment recruitment: All phage genomes were compared to all metagenomic datasets 487 using RazerS 3 [92] (using cutoffs of >95% identity and alignment lengths ≥50bp) to compute 488 coverage per Gb. For microbial genomes, all rRNA sequences (5S, 16S and 23S) were 489 identified using rrna_hmm [93] and were masked prior to comparisons with metagenomic 490 sequences. At most, 20 million reads were used from metagenomic datasets for computing 491 abundances of microbial genomes while a full set of reads were used for phage genomes. 492 Raw data (coverage per Gb) is given in Supplementary Table S2 (phages) and 493 Supplementary Table S6 (Actinobacteria). A phage genome or microbial genome was 494 considered to be present only when it presented >80% genome coverage. Heatmaps were 495 created using http://heatmapper.ca [94].

496

497 References

498 1. Sommer U, Adrian R, De Senerpont Domis L, Elser JJ, Gaedke U, Ibelings B, et al. 499 Beyond the Plankton Ecology Group (PEG) Model: Mechanisms Driving Plankton 500 Succession. Annu Rev Ecol Evol Syst. Annual Reviews; 2012;43: 429–448. 501 doi:10.1146/annurev-ecolsys-110411-160251

502 2. Wetzel RG. Freshwater Ecosystems. Encyclopedia of Biodiversity. Elsevier; 2001. pp. 503 560–569. doi:10.1016/B978-0-12-384719-5.00060-5

504 3. Suttle CA. The significance of viruses to mortality in aquatic microbial communities. 505 Microb Ecol. 1994;28: 237–243. doi:10.1007/BF00166813 bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

506 4. Fuhrman JA, Noble RT. Viruses and protists cause similar bacterial mortality in coastal 507 seawater. Limnol Oceanogr. 1995;40: 1236–1242. doi:10.4319/lo.1995.40.7.1236

508 5. Gobler CJ, Hutchins DA, Fisher NS, Cosper EM, Saňudo-Wilhelmy SA. Release and 509 bioavailability of C, N, P Se, and Fe following viral lysis of a marine chrysophyte. 510 Limnol Oceanogr. 1997;42: 1492–1504. doi:10.4319/lo.1997.42.7.1492

511 6. Middelboe M, Jørgensen NOG. Viral lysis of bacteria: an important source of dissolved 512 amino acids and cell wall compounds. J Mar Biol Assoc United . 2006;86: 513 605–612. doi:10.1017/S0025315406013518

514 7. Waldor MK, Mekalanos JJ. Lysogenic conversion by a filamentous phage encoding 515 cholera toxin. Science. 1996;272: 1910–4. Available: 516 http://www.ncbi.nlm.nih.gov/pubmed/8658163

517 8. Zeidner G, Bielawski JP, Shmoish M, Scanlan DJ, Sabehi G, Béjà O. Potential 518 photosynthesis gene recombination between Prochlorococcus and Synechococcus via 519 viral intermediates. Environ Microbiol. 2005;7: 1505–1513. doi:10.1111/j.1462- 520 2920.2005.00833.x

521 9. Rodriguez-Valera F, Martin-Cuadrado A-B, Rodriguez-Brito B, Pašić L, Thingstad TF, 522 Rohwer F, et al. Explaining microbial population genomics through phage predation. 523 Nat Rev Microbiol. 2009;7: 828–836. doi:10.1038/nrmicro2235

524 10. Suttle CA. Marine viruses — major players in the global ecosystem. Nat Rev Microbiol. 525 2007;5: 801–812. doi:10.1038/nrmicro1750

526 11. Yoshida T, Nagasaki K, Takashima Y, Shirai Y, Tomaru Y, Takao Y, et al. Ma-LMM01 527 Infecting Toxic Microcystis aeruginosa Illuminates Diverse Cyanophage Genome 528 Strategies. J Bacteriol. 2008;190: 1762–1772. doi:10.1128/JB.01534-07

529 12. Chénard C, Wirth JF, Suttle CA. Viruses Infecting a Freshwater Filamentous 530 Cyanobacterium ( Nostoc sp.) Encode a Functional CRISPR Array and a 531 Proteobacterial DNA Polymerase B. MBio. American Society for Microbiology; 2016;7: 532 e00667-16. doi:10.1128/mBio.00667-16

533 13. Moon K, Kang I, Kim S, Kim S-J, Cho J-C. Genome characteristics and environmental 534 distribution of the first phage that infects the LD28 clade, a freshwater methylotrophic 535 bacterial group. Environ Microbiol. Blackwell Publishing Ltd; 2017;19: 4714–4727. 536 doi:10.1111/1462-2920.13936

537 14. Moon K, Kang I, Kim S, Kim S-J, Cho J-C. Genomic and ecological study of two 538 distinctive freshwater bacteriophages infecting a Comamonadaceae bacterium. Sci 539 Rep. Nature Publishing Group; 2018;8: 7989. doi:10.1038/s41598-018-26363-y bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

540 15. Roux S, Enault F, Robin A, Ravet V, Personnic S, Theil S, et al. Assessing the 541 Diversity and Specificity of Two Freshwater Viral Communities through Metagenomics. 542 Martin DP, editor. PLoS One. 2012;7: e33641. doi:10.1371/journal.pone.0033641

543 16. Roux S, Chan L-K, Egan R, Malmstrom RR, McMahon KD, Sullivan MB. Ecogenomics 544 of virophages and their giant virus hosts assessed through time series metagenomics. 545 Nat Commun. Nature Publishing Group; 2017;8: 858. doi:10.1038/s41467-017-01086- 546 2

547 17. Skvortsov T, De Leeuwe C, Quinn JP, McGrath JW, Allen CCR, McElarney Y, et al. 548 Metagenomic characterisation of the viral community of lough neagh, the largest 549 freshwater lake in Ireland. PLoS One. Public Library of Science; 2016;11. 550 doi:10.1371/journal.pone.0150361

551 18. Arkhipova K, Skvortsov T, Quinn JP, McGrath JW, Allen CCR, Dutilh BE, et al. 552 Temporal dynamics of uncultured viruses: a new dimension in viral diversity. ISME J. 553 Nature Publishing Group; 2018;12: 199–211. doi:10.1038/ismej.2017.157

554 19. Ghai R, Mehrshad M, Mizuno CM, Rodriguez-Valera F. Metagenomic recovery of 555 phage genomes of uncultured freshwater actinobacteria. ISME J. Nature Publishing 556 Group; 2017;11: 304–308. doi:10.1038/ismej.2016.110

557 20. Šimek K, Horňák K, Jezbera J, Nedoma J, Znachor P, Hejzlar J, et al. Spatio-temporal 558 patterns of bacterioplankton production and community composition related to 559 phytoplankton composition and protistan bacterivory in a dam reservoir. Aquat Microb 560 Ecol. 2008;51: 249–262. doi:10.3354/ame01193

561 21. Gabaldón C, Devetter M, Hejzlar J, Šimek K, Znachor P, Nedoma J, et al. Repeated 562 flood disturbance enhances rotifer dominance and diversity in a zooplankton 563 community of a small dammed mountain pond. J Limnol. Page Press Publications; 564 2016;76: 292–304. doi:10.4081/jlimnol.2016.1544

565 22. Znachor P, Hejzlar J, Vrba J, Nedoma J, Seda J, Simek K, et al. Brief history of long- 566 term ecological research into aquatic ecosystems and their catchments in the Czech 567 Republic. Part I : Manmade reservoirs. 2016.

568 23. Ghai R, Mizuno CM, Picazo A, Camacho A, Rodriguez-Valera F. Metagenomics 569 uncovers a new group of low GC and ultra-small marine Actinobacteria. Sci Rep. 570 2013;3: 2471. doi:10.1038/srep02471

571 24. Brum JR, Sullivan MB, Ignacio-espinoza JC, Roux S, Doulcier G, Acinas SG, et al. 572 Patterns and ecological drivers of ocean viral communities. Science (80- ). 2015;348: 573 1261498-1–11. doi:10.1126/science.1261498 bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

574 25. Nishimura Y, Watai H, Honda T, Mihara T, Omae K, Roux S, et al. Environmental Viral 575 Genomes Shed New Light on Virus-Host Interactions in the Ocean. Tamaki H, editor. 576 mSphere. American Society for Microbiology; 2017;2. doi:10.1128/mSphere.00359-16

577 26. Pernthaler J, Sattler B, Šimek K, Schwarzenbacher A, Psenner R. Top-down effects 578 on the size-biomass distribution of a freshwater bacterioplankton community. Aquat 579 Microb Ecol. 1996; doi:10.3354/ame010255

580 27. Devoto AE, Santini JM, Olm MR, Anantharaman K, Munk P, Tung J, et al. 581 Megaphages infect Prevotella and variants are widespread in gut microbiomes. Nat 582 Microbiol. Nature Publishing Group; 2019;4: 693–700. doi:10.1038/s41564-018-0338- 583 9

584 28. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of 585 protein or nucleotide sequences. Bioinformatics. 2006;22: 1658–1659. 586 doi:10.1093/bioinformatics/btl158

587 29. Bolduc B, Jang H Bin, Doulcier G, You Z-Q, Roux S, Sullivan MB. vConTACT: an 588 iVirus tool to classify double-stranded DNA viruses that infect Archaea and Bacteria. 589 PeerJ. PeerJ; 2017;5: e3243. doi:10.7717/peerj.3243

590 30. Neuenschwander SM, Ghai R, Pernthaler J, Salcher MM. Microdiversification in 591 genome-streamlined ubiquitous freshwater Actinobacteria. ISME J. Nature Publishing 592 Group; 2018;12: 185–198. doi:10.1038/ismej.2017.156

593 31. Warnecke F, Amann R, Pernthaler J. Actinobacterial 16S rRNA genes from freshwater 594 habitats cluster in four distinct lineages. Environ Microbiol. 2004;6: 242–253. 595 doi:10.1111/j.1462-2920.2004.00561.x

596 32. Hahn MW, Lunsdorf H, Wu Q, Schauer M, Hofle MG, Boenigk J, et al. Isolation of 597 Novel Ultramicrobacteria Classified as Actinobacteria from Five Freshwater Habitats in 598 Europe and Asia. Appl Environ Microbiol. 2003;69: 1442–1451. 599 doi:10.1128/AEM.69.3.1442-1451.2003

600 33. Barth H, Aktories K, Popoff MR, Stiles BG. Binary Bacterial Toxins: Biochemistry, 601 Biology, and Applications of Common Clostridium and Bacillus Proteins. Microbiol Mol 602 Biol Rev. American Society for Microbiology; 2004;68: 373–402. 603 doi:10.1128/MMBR.68.3.373-402.2004

604 34. Lainhart W, Stolfa G, Koudelka GB. Shiga Toxin as a Bacterial Defense against a 605 Eukaryotic Predator, Tetrahymena thermophila. J Bacteriol. 2009;191: 5116–5122. 606 doi:10.1128/JB.00508-09

607 35. Arnold JW, Koudelka GB. The Trojan Horse of the microbiological arms race: phage- bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

608 encoded toxins as a defence against eukaryotic predators. Environ Microbiol. 2014;16: 609 454–466. doi:10.1111/1462-2920.12232

610 36. Casas V, Miyake J, Balsley H, Roark J, Telles S, Leeds S, et al. Widespread 611 occurrence of phage-encoded exotoxin genes in terrestrial and aquatic environments 612 in Southern California. FEMS Microbiol Lett. 2006;261: 141–149. doi:10.1111/j.1574- 613 6968.2006.00345.x

614 37. Imlay JA. Where in the world do bacteria experience oxidative stress? Environ 615 Microbiol. Blackwell Publishing Ltd; 2019;21: 521–530. doi:10.1111/1462-2920.14445

616 38. Imlay JA. The molecular mechanisms and physiological consequences of oxidative 617 stress: lessons from a model bacterium. Nat Rev Microbiol. 2013;11: 443–454. 618 doi:10.1038/nrmicro3032

619 39. Ezraty B, Gennaris A, Barras F, Collet J-F. Oxidative stress, protein damage and 620 repair in bacteria. Nat Rev Microbiol. Nature Publishing Group; 2017;15: 385–396. 621 doi:10.1038/nrmicro.2017.26

622 40. Mizuno CM, Rodriguez-Valera F, Garcia-Heredia I, Martin-Cuadrado AB, Ghai R. 623 Reconstruction of novel cyanobacterial siphovirus genomes from mediterranean 624 metagenomic fosmids. Appl Environ Microbiol. 2013;79: 688–695. 625 doi:10.1128/AEM.02742-12

626 41. Weinbauer MG, Höfle MG. Significance of viral lysis and flagellate grazing as factors 627 controlling bacterioplankton production in a eutrophic lake. Appl Environ Microbiol. 628 1998;

629 42. Šimek K, Nedoma J, Znachor P, Kasalický V, Jezbera J, Hornňák K, et al. A finely 630 tuned symphony of factors modulates the microbial food web of a freshwater reservoir 631 in spring. Limnol Oceanogr. American Society of Limnology and Oceanography Inc.; 632 2014;59: 1477–1492. doi:10.4319/lo.2014.59.5.1477

633 43. Siegmund L, Burmester A, Fischer MS, Wöstemeyer J. A model for endosymbiosis: 634 Interaction between Tetrahymena pyriformis and Escherichia coli. Eur J Protistol. 635 2013;49: 552–563. doi:10.1016/j.ejop.2013.04.007

636 44. Gourabathini P, Brandl MT, Redding KS, Gunderson JH, Berk SG. Interactions 637 between Food-Borne Pathogens and Protozoa Isolated from Lettuce and Spinach. 638 Appl Environ Microbiol. 2008;74: 2518–2525. doi:10.1128/AEM.02709-07

639 45. Rehfuss MYM, Parker CT, Brandl MT. Salmonella transcriptional signature in 640 Tetrahymena phagosomes and role of acid tolerance in passage through the protist. 641 ISME J. 2011;5: 262–273. doi:10.1038/ismej.2010.128 bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

642 46. Potter SC, Luciani A, Eddy SR, Park Y, Lopez R, Finn RD. HMMER web server: 2018 643 update. Nucleic Acids Res. Oxford University Press; 2018;46: W200–W204. 644 doi:10.1093/nar/gky448

645 47. Zimmermann L, Stephens A, Nam S-Z, Rau D, Kübler J, Lozajic M, et al. A 646 Completely Reimplemented MPI Bioinformatics Toolkit with a New HHpred Server at 647 its Core. J Mol Biol. Academic Press; 2018;430: 2237–2243. 648 doi:10.1016/j.jmb.2017.12.007

649 48. Glockner FO, Zaichikov E, Belkova N, Denissova L, Pernthaler J, Pernthaler A, et al. 650 Comparative 16S rRNA Analysis of Lake Bacterioplankton Reveals Globally 651 Distributed Phylogenetic Clusters Including an Abundant Group of Actinobacteria. 652 Appl Environ Microbiol. 2000;66: 5053–5065. doi:10.1128/AEM.66.11.5053-5065.2000

653 49. Ghai R, McMahon KD, Rodriguez-Valera F. Breaking a paradigm: cosmopolitan and 654 abundant freshwater actinobacteria are low GC. Environ Microbiol Rep. 2012;4: 29– 655 35. doi:10.1111/j.1758-2229.2011.00274.x

656 50. Kang I, Kim S, Islam MR, Cho J-C. The first complete genome sequences of the acI 657 lineage, the most abundant freshwater Actinobacteria, obtained by whole-genome- 658 amplification of dilution-to-extinction cultures. Sci Rep. NLM (Medline); 2017;7: 46830. 659 doi:10.1038/srep46830

660 51. Hahn MW, Schmidt J, Taipale SJ, Doolittle WF, Koll U. Rhodoluna lacicola gen. nov., 661 sp. nov., a planktonic freshwater bacterium with stream-lined genome. Int J Syst Evol 662 Microbiol. Society for General Microbiology; 2014;64: 3254–3263. 663 doi:10.1099/ijs.0.065292-0

664 52. Ghai R, Mizuno CM, Picazo A, Camacho A, Rodriguez-Valera F. Key roles for 665 freshwater Actinobacteria revealed by deep metagenomic sequencing. Mol Ecol. 666 Blackwell Publishing Ltd; 2014;23: 6073–6090. doi:10.1111/mec.12985

667 53. Newton RJ, Jones SE, Eiler A, McMahon KD, Bertilsson S. A Guide to the Natural 668 History of Freshwater Lake Bacteria. Microbiol Mol Biol Rev. American Society for 669 Microbiology; 2011;75: 14–49. doi:10.1128/MMBR.00028-10

670 54. Allgaier M, Grossart H-P. Diversity and Seasonal Dynamics of Actinobacteria 671 Populations in Four Lakes in Northeastern Germany. Appl Environ Microbiol. 2006;72: 672 3489–3497. doi:10.1128/AEM.72.5.3489-3497.2006

673 55. Wu QL, Zwart G, Wu J, Kamst-van Agterveld MP, Liu S, Hahn MW. Submersed 674 macrophytes play a key role in structuring bacterioplankton community composition in 675 the large, shallow, subtropical Taihu Lake, China. Environ Microbiol. 2007;9: 2765– bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

676 2774. doi:10.1111/j.1462-2920.2007.01388.x

677 56. Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil P-A, et al. A 678 standardized bacterial taxonomy based on genome phylogeny substantially revises 679 the tree of life. Nat Biotechnol. 2018;36: 996–1004. doi:10.1038/nbt.4229

680 57. Znachor P, Visocká V, Nedoma J, Rychtecký P. Spatial heterogeneity of diatom 681 silicification and growth in a eutrophic reservoir. Freshw Biol. 2013;58: 1889–1902. 682 doi:10.1111/fwb.12178

683 58. Enav H, Kirzner S, Lindell D, Mandel-Gutfreund Y, Béjà O. Adaptation to sub-optimal 684 hosts is a driver of viral diversification in the ocean. Nat Commun. Nature Publishing 685 Group; 2018;9: 4698. doi:10.1038/s41467-018-07164-3

686 59. Andrei AŞ, Salcher MM, Mehrshad M, Rychtecký P, Znachor P, Ghai R. Niche- 687 directed evolution modulates genome architecture in freshwater Planctomycetes. 688 ISME Journal. Nature Publishing Group; 1 Apr 2019. doi:10.1038/s41396-018-0332-5

689 60. Simek K, Pernthaler J, Weinbauer MG, Hornak K, Dolan JR, Nedoma J, et al. 690 Changes in Bacterial Community Composition and Dynamics and Viral Mortality Rates 691 Associated with Enhanced Flagellate Grazing in a Mesoeutrophic Reservoir. Appl 692 Environ Microbiol. 2001;67: 2723–2733. doi:10.1128/AEM.67.6.2723-2733.2001

693 61. Šimek K, Weinbauer MG, Hornák K, Jezbera J, Nedoma J, Dolan JR. Grazer and 694 virus-induced mortality of bacterioplankton accelerates development of Flectobacillus 695 populations in a freshwater community. Environ Microbiol. 2007;9: 789–800. 696 doi:10.1111/j.1462-2920.2006.01201.x

697 62. Mehrshad M, Salcher MM, Okazaki Y, Nakano S, Šimek K, Andrei A-S, et al. Hidden 698 in plain sight—highly abundant and diverse planktonic freshwater Chloroflexi. 699 Microbiome. Springer Nature America, Inc; 2018;6. doi:10.1186/s40168-018-0563-8

700 63. Bushnell B. BBMap (version 35.14) [Software]. Available at 701 https://sourceforge.net/projects/bbmap/. 2015;

702 64. Li D, Luo R, Liu C-M, Leung C-M, Ting H-F, Sadakane K, et al. MEGAHIT v1.0: A fast 703 and scalable metagenome assembler driven by advanced methodologies and 704 community practices. Methods. 2016;102: 3–11. doi:10.1016/j.ymeth.2016.02.020

705 65. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, et al. The SILVA 706 ribosomal RNA gene database project: improved data processing and web-based 707 tools. Nucleic Acids Res. 2012;41: D590–D596. doi:10.1093/nar/gks1219

708 66. Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

709 the analysis of massive data sets. Nat Biotechnol. Springer Nature; 2017; 710 doi:10.1038/nbt.3988

711 67. Nawrocki EP, R., Sean E. ssu-align: a tool for structural alignment of SSU rRNA 712 sequences. 2010.

713 68. Hyatt D, Chen GL, LoCascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: 714 Prokaryotic gene recognition and translation initiation site identification. BMC 715 Bioinformatics. 2010;11. doi:10.1186/1471-2105-11-119

716 69. Bolduc B, Youens-Clark K, Roux S, Hurwitz BL, Sullivan MB. IVirus: Facilitating new 717 insights in viral ecology with software and community data sets imbedded in a 718 cyberinfrastructure. ISME Journal. Nature Publishing Group; 2017. pp. 7–14. 719 doi:10.1038/ismej.2016.89

720 70. Amgarten D, Braga LPP, da Silva AM, Setubal JC. MARVEL, a Tool for Prediction of 721 Bacteriophage Sequences in Metagenomic Bins. Front Genet. 2018;9. 722 doi:10.3389/fgene.2018.00304

723 71. Mizuno CM, Rodriguez-Valera F, Kimes NE, Ghai R. Expanding the Marine 724 Virosphere Using Metagenomics. Rocha EPC, editor. PLoS Genet. 2013;9: e1003987. 725 doi:10.1371/journal.pgen.1003987

726 72. Marchler-Bauer A, Bo Y, Han L, He J, Lanczycki CJ, Lu S, et al. CDD/SPARCLE: 727 Functional classification of proteins via subfamily domain architectures. Nucleic Acids 728 Res. 2017; doi:10.1093/nar/gkw1129

729 73. Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, et al. BBMap 730 short-read aligner, and other bioinformatics tools. Bioinformatics. 2016; 731 doi:10.1111/2041-210X.12613

732 74. Kang DD, Froula J, Egan R, Wang Z. MetaBAT, an efficient tool for accurately 733 reconstructing single genomes from complex microbial communities. PeerJ. PeerJ; 734 2015;3: e1165. doi:10.7717/peerj.1165

735 75. Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. CheckM: assessing 736 the quality of microbial genomes recovered from isolates, single cells, and 737 metagenomes. Genome Res. 2015;25: 1043–55. doi:10.1101/gr.186072.114

738 76. Haft DH. TIGRFAMs: a protein family resource for the functional identification of 739 proteins. Nucleic Acids Res. 2001;29: 41–43. doi:10.1093/nar/29.1.41

740 77. Löytynoja A. Phylogeny-aware alignment with PRANK. Methods Mol Biol. 2014; 741 doi:10.1007/978-1-62703-646-7_10 bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

742 78. Criscuolo A, Gribaldo S. BMGE (Block Mapping and Gathering with Entropy): a new 743 software for selection of phylogenetic informative regions from multiple sequence 744 alignments. BMC Evol Biol. 2010;10: 210. doi:10.1186/1471-2148-10-210

745 79. Nguyen LT, Schmidt HA, Von Haeseler A, Minh BQ. IQ-TREE: A fast and effective 746 stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 747 2015; doi:10.1093/molbev/msu300

748 80. Kalyaanamoorthy S, Minh BQ, Wong TKF, Von Haeseler A, Jermiin LS. ModelFinder: 749 Fast model selection for accurate phylogenetic estimates. Nat Methods. 2017; 750 doi:10.1038/nmeth.4285

751 81. Hoang DT, Chernomor O, Von Haeseler A, Minh BQ, Vinh LS. UFBoot2: Improving 752 the ultrafast bootstrap approximation. Mol Biol Evol. 2018; 753 doi:10.1093/molbev/msx281

754 82. Bateman A. The Pfam protein families database. Nucleic Acids Res. 2004;32: 138D – 755 141. doi:10.1093/nar/gkh121

756 83. Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: 757 Genome-scale protein function classification. Bioinformatics. 2014; 758 doi:10.1093/bioinformatics/btu031

759 84. Eddy SR. Accelerated Profile HMM Searches. Pearson WR, editor. PLoS Comput 760 Biol. 2011;7: e1002195. doi:10.1371/journal.pcbi.1002195

761 85. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin E V, et al. The 762 COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003;4: 763 41. doi:10.1186/1471-2105-4-41

764 86. Finn RD, Bateman A, Clements J, Coggill P, Eberhardt RY, Eddy SR, et al. Pfam: The 765 protein families database. Nucleic Acids Research. 2014. doi:10.1093/nar/gkt1223

766 87. Mann NH, Cook A, Millard A, Bailey S, Clokie M. Bacterial photosynthesis genes in a 767 virus. Nature. 2003;424: 741–741. doi:10.1038/424741a

768 88. Altschul S. Gapped BLAST and PSI-BLAST: a new generation of protein database 769 search programs. Nucleic Acids Res. 1997;25: 3389–3402. 770 doi:10.1093/nar/25.17.3389

771 89. Edwards RA, McNair K, Faust K, Raes J, Dutilh BE. Computational approaches to 772 predict bacteriophage–host relationships. Smith M, editor. FEMS Microbiol Rev. 773 2016;40: 258–272. doi:10.1093/femsre/fuv048

774 90. Evans J, Sheneman L, Foster J. Relaxed neighbor joining: A fast distance-based bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

775 phylogenetic tree construction method. J Mol Evol. 2006; doi:10.1007/s00239-005- 776 0176-2

777 91. Letunic I, Bork P. Interactive Tree Of Life (iTOL) v4: recent updates and new 778 developments. Nucleic Acids Res. 2019; doi:10.1093/nar/gkz239

779 92. Weese D, Holtgrewe M, Reinert K. RazerS 3: Faster, fully sensitive read mapping. 780 Bioinformatics. 2012;28: 2592–2599. doi:10.1093/bioinformatics/bts505

781 93. Huang Y, Gilna P, Li W. Identification of ribosomal RNA genes in metagenomic 782 fragments. Bioinformatics. 2009;25: 1338–1340. doi:10.1093/bioinformatics/btp161

783 94. Babicki S, Arndt D, Marcu A, Liang Y, Grant JR, Maciejewski A, et al. Heatmapper: 784 web-enabled heat mapping for all. Nucleic Acids Res. 2016;44: W147–W153. 785 doi:10.1093/nar/gkw419

786

787

788

789 Figure legends

790

791 Figure 1. Genome statistics. a) Complete phage genome recovery as a function of 792 metagenome size. b) Total number of phage genomes vs dereplicated genomes from Řimov 793 reservoir (epi- and hypolimnion) and Jiřicka Pond. c) Comparison of all phage genomes and 794 dereplicated phage genomes from Viral Refseq, Marine habitat (TARA+uvMED) and the 795 freshwater habitat (UFO). d) Length distribution of marine phage genomes (inset: enlarged 796 view of phages up to length 100K). e) Length distribution of recovered freshwater phage 797 genomes (inset: enlarged view of phages up to length 100K). f) Comparative GC% 798 distributions of marine and freshwater phage genomes g) Number of unique protein clusters 799 in RefSeq, Marine and Freshwater phage genomes at 30% and 60% identity. h) Number of 800 phage genomes with predicted host of the freshwater dataset (inset: the same for lower 801 taxonomic ranks within Actinobacteria; Actino: Actinobacteriota; Nano: Nanopelagicales; 802 Acidi: Acidimicrobiales; Micro: Microbacteriales) i) GC% vs length for predicted (freshwater, 803 marine) and known actinobacterial phages (RefSeq).

804

805 Figure 2. Viral life strategies in freshwater environments. Left: The trojan horse strategy. A 806 phage encoding the eukaryotic toxin ADP-ribosyltransferase (VIP2 family) infects a microbe 807 that is then ingested by a eukaryotic predator (a flagellate is shown). In the phagolysosome bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

808 the toxin is expressed and released after cell lysis into the phagolysosome from where it 809 translocates to the cystosol. In the cytosol the toxin inhibits actino polymerization leading to 810 cell death. Right: Phage-mediated ROS defense. (a-b) A phage encoding ROS-defense 811 genes e.g. thioredoxin, glutaredoxin is ingested by a flagellate. (c-d) The damage produced 812 to the microbe by the ROS present in the phagolysosome will be reduced by the expression 813 of phage encoded genes, which will favour survival during the oxidative burst. (e) When the 814 phage infected microbe is released outside, cell lysis by the phage can proceed.

815

816 Figure 3. Relative abundance of 1398 Řimov phages in 18 metagenomes. A heatmap of 817 abundances is shown (coverage/Gb of metagenome normalized by Z-score). Phages are 818 clustered by sample and abundance (average linkage, Spearman Rank correlation). Vertical 819 bars on the right side indicate the classification of the clusters. Columns are annotated with 820 the temperature, depth of the sample and the sampling date (RE = 0.5m and RH = 30m). 821 Temperature color key is shown at top right.

822

823 Figure 4. Abundance profiles of the WhiB encoding actinophages recovered from Řimov 824 reservoir. Actinophages (n=125) are arranged based on average amino acid identity 825 (dendrogram not shown). From inside out the rings represent: phage genome size in Kb (red 826 line indicates the average genome length, 72.5Kb), phage genome GC%, water temperature 827 of samples whereof phages were assembled, span of clusters of phage genomes grouped by 828 similarity (nucleotide identity >95%), abundance profiles of each actinophage in the epi- and 829 hypolimnion time series datasets of Řimov reservoir (coverage per Gb of metagenome 830 normalized by Z-score). Color keys are shown in the center. Phages of Cluster1 are outlined 831 in black (top left of the circle).

832

833 Figure 5. Abundance profiles of actinobacterial genomes and MAGs combined with 834 phylogenomic analysis. The tree was obtained using complete reference genomes (n=205) 835 and recovered MAGs (n=350) of Řimov reservoir time series metagenomes (see methods). 836 From inside out the rings represent: the larger ring covering some tree branches highlights 837 detailed taxonomic levels of Microbacteraceae and Nanopelagicales, the second ring shows 838 class or order of Actinobacteria, followed by water temperature of samples whereof MAGs 839 were assembled and abundance profiles for each MAG in the epi- and hypolimnion time 840 series datasets of Řimov reservoir (coverage per Gb of metagenome normalized by Z-score). 841 The abundance profiles are shown only for MAGs recovered from the Řimov reservoir. The 842 red and yellow stars indicate acSTL and acTH1 MAGs with 16S rRNA sequences bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

843 respectively. Green stars indicate deep branching basal groups within the order 844 Nanopelagicales. Branch colors reflect bootstrap support (UFboot): black: >=95, red: 45-95 845 and gray: <45. Color keys are shown at the top. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made a) availableb) under aCC-BY-ND 4.0 International license. c) 2500 180 Rimov Freshwater 900 Hypolimnion RefSeq (UFO) R² = 0.54 2000 Marine Rimov (TARA+ 140 700 Epilimnion uvMED) 1500 100 500

Jiricka 1000 60 300 500 # of Phage ofgenomes Phage # # of Caudovirales Genomes # of Phage genomes Phage of # 20 100 0 100 200 300 400 500 ep. ep. ep. al ep. al ep. al ep. Total Total Total Metagenome Size (Gb) Der Der Der Tot Der Tot Der Tot Der d) 600 e) f) 150 600 150 Predicted Marine 200 (TARA+uvMED) Predicted Freshwater 500 500 (UFO) 100 100 150 400 400

300 50 300 50 100 Frequency Frequency 200 200 Frequency 0 0 50 100 0 20K 40K 60K 80K 100K 100 0 20K 40K 60K 80K 100K

0 0 0

0 100K 200K 300K 400K 0 100K 200K 300K 400K 30 40 50 60 70 Length (kb) Length (kb) (GC%) g) h) i) 400 100K Actinobacterial Phages RefSeq Isolates (RefSeq) Freshwater Acnobacteriota Marine Alphaproteobacteria Predicted Marine 80K (UFO) 325 (TARA+uvMED) Betaproteobacteriales (TARA+ 225 Predicted Freshwater (UFO) uvMED) Gammaproteobacteria 200 60K Actino. Bacteroidia 140 Planctomycetes 40K 100 Length (kb) Verrucomicrobiota Nano. 100 Bdellovibrionota 60 (acI) Acidi. 20K Cyanobacteriota # phages (acIV) # Specific Protein Clusters 20 Micro. Gemmamonadetes 0 0 0 20 40 60 80 100 120 140 160 180 200 30 40 50 60 70 30% 60% 30% 60% 30% 60% # of predicted phage genomes GC (%)

Figure 1. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Trojan Horseavailable under aCC-BY-ND 4.0 International license. Phage-mediated ROS defense

Flagella

a Actin Uninfected Uninfected cytoskeleton microbes microbes

Phage-induction b and Toxin-release Phage infected Phage infected ROS microbes microbes Cell lysis Phagosome c Phagosome e

d Disruption of Actin Cytoskeleton Viable, phage Flagellate infected microbe Flagellate Toxin translocation Exocytosis to cytosol products Nucleus Nucleus Cell-death

Figure 2. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made Row Z−Scoreavailable under aCC-BY-ND 4.0 International license.

-4 -2 0 2 4 Epilimnion Hypolimnion Mixis

y17 v15

RE-20apr16RH-20apr16RE-9nov16RH-9nov16 RH-27jun17RH-26jul17 RE-16jun15RE-4no RE-27jun17RE-26jul17 RH-15aug16RH-14apr17RH-23ma RH-18aug17RE-14apr17RE-23may17 RE-15aug16 RE-18aug17 Temperature (ºC) 4 10 18 21

Figure 3. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made AUG available under aCC-BY-ND JUL4.0 International license. JUN 2017 MAY APR HYPOLIMNION NOV AUG 2016 APR AUG JUL JUN 2017 MAY APR NOV EPILIMNION AUG 2016 APR NOV JUN 2015 CLUSTERS

RH-20APR16-C80

RE-23may17-C96 RE-27jun17-C42

RE-26jul17-C79 RH-26jul17-C209 RH-9nov16-C118 RH-23may17-C696RH-14apr17-C239

RH-27jun17-C246 RH-15aug16-C170 PHAGE GENOME_ID RE-9nov16-C108 RE-20APR16-C14

RE-23may17-C75

RE-4nov15-C26 RE-18aug17-C50 RE-26jul17-C512 TEMPERATURE (°C) RE-23may17-C288 GC % RH-27jun17-C495

RE-9nov16-C241 RH-20APR16-C187 LENGTH (Kb)

RH-15aug16-C364

RH-14apr17-C464 RH-23may17-C1427 72.5Kb RH-14apr17-C956 RH-23may17-C3230 RH-14apr17-C3266 RE-16jun15-C28 RE-23may17-C2712 RH-14apr17-C421 RH-18aug17-C3394 RE-4nov15-C131 RH-27jun17-C3583 RE-20APR16-C42 RH-20APR16-C2098 Z-score (coverage per Gb) RH-23may17-C10174 RE-14apr17-C38 RH-23may17-C2863 RH-9nov16-C234 RH-9nov16-C1702 RH-20APR16-C174 RE-23may17-C1134 RH-15aug16-C333 RH-27jun17-C7230 RH-14apr17-C6822 RH-23may17-C1325 -1.1 4.0 RH-18aug17-C6739 RE-23may17-C197 Temperature (°C) RH-23may17-C20198 RH-26jul17-C415 RH-26jul17-C6971 RH-27jun17-C456 RH-23may17-C11264 RH-18aug17-C435 RE-14apr17-C1183 RH-14apr17-C3644 RH-15aug16-C395 RE-16jun15-C2890 RH-14apr17-C488 4 10 24 RE-14apr17-C5287 RH-20APR16-C203 GC (%) RH-23may17-C32485 RH-27jun17-C1 RH-15aug16-C1620 RH-9nov16-C89521718 RE-18aug17-C1128 RE-9nov16-C9554 RH-15aug16-C342 RE-15aug16-C1082 RH-9nov16-C237 RE-23may17-C957 34 45 70 RE-20APR16-C296 RE-23may17-C3339 RH-9nov16-C961 RE-20APR16-C979 RH-15aug16-C1197 RE-23may17-C305 RE-16jun15-C177 RH-20APR16-C815 RE-18aug17-C258 RH-26jul17-C10537 RE-15aug16-C138 RH-23may17-C31223 RE-14apr17-C4966 RE-26jul17-C1538 RH-14apr17-C10761 RE-20APR16-C861 RH-27jun17-C1 RE-23may17-C15627 RH-27jun17-C3270 RH-18aug17-C10376 RH-26jul17-C1527 RH-23may17-C9366 RH-23may17-C4380 RH-27jun17-C1570 RH-23may17-C9190 RE-23may17-C3094 1229 RE-26jul17-C1584 RE-26jul17-C1635 RE-16jun15-C320 RE-26jul17-C1407 RH-23may17-C6643 RH-14apr17-C2087 RH-18aug17-C2154 RE-26jul17-C1027 RH-26jul17-C2296 RE-18aug17-C1483 RH-18aug17-C338 RH-23may17-C1236

RE-18aug17-C249 RH-14apr17-C431 RH-27jun17-C472

RH-20APR16-C178RH-26jul17-C428

RH-23may17-C1365RH-18aug17-C380 RH-15aug16-C340

RE-23may17-C205RE-20APR16-C45RE-26jul17-C170

RE-16jun15-C19 RE-27jun17-C474 RE-9nov16-C154 RE-26jul17-C183

Figure 4. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprintZ-score in perpetuity. (coverage It isper made Gb) Clade Genus available underBranch aCC-BY-ND marks 4.0 International license. Nanopelagicales Planktophila acSTL

Acidimicrobiia Nanopelagicus acI-C2 -1.1 4.0 Temperature (°C) Microbacteraceae “Luna” basal groups

AUG JUL JUN 2017 MAY 4 10 24 APR HYPOLIMNION NOV AUG 2016 APR AUG JUL JUN 2017 MAY APR NOV EPILIMNION AUG 2016 APR NOV JUN 2015 TEMPERATURE (°C) CLADE GENUS

Tree scale: 0.1

Figure 5. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Supplementary Information

Phage-centric ecological interactions in aquatic ecosystems revealed through ultra-deep metagenomics

Vinicius S. Kavagutti1, Adrian-Ştefan Andrei1, Maliheh Mehrshad1, Michaela M. Salcher1,2, Rohit Ghai1*

1Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre of the Academy of

Sciences of the Czech Republic, České Budějovice, Czech Republic.

2Limnological Station, Institute of Plant and Microbial Biology, University of Zurich, Seestrasse 187, 8802

Kilchberg, Switzerland.

*Corresponding author: Rohit Ghai

Department of Aquatic Microbial Ecology, Institute of Hydrobiology, Biology Centre of the Academy of

Sciences, Na Sádkách 7, 370 05, České Budějovice, Czech Republic

Phone: +420 387 775 881

Fax: +420 385 310 248

E-mail: [email protected]

bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

a) 2015 available2016 under aCC-BY-ND 4.0 International2017 license. 2018 J J A S O N D J F M A M J J A S O N D J F M A M J J A S O N D 0 22

Temperature (°C) 10 22

18 20

4 Depth (m) 9

30 4

40

b) M A M J J A S O NM A M J J A S O M J J A S O N 0

Oxygen [mg/L] 10 12

9

20 6 Depth (m) 3 30 0

40 2015 2016 2017

Supplementary Figure S1. a) Water temperature along a depth profile from June 2015 to December 2017 in Římov reservoir. b) Oxygen concentrations along a depth profile from March 2015 to November 2017 in Římov reservoir. Red circles indicate the time and depth of the samples. Oxygen measurements were not available for all time points. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. a) Římov Reservoir

16jun154nov15 20apr1615aug169nov16 14apr1723may1727jun1726jul17 18aug17 20apr1615aug169nov16 14apr1723may1727jun1726jul17 18aug17 Water Temperature (WT) 100 High (18~24°C) 90 Medium (10~18°C) Low (4~10°C) 80

70 Alphaproteobacteria 60 Actinobacteria Bacteroidetes 50 Betaproteobacteriales Gammaproteobacteria 40 Deltaproteobacteria 30 Cyanobacteria 16S rRNA Abundance 16S rRNA 20 Planctomycetes Chloroflexi 10 2015 2016 2017 2016 2017 Other 0 Epilimnion (0.5m) Hypolimnion (30m) b) Jiřická Pond

1may16 1 25may1728jun177aug1728aug17 100

90

80

70

60 50 Alphaproteobacteria Cyanobacteria 40 Actinobacteria Verrucomicrobia Bacteroidetes 30

16S rRNA Abundance (%) 16S rRNA Betaproteobacteriales 20 Gammaproteobacteria Parcubacteria 10 Deltaproteobacteria Omnitrophica Other 0

Supplementary Figure S2. 16S rRNA based abundances of prokaryotic groups. The figure depicts the SILVA SSU (Ref NR 99 132) classification of 16S rRNA reads in the metagenomes of (a) Římov reservoir and (b) Jiřická Pond. Water temperature at the time of sampling is also indicated (scale top right). Římov epilimnion(n=10datasets),hypolimnion (n=8datasets)andJiřická(n=5datasets). Supplementary FigureS4. phages uptolength100K). Supplementary FigureS3. bioRxiv preprint not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade Frequency 100K 25K 50K 75K 0 0 doi: 1 0 https://doi.org/10.1101/670067 2 0 Římov reservoir 3 0 Epilimnion 4 0 5 0 ComparativeGC%distributionsoffreshwater metagenomicdatafor

6 Length distributionofViral RefSeqphagegenomes(inset:enlargedviewof 0 7

0 Frequency , available undera 100 200 300 400 500 600 8 0 0 9 0 0 10 0 ; this versionpostedJune13,2019. 100K 0 150 200 100 50 0 1 CC-BY-ND 4.0Internationallicense 0 0 2 Length (kb) 0 20K Římov reservoir 200K 3 0 GC% Hypolimnion 40K 4 0 5 0 60K 300K 6 0 80K 7 0 , 100K 8 0 400K 9 0 The copyrightholderforthispreprint(whichwas 10 0 . 0 1 0 2 0 3 0 4 0 Jiřická 5 0 6 0 7 0 8 0 9 0 10 0 bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder forHosts this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. Dataset Actinobacteriota

Marine Alphaproteobacteria

RefSeq Bacteroidota

UFO Betaproteobacteriales Cyanobacteriota Bootstrap Firmicutes

0 50 100 Gammaproteobacteria

Morphology GC (%) Siphoviridae

Myoviridae 25 50 75

Podoviridae Supplementary Figure S5. Phage proteomic tree showing relationships of freshwater (n=1330), marine (n=1202) and RefSeq (n=1887) phages used in this study. For more details see methods. From inside out: Gray colored tree branches represent bootstrap values below 100, phage dataset, predicted hosts, genome GC% and known phage morphology (for RefSeq). Figure made with iTOL (http://embl.itol.de). bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

Supplementary Figure S6. Genomic variations in a persistent actinophage (Cluster1). Whole genome alignment of a phage genome that was recovered independently 12 times. The variant polymorphisms are indicated with a “*” at the top. T: Terminase Large Subunit, C: Major Capsid Protein, P: Portal protein.

Row Z−Score Temperature (ºC) 4 10 18 21

-1 0 1 Jr.7aug17 Jr.28jun17 Jr.28aug17 Jr.11may16 Jr.25may17

Supplementary Figure S7. Relative abundance of 279 Jiřická phages in 5 metagenomes (coverage per Gb of metagenome normalized by Z-score). Phages are clustered by sample and abundance by average linkage (with Spearman Rank correlation method). Columns are annotated with the temperature and the sampling date. Temperature color key is shown at top right. bioRxiv preprint doi: https://doi.org/10.1101/670067; this version posted June 13, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license.

List of Supplementary Tables

Supplementary Table S1: Metagenomic datasets used in this study

Supplementary Table S2: Freshwater phage genome information

Supplementary Table S3: Phage Genomes encoding ADP-ribosyltransferase toxin (pfam domain PF03496)

Supplementary Table S4: Freshwater phage genomes encoding ribosomal proteins

Supplementary Table S5: Oxidative Stress Related Pfam Domains

Supplementary Table S6: Genome statistics and abundances of 444 actinobacterial bins

Supplementary Table S7: TIGRFAMs markers used for phylogenomic analyses