<<

1 Ecology and evolution of infecting uncultivated SUP05 bacteria as revealed 2 by single-cell- and meta- genomics 3 1 2 2 2 4 Simon Roux , Alyse K. Hawley , Monica Torres Beltran , Melanie Scofield , 5 Patrick Schwientek3, Ramunas Stepanauskas4, Tanja Woyke3, Steven J. Hallam2,5*, 1* 6 and Matthew B. Sullivan 7 8 Affiliations: 9 1Ecology and Evolutionary Biology, University of Arizona, USA 10 2 Department of Microbiology and Immunology, University of British Columbia, Canada 11 3 DOE Joint Genome Institute, Walnut Creek, California, USA 12 4 Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, USA 13 5 Graduate Program in Bioinformatics, University of British Columbia, Canada 14 15 *To whom correspondence should be addressed: 16 Ecology and Evolutionary Biology Department and Molecular and Cellular Biology 17 Department, Life Sciences South (Office LSS246, Lab LSS203 + LSS207). 1007 East 18 Lowell Street, Tucson AZ 85721, USA. Office: (520) 626-6297 e-mail: 19 [email protected] 20 21 University of British Columbia, Life Sciences Institute for Personalized Medicine, 22 Department of Microbiology & Immunology. 2552-2350 Health Sciences Mall, 23 Vancouver, British Columbia, V6T1Z3, Canada. Office: (604) 827-3420 FAX: (604) 822-

24 6041 e-mail: [email protected] 25 26 The authors declare no competing interests. 27 Summary: Single-cell amplified genome sequencing uncovers -host interactions in 28 uncultivated SUP05 bacteria with relevance to coupled biogeochemical cycling in marine 29 oxygen minimum zones. 30 Keywords: SUP05, , viruses, single cell genomics, oxygen minimum 31 zones, viral dark matter

1 32 Abstract: 33 Viruses modulate microbial communities and alter ecosystem functions. However, due to 34 cultivation bottlenecks specific virus-host interaction dynamics remain cryptic. Here we 35 examined 127 single-cell amplified genomes (SAGs) from uncultivated SUP05 bacteria 36 isolated from a marine oxygen minimum zone (OMZ) to identify 69 viral contigs 37 representing five new genera within dsDNA and ssDNA . 38 Infection frequencies suggest that ~1/3 of SUP05 bacteria are viral-infected, with higher 39 infection frequency where oxygen-deficiency was most severe. Observed Microviridae 40 clonality suggests recovery of bloom-terminating viruses, while systematic co-infection 41 between dsDNA and ssDNA viruses posits previously unrecognized cooperation modes. 42 Analyses of 186 microbial and viral metagenomes revealed that SUP05 viruses persisted 43 for years, but remained endemic to the OMZ. Finally, identification of virus-encoded 44 dissimilatory sulfite reductase suggests SUP05 viruses reprogram their host’s energy 45 metabolism. Together these results demonstrate closely coupled SUP05 virus-host co- 46 evolutionary dynamics with potential to modulate biogeochemical cycling in climate- 47 critical and expanding OMZs. 48 49

2 50 Main text: 51 Introduction 52 Microbial communities are critical drivers of the nutrient and energy flow in natural and 53 engineered ecosystems (Falkowski, Fenchel, and Delong 2008). In the last two decades, it 54 has progressively become clear that viral mediated predation, transfer and metabolic 55 reprogramming modulate the structure, function and evolutionary trajectory of these 56 microbial communities (Suttle 2007; Abedon 2009; Rodriguez-Valera et al. 2009; 57 Hurwitz, Hallam, and Sullivan 2013). At the same time, the vast majority of microbes and 58 viruses remain uncultivated and their diversity is extensive, so that model system-based

59 measurements rarely reflect the network properties of natural microbial communities. 60 While culture-independent methods, such as and metatranscriptomics, can 61 illuminate latent and expressed metabolic potential of microbial (Frias-Lopez et al. 2008; 62 Venter et al. 2004; Stewart, Ulloa, and DeLong 2012; DeLong et al. 2006) or viral 63 communities (Angly et al. 2006; Hurwitz, Hallam, and Sullivan 2013; Mizuno et al. 64 2013), interactions between community members remain difficult to resolve. 65 Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) containing 66 short stretches of viral or plasmid DNA separated between repeat sequences can provide 67 a record of past infections in uncultivated microbial communities. Together with 68 associated Cas (CRISPR-associated) , CRISPRs function as an adaptive immune 69 system in prokaryotes with the potential to suppress viral replication or horizontal gene 70 transfer (Sorek, Kunin, and Hugenholtz 2008). However, an application of CRISPR- 71 based virus-host association to both uncultivated hosts and viruses require the assembly 72 of complete or near-complete genomes of both entities, limiting their utility to lower 73 diversity ecosystems (Andersson and Banfield 2008; Anderson et al. 2011). Alternatively, 74 single-cell amplified genome (SAG) sequencing is emerging as a more direct method to 75 chart metabolic potential of individual cells within microbial communities with special

76 emphasis on candidate phyla that have no cultured representatives, (Rinke et al. 2013; 77 Swan et al. 2013; Yoon et al. 2011; Martinez-Garcia et al. 2012). Here we combine 78 metagenomic and single-cell genomic sequencing to explore virus-host interactions 79 within uncultivated bacteria inhabiting a marine oxygen minimum zone (OMZ). 80 Marine OMZs, defined by dissolved oxygen concentrations <20 μmol kg-1, are

3 81 oceanographic features that arise from elevated demand for respiratory oxygen in poorly 82 ventilated, highly stratified waters. OMZs are crucial for the biogeochemical cycles in 83 global ocean, as they represent hotspots for microbial-driven carbon, nitrogen and sulfur 84 transformations (Ulloa et al. 2012; Wright, Konwar, and Hallam 2012), and play a 85 disproportionate role in nitrogen loss processes and greenhouse gas cycling (Lam et al. 86 2009; Ward et al. 2009). Moreover, these zones are expanding due to changing ocean 87 water temperatures and circulation patterns (Stramma et al. 2008; Whitney, Freeland, and 88 Robert 2007). Given these changing physical and chemical conditions and the importance 89 of OMZs to ocean-atmosphere functioning, a clearer understanding of biological 90 responses is critical to develop a much-needed predictive modeling capacity for OMZs. 91 In OMZs, microbial communities drive matter and energy transformations, and 92 are typically dominated by sulfur-oxidizing Gammaproteobacteria related to the 93 chemoautotrophic gill symbionts of deep-sea clams and mussels (Stewart, Ulloa, and 94 DeLong 2012; Wright, Konwar, and Hallam 2012). Phylogenetic analysis indicates that 95 these bacteria are comprised of two primary lineages; one consisting of sequences 96 affiliated with SUP05 and clam and mussel symbionts, and the other consisting of 97 sequences affiliated with Arctic96BD-19 (Wright, Konwar, and Hallam 2012; Walsh et al. 98 2009). Both groups partition along gradients of oxygen and sulfide, with Arctic96BD-19 99 most prevalent in oxygenated waters and SUP05 most prevalent in anoxic or 100 anoxic/sulfidic waters (Wright, Konwar, and Hallam 2012). Niche partitioning between 101 SUP05 and Arctic96BD-19 is driven by complementary modes of carbon and energy 102 metabolism that harness alternative terminal electron acceptors. While both Arctic96BD- 103 19 and SUP05 use reduced sulfur compounds as electron donors to drive inorganic 104 carbon fixation, SUP05 manifests a more versatile energy metabolism linking carbon, 105 nitrogen and sulfur cycling within OMZ and hydrothermal vent waters (Stewart, Ulloa, 106 and DeLong 2012; Anantharaman et al. 2013; Canfield et al. 2010; Swan et al. 2011; 107 Mattes et al. 2013; Zaikova et al. 2010; Hawley et al. 2014; Anantharaman et al. 2014) . 108 Ocean viruses, predominantly investigated in the sunlit or photic zone, are 109 abundant, dynamic, and diverse (Suttle 2005), with growing evidence for direct roles in 110 metabolic reprogramming of microbial photosynthesis, central carbon metabolism and 111 sulfur cycling (Mann et al. 2003; Lindell et al. 2005; Clokie et al. 2006; Breitbart et al.

4 112 2007; Dammeyer et al 2008; Sharon et al. 2009; Sharon et al. 2011; Thompson et al. 113 2011; Hurwitz, Hallam, and Sullivan 2013). Preliminary studies suggest that similar 114 patterns are emerging in OMZ waters. In the Eastern Tropical South Pacific, a 115 metagenomic survey revealed specific viral populations endemic to OMZ waters 116 (Cassman et al. 2012). Consistent with most viral metagenome surveys, approximately 117 3% of sequences were affiliated with functionally annotated genes in public databases. 118 From a nitrogen and sulfur cycling perspective, viromes from the oxycline contained 119 genes encoding components of nitric oxide synthase, nitrate and nitrite ammonification 120 and ammonia assimilation pathways as well as inorganic sulfur assimilation (Cassman et 121 al. 2012). In anoxic waters, viromes contained genes encoding components of 122 denitrification, nitrate and nitrite ammonification and ammonia assimilation pathways as 123 well as sulfate reduction, thioredoxin-disulfide reductase and inorganic sulfur 124 assimilation (Cassman et al. 2012). More recently, metagenomic analyses of 125 hydrothermal vent plume microbial communities dominated by SUP05 bacteria enabled 126 phage genome assemblies presumed to infect SUP05 (Anantharaman et al. 2014). 127 Consistent with viruses encoding auxiliary metabolic genes (AMGs, Breitbart et al. 2007) 128 enabling viral reprogramming of microbial metabolic pathways (Lindell et al. 2005; 129 Thompson et al. 2011), putative SUP05 phage contained genes encoding reverse 130 dissimilatory sulfite reductase A and C positing a role for viruses in modulating the 131 marine sulfur cycle (Anantharaman et al. 2014). 132 Given that SUP05 and Arctic96BD-19 play key roles in OMZ ecology and 133 biogeochemistry, we designed an approach to target SUP05-associated viruses in a model 134 OMZ ecosystem, Saanich Inlet a seasonally anoxic fjord on the coast of Vancouver 135 Island, British Columbia, Canada. We obtained a SUP05 single-cell genomic dataset 136 spanning defined redox gradients in the Saanich Inlet water column, identified SUP05- 137 associated viruses infecting SAGs, and used resulting virus-host pairs as recruitment 138 platforms to estimate viral diversity, activity, dispersion, and potential impact on SUP05 139 population dynamics and metabolic capacity. The resulting datasets open an 140 unprecedented window on uncultivated virus-host dynamics in OMZs and provide an 141 analytical approach extensible to other natural or engineered ecosystems. 142

5 143 Results 144 Generating a SUP05 bacterial genomic dataset 145 SUP05 SAGs were generated at the Bigelow Laboratory for Ocean Sciences 146 (http://scgc.bigelow.org, (Stepanauskas and Sieracki 2007; Swan et al. 2013)). Briefly, 147 fluorescence-activated cell sorting was used to separate individual cells <10 µm in 148 diameter from 100, 150 and 185 meters water depth, spanning water column gradients of 149 oxygen and sulfide in Saanich Inlet (Figure 1 - figure supplement 1). Water column redox 150 conditions were typical for stratified summer months when SUP05 populations bloom in 151 deep basin waters. Approximately 300 anonymously sorted cells (discriminated solely 152 using fluorescence and size for sorting) per depth interval were subjected to multiple 153 displacement amplification (MDA) and the taxonomic identity of single amplified 154 genomes (SAGs) was determined by directly sequencing bacterial small subunit 155 ribosomal RNA (SSU rRNA) gene amplicons. SAGs affiliated with SUP05 (n=127) and 156 Arctic96BD-19 (n=9) populations were subsequently whole genome shotgun sequenced 157 on the Illumina HiSeq platform. Most (113/127) SUP05 SAGs fell into two major 158 operational taxonomic units (OTUs) or subclades, based on SSU rRNA gene sequence 159 clustering at the 97% identity threshold – SUP05_01 (n = 65) and SUP05_03 (n = 48) 160 (Figure 1 - figure supplement 2). SUP05_01 SAGs were recovered at 100, 150 and 185 161 meters, peaking at 150 meters, while SUP05_03 SAGs were more evenly distributed 162 between 150 and 185 meters. A number of SUP05 SAG assemblies contained viral 163 contigs consistent with sampling infected cells across the redoxcline. 164 165 New SUP05-associated phage genomes 166 Fifty bona fide viral contigs (Supplementary file 1, see Material & Methods) were 167 identified in 30 SUP05 SAGs using viral marker genes, hereafter termed “hallmark 168 genes” (Abrescia et al. 2012). SUP05 viral contigs were affiliated with known families of 169 Caudovirales (dsDNA) and Microviridae (ssDNA) bacteriophages. The presence of 170 Caudovirales is not surprising as they are commonly observed in oceanic samples 171 (Williamson et al. 2012; Hurwitz and Sullivan 2013), including the ETSP OMZ and 172 SUP05-dominated hydrothermal vent plumes (Cassman et al. 2012; Anantharaman et al. 173 2014). Microviridae, however, are usually observed in surface seawater or deep-sea

6 174 sediments and have not been previously associated with OMZs (Angly et al. 2009; 175 Tucker et al. 2011; Labonté and Suttle 2013b; Yoshida et al. 2013). Given the SUP05 176 lineages described above, we note that viral contigs recovered from SUP05_01 SAGs 177 were exclusively Caudovirales, whereas SUP05_03 SAGs contained both Caudovirales 178 and Microviridae. Using non-reference-based methods, an additional 19 contigs were 179 identified as putative viral sequences. These sequences did not encode hallmark genes, 180 but displayed genomic characteristics consistent with novel viral genomes including a 181 low ratio of characterized genes (i.e. most genes predicted on these contigs do not match 182 any sequences from the reference databases), a high number of short genes, and a low 183 number of strand changes between two consecutive genes (i.e. gene sets tend to be coded 184 on the same strand; see Material & Methods, Figure 1 - figure supplement 3). In total, 69 185 viral contigs encoding 898 predicted open reading frames over 529 kb were recovered 186 from SUP05 SAGs representing current viral infections. 187 188 Viral infection of SUP05 cells in 189 Forty-two out of 127 SUP05 SAGs sequenced contained one or more viral contigs 190 (Figure 1 – source data file), indicating that ~1/3 of SUP05 cells inhabiting the Saanich 191 Inlet water column were infected by viruses. Such lineage-specific infection frequency 192 determination is unprecedented in uncultivated or cultivated host cells, and is largely 193 consistent with community-averaged estimates for marine bacteria (Suttle 2007). As with 194 all the other means to estimate infection frequency and viral-induced microbial mortality 195 (Brum et al. 2014), there are caveats to these numbers including underestimation linked 196 to incomplete identification of viruses in the SAG datasets. Such an underestimation 197 could result from (i) lack of reference genomes, (ii) incomplete SAG genomes, (iii) early 198 infections not being detected prior to genome insertion and replication, or (iv) late 199 infections not being detected due to phage-directed degradation of host DNA preventing 200 16S identification during the SAG selection process. Since the infection frequency 201 estimates are largely consistent with community-based measurements, we expect that 202 these biases are small. 203 SUP05 viral infections showed strong depth partitioning along defined gradients 204 of oxygen and sulfide (Figure 1). At 100 meters a single SUP05 SAG (of 12) displayed

7 205 current viral infection, while the percentage of infected SUP05 SAGs increased to 28% 206 and 47% at 150 and 185 meters (Figure 1 – source data file). Consistent with previous 207 studies evaluating community-averaged lytic viral activity (Weinbauer, Brettar, and Ho 208 2003), cell-specific lytic viral infection estimates peaked where SUP05 is typically most 209 abundant and metabolically active in the Saanich Inlet water column (Hawley et al. 210 2014). Additionally, remnants of past infections were detected in SUP05 and 211 Arctic96BD-19 SAGs, including 13 putative and 25 CRISPR sequences 212 (Supplementary file 2). None of these ‘past infection’ sequences match the detected 213 ‘current infection’ viral contigs. 214 215 Patterns of co-infection between SUP05 ssDNA and dsDNA viruses 216 To better understand the ecological and evolutionary forces shaping SUP05 217 virus-host interactions in Saanich Inlet we focused on 12 viral reference contigs including 218 4 Caudovirales contigs longer than 15 kb (from 3 and 1 ) and 8 219 complete genomes of Microviridae. Genome organization (Figure 2) and phylogenetic 220 analysis (Figure 2 – figure supplement 1) revealed that all four Caudovirales contigs 221 represent new genera (share <40% of their genes, Lavigne et al. 2008, Figure 2 – source 222 data file) even when considering the viruses recently assembled from SUP05-dominated 223 microbial metagenomes (Anantharaman et al. 2014). All 8 Microviridae contigs shared 224 100% nucleotide identity, despite their recovery from different SUP05_03 SAGs 225 (Supplementary file 3), and represent a new genus within the subfamily 226 (Figure 2 – figure supplement 2 and 3). These identical Microviridae genomes could 227 represent a lineage-specific viral bloom,targeting the SUP05_03 subclade. SUP05 228 infection by Gokushovirinae extends the known host range from small parasitic bacteria 229 (namely Chlamydia, Bdellovibrio and ) to include free-living 230 Gammaproteobacteria, the first marine host identified for this subfamily of viruses 231 (Labonté and Suttle 2013a). 232 Curiously, most (11 of 12) Microviridae-infected SUP05_03 SAGs also contained 233 Podoviridae contigs (Supplementary file 4). While previously postulated based on 234 comparative genomics, lineage-specific co-infection between the ssDNA Microviridae 235 and dsDNA phages has not been observed (Roux et al. 2012). Such highly correlated

8 236 co-occurrence in SUP05 SAGs (Fisher exact test p-value = 2e-15) is consistent with non- 237 random co-infection. This could be linked to cooperative infection modes between 238 viruses or opportunistic infection of cells already infected by the other virus type, as seen 239 in the case of satellite viruses and (Murant and Mayo 1982; La Scola et al. 240 2008). It is worth noting that the exact nature of interaction between satellite and helper 241 viruses, or between virophages and their associated viruses, is still a matter of debate, and 242 this association between two phages previously thought to be autonomous and 243 independent (Microviridae and Caudovirales) presents a new variation on this theme 244 (Fisher 2012; Desnues and Raoult 2012; Krupovic and Cvirkaite-Krupovic 2012). 245 Because the modular theory of phage evolution postulates that phage genomes consist of 246 collections of gene modules, exchanged through proximity-enhanced recombination 247 (Hendrix et al. 2000) such co-infection of a single host by ssDNA and dsDNA phages 248 provides evidence for how such chimeric ssDNA-dsDNA viral genomes may come into 249 existence (Diemer and Stedman, 2012; Roux et al. 2013). 250 251 SUP05 viruses endemic to Saanich Inlet are stable over time 252 To extend our analysis of SUP05 virus-host interactions beyond individual SAGs, 253 we used the 12 reference viral contigs (i.e. the 4 Caudovirales and 8 Microviridae) as 254 platforms to recruit three years of Saanich Inlet microbial metagenome sequences 255 spanning the redoxcline (Figure 3 and Supplementary file 5). SUP05 Microviridae 256 contigs were inconsistently detected due to known methodological biases associated with 257 linker-amplified metagenome library construction (see Material & Methods), so we 258 focused on dsDNA viral contigs. All 4 SUP05 Caudovirales contigs were absent from 259 surface waters, but repeatedly detected within and below the oxycline, consistent with 260 SUP05 water column disposition (Figure 3A – figure supplement 1). Within the 261 Caudovirales, recruited microbial metagenome sequences were more similar to the 262 reference genome for Podoviridae contigs C22_13 and K04_0 (96% average amino-acid 263 identity), than for Siphoviridae G10_6 and Podoviridae M8F6_0 (92% average amino- 264 acid identity, Figure 2B). Beyond sequence variation, metagenome coverage in one 265 region of M8F6_0 (3 hypothetical open reading frames) was absent in 2009, minimal in 266 2010, and as abundant as surrounding genomic regions in 2011 (Figure 3C), suggesting a

9 267 selective sweep within this population. Contig-derived abundances of SUP05- 268 Caudovirales were in sync with host distributions, but at virus-to-host ratios of 0.01 to 269 0.3 (Figure 4). While tightly choreographed virus-host abundance dynamics parallels that 270 of cultured virus-host systems (e.g., cyanophages - (Waterbury and Valois 1993)), the 271 systematically lower (orders of magnitude lower than typical community measurements) 272 virus-to-host ratios observed here indicates that a greater diversity of SUP05 viruses 273 remains to be uncovered in the Saanich Inlet water column. 274 To determine SUP05 viral biogeography, we interrogated 74 viromes and 112 275 microbial metagenomes sourced from Pacific Ocean waters (Supplementary file 5). 276 Despite consistently recovering SUP05 viral sequences in Saanich Inlet, these sequences 277 were extremely uncommon in other locales (22 instances out of 803 possibilities; Figure 278 3 – figure supplement 2 and 3), even when proximal to Saanich Inlet (e.g. northeastern 279 subarctic Pacific (NESAP) coastal and open ocean waters along the LineP transect) or 280 when sourced from similar water column conditions (e.g. Eastern Tropical South Pacific 281 OMZ, ETSP). Of the 22 SUP05-related viruses detected, all but two were recovered 282 below 500 meters in NESAP OMZ samples, in which SUP05 bacteria were also detected 283 with similar abundance as in Saanich Inlet samples. The remaining two detections 284 derived from an ETSP OMZ virome and a hydrothermal vent plume microbial 285 metagenome from the Guaymas basin. Taken together, these observations point to 286 endemic SUP05 viral populations with the potential to modulate SUP05-mediated 287 biogeochemical cycling via lysis or metabolic reprogramming. 288 289 Potential impact of SUP05 phages on sulfur metabolism 290 Recent studies have highlighted the role of viruses in metabolic reprogramming, 291 from global photosynthesis (Mann et al. 2003; Lindell et al. 2005; Clokie et al. 2006; 292 Sullivan et al. 2006; Sharon et al. 2009) to central carbon metabolism (Sharon et al. 2011; 293 Thompson et al. 2011; Hurwitz, Hallam, and Sullivan 2013) via Auxiliary Metabolism 294 Genes (AMGs). Additionally, viruses assembled from microbial metagenomes from 295 SUP05 dominated hydrothermal vent samples contain sulfur cycling genes 296 (Anantharaman et al. 2014). Therefore, we looked for AMGs encoded on SUP05 viral 297 contigs in the Saanich Inlet water column.

10 298 Four putative AMGs were detected in 12 of the 69 viral contigs, predominantly 299 from SUP05_01 SAGs recovered from 150 meters (Supplementary file 6). One AMG 300 identified on a bona fide viral contig, phosphate-related phoH, is common among marine 301 phages, but remains functionally uncharacterized (Sullivan et al. 2010; Goldsmith et al. 302 2011). The remaining 3 AMGs including 2-oxoglutarate (2OG) and Fe(II)-dependent 303 oxygenase superfamily (2OG-FeII oxygenase), tripartite tricarboxylate transporter (tctA, 304 protein domain hit only) and dissimilatory sulfite reductase subunit C (dsrC) were 305 encoded on contigs identified by non-reference-based methods. In marine cyanophages, 306 2OG-FeII oxygenase-encoding genes are common where they are thought to modulate 307 host nitrogen metabolism during infection (Sullivan et al. 2010). However, the precise 308 metabolic role of tctA and dsrC-like genes during viral infection remains unknown. 309 Given that dsrC was found on 7 SUP05_01 viral contigs (Supplementary file 7) 310 and DsrC is critical in SUP05 energy metabolism (Walsh et al. 2009) we focused on this 311 gene. Although dsrC genes were only present on contigs identified by non-reference- 312 based methods they were closely related to dsrC-like genes encoded on the hydrothermal 313 vent plume phages (Anantharaman et al. 2014). Indeed, conceptually translated sequence 314 alignment of these viral dsrC genes including putative viral and bacterial genes from 315 microbial metagenomic datasets indicate that the Saanich Inlet 'viral' sequences belong to 316 one dsrC subgroup (dsrC_1 according to the classification of Anantharaman and 317 colleagues (Anantharaman et al. 2014)). In addition to high sequence similarity viral dsrC 318 genes from SUP05 SAGs co-localized on contigs with viral homologs (e.g. 2OG-FeII 319 oxygenase, chaperonin), and occurred in genomic context that was completely different 320 to the conserved and well-characterized dsrC region in SUP05 genomes (Figure 5A and 321 5B). 322 The dsrC_1 group encodes a protein retaining 15 conserved residues across 323 known DsrC subunits. However, the second C-terminal cysteine and a 7-8 residue 324 insertion thought to be required for DsrC function based on structural analysis of 325 Desulfovibrio vulgaris and Archaeoglobus fulgidus proteins are missing from the viral 326 protein (Figure 5 – figure supplement 1 (Mander et al. 2005, Oliveira et al. 2008)). These 327 differences suggest that either the viral encoded dsrC is non-functional or has a modified 328 function. Given that genes shared between different viral genomes rarely represent

11 329 nonfunctional genes it is likely that viral encoded dsrC plays a biological role in SUP05. 330 Indeed, there is precedent for divergent viral AMGs serving as modified functional 331 counterparts to host encoded homologues. Specifically, a highly divergent viral ‘pebA’ 332 (Sullivan et al. 2005) was experimentally demonstrated to perform the functions of two 333 host enzymes’ (pebA and pebB) as a bifunctional enzyme, phycoerythrobilin synthetase 334 (pebS) (Dammeyer et al 2008). 335 Given that viral dsrC genes were abundant in the Saanich Inlet water column over 336 a three-year time interval (Figure 5C) with peaked recovery consistent with blooming 337 SUP05 populations (Figure 5 – figure supplement 2) (Hawley et al. 2014), we posit that 338 this viral gene is functional in SUP05 sulfur cycling. Future functional characterization of 339 viral DsrC is needed to constrain viral roles in modulating SUP05 electron transfer 340 reactions during viral infection in the environment. 341 342 Conclusion 343 While new methods and model systems for identifying virus-host interactions 344 continue to emerge (Tadmor et al. 2011; Allers et al. 2013; Mizuno et al. 2013; Deng et 345 al. 2014), viral ecology remains predominantly community focused in nature. This is

346 because most hosts are uncultivated (Rappé and Giovannoni 2003), and culture-

347 independent viral metagenomes are dominated by ‘unknown’ sequences (Hurwitz and 348 Sullivan 2013), which inhibits developing a mechanism- and population-based viral 349 ecology. Here we use single-cell genomics to directly link SUP05 viruses and their hosts 350 across defined gradients of oxygen and sulfide over a three-year time interval in a model 351 OMZ ecosystem. This spatiotemporal resolution revealed endemic patterns of co- 352 infection between ssDNA and dsDNA viruses and the occurrence of AMGs with the 353 potential to modulate electron transfer reactions essential to SUP05 energy metabolism. 354 Together these findings offer novel perspectives on the ecology and evolution of viruses 355 infecting uncultivated bacterial populations. While the capacity to formulate such 356 linkages between cultured virus-host systems in nature is recognized (e.g., cyanophages 357 and pelagiphages), the use of single-cell genomics to explore such linkages in 358 uncultivated microbial communities represents a watershed moment in illuminating viral 359 dark matter and its role in modulating microbial interaction networks in natural and

12 360 engineered ecosystems. 361 362 Materials & Methods: 363 364 Sample collection, sequencing and assembly 365 Samples were collected in Saanich Inlet on Vancouver Island, British Columbia, on 366 the 09th of August 2011. Sample collection and biochemical measurements were 367 performed as previously described (Zaikova et al. 2010). Water column redox conditions 368 were typical for stratified summer months when SUP05 populations bloom in deep basin 369 waters. Individual cells <10 µm in diameter from 100, 150 and 185 meter depth samples 370 were subjected to fluorescence-activated cell sorting, multiple displacement amplification 371 (MDA) and taxonomic identification at the Bigelow Laboratory Single Cell Genomics 372 Center (SCGC; http://scgc.bigelow.org), following previously described procedures 373 (Stepanauskas and Sieracki 2007; Swan et al. 2013). A total of 315 single amplified 374 genomes (SAGs) per sample were subjected to multiple displacement amplification 375 (MDA) and the taxonomic identity of single amplified genomes (SAG) was determined 376 by directly sequencing bacterial small subunit ribosomal RNA (SSU rRNA) gene 377 amplicons. A total of 136 SAGs affiliated with SUP05 or Arctic96BD-19 were selected 378 for genome sequencing. Between 1-3 ug of MDA product was sent to Canada’s Michael 379 Smith Genome Sciences Center (Vancouver, BC) to create shotgun libraries. Briefly, the 380 DNA was sheared to 350-450 bp fragments using a Covaris E210 and purified using 381 AMPure XP Beads according to the manufacturer's instructions. The sheared DNA was 382 end-repaired and A-tailed according to the Illumina standard PE protocol and purified 383 again using AMPure XP Beads, generating paired-end 100-bp reads. Indexed libraries 384 were amplified by PCR for 6 cycles, gel-purified, pooled (11-12 samples per lane) and 385 QC assessed on a Bioanalyzer DNA Series II High Sensitivity chip (Agilent), and then 386 sequenced using an Illumina HiSeq2000 sequencer. 387 All raw Illumina sequence data was passed through DUK, a filtering program 388 developed at JGI, which removes known Illumina sequencing and library preparation 389 artifacts (Mingkun, Copeland, and Han). Artifact filtered sequence data was then screened 390 and trimmed according to the k–mers present in the dataset (Mingkun). High–depth k–

13 391 mers, presumably derived from MDA amplification bias, cause problems in the assembly, 392 especially if the k–mer depth varies in orders of magnitude for different regions of the 393 genome. Reads with high k–mer coverage (>30X average k–mer depth) were normalized 394 to an average depth of 30X. Reads with an average kmer depth of less than 2X were 395 removed. Following steps were then performed for assembly: i) normalized Illumina 396 reads were assembled using IDBA–UD version 1.0.9 (Peng et al. 2012), ii) 1–3 kb 397 simulated paired end reads were created from IDBA–UD contigs using wgsim 398 (https://github.com/lh3/wgsim), iii) normalized Illumina reads were assembled with 399 simulated read pairs using Allpaths–LG (version r42328) (Gnerre et al. 2011), iv) 400 Parameters for assembly steps were: i) IDBA–UD (––no local) ii) wgsim (–e 0 –1 100 –2 401 100 –r 0 –R 0 –X 0) iii) Allpaths–LG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1 402 FRAG COVERAGE=125 JUMP COVERAGE=25 LONG JUMP COV=50, 403 RunAllpathsLG: THREADS=8 RUN=std shredpairs TARGETS=standard VAPI WARN 404 ONLY=True OVERWRITE=True MIN CONTIG=2000). 405 406 SAG taxonomic assignment 407 SAG taxonomy was verified using the assembled contigs in two ways using 408 MetaPathways 1.0 (Konwar et al. 2013). First, the assemblies were blasted against the 409 SILVA (v.111) database to confirm the taxonomy based on SSU rRNA. Next, MEGAN5 410 was used to carry out taxonomic binning of all ORFs from the MetaPathways BLAST 411 output using the Lowest Common Ancestor (LCA) approach (Huson et al. 2007). 412 A total of 2711 SSU rRNA sequences previously taxonomically assigned to SUP05 413 and Arctic96BD-19 lineages were aligned and clustered using mothur v.1.27.0 (Schloss et 414 al. 2009) and 20 representative sequences for the most abundant clusters (cutoff = 6) at 415 97% similarity were selected. These representative sequences were used to build the 416 phylogenetic tree differentiating between SUP05 and Arctic96BD-19. Reference SUP05 417 and Arctic96BD-19 sequences from different environments and symbionts, and cluster 418 representative sequences were aligned using the SILVA aligner tool (http://www.arb- 419 silva.de/aligner/) and imported into an in-house ARB database for SUP05. Aligned 420 sequences were exported from ARB into Mesquite for manual alignment refinement. The 421 final phylogenetic tree was inferred from manually refined Mesquite alignment of

14 422 sequences using maximum likelihood implemented in PHYML using a GTR model with 423 estimated values for the α parameter of the Γ distribution and the proportion of invariable 424 sites. The confidence of each node was determined by assembling a consensus tree of 1 425 000 bootstrap replicates. 426 427 Microbial and viral metagenomes 428 The protocols used to generate the POV (Hurwitz and Sullivan 2013), ETSP OMZ 429 viromes (Cassman et al. 2012), ETSP microbial metagenomes and metatranscriptomes 430 (Ganesh et al. 2014; Stewart, Ulloa, and DeLong 2012), and Guaymas basin metagenome 431 (Anantharaman et al. 2013) are described in their respective publications. All these 432 datasets were sequenced with Roche 454 GL FLX Titanium systems, and quality 433 controlled reads were used in the different analysis computed in this study. 434 LineP and Malaspina viral metagenomes (viromes) were obtained from samples 435 collected during LineP (http://www.pac.dfo-mpo.gc.ca/science/oceans/data-donnees/line- 436 p/index-eng.html) and Malaspina (http://scientific.expedicionmalaspina.es/) cruises. 437 Particles were precipitated with Iron-Chloride from 0.2 µm filtrates, and resuspended in 438 EDTA-Mg-Ascorbate buffer (John et al. 2011) before the DNA was extracted using 439 Promega's Wizard Prep kit. Assembly and gene prediction were conducted through the 440 IMG/M ER pipeline (Markowitz et al. 2014). Microbial metagenome samples at Saanich 441 Inlet and along the LineP transect were also collected during LineP cruises 442 (http://www.pac.dfo-mpo.gc.ca/science/oceans/data-donnees/line-p/index-eng.html). 443 Sequencing and assembly of these datasets was conducted at the JGI. A list of the 444 different web servers and accession numbers for these publicly available datasets is 445 displayed in Supplementary file 5. 446 447 Detection of viral contigs in SUP05/Arctic SAG 448 SUP05 SAG contigs were annotated with the Metavir web server (Roux et al. 449 2014). Briefly, ORFs were predicted with MetaGeneAnnotator (Noguchi, Taniguchi, and 450 Itoh 2008), and compared to the RefseqVirus database with BLASTp (Altschul et al. 451 1997). In order to select viral-associated contigs, we looked for viral specific genes i.e. 452 genes associated with the formation of the and encapsidation of the genome

15 453 (designated as “hallmark viral genes”). Thus, we searched for all genes annotated as 454 “virion structure”, “capsid”, “portal”, “tail”, or “terminase”, and selected contigs 455 including at least one of these hallmark genes (Supplementary file 1). Among the 50 viral 456 contigs detected, we highlighted a set of 12 long (>15 kb) or circular contigs as the best 457 references available for SUP05 phages (Supplementary file 1). We then compared the 458 reference sequences retrieved in this first screening round to all the SUP05/Arctic96BD- 459 19 SAG contigs, in order to extract more viral-related sequences (Supplementary file 1). 460 At this step, all contigs with at least 50% of their genes similar to a previously detected 461 SUP05 viral contigs were retained (sequence similarity between predicted genes assessed 462 through BLASTp, thresholds of 0.001 for e-value and 50 for bit score). 463 Alternatively, we compared the SUP05/Arctic96BD-19 SAG contigs to a set of 464 ocean viromes (Supplementary file 5), and looked for every contig which was covered by 465 virome reads (for 454-sequenced viromes) or predicted genes (for HiSeq-sequenced 466 viromes) on at least 3 genes with at least 90% of identity (protein sequences). However, 467 this comparison to viromes only highlighted contigs already identified as viral from the 468 hallmark gene analysis. Finally, we looked for every sequence which could come from a 469 new type of phage, based on two known properties of phage genomes: most of their 470 genes are not similar to anything in the current databases, and they tend to be mostly 471 coded on the same strand (by block, or module) (Akhter, Aziz, and Edwards 2012). We 472 thus looked for all regions in SAG contigs composed of at least 50 % of uncharacterized 473 genes, with at least 80 % of them on the same coding strand. 19 new short viral contigs 474 were highlighted through this detection (Supplementary file 1), which displayed 475 characteristics close to the viral hallmark contigs (Figure 1 - figure supplement 3). 476 A set of regions of putative viral origin within bacterial contigs also stood out. 477 These sequences were manually curated to check if they could indeed be of viral origin, 478 notably by checking if these regions were conserved between closely related bacterial 479 contigs, and 13 putative defective prophages were eventually identified among them. 480 CRISPR regions were detected with the CRISPR recognition tool (Bland et al. 2007). All 481 spacers were extracted, and compared to all SUP05/Arctic96BD-19 SAG contigs with 482 BLASTn . 483

16 484 Annotation of viral contigs 485 The annotations of selected contigs were extracted from the Metavir web server 486 (Roux et al. 2014) and manually curated. Taxonomic affiliations were based on a BLAST 487 comparison to RefseqVirus and NR databases from NCBI, with a bit score threshold of 488 50 and and e-value thrueshold of 0.001. A tBLASTx comparison of larger contigs 489 (>15 kb) against WGS (Whole-Genome shotgun), HTGS (High-Throughput Genomic 490 Shotgun) and GSS (Genomic Survey Sequences) from the NCBI was used to add the 491 most closely related sequence to the analysis, which could have not been included in the 492 NR and Refseq database yet. This screening notably lead to the detection of two contigs 493 from a Gammaproteobacteria single-cell amplified genome (Gamma proteobacterium 494 SCGC AAA160-D02) similar to SUP05 phage genome, and was therefore included in the 495 phylogenetic and genome comparison analysis. The affiliation of SUP05 viruses to new 496 or existing genera was based on the criteria of 40% of genes shared within a genus 497 previously defined for Caudovirales (Lavigne et al. 2008). Map comparison figures were 498 created with Easyfig (M. J. Sullivan, Petty, and Beatson 2011). 499 Functional annotation was achieved through a domain search against the PFAM 500 database (Punta et al. 2012) (hmmscan (Eddy 2011), using a threshold of 0.001 for e- 501 value and 30 for score). When looking for putative AMGs, defective prophages were not 502 considered since these regions are likely to be subject to rearrangement and gene transfer, 503 and the origin of single genes within these regions is uncertain. A set of microbial dsrC 504 sequences were selected as references for SUP05 viral encoded dsrC genes in genomic 505 context (Fig. S11). Briefly, all contigs in SUP05 SAGs longer than 50 kb and containing 506 a DsrC-like gene were compared through BLASTn, and displayed with Easyfig (M. J. 507 Sullivan, Petty, and Beatson 2011). 508 509 Phage multiple alignments and phylogenetic trees 510 Maximum-likelihood trees were computed with PhyML (Guindon and Gascuel 511 2003) using a LG model, a CAT approximation for Gamma parameter, and computing 512 SH-like scores for node supports. All SUP05 contigs affiliated to Podoviridae and 513 including the major capsid protein gene were added in a single tree alongside reference 514 sequences from Autographivirinae and N4-like viruses. The most closely related

17 515 sequences to each SUP05 Podoviridae, as detected from the genome comparison 516 analysis, were also included in the tree. SUP05 Microviridae were included in a 517 phylogenetic tree based on the Major Capsid protein and centered around the 518 Gokushovirinae sub-family, with sequences from Pichovirinae used as outgroup. 519 Gokushovirinae reference sequences were taken from (Roux et al. 2012) and (Labonté 520 and Suttle 2013b). In order include more aquatic sequences, complete Microviridae 521 genomes were assembled from two sets of viromes sampled from a freshwater 522 subtropical reservoir (Tseng et al. 2013) and deep-sea sediments (Yoshida et al. 2013) and 523 annotated as previously described (Roux et al. 2012). Tree figures were drawn with Itol 524 (Letunic and Bork 2007). DsrC-like predicted protein sequences were aligned with 525 Muscle v3.8.31 (Edgar 2004) and the multiple alignment was displayed with Jalview 526 (Waterhouse et al. 2009). 527 528 Recruitment of metagenomic sequences to SUP05 viral genomes 529 A set of oceanic viromes and microbial metagenomes were used for comparison 530 with SUP05 viral genomes (Supplementary file 5). Similarities between SUP05 viral 531 genomes and published viromes were assessed through BLAST comparison, BLASTx for 532 454-sequenced viromes (POV dataset (Hurwitz and Sullivan 2013), ETSP OMZ viromes 533 (Cassman et al. 2012), ETSP microbial metagenomes and metatranscriptomes (Ganesh et 534 al. 2014; Stewart, Ulloa, and DeLong 2012), and Guaymas basin metagenome 535 (Anantharaman et al. 2013) and BLASTp from predicted protein for HiSeq-sequenced 536 viromes (LineP and Malaspina viromes, Saanich Inlet and LineP microbial 537 metagenomes), with similar thresholds of 0.001 for e-value and 50 for bit score. Each 538 metagenome – viral genome association was classified based on the number of viral 539 genes detected and the amino-acid percentage identity of the BLAST hits associated: 540 when more than 75% of the genes were detected at more than 80% identity in the 541 metagenome, the viral genome was thought to be in the sample. The same ratio of genes 542 detected at lower percentage (60 to 80%) indicates the presence of a related but distinct 543 virus. We considered that less than 75% of the genes detected meant that this virus was 544 likely absent from the sample. The results of Microviridae detection with the HiSeq 545 Illumina datasets have to be carefully considered, as the linker amplification used in the

18 546 preparation of samples for HiSeq Illumina sequencing displays a strong bias against 547 ssDNA templates such as Microviridae genomes (Kim and Bae 2011). Hence, if the 548 detection of SUP05 Microviridae in HiSeq Illumina datasets undoubtedly testifies for the 549 presence of these viruses in the samples, an absence of detection is not a strong indicator 550 of their absence in the sample. 551 In order to detect the host of SUP05 viruses in the same datasets, a mapping of all 552 sequences from each metagenome to non-viral SAG contigs was computed with mummer 553 (Delcher, Salzberg, and Phillippy 2003) (minimum cluster length of 100, maximum gap 554 between two matches in a cluster of 500). The Saanich Inlet SUP05 bacteria is considered 555 present in the metagenome when more than 75% of genes are covered by metagenomic 556 sequences with average nucleotide identity above 95%.Viral-encoded dsrC was computed 557 with a threshold of 95% on average nucleotide identity, as no similarity beyond 80% 558 average nucleotide identity was detected between viral and microbial homologues, 559 whether from public database or from the SUP05 SAG microbial contigs. All 560 recruitment and coverage plots were drawn with the ggplot2 module of R software 561 (Wickham 2009). 562 563 Abundance and variability of SUP05 viral and microbial genomes 564 Assessment of variability in the populations associated with each SUP05 virus was 565 based on a BLASTp between all sequences from Saanich Inlet metagenomes recruited by 566 each SUP05 viral contig (thresholds of 50 for bit score, 0.001 for e-value,, and 80% for 567 amino-acid identity). The relative abundance of SUP05 viral and microbial genomes was 568 assessed from the recruitment of Saanich Inlet metagenomic reads to each viral contig 569 and set of microbial contigs (all contigs greater than 5 kb and not identified as viral) for 570 each “reference” SAG (i.e. the 4 SAG in which a SUP05 reference Caudovirales was 571 detected: AB-750C22AB-904 for C22_13, AB-750K04AB-904 for K04_0, 572 AB-751_G10AB-905 for G10_6, and AB-755_M08F06 for M8F6_0, Figure 2 - source 573 data file). For each metagenome, a normalized ratio of nucleotides recruited by each 574 contig or set of contigs was calculated as the number of bases recruited (sum of the length 575 of recruited reads) divided by the total number of bases in the (set of) contig(s) and the 576 total number of bases in the metagenome. The ratio of viral genomes to host genomes

19 577 was then calculated for each metagenome as the relative abundance of viral contig 578 divided by the relative abundance of bacterial contig from the same SAG. The plots of 579 genetic variability and relative abundance distributions were generated with the ggplot2 580 module of R software (Wickham 2009). The perl scripts used in the different part of the 581 bioinformatics analyses are available online at 582 http://tmpl.arizona.edu/dokuwiki/doku.php?id=bioinformatics:scripts:sup05 and as 583 “Source code”. 584 585 Acknowledgements: 586 We thank the crew aboard the MSV John Strickland for logistical and sampling support in 587 Saanich Inlet and Melanie Scofield, Jody Wright, Evan Durno, and Elena Zaikova in the 588 Hallam lab for technical assistance. We also thank the Joint Genome Institute, including 589 IMG and GOLD teams and Sussanah Tringe, Stephanie Malfatti and Tijana Glavina del 590 Rio for technical and project management assistance. This work was performed under the 591 auspices of the U.S. Department of Energy Joint Genome Institute supported by the 592 Office of Science of the U.S. Department of Energy under Contract No. DE-AC02- 593 05CH11231; the G. Unger Vetlesen and Ambrose Monell Foundations and the Tula 594 Foundation funded Centre for Microbial Diversity and Evolution, Natural Sciences and 595 Engineering Research Council (NSERC) of Canada, Canada Foundation for Innovation 596 (CFI), and the Canadian Institute for Advanced Research (CIFAR) through grants 597 awarded to S.J.H; and BIO5, NSF (OCE-0961947) and the Gordon and Betty Moore 598 Foundation (#3790) through grants awarded to M.B.S. Single cell genomics 599 instrumentation at Bigelow Laboratory for Ocean Sciences was supported by NSF grants 600 OCE-821374 and OCE-1019242 to R.S. and by the State of Maine Technology Institute. 601 The single cell genome sequences and annotations can be accessed via IMG 602 (img.jgi.doe.gov, SAG Ids are listed in Supplementary file 4). Viral contigs and defective 603 prophages identified in the SUP05 SAG are available on the Metavir webserver 604 (http://metavir-meb.univ-bpclermont.fr/), as virome “SUP05_viral_sequences” in project 605 “SUP05_SAGs”. The web servers hosting viral and microbial metagenome sequences 606 used here are listed in Supplementary file 5.

20 607 References: 608 Abedon, Stephen T. 2009. Phage Evolution and Ecology. Advances in Applied 609 Microbiology. 1st ed. Vol. 67. Elsevier Inc. doi:10.1016/S0065-2164(08)01001-0. 610 Abrescia, Nicola G A, Dennis H Bamford, Jonathan M Grimes, and David I Stuart. 2012. 611 “Structure Unifies the Viral Universe.” Annual Review of Biochemistry 81 (January): 612 795–822. doi:10.1146/annurev-biochem-060910-095130. 613 Akhter, Sajia, Ramy K. Aziz, and Robert A. Edwards. 2012. “PhiSpy: A Novel Algorithm 614 for Finding Prophages in Bacterial Genomes That Combines Similarity- and 615 Composition-Based Strategies.” Nucleic Acids Research 40 (16): 1–13. 616 Allers, Elke, Cristina Moraru, Melissa B. Duhaime, Erica Beneze, Natalie Solonenko, 617 Jimena Barrero Canosa, Rudolf Amann, and Matthew B. Sullivan. 2013. “Single- 618 Cell and Population Level Viral Infection Dynamics Revealed by phageFISH, a 619 Method to Visualize Intracellular and Free Viruses.” Environmental Microbiology 620 (January). Doi:10.1111/1462-2920.12100. 621 Altschul, S F, T L Madden, a a Schäffer, J Zhang, Z Zhang, W Miller, and D J Lipman. 622 1997. “Gapped BLAST and PSI-BLAST: A New Generation of Protein Database 623 Search Programs.” Nucleic Acids Research 25 (17) (September): 3389–402. 624 Anantharaman, K, John A Breier, Cody S Sheik, and Gregory J Dick. 2013. “Evidence 625 for Hydrogen Oxidation and Metabolic Plasticity in Widespread Deep-Sea Sulfur- 626 Oxidizing Bacteria.” Proceedings of the National Academy of Sciences of the United 627 States of America 110 (1): 330–335. 628 Anantharaman, K, Melissa B. Duhaime, John A. Breier, Kathleen Wendt, Brandy M. 629 Toner, and Gregory J Dick. 2014. “Sulfur Oxidation Genes in Diverse Deep-Sea 630 Viruses.” Science. doi:10.3354/meps145269. 631 Anderson, Rika E, William J Brazelton, and John a Baross. 2011. “Using CRISPRs as a 632 Metagenomic Tool to Identify Microbial Hosts of a Diffuse Flow Hydrothermal Vent 633 Viral Assemblage.” FEMS Microbiology Ecology 77 (1) (July): 120–33. 634 doi:10.1111/j.1574-6941.2011.01090.x. 635 Andersson, Anders F, and Jillian F Banfield. 2008. “Virus Population Dynamics and 636 Acquired Virus Resistance in Natural Microbial Communities.” Science 320 (5879) 637 (May 23): 1047–50. doi:10.1126/science.1157358. 638 Angly, Florent E, Ben Felts, Mya Breitbart, Peter Salamon, Robert A Edwards, Craig 639 Carlson, Amy M Chan, et al. 2006. “The Marine Viromes of Four Oceanic Regions.” 640 PLoS Biology 4 (11) (November): e368. 641 Angly, Florent E, Dana Willner, Alejandra Prieto-Davó, Robert A Edwards, Robert 642 Schmieder, Rebecca Vega-Thurber, Dionysios A Antonopoulos, et al. 2009. “The 643 GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average 644 Genome Size in Four Major Biomes.” PLoS Computational Biology 5 (12) 645 (December): e1000593. doi:10.1371/journal.pcbi.1000593. 646 Bland, Charles, Teresa L Ramsey, Fareedah Sabree, Micheal Lowe, Kyndall Brown, 647 Nikos C Kyrpides, and Philip Hugenholtz. 2007. “CRISPR Recognition Tool (CRT):

21 648 A Tool for Automatic Detection of Clustered Regularly Interspaced Palindromic 649 Repeats.” BMC Bioinformatics 8 (January): 209. doi:10.1186/1471-2105-8-209. 650 Breitbart, Mya, Luke R. Thompson, Curtis A Suttle, and Matthew B. Sullivan. 2007. 651 “Exploring the Vast Diversity of .” Oceanography 20 (2): 135–139. 652 Brum, Jennifer R, James Jeffrey Morris, Moira Décima, and Michael R Stukel. 2014. 653 “Mortality in the Oceans : Causes and Consequences.” In Eco-DAS IX, edited by 654 Association for the Sciences of Limnology and Oceanography, 16–48. 655 doi:10.4319/ecodas.2014.978-0-9845591-3-8.16. 656 Canfield, Don E, Frank J Stewart, Bo Thamdrup, Loreto De Brabandere, Tage Dalsgaard, 657 Edward F Delong, Niels Peter Revsbech, and Osvaldo Ulloa. 2010. “A Cryptic 658 Sulfur Cycle in Oxygen-Minimum-Zone Waters off the Chilean Coast.” Science 330 659 (6009) (December 3): 1375–8. doi:10.1126/science.1196889. 660 Cassman, Noriko, Alejandra Prieto-Davó, Kevin Walsh, Genivaldo G Z Silva, Florent 661 Angly, Sajia Akhter, Katie Barott, et al. 2012. “Oxygen Minimum Zones Harbour 662 Novel Viral Communities with Low Diversity.” Environmental Microbiology 14 663 (September 11): 3043–65. 664 Clokie, Martha R J, Jinyu Shan, Shaun Bailey, Ying Jia, Henry M Krisch, Stephen West, 665 and Nicholas H Mann. 2006. “Transcription of a ‘Photosynthetic’ T4-Type Phage 666 during Infection of a Marine Cyanobacterium.” Environmental Microbiology 8 (5) 667 (May): 827–35. doi:10.1111/j.1462-2920.2005.00969.x. 668 Dammeyer, Thorben, Sarah C Bagby, Matthew B Sullivan, Sallie W Chisholm, and 669 Nicole Frankenberg-Dinkel. 2008. “Efficient Phage-Mediated Pigment Biosynthesis 670 in Oceanic Cyanobacteria.” Current Biology 18 (6) (March 25): 442–8. 671 doi:10.1016/j.cub.2008.02.067. 672 Delcher, Arthur L, Steven L Salzberg, and Adam M Phillippy. 2003. “Using MUMmer to 673 Identify Similar Regions in Large Sequence Sets.” Current Protocols in 674 Bioinformatics Chapter 10: Unit 10.3. doi:10.1002/0471250953.bi1003s00. 675 DeLong, Edward F, Christina M Preston, Tracy Mincer, Virginia Rich, Steven J Hallam, 676 Niels-Ulrik Frigaard, Asuncion Martinez, et al. 2006. “Community Genomics 677 among Stratified Microbial Assemblages in the Ocean’s Interior.” Science 311 678 (5760) (January 27): 496–503. doi:10.1126/science.1120250. 679 Deng, Li, Julio C. Ignacio-Espinoza, Ann Gregory, Bonnie T. Poulos, Joshua S Weitz, 680 Philip Hugenholtz, and Matthew B. Sullivan. “Viral Tagging Reveals Discrete 681 Populations in Synechococcus Viral Genome Sequence Space.” Nature (in press). 682 Desnues, Christelle, and Didier Raoult. 2012. “Virophages Question the Existence of 683 Satellites.” Nature Reviews. Microbiology 10 (3) (March): 234; author reply 234. 684 doi:10.1038/nrmicro2676-c3. 685 Diemer, Geoffrey S, and Kenneth M Stedman. 2012. “A Novel Virus Genome Discovered 686 in an Extreme Environment Suggests Recombination between Unrelated Groups of 687 RNA and DNA Viruses.” Biology Direct 7 (January): 13. doi:10.1186/1745-6150-7- 688 13.

22 689 Eddy, Sean R. 2011. “Accelerated Profile HMM Searches.” PLoS Computational Biology 690 7 (10) (October): e1002195. doi:10.1371/journal.pcbi.1002195. 691 Edgar, Robert C. 2004. “MUSCLE: A Multiple Sequence Alignment Method with 692 Reduced Time and Space Complexity.” BMC Bioinformatics 5: 113. 693 Falkowski, Paul G, Tom Fenchel, and Edward F Delong. 2008. “The Microbial Engines 694 That Drive Earth’s Biogeochemical Cycles.” Science 320 (5879) (May 23): 1034–9. 695 doi:10.1126/science.1153213. 696 Fischer, Matthias G. 2012. “Sputnik and Mavirus: More than Just Satellite Viruses.” 697 Nature Reviews. Microbiology 10 (1) (January): 78; author reply 78. 698 doi:10.1038/nrmicro2676-c1. 699 Frias-Lopez, Jorge, Yanmei Shi, Gene W Tyson, Maureen L Coleman, Stephan C 700 Schuster, Sallie W Chisholm, and Edward F Delong. 2008. “Microbial Community 701 Gene Expression in Ocean Surface Waters.” Proceedings of the National Academy 702 of Sciences of the United States of America 105 (10) (March 11): 3805–10. 703 doi:10.1073/pnas.0708897105. 704 Ganesh, Sangita, Darren J Parris, Edward F Delong, and Frank J Stewart. 2014. 705 “Metagenomic Analysis of Size-Fractionated Picoplankton in a Marine Oxygen 706 Minimum Zone.” The ISME Journal 8 (1) (January): 187–211. 707 doi:10.1038/ismej.2013.144. 708 Gnerre, Sante, Iain Maccallum, Dariusz Przybylski, Filipe J Ribeiro, Joshua N Burton, 709 Bruce J Walker, Ted Sharpe, et al. 2011. “High-Quality Draft Assemblies of 710 Mammalian Genomes from Massively Parallel Sequence Data.” Proceedings of the 711 National Academy of Sciences of the United States of America 108 (4) (January 25): 712 1513–8. doi:10.1073/pnas.1017351108. 713 Goldsmith, Dawn B, Giuseppe Crosti, Bhakti Dwivedi, Lauren D McDaniel, Arvind 714 Varsani, Curtis a Suttle, Markus G Weinbauer, Ruth-Anne Sandaa, and Mya 715 Breitbart. 2011. “Development of phoH as a Novel Signature Gene for Assessing 716 Marine Phage Diversity.” Applied and Environmental Microbiology 77 (21) 717 (November): 7730–9. doi:10.1128/AEM.05531-11. 718 Guindon, Stéphane, and Olivier Gascuel. 2003. “A Simple, Fast, and Accurate Algorithm 719 to Estimate Large Phylogenies by Maximum Likelihood.” Systematic Biology 52 (5) 720 (October): 696–704. 721 Hawley, A K, H M Brewer, A D Norbeck, L Pasa-Tolic and S J Hallam. 2014. “Opening a 722 metaproteomic window on coupled biogeochemical cycling in oxygen-deficient 723 marine waters”. PNAS (in press) 724 Hendrix, R W, J G Lawrence, G F Hatfull, and S Casjens. 2000. “The Origins and 725 Ongoing Evolution of Viruses.” Trends in Microbiology 8 (11) (November): 504–8. 726 Hurwitz, Bonnie L, Steven J Hallam, and Matthew B Sullivan. 2013. “Metabolic 727 Reprogramming by Viruses in the Sunlit and Dark Ocean.” Genome Biology 14 (11) 728 (November 7): R123. doi:10.1186/gb-2013-14-11-r123. 729 Hurwitz, Bonnie L., and Matthew B. Sullivan. 2013. “The Pacific Ocean Virome (POV):

23 730 A Marine Viral Metagenomic Dataset and Associated Protein Clusters for 731 Quantitative Viral Ecology.” PLoS One 8 (2): e57355. 732 Huson, Daniel H, Alexander F Auch, Ji Qi, and Stephan C Schuster. 2007. “MEGAN 733 Analysis of Metagenomic Data.” Genome Research 17 (3) (March): 377–86. 734 doi:10.1101/gr.5969107. 735 John, Seth G, Carolina B Mendez, Li Deng, Bonnie Poulos, Anne Kathryn M Kauffman, 736 Suzanne Kern, Jennifer Brum, Martin F Polz, Edward a Boyle, and Matthew B 737 Sullivan. 2011. “A Simple and Efficient Method for Concentration of Ocean Viruses 738 by Chemical Flocculation.” Environmental Microbiology Reports 3 (2) (April): 195– 739 202. doi:10.1111/j.1758-2229.2010.00208.x. 740 Kim, Kyoung-Ho, and Jin-Woo Bae. 2011. “Amplification Methods Bias Metagenomic 741 Libraries of Uncultured Single-Stranded and Double-Stranded DNA Viruses.” 742 Applied and Environmental Microbiology 77 (21) (November): 7663–8. 743 doi:10.1128/AEM.00289-11. 744 Konwar, Kishori M, Niels W Hanson, Antoine P Pagé, and Steven J Hallam. 2013. 745 “MetaPathways: A Modular Pipeline for Constructing Pathway/genome Databases 746 from Environmental Sequence Information.” BMC Bioinformatics 14 (January): 747 202. doi:10.1186/1471-2105-14-202. 748 Krupovic, Mart, and Virginija Cvirkaite-Krupovic. 2012. “Towards a More 749 Comprehensive Classification of Satellite Viruses.” Nature Reviews Microbiology 750 10 (3) (February 16): 234–234. doi:10.1038/nrmicro2676-c4. 751 La Scola, Bernard, Christelle Desnues, Isabelle Pagnier, Catherine Robert, Lina Barrassi, 752 Ghislain Fournous, Michèle Merchat, et al. 2008. “The as a Unique 753 Parasite of the Giant .” Nature 455 (7209) (September 4): 100–104. 754 doi:10.1038/nature07218. 755 Labonté, Jessica M, and Curtis A Suttle. 2013a. “Metagenomic and Whole-Genome 756 Analysis Reveals New Lineages of Gokushoviruses and Biogeographic Separation 757 in the Sea.” Frontiers in Microbiology 4. 758 ———. 2013b. “Previously Unknown and Highly Divergent ssDNA Viruses Populate the 759 Oceans.” The ISME Journal 7 (11) (July 11): 2169–77. doi:10.1038/ismej.2013.110. 760 Lam, Phyllis, Gaute Lavik, Marlene M Jensen, Jack van de Vossenberg, Markus Schmid, 761 Dagmar Woebken, Dimitri Gutiérrez, Rudolf Amann, Mike S M Jetten, and Marcel 762 M M Kuypers. 2009. “Revising the Nitrogen Cycle in the Peruvian Oxygen 763 Minimum Zone.” Proceedings of the National Academy of Sciences of the United 764 States of America 106 (12) (March 24): 4752–7. doi:10.1073/pnas.0812444106. 765 Lavigne, Rob, Donald Seto, Padmanabhan Mahadevan, Hans-W Ackermann, and Andrew 766 M Kropinski. 2008. “Unifying Classical and Molecular Taxonomic Classification: 767 Analysis of the Podoviridae Using BLASTP-Based Tools.” Research in 768 Microbiology 159 (5) (June): 406–14. doi:10.1016/j.resmic.2008.03.005. 769 Letunic, Ivica, and Peer Bork. 2007. “Interactive Tree Of Life (iTOL): An Online Tool for 770 Phylogenetic Tree Display and Annotation.” Bioinformatics 23 (1) (January): 127– 771 128.

24 772 Lindell, Debbie, Jacob D Jaffe, Zackary I Johnson, George M Church, and Sallie W 773 Chisholm. 2005. “Photosynthesis Genes in Marine Viruses Yield Proteins during 774 Host Infection.” Nature 438 (7064) (November 3): 86–9. doi:10.1038/nature04111. 775 Mander, Gerd J, Manfred S Weiss, Reiner Hedderich, Jörg Kahnt, Ulrich Ermler, and 776 Eberhard Warkentin. 2005. “X-Ray Structure of the Gamma-Subunit of a 777 Dissimilatory Sulfite Reductase: Fixed and Flexible C-Terminal Arms.” FEBS 778 Letters 579 (21) (August 29): 4600–4. doi:10.1016/j.febslet.2005.07.029. 779 Mann, Nicholas H, Annabel Cook, Andrew Millard, Shaun Bailey, and Martha Clokie. 780 2003. “Bacterial Photosynthesis Genes in a Virus.” Nature 424 (August): 741. 781 Markowitz, Victor M, I-Min a Chen, Ken Chu, Ernest Szeto, Krishna Palaniappan, Manoj 782 Pillay, Anna Ratner, et al. 2014. “IMG/M 4 Version of the Integrated Metagenome 783 Comparative Analysis System.” Nucleic Acids Research 42 (1) (January 1): D568– 784 73. doi:10.1093/nar/gkt919. 785 Martinez-Garcia, Manuel, David Brazel, Nicole J Poulton, Brandon K Swan, Monica 786 Lluesma Gomez, Dashiell Masland, Michael E Sieracki, and Ramunas 787 Stepanauskas. 2012. “Unveiling in Situ Interactions between Marine Protists and 788 Bacteria through Single Cell Sequencing.” The ISME Journal 6 (3) (March): 703–7. 789 doi:10.1038/ismej.2011.126. 790 Mattes, Timothy E, Brook L Nunn, Katharine T Marshall, Giora Proskurowski, Deborah 791 S Kelley, Orest E Kawka, David R Goodlett, Dennis a Hansell, and Robert M 792 Morris. 2013. “Sulfur Oxidizers Dominate Carbon Fixation at a Biogeochemical Hot 793 Spot in the Dark Ocean.” The ISME Journal 7 (12) (December): 2349–60. 794 doi:10.1038/ismej.2013.113. 795 Mingkun, L. “Kmernorm.” Unpublished. 796 Mingkun, L, A Copeland, and J Han. “DUK.” Unpublished. 797 Mizuno, Carolina Megumi, Francisco Rodriguez-Valera, Nikole E Kimes, and Rohit 798 Ghai. 2013. “Expanding the Marine Virosphere Using Metagenomics.” PLoS 799 Genetics 9 (12) (December): e1003987. doi:10.1371/journal.pgen.1003987. 800 Murant, AF, and MA Mayo. 1982. “Satellites of Plant Viruses.” Annual Review of 801 Phytopathology (20): 49–70. 802 Noguchi, Hideki, Takeaki Taniguchi, and Takehiko Itoh. 2008. “MetaGeneAnnotator: 803 Detecting Species-Specific Patterns of Ribosomal Binding Site for Precise Gene 804 Prediction in Anonymous Prokaryotic and Phage Genomes.” DNA Research 15 (6) 805 (December): 387–96. doi:10.1093/dnares/dsn027. 806 Oliveira, Tânia F, Clemens Vonrhein, Pedro M Matias, Sofia S Venceslau, Inês a C 807 Pereira, and Margarida Archer. 2008. “The Crystal Structure of Desulfovibrio 808 Vulgaris Dissimilatory Sulfite Reductase Bound to DsrC Provides Novel Insights 809 into the Mechanism of Sulfate Respiration.” The Journal of Biological Chemistry 810 283 (49) (December 5): 34141–9. doi:10.1074/jbc.M805643200. 811 Peng, Yu, Henry C M Leung, S M Yiu, and Francis Y L Chin. 2012. “IDBA-UD: A de 812 Novo Assembler for Single-Cell and Metagenomic Sequencing Data with Highly

25 813 Uneven Depth.” Bioinformatics 28 (11): 1420–1428. 814 Punta, Marco, Penny C Coggill, Ruth Y Eberhardt, Jaina Mistry, John Tate, Chris 815 Boursnell, Ningze Pang, et al. 2012. “The Pfam Protein Families Database.” Nucleic 816 Acids Research 40 (Database issue) (January): D290–301. doi:10.1093/nar/gkr1065. 817 Rappé, Michael S, and Stephen J Giovannoni. 2003. “The Uncultured Microbial 818 Majority.” Annual Review of Microbiology 57 (January): 369–94. 819 doi:10.1146/annurev.micro.57.030502.090759. 820 Rinke, Christian, Patrick Schwientek, Alexander Sczyrba, Natalia N Ivanova, Iain J 821 Anderson, Jan-Fang Cheng, Aaron Darling, et al. 2013. “Insights into the Phylogeny 822 and Coding Potential of Microbial Dark Matter.” Nature 499 (7459) (July 25): 431– 823 7. doi:10.1038/nature12352. 824 Rodriguez-Valera, Francisco, Ana-Belen Martin-Cuadrado, Beltran Rodriguez-Brito, 825 Lejla Pasić, T Frede Thingstad, Forest Rohwer, and Alex Mira. 2009. “Explaining 826 Microbial Population Genomics through Phage Predation.” Nature Reviews 827 Microbiology 7 (11) (November): 828–836. 828 Roux, Simon, François Enault, Gisèle Bronner, Daniel Vaulot, Patrick Forterre, and Mart 829 Krupovic. 2013. “Chimeric Viruses Blur the Borders between the Major Groups of 830 Eukaryotic Single-Stranded DNA Viruses.” Nature Communications 4 (2700). 831 Roux, Simon, Mart Krupovic, Axel Poulet, Didier Debroas, and François Enault. 2012. 832 “Evolution and Diversity of the Microviridae Viral Family through a Collection of 833 81 New Complete Genomes Assembled from Virome Reads.” PLoS One 7 (7) 834 (January): e40418. doi:10.1371/journal.pone.0040418. 835 Roux, Simon, Jeremy Tournayre, Antoine Mahul, Didier Debroas, and François Enault. 836 2014. “Metavir 2 : Virome Comparative Analysis and Annotation of Assembled 837 Genomic Fragments.” BMC Bioinformatics (in revision). 838 Schloss, Patrick D, Sarah L Westcott, Thomas Ryabin, Justine R Hall, Martin Hartmann, 839 Emily B Hollister, Ryan a Lesniewski, et al. 2009. “Introducing Mothur: Open- 840 Source, Platform-Independent, Community-Supported Software for Describing and 841 Comparing Microbial Communities.” Applied and Environmental Microbiology 75 842 (23) (December): 7537–41. doi:10.1128/AEM.01541-09. 843 Sharon, Itai, Ariella Alperovitch, Forest Rohwer, Matthew Haynes, Fabian Glaser, Nof 844 Atamna-Ismaeel, Ron Y Pinter, et al. 2009. “Photosystem I Gene Cassettes Are 845 Present in Marine Virus Genomes.” Nature 461 (7261) (September 10): 258–62. 846 doi:10.1038/nature08284. 847 Sharon, Itai, Natalia Battchikova, Eva-Mari Aro, Carmela Giglione, Thierry Meinnel, 848 Fabian Glaser, Ron Y Pinter, Mya Breitbart, Forest Rohwer, and Oded Béjà. 2011. 849 “Comparative Metagenomics of Microbial Traits within Oceanic Viral 850 Communities.” The ISME Journal 5 (7) (July): 1178–90. doi:10.1038/ismej.2011.2. 851 Sorek, Rotem, Victor Kunin, and Philip Hugenholtz. 2008. “CRISPR--a Widespread 852 System That Provides Acquired Resistance against Phages in Bacteria and .” 853 Nature Reviews. Microbiology 6 (3) (March): 181–6. doi:10.1038/nrmicro1793.

26 854 Stepanauskas, Ramunas, and Michael E Sieracki. 2007. “Matching Phylogeny and 855 Metabolism in the Uncultured Marine Bacteria, One Cell at a Time.” Proceedings of 856 the National Academy of Sciences of the United States of America 104 (21) (May 857 22): 9052–7. doi:10.1073/pnas.0700496104. 858 Stewart, Frank J, Osvaldo Ulloa, and Edward F DeLong. 2012. “Microbial 859 Metatranscriptomics in a Permanent Marine Oxygen Minimum Zone.” 860 Environmental Microbiology 14 (1) (January): 23–40. doi:10.1111/j.1462- 861 2920.2010.02400.x. 862 Stramma, Lothar, Gregory C Johnson, Janet Sprintall, and Volker Mohrholz. 2008. 863 “Expanding Oxygen-Minimum Zones in the Tropical Oceans.” Science 320 (5876) 864 (May 2): 655–8. doi:10.1126/science.1153847. 865 Sullivan, Matthew B, Maureen L Coleman, Peter Weigele, Forest Rohwer, and Sallie W 866 Chisholm. 2005. “Three Prochlorococcus Cyanophage Genomes: Signature Features 867 and Ecological Interpretations.” PLoS Biology 3 (5) (May): e144. 868 doi:10.1371/journal.pbio.0030144. 869 Sullivan, Matthew B, Katherine H Huang, Julio C Ignacio-Espinoza, Aaron M Berlin, 870 Libusha Kelly, Peter R Weigele, Alicia S DeFrancesco, et al. 2010. “Genomic 871 Analysis of Oceanic Cyanobacterial Myoviruses Compared with T4-like Myoviruses 872 from Diverse Hosts and Environments.” Environmental Microbiology 12 (11) 873 (November): 3035–56. doi:10.1111/j.1462-2920.2010.02280.x. 874 Sullivan, Matthew B, Debbie Lindell, Jessica A. Lee, Luke R Thompson, Joseph P 875 Bielawski, and Sallie W Chisholm. 2006. “Prevalence and Evolution of Core 876 Photosystem II Genes in Marine Cyanobacterial Viruses and Their Hosts.” PLoS 877 Biology 4 (8): e234. 878 Sullivan, Mitchell J, Nicola K Petty, and Scott A Beatson. 2011. “Easyfig: A Genome 879 Comparison Visualizer.” Bioinformatics 27 (7) (April): 1009–10. 880 doi:10.1093/bioinformatics/btr039. 881 Suttle, Curtis a. 2005. “Viruses in the Sea.” Nature 437 (7057) (September 15): 356–61. 882 doi:10.1038/nature04160. 883 Suttle, Curtis A. 2007. “Marine Viruses--Major Players in the Global Ecosystem.” Nature 884 Reviews Microbiology 5 (10) (October): 801–812. 885 Swan, Brandon K, Manuel Martinez-Garcia, Christina M Preston, Alexander Sczyrba, 886 Tanja Woyke, Dominique Lamy, Thomas Reinthaler, et al. 2011. “Potential for 887 Chemolithoautotrophy among Ubiquitous Bacteria Lineages in the Dark Ocean.” 888 Science 333 (6047) (September 2): 1296–300. doi:10.1126/science.1203690. 889 Swan, Brandon K, Ben Tupper, Alexander Sczyrba, Federico M Lauro, Manuel Martinez- 890 Garcia, José M González, Haiwei Luo, et al. 2013. “Prevalent Genome Streamlining 891 and Latitudinal Divergence of Planktonic Bacteria in the Surface Ocean.” 892 Proceedings of the National Academy of Sciences of the United States of America 893 110 (28) (July 9): 11463–8. doi:10.1073/pnas.1304246110. 894 Tadmor, Arbel D, Elizabeth a Ottesen, Jared R Leadbetter, and Rob Phillips. 2011. 895 “Probing Individual Environmental Bacteria for Viruses by Using Microfluidic

27 896 Digital PCR.” Science 333 (6038) (July 1): 58–62. doi:10.1126/science.1200758. 897 Thompson, Luke R, Qinglu Zeng, Libusha Kelly, Katherine H Huang, Alexander U 898 Singer, Joanne Stubbe, and Sallie W Chisholm. 2011. “Phage Auxiliary Metabolic 899 Genes and the Redirection of Cyanobacterial Host Carbon Metabolism.” 900 Proceedings of the National Academy of Sciences of the United States of America 901 108 (39) (September 27): E757–64. doi:10.1073/pnas.1102164108. 902 Tseng, Ching-Hung, Pei-Wen Chiang, Fuh-Kwo Shiah, Yi-Lung Chen, Jia-Rong Liou, 903 Ting-Chang Hsu, Suhinthan Maheswararajah, Isaam Saeed, Saman Halgamuge, and 904 Sen-Lin Tang. 2013. “Microbial and Viral Metagenomes of a Subtropical Freshwater 905 Reservoir Subject to Climatic Disturbances.” The ISME Journal 7 (July 11): 2374– 906 86. doi:10.1038/ismej.2013.118. 907 Tucker, Kimberly P, Rachel Parsons, Erin M Symonds, and Mya Breitbart. 2011. 908 “Diversity and Distribution of Single-Stranded DNA Phages in the North Atlantic 909 Ocean.” The ISME Journal 5 (5) (May): 822–30. doi:10.1038/ismej.2010.188. 910 Ulloa, Osvaldo, Donald E Canfield, Edward F DeLong, Ricardo M Letelier, and Frank J 911 Stewart. 2012. “Microbial Oceanography of Anoxic Oxygen Minimum Zones.” 912 Proceedings of the National Academy of Sciences of the United States of America 913 109 (40) (October 2): 15996–6003. doi:10.1073/pnas.1205009109. 914 Venter, JC, K Remington, JF Heidelberg, Doug Rusch, Jonathan A. Eisen, Dongying Wu, 915 Ian Paulsen, et al. 2004. “Environmental Genome Shotgun Sequencing of the 916 Sargasso Sea.” Science 304: 66–74. 917 Walsh, David A, Elena Zaikova, Charles G Howes, Young C Song, Jody J Wright, 918 Susannah G Tringe, Philippe D Tortell, and Steven J Hallam. 2009. “Metagenome of 919 a Versatile Chemolithoautotroph from Expanding Oceanic Dead Zones.” Science 920 326 (5952) (October 23): 578–82. doi:10.1126/science.1175309. 921 Ward, B B, a H Devol, J J Rich, B X Chang, S E Bulow, Hema Naik, Anil Pratihary, and 922 a Jayakumar. 2009. “Denitrification as the Dominant Nitrogen Loss Process in the 923 Arabian Sea.” Nature 461 (7260) (September 3): 78–81. doi:10.1038/nature08276. 924 Waterbury, JB, and FW Valois. 1993. “Resistance to Co-Occurring Phages Enables 925 Marine Synechococcus Communities to Coexist with Cyanophages Abundant in 926 Seawater.” Applied and Environmental Microbiology 59 (10): 3393–99. 927 Waterhouse, Andrew M, James B Procter, David M A Martin, Michèle Clamp, and 928 Geoffrey J Barton. 2009. “Jalview Version 2--a Multiple Sequence Alignment Editor 929 and Analysis Workbench.” Bioinformatics 25 (9): 1189–1191. 930 Weinbauer, Markus G, Ingrid Brettar, and Manfred G Ho. 2003. “Lysogeny and Virus- 931 Induced Mortality of Bacterioplankton in Surface, Deep, and Anoxic Marine 932 Waters.” Limnology and Oceanography 48 (4): 1457–1465. 933 Whitney, Frank A., Howard J. Freeland, and Marie Robert. 2007. “Persistently Declining 934 Oxygen Levels in the Interior Waters of the Eastern Subarctic Pacific.” Progress in 935 Oceanography 75 (2) (October): 179–199. doi:10.1016/j.pocean.2007.08.007. 936 Wickham, Hadley. 2009. ggplot2: Elegant Graphics for Data Analysis. Springer

28 937 Publishing Company. 938 Williamson, Shannon J, Lisa Zeigler Allen, Hernan a Lorenzi, Douglas W Fadrosh, 939 Daniel Brami, Mathangi Thiagarajan, John P McCrow, Andrey Tovchigrechko, 940 Shibu Yooseph, and J Craig Venter. 2012. “Metagenomic Exploration of Viruses 941 throughout the Indian Ocean.” PLoS One 7 (10) (January): e42047. 942 doi:10.1371/journal.pone.0042047. 943 Wright, Jody J, Kishori M Konwar, and Steven J Hallam. 2012. “ of 944 Expanding Oxygen Minimum Zones.” Nature Reviews. Microbiology 10 (6) (June): 945 381–94. doi:10.1038/nrmicro2778. 946 Yoon, Hwan Su, Dana C Price, Ramunas Stepanauskas, Veeran D Rajah, Michael E 947 Sieracki, William H Wilson, Eun Chan Yang, Siobain Duffy, and Debashish 948 Bhattacharya. 2011. “Single-Cell Genomics Reveals Organismal Interactions in 949 Uncultivated Marine Protists.” Science 332 (6030) (May 6): 714–7. 950 doi:10.1126/science.1203163. 951 Yoshida, M, Y Takaki, M Eitoku, T Nunoura, and K Takai. 2013. “Metagenomic Analysis 952 of Viral Communities in (Hado) Pelagic Sediments.” PLoS One 8 (2): e57271. 953 Zaikova, Elena, David a Walsh, Claire P Stilwell, William W Mohn, Philippe D Tortell, 954 and Steven J Hallam. 2010. “Microbial Community Dynamics in a Seasonally 955 Anoxic Fjord: Saanich Inlet, British Columbia.” Environmental Microbiology 12 (1) 956 (January): 172–91. doi:10.1111/j.1462-2920.2009.02058.x.

29 957 Figures : 958 959 Figure 1. Saanich Inlet water column characteristics and SUP05 infection frequency 960 on the SAG sampling date (August 2011). Key abiotic measurements are represented as 961 background coloring (oxygen levels) and black lined graphs at left (hydrogen sulfide and 962 temperature). SUP05 viral infections determined from 127 SAGs are indicated at right by 963 black slices in pie charts where current infections were delineated from intact viral 964 contigs and past infections were inferred from identification of defective prophages and 965 CRISPR loci. 966 967 Figure 1 - figure supplement 1. CTD measurements of oxygen concentration, 968 temperature, salinity and H2S concentration in the water column of Saanich Inlet at 969 the time of sampling (August 2011). 970 971 Figure 1 - figure supplement 2. Phylogenetic tree of SUP05 and Arctic96BD-19 972 lineages based on comparative SSU ribosomal RNA gene analysis. The tree was 973 inferred using maximum-likelihood implemented in PHYML. The percentage (≥70%) of 974 replicates in which the associated taxa clustered together in the bootstrap test (1000 975 replicates) is shown next to the branches. Reference sequences for both lineages are 976 marked with a star. Representative sequences for SUP05 and Arctic96BD-19 clusters are 977 labeled “SUP05_cluster number” followed by the name of the sequence according to 978 NCBI. SAG representative sequences are shown in red, SAG sequences distribution with 979 depth is represented by colored circles (100 m: green, 150 m: blue, 200 m: purple) whose 980 circumference indicated the total number of SAG sequences (reads) within the cluster. 981 982 Figure 1 – figure supplement 3. Metrics measured on SUP05 SAG contigs classified 983 as “Microbial”, “Viral hallmark contigs” (Table S1 A, B, C) and “Putative viral 984 contigs” (Table S1 D). For each set of contigs, the distribution of average gene size, ratio 985 of strand changes (number of strand changes between two consecutive genes divided by 986 the total number of genes on the contig), and ratio of uncharacterized genes (number of 987 genes with no significant hit in PFAM database divided by the total number of genes on 988 the contig) are displayed. 989 990 Figure 1 – source data file. Number of SUP05 viral sequences detected at the three 991 different depths sampled. For each depth, the count of SAG where viral sequence were 992 detected (“infected” SAG) is indicated, alongside the number of SAGs for which two 993 different viruses were retrieved, the number of SAGs with CRISPR spacer detected and 994 the number of SAGs with a defective identified. 995 996 Figure 2. Genetic map and synteny plots for the four references SUP05 Caudovirales 997 contigs (highlighted in bold). Viral hallmark genes are underlined and identified on 998 plots (MCP: Major Capsid Protein, Sc: Scaffolding protein, H-T conn.: Head-Tail 999 connector). Sequence similarities were deduced from a tBLASTx comparison. For clarity 1000 sake, several sequences including SUP05 viral contig M8F6_0, K04_0, and G10_6 are 1001 reverse-complemented (noted RC).

30 1002 1003 Figure 2 - figure supplement 1. Phylogenetic tree of SUP05 Podoviridae contigs, 1004 derived from major capsid protein sequences with PhyML (maximum-likelihood 1005 tree, LG model, CAT approximation of gamma parameter). All SUP05 contigs 1006 affiliated to the Podoviridae and harboring the major capsid protein gene are included in 1007 the tree and highlighted in bold. The three SUP05 Podoviridae reference contigs (longer 1008 than 15 kb) are noted with a star. SH-like branch supports are indicated on the tree, and 1009 all branches with a support lower than 0.5 were collapsed. 1010 1011 Figure 2 - figure supplement 2. Phylogenetic tree for the SUP05 Microviridae (major 1012 capsid protein). Tree was computed with PhyML (maximum-likelihood tree, LG model, 1013 gamma parameter estimated with CAT approximation), and SH-like supports are 1014 indicated for each branch. All branches with support lower than 0.50 were collapsed. The 1015 tree is focused around the Gokushovirinae subfamily and includes the Pichovirinae 1016 subfamily as an outgroup. Aquatic Gokushovirinae are colored according to their type of 1017 sample, and Saanich Inlet sequences are highlighted in bold. Cultivated Gokushovirinae 1018 are noted in black and highlighted in bold, with the associated genus associated in italic. 1019 All the other sequences are non-cultivated and currently affiliated to “Unclassified 1020 Gokushovirinae”. 1021 1022 Figure 2 - figure supplement 3. Genetic map and synteny plots for the SUP05 1023 Microviridae reference. Viral hallmark genes are labeled on the plot. Associated sequence 1024 “Marine Gokushovirus isolate SOG1-KC131024” was sampled from Strait of Georgia 1025 (Labonté and Suttle 2013b), on which the Saanich Inlet fjord is opening. 1026 1027 Figure 2 – source data file. Summary of best BLAST hit affiliation for the predicted 1028 genes of the five SUP05 reference viral contigs. For each contig, taxonomic and 1029 functional affiliation are indicated with the group or category and the number of genes 1030 affiliated to this group. The category “virion formation” includes all genes associated to 1031 the formation of the capsid and the genome encapsidation. 1032 1033 1034 Figure 3. Spatiotemporal dynamics of SUP05 viral reference genomes in Saanich 1035 Inlet. (A) SUP05 viral presence in Saanich Inlet microbial metagenomes with OMZ 1036 sample names bolded. Four categories indicate the SUP05 virus was detected (>75% of 1037 viral genes detected at >80% amino-acid identity; light blue), a SUP05 viral relative was 1038 detected (>75% of viral genes detected at 60-80% amino-acid identity; light green), no 1039 SUP05 virus was detected (red) or detection was inconclusive (e.g., Microviridae in 1040 HiSeq Illumina datasets that strongly select against ssDNA sequences; gray). (B) SUP05 1041 viral reference genomes had differing sequence conservation among recruited 1042 metagenomic reads. Upper and lower "hinges" correspond to the first and third quartiles 1043 (the 25th and 75th percentiles), while outliers are displayed as points (values beyond 1.5 1044 * Inter-Quartile Range of the hinge). (C) One SUP05 viral reference genome with low 1045 sequence conservation revealed evolution in action whereby a genomic region (see ~21- 1046 30 kb) appears to sweep through the population. 1047

31 1048 Figure 3 - figure supplement 1. Recruitment and coverage plot of SUP05 viral 1049 genome fragments by Saanich Inlet datasets sampled in 2009, 2010 and 2011. Each 1050 dot correspond to a match between a metagenome predicted gene and a gene from the 1051 SUP05 viral genome fragment, displayed according to the coordinate on the genome (x- 1052 axis) and the protein identity percentage (y-axis). For each genome, plots were only 1053 generated for datasets in which the genome was detected. Only hits with more than 80% 1054 amino-acid identity were considered. 1055 1056 Figure 3 - figure supplement 2. Heatmap of detection of SUP05 viruses in oceanic 1057 datasets. Metagenomes are classified from left to right based on the sampling depth as 1058 “Above the OMZ”, “OMZ”, and “Below the OMZ”, and vertically ordered based on the 1059 geographical sampling region, from the samples closest to Saanich Inlet (on top) to the 1060 one farthest from Saanich Inlet (at the bottom). Viral metagenomes are noted with a gray 1061 capsid symbol. Each metagenome – viral genome association was classified based on the 1062 number of viral genes detected and the amino-acid percentage identity of the BLAST hits 1063 associated. The viral genome was thought to be in the sample when more than 75 % of 1064 the genes were detected at more than 80 % of identity in the metagenome (Blue cells), 1065 when the same ratio of genes detected at lower percentage (60 to 80 %) indicates the 1066 presence of a related but distinct virus (Green cells). We considered that less than 75 % of 1067 the genes detected meant that this virus was likely absent from the sample (Red cells), 1068 except for the detection of the ssDNA Microviridae in HiSeq-Illumina-sequenced 1069 viromes, where the procedure used to process samples prior to sequencing is likely to 1070 select against the amplification of ssDNA templates (Gray cells). Metagenomes in which 1071 the associated SUP05 host was detected are highlighted in black (>75% genes on SAG 1072 microbial contigs covered with Average Nucleotide Identity >95%). 1073 1074 Figure 3 - figure supplement 3. Recruitment and coverage plot of SUP05 viral 1075 genomes by datasets sampled outside of Saanich Inlet fjord. Each dot correspond to a 1076 match between a metagenome predicted gene and a gene from the SUP05 viral genome 1077 fragment, displayed according to the coordinate on the genome (x-axis) and the protein 1078 identity percentage (y-axis). For each genome, plots were only generated for datasets in 1079 which the genome was detected. Only hits with more than 80% amino-acid identity were 1080 considered. 1081 1082 Figure 4. Uncultivated SUP05 lineage-specific virus-host ecology. Fragment 1083 recruitment from Saanich Inlet microbial metagenomes to microbial (95% nucleotide 1084 identity) and viral (100% amino-acid identity) reference contigs normalized by contig 1085 and metagenome size was used as a proxy for abundance. Hence, the relative abundance 1086 of microbial and viral genome is indicated as number of metagenomic bases recruited by 1087 contig(s) base pairs (bp) by megabase (Mb) of metagenome. Upper and lower "hinges" of 1088 the relative abundance distribution correspond to the first and third quartiles (the 25th and 1089 75th percentiles), while outliers are displayed as points (values beyond 1.5 * Inter- 1090 Quartile Range of the hinge). A virus-to-host ratio was then calculated for each SAG (i.e. 1091 each virus-host pair) as the ratio of relative abundance of viral contigs to the relative 1092 abundance of microbial contigs from the same SAG. 1093

32 1094 Figure 5. Maps of DsrC-containing contigs. (A) Seven contigs including dsrC-like 1095 gene detected as viral based on non-reference metrics (ratio of uncharacterized genes, 1096 strand coding bias). (B) Genomic context in which dsrC-like genes are retrieved in 1097 SUP05 microbial contigs from SAG. All contigs above 50 kb containing a dsrC-like gene 1098 were selected and compared to get a summary of the different regions in which dsrC-like 1099 genes are found in SUP05 genomes. (C) Map of dsrC-containing Contigs assembled 1100 from Saanich-Inlet metagenomes. One viral-like contig from SAG (020_11) is included 1101 for comparison. 1102 1103 Figure 5 - figure supplement 1. Multiple alignment of dsrC-like genes from Saanich 1104 Inlet microbial and viral contigs, hydrothermal vent phages, and microbial 1105 genomes. Viral sequences are highlighted in red, Saanich Inlet sequences in bold. Four 1106 groups could be distinguished within this set of sequences (dsrC_1 to 4). The main 1107 residues most likely needed for the protein to function os rDsrC are colored across all 1108 groups and indicated below the alignment. The specific insertion and second c-terminal 1109 cysteine, thought to be required for the dsrC function, and only retrieved in the group 1110 dsrC_2, are highlighted with a black frame. Other conserved residues are colored within 1111 each group, except for groups 3 and 4 where too few sequences are available. 1112 1113 Figure 5 - figure supplement 2. Relative abundance of viral dsrC gene on the 3 years 1114 of sampling in Saanich Inlet compared to the concentration of H2S (left) and O2 1115 (right). 1116 1117 1118 Supplementary file 1. List of viral sequences and defective prophages retrieved in 1119 SUP05/Arctic SAGs. Upper part of the table displays the 12 “SUP05 viral reference” 1120 sequences detected from the presence of viral hallmark gene and their size greater than 1121 15kb or circularity (a), then the 19 SUP05 short viral contigs (b), with taxonomic 1122 affiliation based on viral hallmark genes. The bottom part displays the 19 other sequences 1123 retrieved through the second screening (c), based on the first set as references (including 1124 contigs previously detected as “SUP05 short viral contigs”), and the 18 other putative 1125 viral contigs (d), which affiliation to the viral kingdom is uncertain since they lack a viral 1126 hallmark gene. Estimated genome sizes are based on the size of the most closely related 1127 phage genomes, or in the case of the Microviridae on the length of the circular contigs. 1128 1129 Supplementary file 2. List of contigs containing a putative defective prophage (a) or 1130 a CRISPR locus (b).

33 1131 1132 Supplementary file 3. Number of genes shared between contigs of Single-Amplified 1133 Genome (SAG) with a Gokushovirinae genome and the contigs of the five most 1134 closely related SAGs. For each SAG with a Gokushovirinae genome, the five SAGs 1135 displaying the most identical genes (100% amino-acid identity) are indicated. The 1136 number and ratio of identical genes is displayed for each pair of SAGs, alongside the 1137 number and ratio of genes similar but non identical (BLASTp hit with bit score greater 1138 than 50, e-value lower than 0.001, and identity percentage greater than 30%). Matching 1139 SAGs which also display a Gokushovirinae genome are noted with a star. 1140 1141 Supplementary file 4. List of viral sequences detected for each SUP05 SAGs with at 1142 least one viral contig or defective prophage. For the detection of viral contigs, full- 1143 length contigs are indicated by a cross (x), partial matches (short contigs matching the 1144 full-length sequence) are noted with a dash (-). For the short contigs not similar to any 1145 SUP05 viral reference sequence, the number of different contigs identified is indicated 1146 for each cell. 1147 1148 Supplementary file 5. List of metagenomic datasets used in this study. Viral 1149 metagenomes were used for both viral contig detection and recruitment plots, whereas 1150 microbial metagenomes were only included in the recruitment plot computation. OMZ 1151 samples are highlighted in bold. 1152 1153 Supplementary file 6. List of PFAM domains detcted in the 68 viral sequences 1154 identified. The four putative Auxiliary Metabolism Genes are highlighted in bold. 1155 1156 Supplementary file 7. Number of genes shared between contigs of Single-Amplified 1157 Genome (SAG) with a DsrC gene on a viral contig and the contigs of the five most 1158 closely related SAGs. For each SAG with a DsrC gene on a viral contig, the five SAGs 1159 displaying the most identical genes (100% amino-acid identity) are indicated. The 1160 number and ratio of identical genes is displayed for each pair of SAGs, alongside the 1161 number and ratio of genes similar but non identical (BLASTp hit with bit score greater 1162 than 50, e-value lower than 0.001, and identity percentage greater than 30%). Matching 1163 SAGs which also display a similar DsrC gene on a viral contig are indicated with a star.

34 Depth (m) 200 150 100 50 0 H 0 2 S S (µM) 4 8 Temperature (ITS-90) 9 11

13

Current Current Past 1 SUP05 SAGs SUP05 Number of Number Oxygen 10 (ml/L) 5 3 0 2 4 1 50 A SAR116 Puniceispirillum phage HMO-2011 NC_021864

Puniceispirillum phage HMO-2011 gene 100% Other Caudovirales gene Protein-domain affiliated only Sup05 viral contig Tail fiber MCP Tail Portal BLAST identity % M8F6_0 Replication module Uncharacterized gene RC Terminase 5 Kbp 16%

Gamma proteobacterium SCGC AAA160 D02 AA

SAR11 bacteriophage B Pelagibacter HTVC010P NC_020481

HTVC010P-like gene 100% Other Caudovirales gene Protein-domain affiliated only Sup05 viral contig BLAST identity % Uncharacterized gene C22_13 H-T conn Sc MCP Tail fiber Terminase (large & small subunits) 5 Kbp 26%

Sup05 viral contig Sup05 viral contig K04_42 K04_16 RC RC

C Erwinia phage vB eamP-S6 NC_019514

Erwinia phage vB eamP-S6 gene 100% Other Caudovirales gene Protein-domain affiliated only BLAST identity % Sup05 viral contig Uncharacterized gene K04_0 5 Kbp Replication RC Virion-encapsulated 11% MCP Portal module RNA polymerase

E. Coli phage D Enterobacteria phage T5 NC _005859 50 000 - 100 000bp

Lau87_KiloMoana_Scaffold_3 JMBY01000083 30 000 - 33 000bp RC T5-like gene 100% Other Caudovirales gene Sup05 viral contig Protein-domain affiliated only G10_6 BLAST identity % Uncharacterized gene RC Replication module Tail Capsid decoration 5 Kbp 20% protein

Sup05 viral contig Sup05 viral contig J11_30 J11_34 A SUP05 virus detected 2009 2010 2011 SUP05 viral relative detected dsDNA ssDNA dsDNA ssDNA dsDNA ssDNA SUP05 virus not detected Unconclusive K04_0 K04_0 K04_0 Depth (m) C22_13 G10_6 M8F6_0D2C1_21 C22_13 G10_6 M8F6_0D2C1_21 C22_13 G10_6 M8F6_0D2C1_21 10 SI-SI03-10-Jun09 SI-SI03-10-Feb10 SI-SI03-10-Jan11 SI-SI03-10-Nov09 SI-SI03-10-Aug10 100 SI-SI03-100-Jun09 SI-SI03-100-Feb10 SI-SI03-100-Jan11 SI-SI03-100-Aug09 SI-SI03-100-Jul10 SI-SI03-100-Feb11 SI-SI03-100-Nov09 SI-SI03-100-Aug10 SI-SI03-100-Aug11 120 SI-SI03-120-Jun09 SI-SI03-120-Feb10 SI-SI03-120-Jan11 SI-SI03-120-Aug09 SI-SI03-120-Jul10 SI-SI03-120-Nov09 SI-SI03-120-Aug10 SI-SI03-120-Feb11 135 SI-SI03-135-Jun09 SI-SI03-135-Feb10 SI-SI03-135-Jan11 SI-SI03-135-Aug09 SI-SI03-135-Jul10 SI-SI03-135-Nov09 SI-SI03-135-Aug10 SI-SI03-135-Feb11 150 SI-SI03-150-Jun09 SI-SI03-150-Feb10 SI-SI03-150-Jan11 SI-SI03-150-Aug09 SI-SI03-150-Jul10 SI-SI03-150-Feb11 SI-SI03-150-Nov09 SI-SI03-150-Aug10 SI-SI03-150-Aug11 200 SI-SI03-200-Jun09 SI-SI03-200-Feb10 SI-SI03-200-Jan11 SI-SI03-200-Aug09 SI-SI03-200-Jul10 SI-SI03-200-Feb11 SI-SI03-200-Nov09 SI-SI03-200-Aug10 SI-SI03-200-Aug11

2009 2011 B 2010 Between years C Sup05 Podoviridae : M8F6_0 100

95

90 0 100 200 300 count 2009 2010 85 2011 Amino-acididentity percentage

80 Gene color legend : Caudovirales gene Protein-domain affiliated only K04_0 G10_6 Uncharacterized gene C22_13 M8F6_0 Host abundance Virus abundance Virus to host ratio

100 100 100

120 120 120

135 135 135

Depth (m) 150 150 150

200 200 200

0.00 0.05 0.10 0.15 0.20 0.0000 0.0025 0.0050 0.0075 0.001 0.01 0.1 -1 -1 -1 -1 Bp recruited.contig(s) bp .Mb of metagenome Bp recruited.contig(s) bp .Mb of metagenome Viral abundance / Host abundance (log10 scale)

SUP05_3 M8F6_0 SUP05_1 G10_6 K04_0 C22_13 A. Putative viral contigs from SAGs C. Putative viral contigs from microbial metagenomes

2OG-FeII Oxygenase Chaperonin Cpn10 O20_11 DsrC HypotheticalMetallo-phosphoesterase protein 2OG-FeII Oxygenase Chaperonin Cpn10 DsrC P22_28 - RC SI03_Feb_2011_200m-6943

SI03_July_2010_120m-8553 N14_26 SI03_July_2010_200m-5649

SI03_Jan_2011_120m-111443 O20_11 Hypothetical proteinMetallo-phosphoesterase

SI03_Jan_2011_100m-7994

SI03_July_2010_135m-10616 E23_10

SI03_Aug_2009_100m-4159

O11_45 - RC O20_11 DsrC 1 Kbp Caudovirales gene SI03_Aug_2010_150m-1632 Protein-domain affiliated only 100% Uncharacterized gene E22_44 SI03_Aug_2010_135m-7721 100% nucleic-acid identity matches tBLASTx identity percentage

1kb SI03_Feb_2011_135m-4791 35% Caudovirales gene P07_32 Protein-domain affiliated only SI03_Jan_2011_100m-8544 Uncharacterized gene DsrC SI03_Nov_2009_100m-10892

O20_11

B. Microbial contigs from SAGs

Peptidaseplasmid S41 partitioningPeptidase S24proteinDsrC Soj 4Fe-4S diclusterNitrate reductaseDsrC MobA-like NTP transferase EPSP_synthasePrephenateAminotransferase dehydrogenasePrephenateAminotransferase dehydratase Nitrate reductaseHPP Plasmid stabilisation system protein 755_A05D07_1 - RC domain thiaminS MoeA_N MoaE 746_G10_902_0 MolybdenumMoaC Cofactor Synthesis C

750_C02_903_0 - RC 751_N14_905_3

746_O21_902_3 750_E02_904_0 - RC

750_F03_904_2 - RC 750_L07_904_0

751_G10_905_1 750_I05_904_0

Pyr_redox_2 755_E20_F12_5 - RC 746_E17_902_1 - RC

751_D23_904_0 - RC 754_P16_906_5

750_K23_904_0 - RC 755_B10_F08_2

750_P07_904_3 747_F14_903_0

DrsEDrsEDsrHDsrC

Pyr_redox_2

Nitrate_red_gam Nitrite and Nitritesulphite and sulphite reductase reductase

100% DsrC DsrC Protein-domain 750_I15_904_0 BLASTn identity percentage affiliated gene Uncharacterized gene 1kb 71% 747_I05_903_2 - RC

Thiamine biosynthesis protein tRNA synthetases class II (A) Carbamoyl phosphate synthetase Plasmid stabilisation system protein Ribosomal protein L11 methyltransferase