bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
1 Viruses of the eukaryotic plankton are predicted to increase carbon export
2 efficiency in the global sunlit ocean
3 Romain Blanc-Mathieu1,+, Hiroto Kaneko1,+, Hisashi Endo1, Samuel Chaffron2,3, Rodrigo
4 Hernández-Velázquez1, Canh Hao Nguyen1, Hiroshi Mamitsuka1, Nicolas Henry4,
5 Colomban de Vargas4, Matthew B. Sullivan5, Curtis A. Suttle6, Lionel Guidi7 and
6 Hiroyuki Ogata1,*
7 + Equal contribution
8 * Corresponding author
9 Affiliations:
10 1: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho,
11 Uji, Kyoto 611-0011, Japan
12
13 2: Université de Nantes, Centrale Nantes, CNRS - UMR 6004, LS2N, Nantes, France.
14 3: Research Federation (FR2022) Tara Oceans GO-SEE, Paris, France
15
16 4: Sorbonne Universités, UPMC Université Paris 06, CNRS, Laboratoire Adaptation
17 et Diversité en Milieu Marin, Station Biologique de Roscoff, 29680 Roscoff, France
18
19 5: Department of Microbiology and Department of Civil, Environmental and Geodetic
20 Engineering, Ohio State University, Columbus, OH, United States of America
21
1 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
22 6: Departments of Earth, Ocean & Atmospheric Sciences, Microbiology & Immunology,
23 and Botany, and the Institute for the Oceans and Fisheries, University of British
24 Columbia, Vancouver, BC, V6T 1Z4, Canada
25
26 7: Sorbonne Université, CNRS, Laboratoire d’Océanographie de Villefanche, LOV, F-
27 06230 Villefranche-sur-mer, France
28
29
30 Key words: Biological carbon pump, viruses, shunt and pump, carbon export,
31 Prasinovirus, Mimiviridae, Tara Oceans.
2 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
32 Abstract
33 The biological carbon pump (BCP) is the process by which ocean organisms transfer
34 carbon from surface waters to the ocean interior and seafloor sediments for sequestration.
35 Viruses are thought to increase the efficiency of the BCP by fostering primary production
36 and facilitating the export of carbon-enriched materials in the deep sea (the viral “shunt
37 and pump”). A prior study using an oligotrophic ocean-dominated dataset from the Tara
38 Oceans expedition revealed that bacterial dsDNA viruses are better associated with
39 variation in carbon export than either prokaryotes or eukaryotes, but eukaryotic viruses
40 were not examined. Because eukaryotes contribute significantly to ocean biomass and net
41 production (> 40%), their viruses might also play a role in the BCP. Here, we leveraged
42 deep-sequencing molecular data generated in the framework of Tara Oceans to identify
43 and quantify diverse lineages of large dsDNA and smaller RNA viruses of eukaryotes.
44 We found that the abundance of these viruses explained 49% of the variation in carbon
45 export (compared with 89% by bacterial dsDNA viruses) and also substantially explained
46 the variation in net primary production (76%) and carbon export efficiency (50%).
47 Prasinoviruses infecting Mamiellales as well as Mimivirus relatives putatively infecting
48 haptophytes are among the eukaryotic virus lineages predicted to be the best contributors
49 to BCP efficiency. These findings collectively provide a first-level window into how
50 eukaryotic viruses impact the BCP and suggest that the virus-mediated shunt and pump
51 indeed plays a role.
3 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
52 Introduction
53 The oceanic biological carbon pump (BCP) is an organism-driven process by which
54 atmospheric carbon (i.e. CO2) is transferred to the ocean interior and seafloor for
55 sequestration over periods ranging from centuries to those of geological time-scale
56 durations. Between 15% and 20% of net primary production (NPP) is exported out of the
57 euphotic zone, with 1% to 3% (ca. 0.2 gigatons) of fixed carbon reaching the seafloor
58 annually (De La Rocha and Passow 2007; Herndl and Reinthaler 2013; Quéré et al. 2018;
59 Zhang et al. 2018).
60 Three components of the BCP, namely, carbon fixation, export and
61 remineralization, are governed by complex interactions between numerous members of
62 planktonic communities (Falkowski et al. 1998). Among these organisms, diatoms
63 (Tréguer et al. 2018) and zooplankton (Turner 2015) have been identified as important
64 contributors to the BCP in nutrient-replete oceanic regions. In the oligotrophic ocean,
65 Cyanobacteria and Collodaria (Lomas and Moran 2011), diatoms (Agusti et al. 2015;
66 Leblanc et al. 2018) and other small (pico- to nano-) plankton (Richardson and Jackson
67 2007; Lomas and Moran 2011) have been implicated in the BCP. Overall, the
68 composition of the planktonic community in surface waters, rather than a single species,
69 is better associated with the intensity of the BCP (Boyd and Newton 1995; Guidi et al.
70 2009; Guidi et al. 2016).
71 A recent gene correlation network analysis based on Tara Oceans genomic data,
72 ranging from viruses to zooplankton, outlined predicted species interactions associated
73 with carbon export and revealed a remarkably strong association with bacterial dsDNA
74 viruses relative to either prokaryotes or eukaryotes (Guidi et al. 2016). Cell lysis caused
4 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
75 by viruses promotes the production of dissolved organic matter and accelerates the
76 recycling of potentially growth-limiting nutrient elements (i.e. nitrogen and phosphorus)
77 in the photic zone (the “viral shunt”) (Weinbauer and Peduzzi 1995; Gobler et al. 1997;
78 Wilhelm and Suttle 1999; Weinbauer 2004; Motegi et al. 2009). This recycling in turn
79 may increase primary production and carbon export (Brussaard et al. 2008; Weitz et al.
80 2015). Viruses have also been proposed to drive particle aggregation and transfer into the
81 deep sea via the release of sticky, carbon-rich viral lysate (the “viral shuttle”) (Proctor
82 and Fuhrman 1991; Peduzzi and Weinbauer 1993; Shibata et al. 1997; Weinbauer 2004).
83 The combined effect of these two viral properties, coined “shunt and pump”, is proposed
84 to enhance the magnitude and efficiency of the BCP (Suttle 2007). A study by Guidi et al.
85 (2016) revealed that populations of bacterial dsDNA viruses in the sunlit oligotrophic
86 ocean are strongly associated with variation in the magnitude of the flux of particulate
87 organic carbon. Although viruses of eukaryotes were not included in the above-
88 mentioned study, the significant contribution of their hosts to ocean biomass and net
89 production (Li 1995; Nelson et al. 1995; Worden et al. 2004; Liu et al. 2009) and their
90 observed predominance over prokaryotes in sinking materials of Sargasso Sea
91 oligotrophic surface waters (Fawcett et al. 2011; Lomas and Moran 2011) suggest that
92 eukaryotic viruses are responsible for a substantial part of the variation in exported
93 carbon. Furthermore, the mechanisms causing this association remain to be
94 dissected (Sullivan et al. 2017), as such an association might emerge through indirect
95 correlation with other parameters, such as NPP.
96 In this study, we explored the association between eukaryotic viruses and the BCP
97 as well as the viral mechanisms enhancing this process. We exploited comprehensive
5 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
98 organismal data from the Tara Oceans expedition (Sunagawa et al. 2015; Carradec et al.
99 2018) as well as related measurements of carbon export estimated from particle
100 concentration and size distributions observed in situ (Guidi et al. 2016). The investigation
101 of eukaryotic viruses based on environmental genomics has long been difficult because of
102 their lower concentration in the water column compared with prokaryotic dsDNA viruses
103 (Hingamp et al. 2013). Deep sequencing of planktonic community DNA and RNA, as
104 carried out in Tara Oceans, has enabled the identification of marker genes of major viral
105 groups infecting eukaryotes and begun to reveal that these groups represent a sizeable
106 fraction of the marine virosphere (Hingamp et al. 2013; Allen et al. 2017; Moniruzzaman
107 et al. 2017; Carradec et al. 2018; Mihara et al. 2018). In the present study, we identified
108 several hundred marker-gene sequences of nucleocytoplasmic large DNA viruses
109 (NCLDVs; so-called “giant viruses”) in the prokaryotic size fraction. We also identified
110 RNA viruses in meta-transcriptomes of four eukaryotic size fractions. The resulting
111 profile of viral distributions was compared with the magnitude of carbon export (CE) and
112 its efficiency (CEE) to identify lineages of viruses predicted to contribute to the BCP.
113 Results and Discussion
114 The discovery of diverse NCLDVs and RNA viruses in Tara Oceans gene
115 catalogs
116 We used profile hidden Markov model-based homology searching to identify marker-
117 gene sequences of viruses of eukaryotes in two ocean gene catalogs. These catalogs were
118 previously constructed from environmental shotgun sequence data of samples collected
119 during the Tara Oceans expedition. The first catalog, the Ocean Microbial Reference
6 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
120 Gene Catalog (OM-RGC), contains 40 million non-redundant genes predicted from the
121 assemblies of Tara Oceans viral and microbial metagenomes (Sunagawa et al. 2015). We
122 searched this catalog for NCLDV DNA polymerase family B (PolB) genes, as dsDNA
123 viruses may be present in microbial metagenomes because large virions (> 0.2 µm) have
124 been retained on the filter or viral genomes replicating within picoeukaryotic cells have
125 been captured. The second gene catalog, the Marine Atlas of Tara Oceans Unigenes
126 (MATOU), contains 116 million non-redundant genes predicted from meta-
127 transcriptomes of single-cell microeukaryotes and small multicellular zooplankton
128 (Carradec et al. 2018). We searched this catalog for RNA-dependent RNA polymerase
129 (RdRP) genes of RNA viruses because transcripts of viruses actively infecting their host
130 or genomes of RNA viruses have been captured in it.
131 We identified 3,486 NCLDV PolB sequences and 975 RNA virus RdRP
132 sequences. All except 17 of the NCLDV PolBs were assigned to the families Mimiviridae
133 (n = 2,923), Phycodnaviridae (n = 348) and Iridoviridae (n = 198) (Figure 1a). The large
134 number of PolB sequences assigned to Mimiviridae and Phycodnaviridae compared with
135 other NCLDV families is consistent with a previous observation based on a smaller
136 dataset (Hingamp et al. 2013). The divergence between these environmental sequences
137 and reference sequences from known viral genomes was greater in Mimiviridae than
138 Phycodnaviridae (Figure 1b, Supplementary Figure 1a). Within Mimiviridae, 83% of the
139 sequences were most similar to those from algae-infecting Mimivirus relatives. Among
140 the sequences classified in Phycodnaviridae, 92% were most similar to those in
141 Prasinovirus, while 5% were closest to Yellowstone lake phycodnavirus, which is closely
142 related to Prasinovirus. RdRP sequences were assigned mostly to the order
7 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
143 Picornavirales (n = 325) followed by the families Partitiviridae (n = 131), Narnaviridae
144 (n = 95), Tombusviridae (n = 45) and Virgaviridae (n = 33) (Figure 1c), with most
145 sequences being distant (30% to 40% amino acid identity) from reference viruses (Figure
146 1d, Supplementary Figure 1b). This result is consistent with previous studies on the
147 diversity of marine RNA viruses, in which RNA virus sequences were found to
148 correspond to diverse positive-polarity ssRNA and dsRNA viruses distantly related to
149 well-characterized viruses (reviewed in Culley [2018]).
Tree scale: 0.1 a b Phycodnaviridae Prasinovirus Mimivirus Iridoviridae
Unclassif ied PBC Vs YSL PVs C ro V (n=198) Other NCLDVs NCLDVs Phycodnaviridae HaV Iridoviridae (n=17) Ascoviridae Pithovirus (n=348) Marseilleviridae AaV
PoV
PgV, PpV, HeV- RF02, PkV-RF02, PpDNAV CeV Mi mi vi ri d a e OLPVs Mimiviridae
(n=2,923) Endornaviridae Tombusviridae Tree scale: 1 c d Other families Other families
Bromoviridae Mononegavirales RNA viruses Tymovirales Unclassified (n=57) Picornavirales Flaviviridae (n=78) Narnaviridae (n=325) Bunyavirales Other Bunyavirales families Tot iviridae (n=289)
Partitiviridae Part it iviridae (n=131) Narnaviridae 150 (n=95) Picornavirales
151 Figure 1: Numerous lineages of nucleocytoplasmic large DNA viruses (NCLDVs) 152 and RNA viruses were identified in environmental samples collected during the 153 Tara Oceans expedition (2009-2013). a and c Taxonomic breakdown of environmental 154 sequences of NCLDV DNA polymerase family B and RNA virus RNA-dependent RNA 155 polymerase. b and d Unrooted maximum likelihood phylogenetic trees of environmental 156 (black) and reference (red) viral sequences.
8 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
157 Eukaryotic viruses linked to carbon export efficiency in the sunlit ocean
158 Among the PolB and RdRP sequences identified in the Tara Oceans gene catalogs, 37%
159 and 17% were respectively present in at least five samples and had accompanying carbon
160 export measurement data (Supplementary Table 1). Using abundance profiles of these
161 1,454 marker-gene sequences at 61 sampling sites in the photic zone of 40 Tara Oceans
162 stations (Figure 2a–c), we tested for associations with estimates of carbon export flux at
163 150 meters (CE150) and measurements of carbon export efficiency (CEE). PLS regression
2 164 model explained 49% (R = 49%) of the variation in CE150, with a Pearson correlation
4 165 coefficient between observed and predicted values of 0.67 (P < 1 × 10− ) (Figure 2d); this
166 result demonstrates that viruses of eukaryotes are associated with the magnitude of
167 carbon export. A comparison of viral abundances with CEE revealed that viruses are also
2 4 168 strongly associated with CEE (R = 50%, r = 0.72, P < 1 × 10− ) (Figure 2e and
169 Supplementary Figure 2a; see Supplementary Table 2 for details of PLS models and
170 Supplementary Figure 2b for details of the permutation tests). In these PLS regression
171 models, 62 and 26 viruses were considered to be important predictors (i.e. variable
172 importance in the projection [VIP] score > 2 and regression coefficient > 0) of CE150 and
173 CEE, respectively, and only two viruses were shared between them.
11 174 CE150 was found to be correlated with NPP (r = 0.76, P < 1 × 10− ), which
175 suggests that the association of viruses with CE150 is due to a viral shunt effect or to
176 primary-production enhancement of viral production. In line with this interpretation, we
2 4 177 found that viral abundances can predict NPP variations (R = 59%, r = 0.78, P < 1 × 10− )
178 (Figure 2f); in addition, a larger number of shared predictors were obtained under the
7 179 PLS model for CE150 (44 viruses) than for CEE (3 viruses) (P = 1.3 × 10− ). In contrast,
9 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
180 CEE was not correlated with NPP (r = 0.2, P = 0.1) or CE150 (r = 0.14, P = 0.3). The
181 association of some viruses with CEE is therefore not an indirect relationship caused by
182 NPP; instead, viruses of eukaryotes may have a shuttling effect that enhances the
183 efficiency of carbon export.
a b
145 146 152 23 50 150 25 ● ● 132 ● ●● 30 142 ● ● ● ● ● ● ● ● ● 151 36 137 ● 38 39 148 ● ●●● ● 138 ● 123 ● 109 ● ● 149 41 ● 0 128 ● 122 ● lat ● ● 110 72 ●●● ● ● 52
Latitude 125 ● 100 76 ● 111 ● ● 70 ● 102 64 Mimiviridae Prasinovirus Iridoviridae Other families ● ● ● 124 ● 68 ● 98 ● ● c 93 78 65 −50 66
−100 0 100 200 Longitudelong
Picornavirales Partitiviridae Narnaviridae Other families Unclassified RNA viruses d e f ● ●● ● 8 2 ● 2 2 2 R = 0.49 R = 0.50 R = 0.78 ● ● ● ● r = 0.7 ● r = 0.72 ● r = 0.59 1000 -4 0 -4 -4
6 P<10 P<10 P<10 ●● ●●●
2 ● ● ● ● ● − ●● ● ● 800 ● ●● ●● ●
4 ● ● ● 4 ● ● ● − ● ●●●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● 600 ● 6 ● ●●● ●● ●● ● ● ●● ● ● ● ● ● 2 ● ●●●●● ● − ● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●●● ● predicted carbon export ● ● ● ●●● ●● 8 ●● ●●● ●●● − ● 400 ●● ● ● ● ● ●● ● ● 0 ● ● ● predicted net primary production ● predicted carbon export efficency ● ● 10 0 2 4 6 8 10 12 − −15 −10 −5 0 200 600 1000 1400 184 measured carbon export measured carbon export efficiency measured net primary production
185 Figure 2: Abundance of nucleocytoplasmic large DNA viruses (NCLDVs) and RNA 186 viruses is associated with carbon export efficiency and flux at 150m in the global 187 ocean. a Location of 40 Tara Oceans stations that were the source of 61 prokaryote- 188 enriched metagenomes and 244 eukaryotic meta-transcriptomes (61 sites x 4 organismal 189 size fractions) from surface and DCM layers and measurements of carbon export. b–c 190 Relative abundance of NLCVDs and RNA viruses in samples used for association 191 analyses. d–f Plots showing the correlation between predicted and observed values in a 192 leave-one-out cross-validation test for carbon export flux at 150 m (d), carbon export 193 efficiency (e) and net primary production (f). Each PLS regression model was 194 reconstructed using abundance profiles of 1,454 marker-gene sequences (1,282 PolBs and 195 172 RdRPs) derived from environmental samples. r, Pearson correlation coefficient; R2, 196 square of the correlation between measured response values and predicted response 197 values. R2, which was calculated as 1 – SSE / SST (sum of squares due to error and total), 198 measures how successful the fit is in explaining the variation of the data. Model 199 robustness was assessed using a permutation test (n = 10,000).
10 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.
200 Giant viruses of small algae are predicted to enhance CEE
201 We considered 26 viruses positively associated with CEE with a VIP score > 2
202 (Supplementary File 1) to be important predictors of CEE in the PLS regression. These
203 viruses are hereafter referred to as VIPs. Eleven VIPs (nine NCLDVs and two RNA
204 viruses; red rectangle in Figure 3) were abundant in samples from Eastern India coast
205 (EAFR) and Benguela current coast (BENG) provinces, with some (e.g. polb 00248170
206 and 002035391) also relatively abundant in samples from different oceanic provinces
207 where CEE was also relatively high. This observation suggests that these viruses enhance
208 the BCP in different regions of the global ocean.
209 Most VIPs (23 of 26) belonged to Mimiviridae (n = 11) and Phycodnaviridae (n =
210 12). All the phycodnavirus sequences were most closely related to those of
211 prasinoviruses, with amino acid sequence identities to reference sequences ranging
212 between 64% and 88%. The three remaining VIPs were RNA viruses. The closest
213 homolog of one sequence (rdrp 36496887) was that of an octopus-associated member of
214 Picornavirales (Beihai picorna-like virus 106), while the other two (90355641 and
215 58847823) were most similar to that of a crustacean-associated Bunyavirales virus
216 (Wenling crustacean virus 9).
217 Taxonomic analysis of genes in genome fragments containing VIP PolBs further
218 confirmed their identity as Mimiviridae or Phycodnaviridae (Supplementary Figure 3).
219 One contig (CNJ01018797) was predicted to encode a protein having a high amino-acid
220 identity (94%) to a gene product from the transcriptome of the diatom Triceratium
221 dubium (Supplementary Figure 3b). NCLDVs infecting diatoms have not been reported
11 0 −2 −4 −6 −8 −10 −12 Exp. efficiency Exp. −14
rdrp_107558617 rdrp_77677810 rdrp_105714054 polb_OM.RGC.v1.000017697 rdrp_103795290 rdrp_89591997 polb_OM.RGC.v1.002767869 rdrp_89802926 rdrp_89802927 polb_OM.RGC.v1.002652025 polb_OM.RGC.v1.000110630 polb_OM.RGC.v1.000072765 polb_OM.RGC.v1.000471455 polb_OM.RGC.v1.000287835 222 sopolb_OM.RGC.v1.015629864 far, but a previous co-abundance analysis has linked diatoms and Mimiviridae rdrp_88935421 polb_OM.RGC.v1.000074398 rdrp_84897402 polb_OM.RGC.v1.000495602 223 (Moniruzzamanpolb_OM.RGC.v1.013940193 et al. 2017). polb_OM.RGC.v1.009636901 polb_OM.RGC.v1.000228894 polb_OM.RGC.v1.004472615 polb_OM.RGC.v1.000205020 MS IO SAO SPO NPO NAO polb_OM.RGC.v1.001561949 polb_OM.RGC.v1.0002748680 polb_OM.RGC.v1.000072166−2 polb_OM.RGC.v1.000673383−4 polb_OM.RGC.v1.000844241 polb_OM.RGC.v1.001883961−6 polb_OM.RGC.v1.001897164−8 polb_OM.RGC.v1.001445888−10 polb_OM.RGC.v1.000239928 efficiency −12 Exp. efficiency Exp. polb_OM.RGC.v1.000262952Carbonexport −14 polb_OM.RGC.v1.000062429 polb_OM.RGC.v1.000912507 polb_OM.RGC.v1.004312996rdrp_107558617 polb_OM.RGC.v1.002035391rdrp_77677810 polb_OM.RGC.v1.000249074rdrp_105714054 polb_OM.RGC.v1.000017697polb_OM.RGC.v1.000073352 polb_OM.RGC.v1.000232032rdrp_103795290 polb_OM.RGC.v1.002503270rdrp_89591997 polb_OM.RGC.v1.002767869rdrp_58847823 polb_OM.RGC.v1.015907472rdrp_89802926 polb_OM.RGC.v1.007771300rdrp_89802927 polb_OM.RGC.v1.002652025polb_OM.RGC.v1.000248170 polb_OM.RGC.v1.000110630rdrp_90355641 polb_OM.RGC.v1.000072765polb_OM.RGC.v1.000968766 polb_OM.RGC.v1.000471455polb_OM.RGC.v1.000079111 polb_OM.RGC.v1.000287835polb_OM.RGC.v1.000230224 polb_OM.RGC.v1.015629864polb_OM.RGC.v1.010288541 rdrp_88935421rdrp_36496887 polb_OM.RGC.v1.000074398polb_OM.RGC.v1.000194282 polb_OM.RGC.v1.000611300rdrp_84897402 polb_OM.RGC.v1.000495602polb_OM.RGC.v1.000129518 polb_OM.RGC.v1.013940193polb_OM.RGC.v1.001175669 polb_OM.RGC.v1.009636901polb_OM.RGC.v1.001743853 polb_OM.RGC.v1.000228894polb_OM.RGC.v1.000042601 polb_OM.RGC.v1.004472615polb_OM.RGC.v1.000070861 polb_OM.RGC.v1.000205020polb_OM.RGC.v1.001386278 polb_OM.RGC.v1.001561949polb_OM.RGC.v1.000061559 polb_OM.RGC.v1.000274868 polb_OM.RGC.v1.000072166 polb_OM.RGC.v1.000673383 polb_OM.RGC.v1.000844241Abundance: polb_OM.RGC.v1.001883961
-4 0 4 8 23_SRF 25_SRF 30_SRF 36_SRF 38_SRF 41_SRF 52_SRF 64_SRF 65_SRF 66_SRF 68_SRF 70_SRF 72_SRF 76_SRF 78_SRF 93_SRF 98_SRF polb_OM.RGC.v1.001897164 23_DCM 25_DCM 30_DCM 36_DCM 38_DCM 39_DCM 41_DCM 52_DCM 64_DCM 68_DCM 72_DCM 76_DCM 78_DCM − 4 − 2 0 2 4 6 8 100_SRF 102_SRF 109_SRF 110_SRF 111_SRF 122_SRF 123_SRF 124_SRF 125_SRF 128_SRF 132_SRF 137_SRF 138_SRF 142_SRF 145_SRF 146_SRF 148_SRF 149_SRF 150_SRF 151_SRF 152_SRF polb_OM.RGC.v1.001445888 100_DCM 102_DCM 109_DCM 110_DCM 111_DCM 128_DCM 137_DCM 142_DCM 150_DCM 151_DCM PEOD ARAB ARAB ISSG EAFR BENG CHIL SPSG CHIL PEOD NPST PNEC GFST NAST-W NAST-E polb_OM.RGC.v1.000239928 MEDI MONS SATL SPSG CARB polb_OM.RGC.v1.000262952 polb_OM.RGC.v1.000062429 polb_OM.RGC.v1.000912507 polb_OM.RGC.v1.004312996 polb_OM.RGC.v1.002035391 224 polb_OM.RGC.v1.000249074 polb_OM.RGC.v1.000073352 polb_OM.RGC.v1.000232032 polb_OM.RGC.v1.002503270 225 Figure rdrp_588478233: Biogeography of viral lineages associated with carbon export efficiency polb_OM.RGC.v1.015907472 polb_OM.RGC.v1.007771300 226 (CEE).polb_OM.RGC.v1.000248170 The upper panel shows carbon export efficiency, ( CEdeep – CEsurface ) / CEdeep, for rdrp_90355641 227 thepolb_OM.RGC.v1.000968766 61 sampling sites. Sites with values above and below zero are those where carbon polb_OM.RGC.v1.000079111 228 exportpolb_OM.RGC.v1.000230224 flux was respectively higher and lower in deep than in surface waters; hence, they polb_OM.RGC.v1.010288541 rdrp_36496887 229 correspondpolb_OM.RGC.v1.000194282 to sites where the biological carbon pump was higher or lower compared with polb_OM.RGC.v1.000611300 230 otherpolb_OM.RGC.v1.000129518 sites. The bottom panel is a map reflecting abundances, expressed as center-log polb_OM.RGC.v1.001175669 231 ratiopolb_OM.RGC.v1.001743853 transformed gene-length normalized read counts, of viruses positively associated polb_OM.RGC.v1.000042601 232 withpolb_OM.RGC.v1.000070861 CEE and having a VIP score > 2. MS, Mediterranean Sea; IO, Indian Ocean; SAO, polb_OM.RGC.v1.001386278 233 Southpolb_OM.RGC.v1.000061559 Atlantic Ocean; SPO, South Pacific Ocean; NPO, North Pacific Ocean; NAO, 234 North Atlantic Ocean. The bottom horizontal axis is labeled with Tara Oceans station
235 numbers, sampling 23_SRF depth25_SRF 30_SRF (SRF,36_SRF 38_SRF 41_SRF surface;52_SRF 64_SRF 65_SRF 66_SRF DCM,68_SRF 70_SRF 72_SRF 76_SRF deep78_SRF 93_SRF 98_SRF chlorophyll maximum) and 23_DCM 25_DCM 30_DCM 36_DCM 38_DCM 39_DCM 41_DCM 52_DCM 64_DCM 68_DCM 72_DCM 76_DCM 78_DCM 100_SRF 102_SRF 109_SRF 110_SRF 111_SRF 122_SRF 123_SRF 124_SRF 125_SRF 128_SRF 132_SRF 137_SRF 138_SRF 142_SRF 145_SRF 146_SRF 148_SRF 149_SRF 150_SRF 151_SRF 152_SRF 236 abbreviations of biogeographic provinces. 100_DCM 102_DCM 109_DCM 110_DCM 111_DCM 128_DCM 137_DCM 142_DCM 150_DCM 151_DCM 237 238 Because most VIPs were only distantly related to isolated viruses, inferring
239 characteristics such as host type and infection strategies was not straightforward. We
240 therefore conducted a phylogeny-guided network-based host prediction analysis (Figure
12 241 4; see Methods). Within the Prasinovirus clade, which contained six VIPs, enrichment
242 analysis of predicted host groups along the phylogenetic tree uncovered 10 nodes
243 significantly enriched for five different eukaryotic orders. Mamiellales, the only known
244 host group of prasinoviruses, was detected at nine different nodes (seven of them had no
245 parent-to-children relationships), while the other four eukaryotic orders were found at
246 only one node (or two in the case of Eutreptiales) (Figure 4a). Within Mimiviridae,
247 significant enrichment was detected for 10 different orders; Collodaria was detected at 15
248 nodes (2 of them had no parent-to-children relationships) and Prymnesiales at 6 nodes (3
249 of them had no parent-to-children relationship), while all other orders were present at a
250 maximum of one node with no parent-to-children relationships. The nodes enriched in
251 Prymnesiales and Collodaria fell within or were sister to clades containing reference
252 viruses isolated from Prymnesiales and Phaeocystales (both are haptophytes orders)
253 species. This placement suggests that environmental PolB sequences in this clade belong
254 to Mimiviridae viruses that infect Prymnesiales. Interestingly four of these Mimiviridae
255 viruses were VIP viruses (Figure 4a). The detection of Collodaria may be the result of
256 indirect associations that reflect a symbiotic relationship with Prymnesiales, as some
257 acantharians, a lineage of Rhizaria related to Collodaria, are known to host Prymnesiales
258 species (Mars Brisbin et al. 2018). Nodes enriched in the remaining three other orders fell
259 within clades that were only distantly related to any reference Mimiviridae. The final
260 Mimiviridae VIP (polb 000079111) in the tree was a distant relative of Aureococcus
261 anophagefferens virus (AaV). No host groups were predicted for the clade containing the
262 Picornavirales VIP (Figure 4b). Overall, this host prediction analysis revealed that
13 Tree scale: 1
OM-RGC.v1.000236724 0 1 2 3 OM-RGC.v1.001143251 OM-RGC.v1.001557307 OM-RGC.v1.001274286 OM-RGC.v1.002268728 OtV_RT-2011 OlV_1 OtV_2 OtV_1 263 lineages of prasinoviruses and putative prymnesioviruses are among the viral lineages MpV_SP1 OM-RGC.v1.000435873 MpV_1 OM-RGC.v1.000200745 264 best associated with CEE. OM-RGC.v1.002503270 OM-RGC.v1.001009342 Tree scale: 1 Tree scale: 1 1 2 3 4 5
OM-RGC.v1.002503271 0 1 2 3 4 0 1 2 3 a b Picornaviridae OM-RGC.v1.000236724 OM-RGC.v1.001143256 1 OM-RGC.v1.001557307 5973300 OM-RGC.v1.000194282 OM-RGC.v1.001274286 5890355 OM-RGC.v1.002268728 8625456 OtV_RT-2011 13828801 OlV_1 Kilifi_virus OM-RGC.v1.000065429 OtV_2 50667108 OtV_1 MpV_SP1 50475018 OM-RGC.v1.000435873 50273061 MpV_1 50460461 OM-RGC.v1.000692758 OM-RGC.v1.000200745 BV3 002503270 RiPV-1 OM-RGC.v1.001009342 76240729 OM-RGC.v1.002503271 Figure key 65266812 OM-RGC.v1.000230224 000194282 51241105 OM-RGC.v1.000065429 OM-RGC.v1.000692758 49554577 000230224 82602001 OM-RGC.v1.000192208 42876889 OM-RGC.v1.000192208 OM-RGC.v1.000191910 42456736 OM-RGC.v1.000249993 OM-RGC.v1.000320572 52388179 OM-RGC.v1.000251027 26568779 OM-RGC.v1.000250875 Reference viruses 29675950 OM-RGC.v1.000191910 OM-RGC.v1.000249217 1 76136173 BpV_2 84432702 OM-RGC.v1.000449426 classification 63982103 OM-RGC.v1.000236849 64666778 OM-RGC.v1.000249993 OM-RGC.v1.000457706 109205596 BpV_1 OM-RGC.v1.002318741 Prasinovirus Posavirus_3 OM-RGC.v1.000240849 Posavirus_2 000239928 Bu-1 OM-RGC.v1.000320572 OM-RGC.v1.000240662 Tottori-HG1 000249074 Prasinovirus OM-RGC.v1.000239005 Posavirus_1 OM-RGC.v1.001647925 Bu-3 000248170 Megavirinae 76170225 OM-RGC.v1.000251027 OM-RGC.v1.000231895 59731273 000073352 HaRNAV OM-RGC.v1.000231558 Mesomimivirinae 1493543 OM-RGC.v1.000468213 45643841 OM-RGC.v1.000250875 OM-RGC.v1.000745571 63103152 OM-RGC.v1.000229407 OM-RGC.v1.000231911 Unclassified 37006855 OM-RGC.v1.000122879 33077172 OM-RGC.v1.000479532 53572985 OM-RGC.v1.000249217 Mimivirus Mimiviridae 52479932 Tupanvirus_SL Marnaviridae Tupanvirus_DO 11722969 Moumouvirus 15963946 Megavirus 79173268 BpV_2 Hokovirus Labyrnavirus 44060994 Indivirus 45028609 OM-RGC.v1.003118589 Catovirus Bacillarnavirus BRNAV_G3 OM-RGC.v1.000449426 BsV 29093457 CroV AWando15 OM-RGC.v1.000057033 Picornaviridae AsRNAV OM-RGC.v1.000073545 16996831 OM-RGC.v1.004949926 6753846 OM-RGC.v1.000236849 OM-RGC.v1.000109761 Dicistroviridae OM-RGC.v1.001499365 32150309 OM-RGC.v1.001105267 24747727 OM-RGC.v1.000330656 Unclassified 25124614 OM-RGC.v1.000457706 OM-RGC.v1.000062592 24128940 PoV 16254629 OM-RGC.v1.003483755 15962113 OM-RGC.v1.000492984 Picornavirales OM-RGC.v1.001372408 11236314 BpV_1 OM-RGC.v1.000046074 APLV3 OM-RGC.v1.001041167 33069045 OM-RGC.v1.000161220 7873561 OM-RGC.v1.000068880 7903458 000079111 10956500 OM-RGC.v1.002318741 OM-RGC.v1.000079078 AaV 58837197 OM-RGC.v1.000081579 DpRNAV OM-RGC.v1.000032783 103609885 OM-RGC.v1.000240849 OM-RGC.v1.000042784 AglaRNAV OM-RGC.v1.001148268 2 Variable importance in 27815517 OM-RGC.v1.000083592 10541891 OM-RGC.v1.000081984 OM-RGC.v1.000081503 Projection (VIP) score 12433837 OM-RGC.v1.000239928 OM-RGC.v1.000032792 JP-B OM-RGC.v1.000294968 TPLV1 OM-RGC.v1.000086577 APLV4 OM-RGC.v1.001196663 0 1 2 3 4 29660803 OM-RGC.v1.000335275 107558617 OM-RGC.v1.000240662 OM-RGC.v1.000085188 OM-RGC.v1.000072571 6040574 OM-RGC.v1.000109237 60392120 000274868 5252376 OM-RGC.v1.000249074 OM-RGC.v1.002750835 6175769 OM-RGC.v1.004389005 5971958 OM-RGC.v1.001132733 45022612 OM-RGC.v1.000079253 OM-RGC.v1.000051249 44709998 OM-RGC.v1.000239005 OM-RGC.v1.000417185 44709997 000110630 45370407 OM-RGC.v1.000057148 44049884 OM-RGC.v1.000120786 23270150 OM-RGC.v1.001647925 OM-RGC.v1.000063945 82254166 OM-RGC.v1.001599716 OM-RGC.v1.000387513 35179764 OM-RGC.v1.000155828 3 33049404 OM-RGC.v1.000072094 Correlation to CEE 33056153 OM-RGC.v1.000248170 OM-RGC.v1.000060884 33056152 OM-RGC.v1.001121291 21733119 OM-RGC.v1.000055116 (PLS regression coefficients) OM-RGC.v1.000192105 8901086 OM-RGC.v1.000054815 9116058 OM-RGC.v1.000231895 OM-RGC.v1.000667500 FsPLV OM-RGC.v1.002156050 97531995 OM-RGC.v1.000058962 70963812 OM-RGC.v1.000700607 -0.08 -0.1 21804986 OM-RGC.v1.000073352 OM-RGC.v1.003826014 CsfrRNAV OM-RGC.v1.001170932 OM-RGC.v1.002828763 -0.04 -0.05 RsetRNAV01 OM-RGC.v1.000187401 23395631 OM-RGC.v1.000565520 22980344 OM-RGC.v1.000231558 OM-RGC.v1.000064307 0 0 APLV2 OM-RGC.v1.003855514 26169224 OM-RGC.v1.000071171 OM-RGC.v1.000945523 0.04 0.03 JP-A OM-RGC.v1.000684264 6031360 OM-RGC.v1.000468213 OM-RGC.v1.001819659 83943423 OM-RGC.v1.000064883 0.08 0.06 35193888 OM-RGC.v1.000073441 60395683 000072166 CtenRNAV01 OM-RGC.v1.000745571 OM-RGC.v1.000072372 41784406 OM-RGC.v1.000066031 OM-RGC.v1.000089657 58840825 OM-RGC.v1.000068465 5972655 OM-RGC.v1.000061528 5257731 OM-RGC.v1.000229407 OM-RGC.v1.000061593 36496887 OM-RGC.v1.000286831 97623802 OM-RGC.v1.000158793 OM-RGC.v1.000066893 4 Cosmopolitanism 98629691 OM-RGC.v1.000066963 32661547 OM-RGC.v1.000231911 OM-RGC.v1.000066950 97436908 001897164 (number of station) 97454952 OM-RGC.v1.000630816 478607 OM-RGC.v1.001350990 12533 OM-RGC.v1.000122879 OM-RGC.v1.000025088 54635440 000017697 OM-RGC.v1.000545331 0 0 BV1 OM-RGC.v1.000058552 5309472 OM-RGC.v1.000070978 98636599 OM-RGC.v1.000479532 OM-RGC.v1.000060035 25 10 5296497 OM-RGC.v1.000085957 5998385 OM-RGC.v1.000328875 20 43017044 OM-RGC.v1.000026821 50 38186194 Mimivirus OM-RGC.v1.000235944 21728043 OM-RGC.v1.000079200 OM-RGC.v1.000064954 75 30 20451590 OM-RGC.v1.000088733 21391908 OM-RGC.v1.001004628 21872260 Tupanvirus_SL OM-RGC.v1.000133961 100 40 AmV-A1 OM-RGC.v1.002822478 OM-RGC.v1.001184783 42975987 OM-RGC.v1.000064096 21729800 OM-RGC.v1.004057507 16323154 Tupanvirus_DO OM-RGC.v1.001296274 16005444 OM-RGC.v1.000069361 89028829 OM-RGC.v1.000066071 OM-RGC.v1.00026937Mimiviridae 5 Iflaviridae Moumouvirus OM-RGC.v1.000539407 OM-RGC.v1.000544460 Secoviridae OM-RGC.v1.001391978 OM-RGC.v1.000356992 5 Distribution of relative 79539407 OM-RGC.v1.000066500 35164655 Megavirus OM-RGC.v1.000061999 29626693 OM-RGC.v1.000068784 000287835 abundances 59958592 OM-RGC.v1.000284102 Centovirus_AC OM-RGC.v1.000512830 AGV3 Hokovirus OM-RGC.v1.000073383 ALPV OM-RGC.v1.002317890 OM-RGC.v1.003037006 APLV1 OM-RGC.v1.003008254 25% 50% 75% 29628862 OM-RGC.v1.003013914 98536406 Indivirus Min. Max. OM-RGC.v1.000159675 BRNAV_G2 OM-RGC.v1.001250095 21880330 OM-RGC.v1.000069310 29949318 OM-RGC.v1.000062167 29724237 OM-RGC.v1.003118589 OM-RGC.v1.000064537 OM-RGC.v1.000291604 HoCV-1 OM-RGC.v1.000067387 PSIV OM-RGC.v1.000067277 TrV OM-RGC.v1.000618073 BQCV Catovirus OM-RGC.v1.000847567 HiPV OM-RGC.v1.001249129 OM-RGC.v1.000512118 SINV-1 OM-RGC.v1.000071229 KBV BsV OM-RGC.v1.000071068 84892392 OM-RGC.v1.000490625 52260257 OM-RGC.v1.000079365 6 58838535 OM-RGC.v1.000377714 Predicted eukaryotic host 54294427 OM-RGC.v1.002035311 56759010 CroV OM-RGC.v1.000104259 000072765 group at a given node 57468092 OM-RGC.v1.001058750 77677810 OM-RGC.v1.000027805 84897402 OM-RGC.v1.000057033 OM-RGC.v1.000259426 (from top to bottom of the tree) 22428174 000673383 56725325 OM-RGC.v1.001173520 OM-RGC.v1.000075116 75258857 OM-RGC.v1.000310828 24762850 OM-RGC.v1.000073545 OM-RGC.v1.000073739 24128145 OM-RGC.v1.000071604 Mamiellales Goose_dicistrovirus OM-RGC.v1.000280162 29093047 OM-RGC.v1.000049837 Apis_dicistrovirus OM-RGC.v1.000122430 Pleuronematida AnCV OM-RGC.v1.004949926 OM-RGC.v1.001016487 000061559 DCV OM-RGC.v1.000061401 Eutrepiales CrPV OM-RGC.v1.000621764 15356524 OM-RGC.v1.000109761 OM-RGC.v1.000146594 TSV OM-RGC.v1.000058809 Corethrales MCDV OM-RGC.v1.000061534 OM-RGC.v1.000282052 BRNAV_G1 OM-RGC.v1.000115299 Beroida 58873145 OM-RGC.v1.001499365 OM-RGC.v1.000058812 21733123 OM-RGC.v1.000834606 32165274 OM-RGC.v1.000860059 Chaetocerotales 44278324 OM-RGC.v1.000663173 45092716 OM-RGC.v1.004661489 OM-RGC.v1.001105267 OM-RGC.v1.000166921 Thecosomata 89591997 OM-RGC.v1.000079753 89802927 OLPV_1 89802926 OLPV_2 Syndiniales BRNAV_G5 OM-RGC.v1.000330656 OM-RGC.v1.000241877 32536384 OM-RGC.v1.002963968 58838081 000070861 Aconoidasida OM-RGC.v1.001527691 58865139 OM-RGC.v1.001230579 16250369 OM-RGC.v1.000062592 OM-RGC.v1.000070058 Calanoida 60944405 OM-RGC.v1.000075583 7501593 OM-RGC.v1.000683307 22452106 OM-RGC.v1.000297388 Bacillariales 32202687 CeV PoV OM-RGC.v1.000030837 Cyclotrichida 36505302 000129518 100032960 OM-RGC.v1.001180381 100389194 OM-RGC.v1.002825758 Collodaria 501185 OM-RGC.v1.003483755 OM-RGC.v1.000070477 501186 OM-RGC.v1.003117577 510209 OM-RGC.v1.000148273 Sphaeropleales PparvumV 487149 OM-RGC.v1.000063931 487147 OM-RGC.v1.000492984 OM-RGC.v1.000071750 Prymnesiales 33428197 OM-RGC.v1.000105816 8855753 OM-RGC.v1.000299791 8855752 OM-RGC.v1.000401418 Cryomonadida 9164163 OM-RGC.v1.001372408 OM-RGC.v1.000102622 57554431 OM-RGC.v1.000065051 PkV_RF02 Arthracanthida 8626697 HeV_RF02 8868089 PgV 9164160 OM-RGC.v1.000046074 PpV Noctilucales 4578944 OM-RGC.v1.000033614 OM-RGC.v1.000495698 5015691 OM-RGC.v1.000054135 Peridiniales 2954311 OM-RGC.v1.004447818 2584342 OM-RGC.v1.001041167 OM-RGC.v1.000066804 4578943 OM-RGC.v1.000081486 Chaetocerotales 5015692 OM-RGC.v1.002413486 2523835 000262952 Hemiptera 2523836 OM-RGC.v1.000161220 000912507 2523834 OM-RGC.v1.000068882650 OM-RGC.v1.000079111 OM-RGC.v1.000079078 AaV 266 Figure 4: Phylogenetic position of viruses associated with the efficiency of carbon OM-RGC.v1.000081579 OM-RGC.v1.00003278267 export.3 Details of the PLS model and viral global distribution, relative abundance and OM-RGC.v1.00004278268 putative4 eukaryotic host group are superimposed on the trees. Phylogenetic trees contain OM-RGC.v1.001148268 OM-RGC.v1.00008359269 environmental2 (labeled in red if VIP > 2) and reference (labeled in black) sequences of OM-RGC.v1.000081984 OM-RGC.v1.00008150270 Prasinovirus3 and Mimiviridae DNA polymerase family B (a) and Picornavirales RdRP OM-RGC.v1.00003279271 (b).2 OM-RGC.v1.000294968 OM-RGC.v1.000086577 OM-RGC.v1.001196663 OM-RGC.v1.000335275 OM-RGC.v1.00008518 8 14 OM-RGC.v1.000072571 OM-RGC.v1.000109237 OM-RGC.v1.000274868 OM-RGC.v1.002750835 OM-RGC.v1.004389005 OM-RGC.v1.001132733 OM-RGC.v1.000079253 OM-RGC.v1.000051249 OM-RGC.v1.000417185 OM-RGC.v1.000110630 OM-RGC.v1.000057148 OM-RGC.v1.000120786 OM-RGC.v1.000063945 OM-RGC.v1.001599716 OM-RGC.v1.000387513 OM-RGC.v1.000155828 OM-RGC.v1.000072094 OM-RGC.v1.000060884 OM-RGC.v1.001121291 OM-RGC.v1.000055116 OM-RGC.v1.000192105 OM-RGC.v1.000054815 OM-RGC.v1.000667500 OM-RGC.v1.002156050 OM-RGC.v1.000058962 OM-RGC.v1.000700607 OM-RGC.v1.003826014 OM-RGC.v1.001170932 OM-RGC.v1.002828763 OM-RGC.v1.000187401 OM-RGC.v1.000565520 OM-RGC.v1.000064307 OM-RGC.v1.003855514 OM-RGC.v1.000071171 OM-RGC.v1.000945523 OM-RGC.v1.000684264 OM-RGC.v1.001819659 OM-RGC.v1.000064883 OM-RGC.v1.000073441 OM-RGC.v1.000072166 OM-RGC.v1.000072372 OM-RGC.v1.000066031 OM-RGC.v1.000089657 OM-RGC.v1.000068465 OM-RGC.v1.000061528 OM-RGC.v1.000061593 OM-RGC.v1.000286831 OM-RGC.v1.000158793 OM-RGC.v1.000066893 OM-RGC.v1.000066963 OM-RGC.v1.000066950 OM-RGC.v1.001897164 OM-RGC.v1.000630816 OM-RGC.v1.001350990 OM-RGC.v1.000025088 OM-RGC.v1.000017697 OM-RGC.v1.000545331 OM-RGC.v1.000058552 OM-RGC.v1.000070978 OM-RGC.v1.000060035 OM-RGC.v1.000085957 OM-RGC.v1.000328875 OM-RGC.v1.000026821 OM-RGC.v1.000235944 OM-RGC.v1.000079200 OM-RGC.v1.000064954 OM-RGC.v1.000088733 OM-RGC.v1.001004628 OM-RGC.v1.000133961 OM-RGC.v1.002822478 OM-RGC.v1.001184783 OM-RGC.v1.000064096 OM-RGC.v1.004057507 OM-RGC.v1.001296274 OM-RGC.v1.000069361 OM-RGC.v1.000066071 OM-RGC.v1.000269375 OM-RGC.v1.000539407 OM-RGC.v1.000544460 OM-RGC.v1.001391978 OM-RGC.v1.000356992 OM-RGC.v1.000066500 OM-RGC.v1.000061999 OM-RGC.v1.000068784 OM-RGC.v1.000287835 OM-RGC.v1.000284102 OM-RGC.v1.000512830 OM-RGC.v1.000073383 OM-RGC.v1.002317890 OM-RGC.v1.003037006 OM-RGC.v1.003008254 OM-RGC.v1.003013914 OM-RGC.v1.000159675 OM-RGC.v1.001250095 OM-RGC.v1.000069310 OM-RGC.v1.000062167 OM-RGC.v1.000064537 OM-RGC.v1.000291604 OM-RGC.v1.000067387 OM-RGC.v1.000067277 OM-RGC.v1.000618073 OM-RGC.v1.000847567 OM-RGC.v1.001249129 OM-RGC.v1.000512118 OM-RGC.v1.000071229 OM-RGC.v1.000071068 OM-RGC.v1.000490625 OM-RGC.v1.000079365 OM-RGC.v1.000377714 OM-RGC.v1.002035311 OM-RGC.v1.000104259 OM-RGC.v1.000072765 OM-RGC.v1.001058750 OM-RGC.v1.000027805 OM-RGC.v1.000259426 OM-RGC.v1.000673383 OM-RGC.v1.001173520 OM-RGC.v1.000075116 OM-RGC.v1.000310828 OM-RGC.v1.000073739 OM-RGC.v1.000071604 OM-RGC.v1.000280162 OM-RGC.v1.000049837 OM-RGC.v1.000122430 OM-RGC.v1.001016487 OM-RGC.v1.000061559 OM-RGC.v1.000061401 OM-RGC.v1.000621764 OM-RGC.v1.000146594 OM-RGC.v1.000058809 OM-RGC.v1.000061534 OM-RGC.v1.000282052 OM-RGC.v1.000115299 OM-RGC.v1.000058812 OM-RGC.v1.000834606 OM-RGC.v1.000860059 OM-RGC.v1.000663173 OM-RGC.v1.004661489 OM-RGC.v1.000166921 OM-RGC.v1.000079753 OLPV_1 OLPV_2 OM-RGC.v1.000241877 OM-RGC.v1.002963968 OM-RGC.v1.000070861 OM-RGC.v1.001527691 OM-RGC.v1.001230579 OM-RGC.v1.000070058 OM-RGC.v1.000075583 OM-RGC.v1.000683307 OM-RGC.v1.000297388 CeV OM-RGC.v1.000030837 OM-RGC.v1.000129518 OM-RGC.v1.001180381 OM-RGC.v1.002825758 OM-RGC.v1.000070477 OM-RGC.v1.003117577 OM-RGC.v1.000148273 PparvumV OM-RGC.v1.000063931 OM-RGC.v1.000071750 OM-RGC.v1.000105816 OM-RGC.v1.000299791 OM-RGC.v1.000401418 OM-RGC.v1.000102622 OM-RGC.v1.000065051 PkV_RF02 HeV_RF02 PgV PpV OM-RGC.v1.000033614 OM-RGC.v1.000495698 OM-RGC.v1.000054135 OM-RGC.v1.004447818 OM-RGC.v1.000066804 OM-RGC.v1.000081486 OM-RGC.v1.002413486 OM-RGC.v1.000262952 OM-RGC.v1.000912507 272 Functional traits of eukaryotic organisms interacting with CEE-associated
273 viruses
274 The host assignment of Mamiellales, Prymnesiales (Figure 4a) and Chaetocerotales
275 (Figure 4b) to viral clades containing reference viruses isolated from these organismal
276 groups suggests that the network-based approach was able to predict virus–host
277 relationships with a certain level of reliability using our large dataset despite the reported
278 limitations of similar types of analyses (Coenen and Weitz 2018). This positive signal
279 prompted us to use these species-association networks to investigate taxonomic and
280 functional differences between the eukaryotic organisms predicted to interact with viruses
281 that were either positively or negatively associated with CEE (Figure 5). At the functional
282 level, viruses positively associated with CEE had a greater number of connections with
283 chloroplast-bearing eukaryotes (Q = 0.03) and with silicified eukaryotes (Q = 0.05)
284 compared with viruses negatively associated with CEE (Supplementary Table 4); this
285 suggests that these virus–host systems contribute to CEE. This result also supports the
286 proposed idea that viral lysis enhances the effect of the BCP; that is, lysis of autotrophs is
287 expected to release growth-limiting nutrients (Gobler et al. 1997) that are recycled to
288 grow new cells, while carbon-rich cell debris, potentially ballasted with silica, tends to
289 aggregate and sink (Suttle 2007).
15 Viral sequence ID Taxa Reg. Coeff. Reg. Chloroplast Silicification Calcification polb_OM.RGC.v1.001386278 0.064 Dinophyceae n polb_OM.RGC.v1.000070861 0.063 Collodaria polb_OM.RGC.v1.000042601 0.062 Dinophyceae n polb_OM.RGC.v1.001743853 0.057 Trachymedusae polb_OM.RGC.v1.001175669 0.056 Rotaliida n polb_OM.RGC.v1.000129518 0.055 Dinophyceae n polb_OM.RGC.v1.000611300 0.047 Ulvophyceae n polb_OM.RGC.v1.000194282 0.047 Gymnodiniales n rdrp_36496887 0.038 Poecilostomatoida polb_OM.RGC.v1.010288541 0.038 Bacillariophyta n n polb_OM.RGC.v1.000230224 0.033 Arthra_Symphy_F1 polb_OM.RGC.v1.000079111 0.032 Cephaloidophorida polb_OM.RGC.v1.000968766 0.032 Calanoida polb_OM.RGC.v1.000248170 0.028 Beroida polb_OM.RGC.v1.007771300 0.027 Dinophyceae polb_OM.RGC.v1.015907472 0.024 Coscinodiscophytina n n polb_OM.RGC.v1.000232032 0.013 Gregarinomorphea polb_OM.RGC.v1.000073352 0.012 Mamiellophyceae n polb_OM.RGC.v1.000249074 0.010 Dinophysiales polb_OM.RGC.v1.002035391 0.010 Coscinodiscophytina n n polb_OM.RGC.v1.004312996 0.009 Coscinodiscophytina n n polb_OM.RGC.v1.000912507 0.009 Acanthoecida polb_OM.RGC.v1.000062429 -0.009 Peridiniales polb_OM.RGC.v1.000239928 -0.018 Dinophyceae n polb_OM.RGC.v1.001445888 -0.026 Copepoda polb_OM.RGC.v1.001897164 -0.035 Copepoda polb_OM.RGC.v1.001883961 -0.043 Ostracoda polb_OM.RGC.v1.000844241 -0.043 Peridiniales polb_OM.RGC.v1.000673383 -0.043 Calanoida polb_OM.RGC.v1.000072166 -0.043 Labyrinthulida polb_OM.RGC.v1.000274868 -0.045 Chlorophyceae n polb_OM.RGC.v1.001561949 -0.048 Ostracoda polb_OM.RGC.v1.000205020 -0.048 Collodaria polb_OM.RGC.v1.004472615 -0.049 Calanoida polb_OM.RGC.v1.000228894 -0.051 Variosea polb_OM.RGC.v1.009636901 -0.051 Collodaria polb_OM.RGC.v1.000495602 -0.056 MAST_3-12 polb_OM.RGC.v1.000074398 -0.061 Ichthyosporea polb_OM.RGC.v1.015629864 -0.062 Peridiniales polb_OM.RGC.v1.000287835 -0.064 Collodaria polb_OM.RGC.v1.000471455 -0.065 Collodaria polb_OM.RGC.v1.000110630 -0.073 Acantharea polb_OM.RGC.v1.002652025 -0.075 Dinophyceae n rdrp_89802927 -0.075 Calanoida rdrp_89802926 -0.075 Insecta polb_OM.RGC.v1.002767869 -0.078 Collodaria rdrp_103795290 -0.082 Collodaria polb_OM.RGC.v1.000017697 -0.083 Calanoida 290 rdrp_77677810 -0.099 Leptothecata
291 Figure 5: Functional traits and taxonomy of putative hosts of viruses associated with 292 the efficiency of carbon export. “Putative hosts” refers to eukaryotic V9-OTUs with the 293 highest absolute edge weight in FlashWeave interaction networks. Taxa correspond to 294 higher rank classification of the V9-OTU. Black squares indicate the presence of a 295 functional trait for the V9-OTU best connected to the virus.
16 296 Integrating previous observations to model a virus-driven BCP
297 Our PLS models revealed that viruses of eukaryotes are associated not only with the
298 magnitude of carbon export, but also with the export efficiency of particles. This finding
299 suggests that viruses contribute to BCP enhancement by promoting aggregate formation
300 and subsequent sedimentation. Consistent with this view, prasinoviruses and putative
301 prymnesioviruses were found to be among the viruses best associated with the efficiency
302 of carbon export under our PLS model for CEE. Several prasinoviruses (fourteen with an
303 available genome sequence) have been isolated for three genera of green microalgae,
304 namely Micromonas, Ostreococcus and Bathycoccus. These genera belong to the order
305 Mamiellales, which are bacterial-sized phytoplankton common in coastal and oceanic
306 environments and considered to be influential actors in oceanic systems (Not et al. 2012;
307 Weynberg et al. 2017). Chrysochromulina ericina virus, Prymnesium parvum DNA virus,
308 Prymnesium kappa virus-RF02 and Haptolina ericina virus are Mimiviridae viruses
309 isolated from Prymnesiales (Haptophyta) cultures (Figure 4). They are closely related to
310 Phaeocystis globosa virus (PgV) and Phaeocystis poutcheti virus isolated from
311 Phaeocystales (Haptophyta) cultures. Both Prymnesiales and Phaeocystales species have
312 non-calcifying organic scales and some species can form blooms and colonies. The
313 detection of several putative prymnesioviruses in samples containing small cells is
314 consistent with the observation of diverse and abundant noncalcifying haptophytes in
315 open oceans (Liu et al. 2009; Endo et al. 2018). According to the estimates of Liu et al
316 (2009), these noncalcifying haptophytes contribute twice as much as diatoms or
317 prokaryotes to photosynthetic production, thereby highlighting their importance in the
318 BCP. Prymnesiales are an important mixotrophic group in oligotrophic ocean (Unrein et
17 319 al. 2014) and mixotrophy is proposed to increase vertical carbon flux by enabling the
320 uptake organic forms of nutrients (Ward and Follows 2016). Viral infection of such
321 organisms may thus help increase BCP efficiency.
322 As most other cultured algal viruses, prasinoviruses and prymnesioviruses are
323 lytic. Viral lysis not only generates the cellular debris used to build aggregates; it also
324 facilitates the process of aggregation (Weinbauer 2004) via viral-induced production of
325 agglomerative compounds, such as transparent exopolymer particles (TEPs). Lønborg et
326 al. (2013) proposed that the increased DOC and TEP production observed in cultures of
327 Micromonas pusilla infected with prasinoviruses (compared with non-infected cultures)
328 could stimulate particle aggregation and thus carbon export out of the photic zone. Some
329 prasinoviruses encode glycosyltransferases of the GT2 family. Similar to the a098r gene
330 (GT2) in Paramecium bursaria Chlorella virus 1, the expression of GT2 family members
331 during infection possibly leads to the production of a dense fibrous hyaluronan network at
332 the surface of infected cells. Such a network may trigger the aggregation of host cells,
333 facilitate viral propagation (Van Etten et al. 2017) and increase the cell wall C:N ratio.
334 PgV, closely related to prymnesioviruses (Figure 4), has been linked with increased TEP
335 production and aggregate formation during the termination of Phaeocystis bloom
336 (Brussaard et al. 2007).
337 Other previous experimental and in situ studies have linked viruses and viral
338 activity to vertical carbon flux, cell sinking, and aggregate formation. Several laboratory
339 experiments have revealed associations between viruses and sinking cells or between
340 viruses and aggregate formation. For example, sinking rates of the toxic bloom former
341 Heterosigma akashiwo are elevated following viral infection (Lawrence and Suttle 2004).
18 342 As early as 1993, an experimental study revealed increased particle aggregation and
343 primary productivity upon addition of virally enriched material to seawater samples
344 (Peduzzi and Weinbauer 1993). More recently, cultures of the diatom Chaetoceros
345 tenuissimus infected with a DNA virus (CtenDNAV type II) have been shown to produce
346 higher levels of large-sized particles (50 to 400 µm) compared with non-infected cultures
347 (Yamada et al. 2018). In situ studies have uncovered vertical transport of viruses between
348 photic and aphotic zones in the Pacific Ocean (Hurwitz et al. 2015) and in Tara Oceans
349 samples (Brum et al. 2015), which suggests their association with sinking material. In
350 addition, the level of active Emiliania huxleyi virus (EhV) infection in a coccolithophore
351 bloom has been found to be correlated with water column depth, which suggests that
352 infected cells are exported from the surface to deeper waters in aggregate form (Sheyn et
353 al. 2018). EhVs infecting E. huxleyi have also been found to facilitate particle
354 aggregation and downward vertical flux of both particulate organic and particulate
355 inorganic carbon in the North Atlantic (Laber et al. 2018). No Phycodnaviridae PolB had
356 their best hits to EhVs in our study, which is probably because of sampling regions and
357 periods. These previous observations come as potential mechanistic explanations that
358 provide support for our results suggesting eukaryotic viruses have a “shuttle effect” that
359 enhance carbon export.
360 Viruses versus hosts as key markers of CE and CEE variations
361 Our analysis used similar data and analytical methods as those of Guidi et al. (2016), who
362 explored the link between planktonic networks and the BCP. Those authors reported that
363 bacterial viruses are better associated with CE150 than are cellular organisms. In their
364 study, host lineages for prasinoviruses (Mamiellales) and prymnesioviruses
19 365 (Prymnesiales) were not found to be associated with CE150. The fact that we detected
366 these viruses as predictors of CEE may mean that viruses are better predictors than their
367 hosts, most likely because viruses may better reflect the overall process and mechanisms
368 by which carbon is exported. Viral infection causes lysis and aggregation of cells,
369 enhances sinking rates and increases the export of organic carbon out of the sunlit ocean.
370 Among the eukaryotic organisms studied by Guidi et al. (2016), collodarians,
371 dinoflagellates and copepods were reported to be important contributors to CE150. Some
372 of the viral taxa we found to be associated with CEE were connected with collodarians
373 and dinoflagellates in our host prediction analysis, although not as clearly as with
374 Mamiellales and Prymnesiales. Known copepod viruses have a ssDNA genome and
375 hence were not captured in our data. Only one genome sequence of a ssRNA virus and
376 one PolB sequence of a NCLDV are available for dinoflagellates viruses, and none have
377 been reported for collodarians. Viruses infecting these organisms may be among those we
378 found to be associated with CEE, but the lack of reference viruses hampered their
379 identification from the environmental samples.
380 Conclusions
381 In this study, we identified associations between CEE and diverse groups of planktonic
382 eukaryotic viruses collected in the photic ocean from 61 sampling sites during the Tara
383 Oceans expedition. Two lineages were predicted to be important groups facilitating
384 carbon export in the global ocean: prasinoviruses infecting tiny green algae of the order
385 Mamiellales, and Mimivirus relatives putatively infecting haptophytes of the order
386 Prymnesiales. This finding suggests that these eukaryotic viruses, like cyanophages
20 387 previously reported to be strongly associated with carbon export, are important factors in
388 the BCP. This idea is in agreement with observations that their hosts, despite being one to
389 two orders of magnitude less abundant than cyanobacteria, likely dominate carbon
390 biomass and net production in the ocean. The global-scale association between eukaryotic
391 viruses and CEE is empirical evidence supporting the shunt and pump hypothesis,
392 namely, that viruses enhance carbon pump efficiency by increasing the relative amount of
393 organic carbon exported from the surface to the deeper sea.
394 Methods
395 Data context
396 We used publicly available data generated in the framework of the Tara Oceans
397 expedition (Karsenti et al. 2011). Single-copy marker-gene sequences for NCLDVs and
398 RNA viruses were identified from two gene catalogs: the Ocean Microbial Reference
399 Gene Catalog (OM-RGC) and the Marine Atlas of Tara Oceans Unigenes (MATOU).
400 The viral marker-gene abundance profiles used in our study for prokaryotic-sized
401 metagenomes and eukaryotic-sized meta-transcriptomes were from Sunagawa et al.
402 (2015) and Carradec et al. (2018), respectively. For eukaryotic 18S rDNA V9 OTUs, we
403 used an updated version of the data of de Vargas et al. (2015) that included functional
404 trait annotations (chloroplast-bearing, silicified and calcified organisms) of V9-OTUs.
405 Abundance profiles are compositional matrices in which gene abundances are expressed
406 as unnormalized (V9 barcode data) or gene-length normalized (shotgun data) read counts.
407 Eukaryotic plankton samples were filtered for categorization into the following size
408 classes: piconano (0.8–5 µm), nano (5–20 µm), micro (20–180 µm) and meso (180–2000
21 1 2 1 409 µm) (Pesant et al. 2015). Indirect measurements of carbon export (mg− m− d− ) in 5-m
410 increments from the surface to a 1000-m depth were taken from Guidi et al. (2016). The
411 original measurements were derived from the distribution of particle sizes and
412 abundances collected using an underwater vision profiler (Picheral et al. 2010). These
413 raw data are available from PANGEA (Picheral et al. 2014). Net primary production
414 (NPP) data were extracted from 8-day composites of the vertically generalized production
415 model (VGPM) (Behrenfeld and Falkowski 1997) at the week of sampling.
416 Carbon export and carbon export efficiency
1 2 1 417 Carbon flux profiles (mg− m− d− ) based on particle size distribution and abundances
418 were estimated according to Guidi et al. (2008). Carbon flux values from depths of 30 to
419 970 m were divided into 20-m bins, each obtained by averaging the carbon flux values
420 from the designated 20 m from profiles gathered during biological sampling within a 25-
421 km radius during 24 h and having less than 50% missing data (Supplementary Figure 4).
422 For compatibility with Guidi et al. (2016), carbon export was defined as the carbon flux
423 at 150 m. Carbon export efficiency was calculated as follows: CEE = (CEdeep −
424 CEsurface)/CEdeep (Sarmiento 2013) and CEEʹ = CEdeep/CEsurface (Francois et al. 2002). In
425 these formulas, CEsurface is the maximum average carbon flux within the first 150 m (the
426 maximum layer depth varied between 20 to 130 m in this dataset), and CEdeep is the
427 average carbon flux 200 m below this maximum.
428 Acquisition of viral marker genes from ocean gene catalogs
429 Viral genes were collected from two gene catalogs: OM-RGC version 1 (Sunagawa et al.
430 2015) and MATOU (Carradec et al. 2018). The OM-RGC data were taxonomically re-
22 431 annotated as in Carradec et al. (2018). Importantly, the NCBI reference tree used to
432 determine the last common ancestor was modified to reflect the current NCLDVs
433 classification (see Carradec et al. [2018] for details). We classified viral gene sequences
434 as eukaryotic or prokaryotic according to their best BLAST score against viral sequences
435 in the Virus-Host DB (Mihara et al. 2016). DNA polymerase B (PolB) and RNA-
436 dependent RNA polymerase (RdRP) genes were used as markers for NCLDVs and RNA
437 viruses, respectively. For PolB, reference proteins from the NCLDV orthologous gene
438 cluster NCVOG0038 (Yutin et al. 2009) were aligned using linsi (Katoh and Standley
439 2013). A HMM profile was constructed from the resulting alignment using hmmbuild
440 (Eddy 1998). This PolB HMM profile was searched against OM-RGC amino acid
441 sequences annotated as NCLDVs, and sequences longer than 200 amino acids and having
5 442 hits with E-values < 1 × 10− were selected as putative PolBs. RdRP sequences were
443 chosen from the MATOU catalog as follows: sequences assigned to Pfam profiles
444 PF00680, PF00946, PF00972, PF00978, PF00998, PF02123, PF04196, PF04197 or
445 PF05919 and annotated as RNA viruses were retained as RdRPs. The resulting 3,486
446 PolB sequences and 975 RdRP sequences were used along with their abundance matrices
447 for PLS regression analyses, co-abundance network reconstructions and to build
448 phylogenies.
449 Testing for associations of viral abundance with CE150 and CEE
450 To test for an association between the abundance of viral marker genes and CEE, we
451 proceeded as follows. Only marker genes represented by at least two reads in five or
452 more samples were retained. To cope with the sparsity and composition of the data, gene-
453 length normalized read-count matrices were center-log transformed separately for RNA
23 454 viruses and NCLDVs. We next selected genes with a Spearman correlation coefficient to
455 CE150 or CEE greater than 0.2 or smaller than -0.2 (zero values were removed). To
456 assess the association between these marker genes and CEE, we used partial least square
457 (PLS) regression analysis as described in Guidi et al. (2016). The number of components
458 selected for the PLS model was chosen to minimize the root mean square error of
459 prediction (Supplementary Table 2). We assessed the strength of the association between
460 carbon export (the response variable) and viral abundance (the explanatory variable) by
461 correlating leave-one-out cross-validation predicted values with the measured carbon
462 export values. We tested the significance of the correlation by comparing the original
463 Pearson coefficient between explanatory and response variables with the distribution of
464 Pearson coefficients obtained from PLS models reconstructed on permutated data (10,000
465 iterations). We estimated the contribution of each gene (predictor) according to its
466 variable importance in the projection (VIP) (Chong and Jun 2005). The VIP measure of a
467 predictor estimates its contribution in the PLS regression. Predictors having high VIP
468 values (> 2) are assumed to be important for the PLS prediction of the response variable.
469 Phylogenetic analysis
470 Environmental PolB sequences annotated as Phycodnaviridae or Mimiviridae were
471 searched against reference PolB sequences using BLAST. Environmental sequences with
472 hits to a reference sequence having >40% identity and an alignment length greater than
473 400 amino acids were kept and aligned with reference sequences using linsi (Katoh and
474 Standley 2013). We further removed sequences that occurred in fewer than 10 samples to
475 reduce the size of the tree. Environmental RdRP sequences annotated as Picornavirales
476 were translated into six frames of amino acid sequences, and reference Picornavirales
24 477 RdRP sequences collected from the Virus-Host DB (Mihara et al. 2016) were searched
478 against the CDD database (Marchler-Bauer et al. 2015) using rpsBLAST. The resulting
479 alignment was used to trim reference and environmental RdRP sequences to the
480 conserved part corresponding to the domain, CDD: 279070, before alignment with linsi.
481 PolB and RdRP multiple sequence alignments were manually curated to discard poorly
482 aligned sequences. Trees were constructed using the build function of ETE3 (Huerta-
483 Cepas et al. 2016) as implemented at GenomeNet (https://www.genome.jp/tools-bin/ete).
484 Columns were automatically trimmed using trimAl (Capella-Gutiérrez et al. 2009), and
485 trees were constructed using FastTree with default settings (Price et al. 2009).
486 Virus–eukaryote co-occurrence analysis
487 We used FlashWeave (Tackmann et al. 2018) with Julia 1.0 to detect virus–host
488 associations on the basis of their co-abundance patterns. Read count matrices for the
489 3,486 PolB, 975 RdRP and 18S V9 DNA barcodes obtained from samples collected at the
490 same location were fed into FlashWeave. 18S-V9 data were filtered to retain OTUs with
491 an informative taxonomic annotation. 18S-V9 OTUs and viral marker sequences were
492 further filtered to conserve only those present in at least five samples. FlashWeave
493 analyses were run separately for each of the four eukaryotic size fractions. The number of
494 samples per size fraction ranged between 51 and 99 for NCLDVs and between 36 and 62
495 for RNA viruses. The number of retained OTUs per size fraction varied between 1,775
496 and 2,269 for NCLDVs and between 48 and 125 for RNA viruses (Supplementary Table
497 3). FlashWeave was run under the settings heterogenous = false and sensitive = true, and
498 PolB/RdRP-V9-OTU edges receiving a weight > 0.2 and a Q-value < 0.01(the default)
499 were retained.
25 500 Mapping of putative hosts onto viral phylogenies
501 In our association networks, individual viral sequences were often associated with
502 multiple 18S-V9 OTUs belonging to drastically different eukaryotic groups, a situation
503 that can reflect interactions among multiple organisms but also noise associated with this
504 type of analysis (Coenen and Weitz 2018). To extract meaningful information from these
505 networks we reasoned as follow: We assumed that evolutionarily related viruses infect
506 evolutionary related organisms, similar to the case of phycodnaviruses (Clasen and Suttle
507 2009). In the interaction networks, the number of connections between viruses in a given
508 clade and its eukaryotic host group should accordingly be enriched compared with the
509 number of connections with non-host organisms arising by chance. Following this
510 reasoning, we assigned the most likely eukaryotic host group as follows. The tree
511 constructed from viral marker-gene sequences (PolB or RdRP) was traversed from root to
512 tips to visit every node. We counted how many connections existed between leaves of
513 each node and the V9-OTUs of a given eukaryotic group (order level). We then tested
514 whether the node was enriched compared with the rest of the tree using Fischer’s exact
515 test and applied the Benjamini-Hochberg procedure to control the FDR among
516 comparisons of each eukaryotic taxon (order level). To avoid the appearance of
517 significant associations driven by a few highly connected leaves, we required half of the
518 leaves within a node to be connected to a given eukaryotic group. Significant enrichment
519 of connections between a virus clade and a eukaryotic order was considered to be
520 indicative of a possible virus–host relationship. We refer to the above approach, in which
521 taxon interactions are mapped onto a phylogenetic tree of a target group using the
522 organism’s associations predicted from a species co-abundance-based network, as TIM,
26 523 for Taxon Interaction Mapper. The script is available upon request. This approach can be
524 extended to interactions other than virus–host relationships.
525 Acknowledgements
526 This work was supported by JSPS/KAKENHI (Nos. 26430184, 18H02279, 19H05667 to
527 H.O.), Scientific Research on Innovative Areas from the Ministry of Education, Culture,
528 Science, Sports and Technology (MEXT) of Japan (Nos. 16H06429, 16K21723,
529 16H06437 to H.O.), the Collaborative Research Program of the Institute for Chemical
530 Research, Kyoto University (2019-29 to S.C.), the Future Development Funding Program
531 of Kyoto University Research Coordination Alliance (to R.B.M.), ICR-KU International
532 Short-term Exchange Program for Young Researchers (to S.C.). Computational time was
533 provided by the SuperComputer System, Institute for Chemical Research, Kyoto
534 University. We thank the Tara Oceans consortium, the projects Oceanomics and France
535 Genomique (grants ANR-11-BTBR-0008 and ANR-10-INBS-09), people and sponsors
536 who supported the Tara Oceans Expedition (http://www.embl.de/tara-oceans/) for
537 making the data accessible. This is contribution number XXX of the Tara Oceans
538 Expedition 2009–2013.
539 References
540 Agusti S, González-Gordillo JI, Vaqué D, Estrada M, Cerezo MI, Salazar G, Gasol JM,
541 Duarte CM. 2015. Ubiquitous healthy diatoms in the deep sea confirm deep
542 carbon injection by the biological pump. Nat. Commun. 6:7608.
27 543 Alberti A, Poulain J, Engelen S, Labadie K, Romac S, Ferrera I, Albini G, Aury J-M,
544 Belser C, Bertrand A, et al. 2017. Viral to metazoan marine plankton
545 nucleotide sequences from the Tara Oceans expedition. Sci. Data [Internet] 4.
546 Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5538240/
547 Allen LZ, McCrow JP, Ininbergs K, Dupont CL, Badger JH, Hoffman JM, Ekman M,
548 Allen AE, Bergman B, Venter JC. 2017. The Baltic Sea Virome: Diversity and
549 Transcriptional Activity of DNA and RNA Viruses. mSystems 2:e00125-16.
550 Behrenfeld MJ, Falkowski PG. 1997. Photosynthetic rates derived from satellite-
551 based chlorophyll concentration. Limnol. Oceanogr. 42:1–20.
552 Boyd P, Newton P. 1995. Evidence of the potential influence of planktonic
553 community structure on the interannual variability of particulate organic
554 carbon flux. Deep Sea Res. Part Oceanogr. Res. Pap. 42:619–639.
555 Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, Chaffron S,
556 Cruaud C, Vargas C de, Gasol JM, et al. 2015. Patterns and ecological drivers of
557 ocean viral communities. Science 348:1261498.
558 Brussaard CPD, Bratbak G, Baudoux A-C, Ruardij P. 2007. Phaeocystis and its
559 interaction with viruses. Biogeochemistry 83:201–215.
560 Brussaard CPD, Wilhelm SW, Thingstad F, Weinbauer MG, Bratbak G, Heldal M,
561 Kimmance SA, Middelboe M, Nagasaki K, Paul JH, et al. 2008. Global-scale
28 562 processes with a nanoscale drive: the role of marine viruses. ISME J. 2:575–
563 578.
564 Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for
565 automated alignment trimming in large-scale phylogenetic analyses.
566 Bioinforma. Oxf. Engl. 25:1972–1973.
567 Carradec Q, Pelletier E, Silva CD, Alberti A, Seeleuthner Y, Blanc-Mathieu R, Lima-
568 Mendez G, Rocha F, Tirichine L, Labadie K, et al. 2018. A global ocean atlas of
569 eukaryotic genes. Nat. Commun. 9:373.
570 Chong I-G, Jun C-H. 2005. Performance of some variable selection methods when
571 multicollinearity is present. Chemom. Intell. Lab. Syst. 78:103–112.
572 Clasen JL, Suttle CA. 2009. Identification of freshwater Phycodnaviridae and their
573 potential phytoplankton hosts, using DNA pol sequence fragments and a
574 genetic-distance analysis. Appl. Environ. Microbiol. 75:991–997.
575 Coenen AR, Weitz JS. 2018. Limitations of Correlation-Based Inference in Complex
576 Virus-Microbe Communities. mSystems 3:e00084-18.
577 Culley A. 2018. New insight into the RNA aquatic virosphere via viromics. Virus Res.
578 244:84–89.
579 De La Rocha CL, Passow U. 2007. Factors influencing the sinking of POC and the
580 efficiency of the biological carbon pump. Deep Sea Res. Part II Top. Stud.
581 Oceanogr. 54:639–658.
29 582 Eddy SR. 1998. Profile hidden Markov models. Bioinformatics 14:755–763.
583 Endo H, Ogata H, Suzuki K. 2018. Contrasting biogeography and diversity patterns
584 between diatoms and haptophytes in the central Pacific Ocean. Sci. Rep.
585 8:10916.
586 Falkowski PG, Barber RT, Smetacek V. 1998. Biogeochemical Controls and
587 Feedbacks on Ocean Primary Production. Science 281:200–206.
588 Fawcett SE, Lomas MW, Casey JR, Ward BB, Sigman DM. 2011. Assimilation of
589 upwelled nitrate by small eukaryotes in the Sargasso Sea. Nat. Geosci. 4:717–
590 722.
591 Francois R, Honjo S, Krishfield R, Manganini S. 2002. Factors controlling the flux of
592 organic carbon to the bathypelagic zone of the ocean. Glob. Biogeochem.
593 Cycles 16:34-1-34–20.
594 Gobler CJ, Hutchins DA, Fisher NS, Cosper EM, Saňudo‐Wilhelmy SA. 1997. Release
595 and bioavailability of C, N, P Se, and Fe following viral lysis of a marine
596 chrysophyte. Limnol. Oceanogr. 42:1492–1504.
597 Guidi L, Chaffron S, Bittner L, Eveillard D, Larhlimi A, Roux S, Darzi Y, Audic S,
598 Berline L, Brum JR, et al. 2016. Plankton networks driving carbon export in
599 the oligotrophic ocean. Nature 532:465.
30 600 Guidi L, Jackson GA, Stemmann L, Miquel JC, Picheral M, Gorsky G. 2008.
601 Relationship between particle size distribution and flux in the mesopelagic
602 zone. Deep Sea Res. Part Oceanogr. Res. Pap. 55:1364–1374.
603 Guidi L, Stemmann L, Jackson GA, Ibanez F, Claustre H, Legendre L, Picheral M,
604 Gorskya G. 2009. Effects of phytoplankton community on production, size,
605 and export of large aggregates: A world-ocean analysis. Limnol. Oceanogr.
606 54:1951–1963.
607 Herndl GJ, Reinthaler T. 2013. Microbial control of the dark end of the biological
608 pump. Nat. Geosci. 6:718–724.
609 Hingamp P, Grimsley N, Acinas SG, Clerissi C, Subirana L, Poulain J, Ferrera I,
610 Sarmento H, Villar E, Lima-Mendez G, et al. 2013. Exploring nucleo-
611 cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes. ISME
612 J. 7:1678–1695.
613 Huerta-Cepas J, Serra F, Bork P. 2016. ETE 3: Reconstruction, analysis and
614 visualization of phylogenomic data. Mol. Biol. Evol.:msw046.
615 Hurwitz BL, Brum JR, Sullivan MB. 2015. Depth-stratified functional and taxonomic
616 niche specialization in the “core” and “flexible” Pacific Ocean Virome. ISME J.
617 9:472–484.
31 618 Karsenti E, Acinas SG, Bork P, Bowler C, Vargas CD, Raes J, Sullivan M, Arendt D,
619 Benzoni F, Claverie J-M, et al. 2011. A Holistic Approach to Marine Eco-
620 Systems Biology. PLOS Biol. 9:e1001177.
621 Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version
622 7: improvements in performance and usability. Mol. Biol. Evol. 30:772–780.
623 Laber CP, Hunter JE, Carvalho F, Collins JR, Hunter EJ, Schieler BM, Boss E, More K,
624 Frada M, Thamatrakoln K, et al. 2018. Coccolithovirus facilitation of carbon
625 export in the North Atlantic. Nat. Microbiol. 3:537–547.
626 Lawrence JE, Suttle CA. 2004. Effect of viral infection on sinking rates of
627 Heterosigma akashiwo and its implications for bloom termination. Aquat.
628 Microb. Ecol. 37:1–7.
629 Leblanc K, Quéguiner B, Diaz F, Cornet V, Michel-Rodriguez M, Madron XD de,
630 Bowler C, Malviya S, Thyssen M, Grégori G, et al. 2018. Nanoplanktonic
631 diatoms are globally overlooked but play a role in spring blooms and carbon
632 export. Nat. Commun. 9:953.
633 Li W. 1995. Composition of Ultraphytoplankton in the Central North-Atlantic. Mar.
634 Ecol. Prog. Ser. 122:1–8.
635 Liu H, Probert I, Uitz J, Claustre H, Aris-Brosou S, Frada M, Not F, de Vargas C. 2009.
636 Extreme diversity in noncalcifying haptophytes explains a major pigment
637 paradox in open oceans. Proc. Natl. Acad. Sci. U. S. A. 106:12803–12808.
32 638 Lomas MW, Moran SB. 2011. Evidence for aggregation and export of cyanobacteria
639 and nano-eukaryotes from the Sargasso Sea euphotic zone. Biogeosciences
640 8:203–216.
641 Lønborg C, Middelboe M, Brussaard CPD. 2013. Viral lysis of Micromonas pusilla:
642 impacts on dissolved organic matter production and composition.
643 Biogeochemistry 116:231–240.
644 Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He
645 J, Gwadz M, Hurwitz DI, et al. 2015. CDD: NCBI’s conserved domain database.
646 Nucleic Acids Res. 43:D222–D226.
647 Mars Brisbin M, Mesrop LY, Grossmann MM, Mitarai S. 2018. Intra-host Symbiont
648 Diversity and Extended Symbiont Maintenance in Photosymbiotic
649 Acantharea (Clade F). Front. Microbiol. [Internet] 9. Available from:
650 https://www.frontiersin.org/articles/10.3389/fmicb.2018.01998/full
651 Mihara T, Koyano H, Hingamp P, Grimsley N, Goto S, Ogata H. 2018. Taxon Richness
652 of “Megaviridae” Exceeds those of Bacteria and Archaea in the Ocean.
653 Microbes Environ. 33:162–171.
654 Mihara T, Nishimura Y, Shimizu Y, Nishiyama H, Yoshikawa G, Uehara H, Hingamp P,
655 Goto S, Ogata H. 2016. Linking Virus Genomes with Host Taxonomy. Viruses
656 8:66.
33 657 Moniruzzaman M, Wurch LL, Alexander H, Dyhrman ST, Gobler CJ, Wilhelm SW.
658 2017. Virus-host relationships of marine single-celled eukaryotes resolved
659 from metatranscriptomics. Nat. Commun. 8:16054.
660 Motegi C, Nagata T, Miki T, Weinbauer MG, Legendre L, Rassoulzadegand F. 2009.
661 Viral control of bacterial growth efficiency in marine pelagic environments.
662 Limnol. Oceanogr. 54:1901–1910.
663 Nelson DM, Tréguer P, Brzezinski MA, Leynaert A, Quéguiner B. 1995. Production
664 and dissolution of biogenic silica in the ocean: Revised global estimates,
665 comparison with regional data and relationship to biogenic sedimentation.
666 Glob. Biogeochem. Cycles 9:359–372.
667 Not F, Siano R, Kooistra WHCF, Simon N, Vaulot D, Probert I. 2012. Chapter One -
668 Diversity and Ecology of Eukaryotic Marine Phytoplankton. In: Piganeau G,
669 editor. Advances in Botanical Research. Vol. 64. Genomic Insights into the
670 Biology of Algae. Academic Press. p. 1–53. Available from:
671 http://www.sciencedirect.com/science/article/pii/B978012391499600001
672 3
673 Peduzzi P, Weinbauer MG. 1993. Effect of concentrating the virus-rich 2-2nm size
674 fraction of seawater on the formation of algal flocs (marine snow). Limnol.
675 Oceanogr. 38:1562–1565.
34 676 Pesant S, Not F, Picheral M, Kandels-Lewis S, Bescot NL, Gorsky G, Iudicone D,
677 Karsenti E, Speich S, Troublé R, et al. 2015. Open science resources for the
678 discovery and analysis of Tara Oceans data. Sci. Data 2:150023.
679 Picheral M, Guidi L, Stemmann L, Karl DM, Iddaoud G, Gorsky G. 2010. The
680 Underwater Vision Profiler 5: An advanced instrument for high spatial
681 resolution studies of particle size spectra and zooplankton. Limnol. Oceanogr.
682 Methods 8:462–473.
683 Picheral M, Searson S, Taillandier V, Bricaud A, Boss E, Stemmann L, Gorsky G, Tara
684 Oceans Consortium C, Tara Oceans Expedition P. 2014. Vertical profiles of
685 environmental parameters measured from physical, optical and imaging
686 sensors during Tara Oceans expedition 2009-2013. Available from:
687 https://doi.pangaea.de/10.1594/PANGAEA.836321
688 Price MN, Dehal PS, Arkin AP. 2009. FastTree: Computing Large Minimum Evolution
689 Trees with Profiles instead of a Distance Matrix. Mol. Biol. Evol. 26:1641–
690 1650.
691 Proctor LM, Fuhrman JA. 1991. Roles of viral infection in organic particle flux. Mar.
692 Ecol. Prog. Ser. 69:133–142.
693 Quéré CL, Andrew RM, Friedlingstein P, Sitch S, Hauck J, Pongratz J, Pickers PA,
694 Korsbakken JI, Peters GP, Canadell JG, et al. 2018. Global Carbon Budget 2018.
695 Earth Syst. Sci. Data 10:2141–2194.
35 696 Richardson TL, Jackson GA. 2007. Small Phytoplankton and Carbon Export from the
697 Surface Ocean. Science 315:838–840.
698 Sarmiento JL. 2013. Ocean Biogeochemical Dynamics. Princeton University Press
699 Sheyn U, Rosenwasser S, Lehahn Y, Barak-Gavish N, Rotkopf R, Bidle KD, Koren I,
700 Schatz D, Vardi A. 2018. Expression profiling of host and virus during a
701 coccolithophore bloom provides insights into the role of viral infection in
702 promoting carbon export. ISME J.:1.
703 Shibata A, Kogure K, Koike I, Ohwada K. 1997. Formation of submicron colloidal
704 particles from marine bacteria by viral infection. Mar. Ecol. Prog. Ser.
705 155:303–307.
706 Steinberg DK, Mooy BASV, Buesseler KO, Boyd PW, Kobari T, Karl DM. 2008.
707 Bacterial vs. zooplankton control of sinking particle flux in the ocean’s
708 twilight zone. Limnol. Oceanogr. 53:1327–1338.
709 Sullivan MB, Weitz JS, Wilhelm S. 2017. Viral ecology comes of age. Environ.
710 Microbiol. Rep. 9:33–35.
711 Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B,
712 Zeller G, Mende DR, Alberti A, et al. 2015. Ocean plankton. Structure and
713 function of the global ocean microbiome. Science 348:1261359.
714 Suttle CA. 2007. Marine viruses--major players in the global ecosystem. Nat. Rev.
715 Microbiol. 5:801–812.
36 716 Tackmann J, Rodrigues JFM, Mering C von. 2018. Rapid inference of direct
717 interactions in large-scale ecological networks from heterogeneous microbial
718 sequencing data. bioRxiv:390195.
719 Tréguer P, Bowler C, Moriceau B, Dutkiewicz S, Gehlen M, Aumont O, Bittner L,
720 Dugdale R, Finkel Z, Iudicone D, et al. 2018. Influence of diatom diversity on
721 the ocean biological carbon pump. Nat. Geosci. 11:27–37.
722 Turner JT. 2015. Zooplankton fecal pellets, marine snow, phytodetritus and the
723 ocean’s biological pump. Prog. Oceanogr. 130:205–248.
724 Unrein F, Gasol JM, Not F, Forn I, Massana R. 2014. Mixotrophic haptophytes are key
725 bacterial grazers in oligotrophic coastal waters. ISME J. 8:164–176.
726 Van Etten JL, Agarkova I, Dunigan DD, Tonetti M, De Castro C, Duncan GA. 2017.
727 Chloroviruses Have a Sweet Tooth. Viruses [Internet] 9. Available from:
728 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408694/
729 de Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le
730 Bescot N, Probert I, et al. 2015. Ocean plankton. Eukaryotic plankton
731 diversity in the sunlit ocean. Science 348:1261605.
732 Ward BA, Follows MJ. 2016. Marine mixotrophy increases trophic transfer efficiency,
733 mean organism size, and vertical carbon flux. Proc. Natl. Acad. Sci. 113:2958–
734 2963.
37 735 Weinbauer MG. 2004. Ecology of prokaryotic viruses. FEMS Microbiol. Rev. 28:127–
736 181.
737 Weinbauer MG, Peduzzi P. 1995. Effect of virus-rich high molecular weight
738 concentrates of seawater on the dynamics of dissolved amino acids and
739 carbohydrates. Mar. Ecol. Prog. Ser. 127:245–253.
740 Weitz JS, Stock CA, Wilhelm SW, Bourouiba L, Coleman ML, Buchan A, Follows MJ,
741 Fuhrman JA, Jover LF, Lennon JT, et al. 2015. A multitrophic model to
742 quantify the effects of marine viruses on microbial food webs and ecosystem
743 processes. ISME J. 9:1352–1364.
744 Weynberg KD, Allen MJ, Wilson WH. 2017. Marine Prasinoviruses and Their Tiny
745 Plankton Hosts: A Review. Viruses 9.
746 Wilhelm SW, Suttle CA. 1999. Viruses and Nutrient Cycles in the SeaViruses play
747 critical roles in the structure and function of aquatic food webs. BioScience
748 49:781–788.
749 Worden AZ, Nolan JK, Palenik B. 2004. Assessing the dynamics and ecology of
750 marine picophytoplankton: The importance of the eukaryotic component.
751 Limnol. Oceanogr. 49:168–179.
752 Yamada Y, Tomaru Y, Fukuda H, Nagata T. 2018. Aggregate Formation During the
753 Viral Lysis of a Marine Diatom. Front. Mar. Sci. [Internet] 5. Available from:
754 https://www.frontiersin.org/articles/10.3389/fmars.2018.00167/full
38 755 Yutin N, Wolf YI, Raoult D, Koonin EV. 2009. Eukaryotic large nucleo-cytoplasmic
756 DNA viruses: Clusters of orthologous genes and reconstruction of viral
757 genome evolution. Virol. J. 6:223.
758 Zhang C, Dang H, Azam F, Benner R, Legendre L, Passow U, Polimene L, Robinson C,
759 Suttle CA, Jiao N. 2018. Evolving paradigms in biological carbon cycling in the
760 ocean. Natl. Sci. Rev. 5:481–499.
761 Supplementary material
762 • Supplement.docx: supplementary figures and tables.
763 • Supplementary_file_1: Information on the predictors (viruses) in the PLS model
764 predicting carbon export efficiency. Predictors are sorted by VIP score.
765 • Supplementary data available at GenomeNet FTP:
766 ftp://ftp.genome.jp/pub/db/community/tara/Cpump/
39