<<

bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

1 of the eukaryotic are predicted to increase carbon export

2 efficiency in the global sunlit ocean

3 Romain Blanc-Mathieu1,+, Hiroto Kaneko1,+, Hisashi Endo1, Samuel Chaffron2,3, Rodrigo

4 Hernández-Velázquez1, Canh Hao Nguyen1, Hiroshi Mamitsuka1, Nicolas Henry4,

5 Colomban de Vargas4, Matthew B. Sullivan5, Curtis A. Suttle6, Lionel Guidi7 and

6 Hiroyuki Ogata1,*

7 + Equal contribution

8 * Corresponding author

9 Affiliations:

10 1: Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho,

11 Uji, Kyoto 611-0011, Japan

12

13 2: Université de Nantes, Centrale Nantes, CNRS - UMR 6004, LS2N, Nantes, France.

14 3: Research Federation (FR2022) Tara Oceans GO-SEE, Paris, France

15

16 4: Sorbonne Universités, UPMC Université Paris 06, CNRS, Laboratoire Adaptation

17 et Diversité en Milieu Marin, Station Biologique de Roscoff, 29680 Roscoff, France

18

19 5: Department of Microbiology and Department of Civil, Environmental and Geodetic

20 Engineering, Ohio State University, Columbus, OH, United States of America

21

1 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

22 6: Departments of Earth, Ocean & Atmospheric Sciences, Microbiology & Immunology,

23 and Botany, and the Institute for the Oceans and Fisheries, University of British

24 Columbia, Vancouver, BC, V6T 1Z4, Canada

25

26 7: Sorbonne Université, CNRS, Laboratoire d’Océanographie de Villefanche, LOV, F-

27 06230 Villefranche-sur-mer, France

28

29

30 Key words: Biological carbon pump, viruses, shunt and pump, carbon export,

31 Prasinovirus, , Tara Oceans.

2 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

32 Abstract

33 The biological carbon pump (BCP) is the process by which ocean organisms transfer

34 carbon from surface waters to the ocean interior and seafloor sediments for sequestration.

35 Viruses are thought to increase the efficiency of the BCP by fostering primary production

36 and facilitating the export of carbon-enriched materials in the deep sea (the viral “shunt

37 and pump”). A prior study using an oligotrophic ocean-dominated dataset from the Tara

38 Oceans expedition revealed that bacterial dsDNA viruses are better associated with

39 variation in carbon export than either prokaryotes or , but eukaryotic viruses

40 were not examined. Because eukaryotes contribute significantly to ocean biomass and net

41 production (> 40%), their viruses might also play a role in the BCP. Here, we leveraged

42 deep-sequencing molecular data generated in the framework of Tara Oceans to identify

43 and quantify diverse lineages of large dsDNA and smaller RNA viruses of eukaryotes.

44 We found that the abundance of these viruses explained 49% of the variation in carbon

45 export (compared with 89% by bacterial dsDNA viruses) and also substantially explained

46 the variation in net primary production (76%) and carbon export efficiency (50%).

47 Prasinoviruses infecting Mamiellales as well as Mimivirus relatives putatively infecting

48 haptophytes are among the eukaryotic lineages predicted to be the best contributors

49 to BCP efficiency. These findings collectively provide a first-level window into how

50 eukaryotic viruses impact the BCP and suggest that the virus-mediated shunt and pump

51 indeed plays a role.

3 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

52 Introduction

53 The oceanic biological carbon pump (BCP) is an organism-driven process by which

54 atmospheric carbon (i.e. CO2) is transferred to the ocean interior and seafloor for

55 sequestration over periods ranging from centuries to those of geological time-scale

56 durations. Between 15% and 20% of net primary production (NPP) is exported out of the

57 euphotic zone, with 1% to 3% (ca. 0.2 gigatons) of fixed carbon reaching the seafloor

58 annually (De La Rocha and Passow 2007; Herndl and Reinthaler 2013; Quéré et al. 2018;

59 Zhang et al. 2018).

60 Three components of the BCP, namely, carbon fixation, export and

61 remineralization, are governed by complex interactions between numerous members of

62 planktonic communities (Falkowski et al. 1998). Among these organisms, diatoms

63 (Tréguer et al. 2018) and zooplankton (Turner 2015) have been identified as important

64 contributors to the BCP in nutrient-replete oceanic regions. In the oligotrophic ocean,

65 Cyanobacteria and Collodaria (Lomas and Moran 2011), diatoms (Agusti et al. 2015;

66 Leblanc et al. 2018) and other small (pico- to nano-) plankton (Richardson and Jackson

67 2007; Lomas and Moran 2011) have been implicated in the BCP. Overall, the

68 composition of the planktonic community in surface waters, rather than a single species,

69 is better associated with the intensity of the BCP (Boyd and Newton 1995; Guidi et al.

70 2009; Guidi et al. 2016).

71 A recent gene correlation network analysis based on Tara Oceans genomic data,

72 ranging from viruses to zooplankton, outlined predicted species interactions associated

73 with carbon export and revealed a remarkably strong association with bacterial dsDNA

74 viruses relative to either prokaryotes or eukaryotes (Guidi et al. 2016). Cell lysis caused

4 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

75 by viruses promotes the production of dissolved organic matter and accelerates the

76 recycling of potentially growth-limiting nutrient elements (i.e. nitrogen and phosphorus)

77 in the photic zone (the “viral shunt”) (Weinbauer and Peduzzi 1995; Gobler et al. 1997;

78 Wilhelm and Suttle 1999; Weinbauer 2004; Motegi et al. 2009). This recycling in turn

79 may increase primary production and carbon export (Brussaard et al. 2008; Weitz et al.

80 2015). Viruses have also been proposed to drive particle aggregation and transfer into the

81 deep sea via the release of sticky, carbon-rich viral lysate (the “viral shuttle”) (Proctor

82 and Fuhrman 1991; Peduzzi and Weinbauer 1993; Shibata et al. 1997; Weinbauer 2004).

83 The combined effect of these two viral properties, coined “shunt and pump”, is proposed

84 to enhance the magnitude and efficiency of the BCP (Suttle 2007). A study by Guidi et al.

85 (2016) revealed that populations of bacterial dsDNA viruses in the sunlit oligotrophic

86 ocean are strongly associated with variation in the magnitude of the flux of particulate

87 organic carbon. Although viruses of eukaryotes were not included in the above-

88 mentioned study, the significant contribution of their hosts to ocean biomass and net

89 production (Li 1995; Nelson et al. 1995; Worden et al. 2004; Liu et al. 2009) and their

90 observed predominance over prokaryotes in sinking materials of Sargasso Sea

91 oligotrophic surface waters (Fawcett et al. 2011; Lomas and Moran 2011) suggest that

92 eukaryotic viruses are responsible for a substantial part of the variation in exported

93 carbon. Furthermore, the mechanisms causing this association remain to be

94 dissected (Sullivan et al. 2017), as such an association might emerge through indirect

95 correlation with other parameters, such as NPP.

96 In this study, we explored the association between eukaryotic viruses and the BCP

97 as well as the viral mechanisms enhancing this process. We exploited comprehensive

5 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

98 organismal data from the Tara Oceans expedition (Sunagawa et al. 2015; Carradec et al.

99 2018) as well as related measurements of carbon export estimated from particle

100 concentration and size distributions observed in situ (Guidi et al. 2016). The investigation

101 of eukaryotic viruses based on environmental genomics has long been difficult because of

102 their lower concentration in the water column compared with prokaryotic dsDNA viruses

103 (Hingamp et al. 2013). Deep sequencing of planktonic community DNA and RNA, as

104 carried out in Tara Oceans, has enabled the identification of marker genes of major viral

105 groups infecting eukaryotes and begun to reveal that these groups represent a sizeable

106 fraction of the marine virosphere (Hingamp et al. 2013; Allen et al. 2017; Moniruzzaman

107 et al. 2017; Carradec et al. 2018; Mihara et al. 2018). In the present study, we identified

108 several hundred marker-gene sequences of nucleocytoplasmic large DNA viruses

109 (NCLDVs; so-called “giant viruses”) in the prokaryotic size fraction. We also identified

110 RNA viruses in meta-transcriptomes of four eukaryotic size fractions. The resulting

111 profile of viral distributions was compared with the magnitude of carbon export (CE) and

112 its efficiency (CEE) to identify lineages of viruses predicted to contribute to the BCP.

113 Results and Discussion

114 The discovery of diverse NCLDVs and RNA viruses in Tara Oceans gene

115 catalogs

116 We used profile hidden Markov model-based homology searching to identify marker-

117 gene sequences of viruses of eukaryotes in two ocean gene catalogs. These catalogs were

118 previously constructed from environmental shotgun sequence data of samples collected

119 during the Tara Oceans expedition. The first catalog, the Ocean Microbial Reference

6 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

120 Gene Catalog (OM-RGC), contains 40 million non-redundant genes predicted from the

121 assemblies of Tara Oceans viral and microbial metagenomes (Sunagawa et al. 2015). We

122 searched this catalog for NCLDV DNA polymerase family B (PolB) genes, as dsDNA

123 viruses may be present in microbial metagenomes because large virions (> 0.2 µm) have

124 been retained on the filter or viral replicating within picoeukaryotic cells have

125 been captured. The second gene catalog, the Marine Atlas of Tara Oceans Unigenes

126 (MATOU), contains 116 million non-redundant genes predicted from meta-

127 transcriptomes of single-cell microeukaryotes and small multicellular zooplankton

128 (Carradec et al. 2018). We searched this catalog for RNA-dependent RNA polymerase

129 (RdRP) genes of RNA viruses because transcripts of viruses actively infecting their host

130 or genomes of RNA viruses have been captured in it.

131 We identified 3,486 NCLDV PolB sequences and 975 RNA virus RdRP

132 sequences. All except 17 of the NCLDV PolBs were assigned to the families Mimiviridae

133 (n = 2,923), (n = 348) and (n = 198) (Figure 1a). The large

134 number of PolB sequences assigned to Mimiviridae and Phycodnaviridae compared with

135 other NCLDV families is consistent with a previous observation based on a smaller

136 dataset (Hingamp et al. 2013). The divergence between these environmental sequences

137 and reference sequences from known viral genomes was greater in Mimiviridae than

138 Phycodnaviridae (Figure 1b, Supplementary Figure 1a). Within Mimiviridae, 83% of the

139 sequences were most similar to those from algae-infecting Mimivirus relatives. Among

140 the sequences classified in Phycodnaviridae, 92% were most similar to those in

141 Prasinovirus, while 5% were closest to Yellowstone lake phycodnavirus, which is closely

142 related to Prasinovirus. RdRP sequences were assigned mostly to the order

7 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

143 (n = 325) followed by the families (n = 131), Narnaviridae

144 (n = 95), (n = 45) and (n = 33) (Figure 1c), with most

145 sequences being distant (30% to 40% amino acid identity) from reference viruses (Figure

146 1d, Supplementary Figure 1b). This result is consistent with previous studies on the

147 diversity of marine RNA viruses, in which RNA virus sequences were found to

148 correspond to diverse positive-polarity ssRNA and dsRNA viruses distantly related to

149 well-characterized viruses (reviewed in Culley [2018]).

Tree scale: 0.1 a b Phycodnaviridae Prasinovirus Mimivirus Iridoviridae

Unclassif ied PBC Vs YSL PVs C ro V (n=198) Other NCLDVs NCLDVs Phycodnaviridae HaV Iridoviridae (n=17) Pithovirus (n=348) AaV

PoV

PgV, PpV, HeV- RF02, PkV-RF02, PpDNAV CeV Mi mi vi ri d a e OLPVs Mimiviridae

(n=2,923) Tombusviridae Tree scale: 1 c d Other families Other families

Bromoviridae RNA viruses Unclassified (n=57) Picornavirales (n=78) Narnaviridae (n=325) Other Bunyavirales families Tot iviridae (n=289)

Partitiviridae Part it iviridae (n=131) Narnaviridae 150 (n=95) Picornavirales

151 Figure 1: Numerous lineages of nucleocytoplasmic large DNA viruses (NCLDVs) 152 and RNA viruses were identified in environmental samples collected during the 153 Tara Oceans expedition (2009-2013). a and c Taxonomic breakdown of environmental 154 sequences of NCLDV DNA polymerase family B and RNA virus RNA-dependent RNA 155 polymerase. b and d Unrooted maximum likelihood phylogenetic trees of environmental 156 (black) and reference (red) viral sequences.

8 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

157 Eukaryotic viruses linked to carbon export efficiency in the sunlit ocean

158 Among the PolB and RdRP sequences identified in the Tara Oceans gene catalogs, 37%

159 and 17% were respectively present in at least five samples and had accompanying carbon

160 export measurement data (Supplementary Table 1). Using abundance profiles of these

161 1,454 marker-gene sequences at 61 sampling sites in the photic zone of 40 Tara Oceans

162 stations (Figure 2a–c), we tested for associations with estimates of carbon export flux at

163 150 meters (CE150) and measurements of carbon export efficiency (CEE). PLS regression

2 164 model explained 49% (R = 49%) of the variation in CE150, with a Pearson correlation

4 165 coefficient between observed and predicted values of 0.67 (P < 1 × 10− ) (Figure 2d); this

166 result demonstrates that viruses of eukaryotes are associated with the magnitude of

167 carbon export. A comparison of viral abundances with CEE revealed that viruses are also

2 4 168 strongly associated with CEE (R = 50%, r = 0.72, P < 1 × 10− ) (Figure 2e and

169 Supplementary Figure 2a; see Supplementary Table 2 for details of PLS models and

170 Supplementary Figure 2b for details of the permutation tests). In these PLS regression

171 models, 62 and 26 viruses were considered to be important predictors (i.e. variable

172 importance in the projection [VIP] score > 2 and regression coefficient > 0) of CE150 and

173 CEE, respectively, and only two viruses were shared between them.

11 174 CE150 was found to be correlated with NPP (r = 0.76, P < 1 × 10− ), which

175 suggests that the association of viruses with CE150 is due to a viral shunt effect or to

176 primary-production enhancement of viral production. In line with this interpretation, we

2 4 177 found that viral abundances can predict NPP variations (R = 59%, r = 0.78, P < 1 × 10− )

178 (Figure 2f); in addition, a larger number of shared predictors were obtained under the

7 179 PLS model for CE150 (44 viruses) than for CEE (3 viruses) (P = 1.3 × 10− ). In contrast,

9 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

180 CEE was not correlated with NPP (r = 0.2, P = 0.1) or CE150 (r = 0.14, P = 0.3). The

181 association of some viruses with CEE is therefore not an indirect relationship caused by

182 NPP; instead, viruses of eukaryotes may have a shuttling effect that enhances the

183 efficiency of carbon export.

a b

145 146 152 23 50 150 25 ● ● 132 ● ●● 30 142 ● ● ● ● ● ● ● ● ● 151 36 137 ● 38 39 148 ● ●●● ● 138 ● 123 ● 109 ● ● 149 41 ● 0 128 ● 122 ● lat ● ● 110 72 ●●● ● ● 52

Latitude 125 ● 100 76 ● 111 ● ● 70 ● 102 64 Mimiviridae Prasinovirus Iridoviridae Other families ● ● ● 124 ● 68 ● 98 ● ● c 93 78 65 −50 66

−100 0 100 200 Longitudelong

Picornavirales Partitiviridae Narnaviridae Other families Unclassified RNA viruses d e f ● ●● ● 8 2 ● 2 2 2 R = 0.49 R = 0.50 R = 0.78 ● ● ● ● r = 0.7 ● r = 0.72 ● r = 0.59 1000 -4 0 -4 -4

6 P<10 P<10 P<10 ●● ●●●

2 ● ● ● ● ● − ●● ● ● 800 ● ●● ●● ●

4 ● ● ● 4 ● ● ● − ● ●●●● ● ● ● ● ● ●● ●● ● ● ●●● ● ● ● ● ●● ● ● ● ● ● 600 ● 6 ● ●●● ●● ●● ● ● ●● ● ● ● ● ● 2 ● ●●●●● ● − ● ● ● ● ●●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●●● ● predicted carbon export ● ● ● ●●● ●● 8 ●● ●●● ●●● − ● 400 ●● ● ● ● ● ●● ● ● 0 ● ● ● predicted net primary production ● predicted carbon export efficency ● ● 10 0 2 4 6 8 10 12 − −15 −10 −5 0 200 600 1000 1400 184 measured carbon export measured carbon export efficiency measured net primary production

185 Figure 2: Abundance of nucleocytoplasmic large DNA viruses (NCLDVs) and RNA 186 viruses is associated with carbon export efficiency and flux at 150m in the global 187 ocean. a Location of 40 Tara Oceans stations that were the source of 61 prokaryote- 188 enriched metagenomes and 244 eukaryotic meta-transcriptomes (61 sites x 4 organismal 189 size fractions) from surface and DCM layers and measurements of carbon export. b–c 190 Relative abundance of NLCVDs and RNA viruses in samples used for association 191 analyses. d–f Plots showing the correlation between predicted and observed values in a 192 leave-one-out cross-validation test for carbon export flux at 150 m (d), carbon export 193 efficiency (e) and net primary production (f). Each PLS regression model was 194 reconstructed using abundance profiles of 1,454 marker-gene sequences (1,282 PolBs and 195 172 RdRPs) derived from environmental samples. r, Pearson correlation coefficient; R2, 196 square of the correlation between measured response values and predicted response 197 values. R2, which was calculated as 1 – SSE / SST (sum of squares due to error and total), 198 measures how successful the fit is in explaining the variation of the data. Model 199 robustness was assessed using a permutation test (n = 10,000).

10 bioRxiv preprint doi: https://doi.org/10.1101/710228; this version posted July 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license.

200 Giant viruses of small algae are predicted to enhance CEE

201 We considered 26 viruses positively associated with CEE with a VIP score > 2

202 (Supplementary File 1) to be important predictors of CEE in the PLS regression. These

203 viruses are hereafter referred to as VIPs. Eleven VIPs (nine NCLDVs and two RNA

204 viruses; red rectangle in Figure 3) were abundant in samples from Eastern India coast

205 (EAFR) and Benguela current coast (BENG) provinces, with some (e.g. polb 00248170

206 and 002035391) also relatively abundant in samples from different oceanic provinces

207 where CEE was also relatively high. This observation suggests that these viruses enhance

208 the BCP in different regions of the global ocean.

209 Most VIPs (23 of 26) belonged to Mimiviridae (n = 11) and Phycodnaviridae (n =

210 12). All the phycodnavirus sequences were most closely related to those of

211 prasinoviruses, with amino acid sequence identities to reference sequences ranging

212 between 64% and 88%. The three remaining VIPs were RNA viruses. The closest

213 homolog of one sequence (rdrp 36496887) was that of an octopus-associated member of

214 Picornavirales (Beihai picorna-like virus 106), while the other two (90355641 and

215 58847823) were most similar to that of a crustacean-associated Bunyavirales virus

216 (Wenling crustacean virus 9).

217 Taxonomic analysis of genes in fragments containing VIP PolBs further

218 confirmed their identity as Mimiviridae or Phycodnaviridae (Supplementary Figure 3).

219 One contig (CNJ01018797) was predicted to encode a protein having a high amino-acid

220 identity (94%) to a gene product from the transcriptome of the diatom Triceratium

221 dubium (Supplementary Figure 3b). NCLDVs infecting diatoms have not been reported

11 0 −2 −4 −6 −8 −10 −12 Exp. efficiency Exp. −14

rdrp_107558617 rdrp_77677810 rdrp_105714054 polb_OM.RGC.v1.000017697 rdrp_103795290 rdrp_89591997 polb_OM.RGC.v1.002767869 rdrp_89802926 rdrp_89802927 polb_OM.RGC.v1.002652025 polb_OM.RGC.v1.000110630 polb_OM.RGC.v1.000072765 polb_OM.RGC.v1.000471455 polb_OM.RGC.v1.000287835 222 sopolb_OM.RGC.v1.015629864 far, but a previous co-abundance analysis has linked diatoms and Mimiviridae rdrp_88935421 polb_OM.RGC.v1.000074398 rdrp_84897402 polb_OM.RGC.v1.000495602 223 (Moniruzzamanpolb_OM.RGC.v1.013940193 et al. 2017). polb_OM.RGC.v1.009636901 polb_OM.RGC.v1.000228894 polb_OM.RGC.v1.004472615 polb_OM.RGC.v1.000205020 MS IO SAO SPO NPO NAO polb_OM.RGC.v1.001561949 polb_OM.RGC.v1.0002748680 polb_OM.RGC.v1.000072166−2 polb_OM.RGC.v1.000673383−4 polb_OM.RGC.v1.000844241 polb_OM.RGC.v1.001883961−6 polb_OM.RGC.v1.001897164−8 polb_OM.RGC.v1.001445888−10 polb_OM.RGC.v1.000239928 efficiency −12 Exp. efficiency Exp. polb_OM.RGC.v1.000262952Carbonexport −14 polb_OM.RGC.v1.000062429 polb_OM.RGC.v1.000912507 polb_OM.RGC.v1.004312996rdrp_107558617 polb_OM.RGC.v1.002035391rdrp_77677810 polb_OM.RGC.v1.000249074rdrp_105714054 polb_OM.RGC.v1.000017697polb_OM.RGC.v1.000073352 polb_OM.RGC.v1.000232032rdrp_103795290 polb_OM.RGC.v1.002503270rdrp_89591997 polb_OM.RGC.v1.002767869rdrp_58847823 polb_OM.RGC.v1.015907472rdrp_89802926 polb_OM.RGC.v1.007771300rdrp_89802927 polb_OM.RGC.v1.002652025polb_OM.RGC.v1.000248170 polb_OM.RGC.v1.000110630rdrp_90355641 polb_OM.RGC.v1.000072765polb_OM.RGC.v1.000968766 polb_OM.RGC.v1.000471455polb_OM.RGC.v1.000079111 polb_OM.RGC.v1.000287835polb_OM.RGC.v1.000230224 polb_OM.RGC.v1.015629864polb_OM.RGC.v1.010288541 rdrp_88935421rdrp_36496887 polb_OM.RGC.v1.000074398polb_OM.RGC.v1.000194282 polb_OM.RGC.v1.000611300rdrp_84897402 polb_OM.RGC.v1.000495602polb_OM.RGC.v1.000129518 polb_OM.RGC.v1.013940193polb_OM.RGC.v1.001175669 polb_OM.RGC.v1.009636901polb_OM.RGC.v1.001743853 polb_OM.RGC.v1.000228894polb_OM.RGC.v1.000042601 polb_OM.RGC.v1.004472615polb_OM.RGC.v1.000070861 polb_OM.RGC.v1.000205020polb_OM.RGC.v1.001386278 polb_OM.RGC.v1.001561949polb_OM.RGC.v1.000061559 polb_OM.RGC.v1.000274868 polb_OM.RGC.v1.000072166 polb_OM.RGC.v1.000673383 polb_OM.RGC.v1.000844241Abundance: polb_OM.RGC.v1.001883961

-4 0 4 8 23_SRF 25_SRF 30_SRF 36_SRF 38_SRF 41_SRF 52_SRF 64_SRF 65_SRF 66_SRF 68_SRF 70_SRF 72_SRF 76_SRF 78_SRF 93_SRF 98_SRF polb_OM.RGC.v1.001897164 23_DCM 25_DCM 30_DCM 36_DCM 38_DCM 39_DCM 41_DCM 52_DCM 64_DCM 68_DCM 72_DCM 76_DCM 78_DCM − 4 − 2 0 2 4 6 8 100_SRF 102_SRF 109_SRF 110_SRF 111_SRF 122_SRF 123_SRF 124_SRF 125_SRF 128_SRF 132_SRF 137_SRF 138_SRF 142_SRF 145_SRF 146_SRF 148_SRF 149_SRF 150_SRF 151_SRF 152_SRF polb_OM.RGC.v1.001445888 100_DCM 102_DCM 109_DCM 110_DCM 111_DCM 128_DCM 137_DCM 142_DCM 150_DCM 151_DCM PEOD ARAB ARAB ISSG EAFR BENG CHIL SPSG CHIL PEOD NPST PNEC GFST NAST-W NAST-E polb_OM.RGC.v1.000239928 MEDI MONS SATL SPSG CARB polb_OM.RGC.v1.000262952 polb_OM.RGC.v1.000062429 polb_OM.RGC.v1.000912507 polb_OM.RGC.v1.004312996 polb_OM.RGC.v1.002035391 224 polb_OM.RGC.v1.000249074 polb_OM.RGC.v1.000073352 polb_OM.RGC.v1.000232032 polb_OM.RGC.v1.002503270 225 Figure rdrp_588478233: Biogeography of viral lineages associated with carbon export efficiency polb_OM.RGC.v1.015907472 polb_OM.RGC.v1.007771300 226 (CEE).polb_OM.RGC.v1.000248170 The upper panel shows carbon export efficiency, ( CEdeep – CEsurface ) / CEdeep, for rdrp_90355641 227 thepolb_OM.RGC.v1.000968766 61 sampling sites. Sites with values above and below zero are those where carbon polb_OM.RGC.v1.000079111 228 exportpolb_OM.RGC.v1.000230224 flux was respectively higher and lower in deep than in surface waters; hence, they polb_OM.RGC.v1.010288541 rdrp_36496887 229 correspondpolb_OM.RGC.v1.000194282 to sites where the biological carbon pump was higher or lower compared with polb_OM.RGC.v1.000611300 230 otherpolb_OM.RGC.v1.000129518 sites. The bottom panel is a map reflecting abundances, expressed as center-log polb_OM.RGC.v1.001175669 231 ratiopolb_OM.RGC.v1.001743853 transformed gene-length normalized read counts, of viruses positively associated polb_OM.RGC.v1.000042601 232 withpolb_OM.RGC.v1.000070861 CEE and having a VIP score > 2. MS, Mediterranean Sea; IO, Indian Ocean; SAO, polb_OM.RGC.v1.001386278 233 Southpolb_OM.RGC.v1.000061559 Atlantic Ocean; SPO, South Pacific Ocean; NPO, North Pacific Ocean; NAO, 234 North Atlantic Ocean. The bottom horizontal axis is labeled with Tara Oceans station

235 numbers, sampling 23_SRF depth25_SRF 30_SRF (SRF,36_SRF 38_SRF 41_SRF surface;52_SRF 64_SRF 65_SRF 66_SRF DCM,68_SRF 70_SRF 72_SRF 76_SRF deep78_SRF 93_SRF 98_SRF chlorophyll maximum) and 23_DCM 25_DCM 30_DCM 36_DCM 38_DCM 39_DCM 41_DCM 52_DCM 64_DCM 68_DCM 72_DCM 76_DCM 78_DCM 100_SRF 102_SRF 109_SRF 110_SRF 111_SRF 122_SRF 123_SRF 124_SRF 125_SRF 128_SRF 132_SRF 137_SRF 138_SRF 142_SRF 145_SRF 146_SRF 148_SRF 149_SRF 150_SRF 151_SRF 152_SRF 236 abbreviations of biogeographic provinces. 100_DCM 102_DCM 109_DCM 110_DCM 111_DCM 128_DCM 137_DCM 142_DCM 150_DCM 151_DCM 237 238 Because most VIPs were only distantly related to isolated viruses, inferring

239 characteristics such as host type and infection strategies was not straightforward. We

240 therefore conducted a phylogeny-guided network-based host prediction analysis (Figure

12 241 4; see Methods). Within the Prasinovirus clade, which contained six VIPs, enrichment

242 analysis of predicted host groups along the phylogenetic tree uncovered 10 nodes

243 significantly enriched for five different eukaryotic orders. Mamiellales, the only known

244 host group of prasinoviruses, was detected at nine different nodes (seven of them had no

245 parent-to-children relationships), while the other four eukaryotic orders were found at

246 only one node (or two in the case of Eutreptiales) (Figure 4a). Within Mimiviridae,

247 significant enrichment was detected for 10 different orders; Collodaria was detected at 15

248 nodes (2 of them had no parent-to-children relationships) and Prymnesiales at 6 nodes (3

249 of them had no parent-to-children relationship), while all other orders were present at a

250 maximum of one node with no parent-to-children relationships. The nodes enriched in

251 Prymnesiales and Collodaria fell within or were sister to clades containing reference

252 viruses isolated from Prymnesiales and Phaeocystales (both are haptophytes orders)

253 species. This placement suggests that environmental PolB sequences in this clade belong

254 to Mimiviridae viruses that infect Prymnesiales. Interestingly four of these Mimiviridae

255 viruses were VIP viruses (Figure 4a). The detection of Collodaria may be the result of

256 indirect associations that reflect a symbiotic relationship with Prymnesiales, as some

257 acantharians, a lineage of Rhizaria related to Collodaria, are known to host Prymnesiales

258 species (Mars Brisbin et al. 2018). Nodes enriched in the remaining three other orders fell

259 within clades that were only distantly related to any reference Mimiviridae. The final

260 Mimiviridae VIP (polb 000079111) in the tree was a distant relative of Aureococcus

261 anophagefferens virus (AaV). No host groups were predicted for the clade containing the

262 Picornavirales VIP (Figure 4b). Overall, this host prediction analysis revealed that

13 Tree scale: 1

OM-RGC.v1.000236724 0 1 2 3 OM-RGC.v1.001143251 OM-RGC.v1.001557307 OM-RGC.v1.001274286 OM-RGC.v1.002268728 OtV_RT-2011 OlV_1 OtV_2 OtV_1 263 lineages of prasinoviruses and putative prymnesioviruses are among the viral lineages MpV_SP1 OM-RGC.v1.000435873 MpV_1 OM-RGC.v1.000200745 264 best associated with CEE. OM-RGC.v1.002503270 OM-RGC.v1.001009342 Tree scale: 1 Tree scale: 1 1 2 3 4 5

OM-RGC.v1.002503271 0 1 2 3 4 0 1 2 3 a b Picornaviridae OM-RGC.v1.000236724 OM-RGC.v1.001143256 1 OM-RGC.v1.001557307 5973300 OM-RGC.v1.000194282 OM-RGC.v1.001274286 5890355 OM-RGC.v1.002268728 8625456 OtV_RT-2011 13828801 OlV_1 Kilifi_virus OM-RGC.v1.000065429 OtV_2 50667108 OtV_1 MpV_SP1 50475018 OM-RGC.v1.000435873 50273061 MpV_1 50460461 OM-RGC.v1.000692758 OM-RGC.v1.000200745 BV3 002503270 RiPV-1 OM-RGC.v1.001009342 76240729 OM-RGC.v1.002503271 Figure key 65266812 OM-RGC.v1.000230224 000194282 51241105 OM-RGC.v1.000065429 OM-RGC.v1.000692758 49554577 000230224 82602001 OM-RGC.v1.000192208 42876889 OM-RGC.v1.000192208 OM-RGC.v1.000191910 42456736 OM-RGC.v1.000249993 OM-RGC.v1.000320572 52388179 OM-RGC.v1.000251027 26568779 OM-RGC.v1.000250875 Reference viruses 29675950 OM-RGC.v1.000191910 OM-RGC.v1.000249217 1 76136173 BpV_2 84432702 OM-RGC.v1.000449426 classification 63982103 OM-RGC.v1.000236849 64666778 OM-RGC.v1.000249993 OM-RGC.v1.000457706 109205596 BpV_1 OM-RGC.v1.002318741 Prasinovirus Posavirus_3 OM-RGC.v1.000240849 Posavirus_2 000239928 Bu-1 OM-RGC.v1.000320572 OM-RGC.v1.000240662 Tottori-HG1 000249074 Prasinovirus OM-RGC.v1.000239005 Posavirus_1 OM-RGC.v1.001647925 Bu-3 000248170 Megavirinae 76170225 OM-RGC.v1.000251027 OM-RGC.v1.000231895 59731273 000073352 HaRNAV OM-RGC.v1.000231558 Mesomimivirinae 1493543 OM-RGC.v1.000468213 45643841 OM-RGC.v1.000250875 OM-RGC.v1.000745571 63103152 OM-RGC.v1.000229407 OM-RGC.v1.000231911 Unclassified 37006855 OM-RGC.v1.000122879 33077172 OM-RGC.v1.000479532 53572985 OM-RGC.v1.000249217 Mimivirus Mimiviridae 52479932 Tupanvirus_SL Tupanvirus_DO 11722969 Moumouvirus 15963946 Megavirus 79173268 BpV_2 Hokovirus Labyrnavirus 44060994 Indivirus 45028609 OM-RGC.v1.003118589 Catovirus Bacillarnavirus BRNAV_G3 OM-RGC.v1.000449426 BsV 29093457 CroV AWando15 OM-RGC.v1.000057033 Picornaviridae AsRNAV OM-RGC.v1.000073545 16996831 OM-RGC.v1.004949926 6753846 OM-RGC.v1.000236849 OM-RGC.v1.000109761 OM-RGC.v1.001499365 32150309 OM-RGC.v1.001105267 24747727 OM-RGC.v1.000330656 Unclassified 25124614 OM-RGC.v1.000457706 OM-RGC.v1.000062592 24128940 PoV 16254629 OM-RGC.v1.003483755 15962113 OM-RGC.v1.000492984 Picornavirales OM-RGC.v1.001372408 11236314 BpV_1 OM-RGC.v1.000046074 APLV3 OM-RGC.v1.001041167 33069045 OM-RGC.v1.000161220 7873561 OM-RGC.v1.000068880 7903458 000079111 10956500 OM-RGC.v1.002318741 OM-RGC.v1.000079078 AaV 58837197 OM-RGC.v1.000081579 DpRNAV OM-RGC.v1.000032783 103609885 OM-RGC.v1.000240849 OM-RGC.v1.000042784 AglaRNAV OM-RGC.v1.001148268 2 Variable importance in 27815517 OM-RGC.v1.000083592 10541891 OM-RGC.v1.000081984 OM-RGC.v1.000081503 Projection (VIP) score 12433837 OM-RGC.v1.000239928 OM-RGC.v1.000032792 JP-B OM-RGC.v1.000294968 TPLV1 OM-RGC.v1.000086577 APLV4 OM-RGC.v1.001196663 0 1 2 3 4 29660803 OM-RGC.v1.000335275 107558617 OM-RGC.v1.000240662 OM-RGC.v1.000085188 OM-RGC.v1.000072571 6040574 OM-RGC.v1.000109237 60392120 000274868 5252376 OM-RGC.v1.000249074 OM-RGC.v1.002750835 6175769 OM-RGC.v1.004389005 5971958 OM-RGC.v1.001132733 45022612 OM-RGC.v1.000079253 OM-RGC.v1.000051249 44709998 OM-RGC.v1.000239005 OM-RGC.v1.000417185 44709997 000110630 45370407 OM-RGC.v1.000057148 44049884 OM-RGC.v1.000120786 23270150 OM-RGC.v1.001647925 OM-RGC.v1.000063945 82254166 OM-RGC.v1.001599716 OM-RGC.v1.000387513 35179764 OM-RGC.v1.000155828 3 33049404 OM-RGC.v1.000072094 Correlation to CEE 33056153 OM-RGC.v1.000248170 OM-RGC.v1.000060884 33056152 OM-RGC.v1.001121291 21733119 OM-RGC.v1.000055116 (PLS regression coefficients) OM-RGC.v1.000192105 8901086 OM-RGC.v1.000054815 9116058 OM-RGC.v1.000231895 OM-RGC.v1.000667500 FsPLV OM-RGC.v1.002156050 97531995 OM-RGC.v1.000058962 70963812 OM-RGC.v1.000700607 -0.08 -0.1 21804986 OM-RGC.v1.000073352 OM-RGC.v1.003826014 CsfrRNAV OM-RGC.v1.001170932 OM-RGC.v1.002828763 -0.04 -0.05 RsetRNAV01 OM-RGC.v1.000187401 23395631 OM-RGC.v1.000565520 22980344 OM-RGC.v1.000231558 OM-RGC.v1.000064307 0 0 APLV2 OM-RGC.v1.003855514 26169224 OM-RGC.v1.000071171 OM-RGC.v1.000945523 0.04 0.03 JP-A OM-RGC.v1.000684264 6031360 OM-RGC.v1.000468213 OM-RGC.v1.001819659 83943423 OM-RGC.v1.000064883 0.08 0.06 35193888 OM-RGC.v1.000073441 60395683 000072166 CtenRNAV01 OM-RGC.v1.000745571 OM-RGC.v1.000072372 41784406 OM-RGC.v1.000066031 OM-RGC.v1.000089657 58840825 OM-RGC.v1.000068465 5972655 OM-RGC.v1.000061528 5257731 OM-RGC.v1.000229407 OM-RGC.v1.000061593 36496887 OM-RGC.v1.000286831 97623802 OM-RGC.v1.000158793 OM-RGC.v1.000066893 4 Cosmopolitanism 98629691 OM-RGC.v1.000066963 32661547 OM-RGC.v1.000231911 OM-RGC.v1.000066950 97436908 001897164 (number of station) 97454952 OM-RGC.v1.000630816 478607 OM-RGC.v1.001350990 12533 OM-RGC.v1.000122879 OM-RGC.v1.000025088 54635440 000017697 OM-RGC.v1.000545331 0 0 BV1 OM-RGC.v1.000058552 5309472 OM-RGC.v1.000070978 98636599 OM-RGC.v1.000479532 OM-RGC.v1.000060035 25 10 5296497 OM-RGC.v1.000085957 5998385 OM-RGC.v1.000328875 20 43017044 OM-RGC.v1.000026821 50 38186194 Mimivirus OM-RGC.v1.000235944 21728043 OM-RGC.v1.000079200 OM-RGC.v1.000064954 75 30 20451590 OM-RGC.v1.000088733 21391908 OM-RGC.v1.001004628 21872260 Tupanvirus_SL OM-RGC.v1.000133961 100 40 AmV-A1 OM-RGC.v1.002822478 OM-RGC.v1.001184783 42975987 OM-RGC.v1.000064096 21729800 OM-RGC.v1.004057507 16323154 Tupanvirus_DO OM-RGC.v1.001296274 16005444 OM-RGC.v1.000069361 89028829 OM-RGC.v1.000066071 OM-RGC.v1.00026937Mimiviridae 5 Moumouvirus OM-RGC.v1.000539407 OM-RGC.v1.000544460 OM-RGC.v1.001391978 OM-RGC.v1.000356992 5 Distribution of relative 79539407 OM-RGC.v1.000066500 35164655 Megavirus OM-RGC.v1.000061999 29626693 OM-RGC.v1.000068784 000287835 abundances 59958592 OM-RGC.v1.000284102 Centovirus_AC OM-RGC.v1.000512830 AGV3 Hokovirus OM-RGC.v1.000073383 ALPV OM-RGC.v1.002317890 OM-RGC.v1.003037006 APLV1 OM-RGC.v1.003008254 25% 50% 75% 29628862 OM-RGC.v1.003013914 98536406 Indivirus Min. Max. OM-RGC.v1.000159675 BRNAV_G2 OM-RGC.v1.001250095 21880330 OM-RGC.v1.000069310 29949318 OM-RGC.v1.000062167 29724237 OM-RGC.v1.003118589 OM-RGC.v1.000064537 OM-RGC.v1.000291604 HoCV-1 OM-RGC.v1.000067387 PSIV OM-RGC.v1.000067277 TrV OM-RGC.v1.000618073 BQCV Catovirus OM-RGC.v1.000847567 HiPV OM-RGC.v1.001249129 OM-RGC.v1.000512118 SINV-1 OM-RGC.v1.000071229 KBV BsV OM-RGC.v1.000071068 84892392 OM-RGC.v1.000490625 52260257 OM-RGC.v1.000079365 6 58838535 OM-RGC.v1.000377714 Predicted eukaryotic host 54294427 OM-RGC.v1.002035311 56759010 CroV OM-RGC.v1.000104259 000072765 group at a given node 57468092 OM-RGC.v1.001058750 77677810 OM-RGC.v1.000027805 84897402 OM-RGC.v1.000057033 OM-RGC.v1.000259426 (from top to bottom of the tree) 22428174 000673383 56725325 OM-RGC.v1.001173520 OM-RGC.v1.000075116 75258857 OM-RGC.v1.000310828 24762850 OM-RGC.v1.000073545 OM-RGC.v1.000073739 24128145 OM-RGC.v1.000071604 Mamiellales Goose_dicistrovirus OM-RGC.v1.000280162 29093047 OM-RGC.v1.000049837 Apis_dicistrovirus OM-RGC.v1.000122430 Pleuronematida AnCV OM-RGC.v1.004949926 OM-RGC.v1.001016487 000061559 DCV OM-RGC.v1.000061401 Eutrepiales CrPV OM-RGC.v1.000621764 15356524 OM-RGC.v1.000109761 OM-RGC.v1.000146594 TSV OM-RGC.v1.000058809 Corethrales MCDV OM-RGC.v1.000061534 OM-RGC.v1.000282052 BRNAV_G1 OM-RGC.v1.000115299 Beroida 58873145 OM-RGC.v1.001499365 OM-RGC.v1.000058812 21733123 OM-RGC.v1.000834606 32165274 OM-RGC.v1.000860059 Chaetocerotales 44278324 OM-RGC.v1.000663173 45092716 OM-RGC.v1.004661489 OM-RGC.v1.001105267 OM-RGC.v1.000166921 Thecosomata 89591997 OM-RGC.v1.000079753 89802927 OLPV_1 89802926 OLPV_2 BRNAV_G5 OM-RGC.v1.000330656 OM-RGC.v1.000241877 32536384 OM-RGC.v1.002963968 58838081 000070861 OM-RGC.v1.001527691 58865139 OM-RGC.v1.001230579 16250369 OM-RGC.v1.000062592 OM-RGC.v1.000070058 Calanoida 60944405 OM-RGC.v1.000075583 7501593 OM-RGC.v1.000683307 22452106 OM-RGC.v1.000297388 Bacillariales 32202687 CeV PoV OM-RGC.v1.000030837 Cyclotrichida 36505302 000129518 100032960 OM-RGC.v1.001180381 100389194 OM-RGC.v1.002825758 Collodaria 501185 OM-RGC.v1.003483755 OM-RGC.v1.000070477 501186 OM-RGC.v1.003117577 510209 OM-RGC.v1.000148273 Sphaeropleales PparvumV 487149 OM-RGC.v1.000063931 487147 OM-RGC.v1.000492984 OM-RGC.v1.000071750 Prymnesiales 33428197 OM-RGC.v1.000105816 8855753 OM-RGC.v1.000299791 8855752 OM-RGC.v1.000401418 Cryomonadida 9164163 OM-RGC.v1.001372408 OM-RGC.v1.000102622 57554431 OM-RGC.v1.000065051 PkV_RF02 Arthracanthida 8626697 HeV_RF02 8868089 PgV 9164160 OM-RGC.v1.000046074 PpV 4578944 OM-RGC.v1.000033614 OM-RGC.v1.000495698 5015691 OM-RGC.v1.000054135 2954311 OM-RGC.v1.004447818 2584342 OM-RGC.v1.001041167 OM-RGC.v1.000066804 4578943 OM-RGC.v1.000081486 Chaetocerotales 5015692 OM-RGC.v1.002413486 2523835 000262952 Hemiptera 2523836 OM-RGC.v1.000161220 000912507 2523834 OM-RGC.v1.000068882650 OM-RGC.v1.000079111 OM-RGC.v1.000079078 AaV 266 Figure 4: Phylogenetic position of viruses associated with the efficiency of carbon OM-RGC.v1.000081579 OM-RGC.v1.00003278267 export.3 Details of the PLS model and viral global distribution, relative abundance and OM-RGC.v1.00004278268 putative4 eukaryotic host group are superimposed on the trees. Phylogenetic trees contain OM-RGC.v1.001148268 OM-RGC.v1.00008359269 environmental2 (labeled in red if VIP > 2) and reference (labeled in black) sequences of OM-RGC.v1.000081984 OM-RGC.v1.00008150270 Prasinovirus3 and Mimiviridae DNA polymerase family B (a) and Picornavirales RdRP OM-RGC.v1.00003279271 (b).2 OM-RGC.v1.000294968 OM-RGC.v1.000086577 OM-RGC.v1.001196663 OM-RGC.v1.000335275 OM-RGC.v1.00008518 8 14 OM-RGC.v1.000072571 OM-RGC.v1.000109237 OM-RGC.v1.000274868 OM-RGC.v1.002750835 OM-RGC.v1.004389005 OM-RGC.v1.001132733 OM-RGC.v1.000079253 OM-RGC.v1.000051249 OM-RGC.v1.000417185 OM-RGC.v1.000110630 OM-RGC.v1.000057148 OM-RGC.v1.000120786 OM-RGC.v1.000063945 OM-RGC.v1.001599716 OM-RGC.v1.000387513 OM-RGC.v1.000155828 OM-RGC.v1.000072094 OM-RGC.v1.000060884 OM-RGC.v1.001121291 OM-RGC.v1.000055116 OM-RGC.v1.000192105 OM-RGC.v1.000054815 OM-RGC.v1.000667500 OM-RGC.v1.002156050 OM-RGC.v1.000058962 OM-RGC.v1.000700607 OM-RGC.v1.003826014 OM-RGC.v1.001170932 OM-RGC.v1.002828763 OM-RGC.v1.000187401 OM-RGC.v1.000565520 OM-RGC.v1.000064307 OM-RGC.v1.003855514 OM-RGC.v1.000071171 OM-RGC.v1.000945523 OM-RGC.v1.000684264 OM-RGC.v1.001819659 OM-RGC.v1.000064883 OM-RGC.v1.000073441 OM-RGC.v1.000072166 OM-RGC.v1.000072372 OM-RGC.v1.000066031 OM-RGC.v1.000089657 OM-RGC.v1.000068465 OM-RGC.v1.000061528 OM-RGC.v1.000061593 OM-RGC.v1.000286831 OM-RGC.v1.000158793 OM-RGC.v1.000066893 OM-RGC.v1.000066963 OM-RGC.v1.000066950 OM-RGC.v1.001897164 OM-RGC.v1.000630816 OM-RGC.v1.001350990 OM-RGC.v1.000025088 OM-RGC.v1.000017697 OM-RGC.v1.000545331 OM-RGC.v1.000058552 OM-RGC.v1.000070978 OM-RGC.v1.000060035 OM-RGC.v1.000085957 OM-RGC.v1.000328875 OM-RGC.v1.000026821 OM-RGC.v1.000235944 OM-RGC.v1.000079200 OM-RGC.v1.000064954 OM-RGC.v1.000088733 OM-RGC.v1.001004628 OM-RGC.v1.000133961 OM-RGC.v1.002822478 OM-RGC.v1.001184783 OM-RGC.v1.000064096 OM-RGC.v1.004057507 OM-RGC.v1.001296274 OM-RGC.v1.000069361 OM-RGC.v1.000066071 OM-RGC.v1.000269375 OM-RGC.v1.000539407 OM-RGC.v1.000544460 OM-RGC.v1.001391978 OM-RGC.v1.000356992 OM-RGC.v1.000066500 OM-RGC.v1.000061999 OM-RGC.v1.000068784 OM-RGC.v1.000287835 OM-RGC.v1.000284102 OM-RGC.v1.000512830 OM-RGC.v1.000073383 OM-RGC.v1.002317890 OM-RGC.v1.003037006 OM-RGC.v1.003008254 OM-RGC.v1.003013914 OM-RGC.v1.000159675 OM-RGC.v1.001250095 OM-RGC.v1.000069310 OM-RGC.v1.000062167 OM-RGC.v1.000064537 OM-RGC.v1.000291604 OM-RGC.v1.000067387 OM-RGC.v1.000067277 OM-RGC.v1.000618073 OM-RGC.v1.000847567 OM-RGC.v1.001249129 OM-RGC.v1.000512118 OM-RGC.v1.000071229 OM-RGC.v1.000071068 OM-RGC.v1.000490625 OM-RGC.v1.000079365 OM-RGC.v1.000377714 OM-RGC.v1.002035311 OM-RGC.v1.000104259 OM-RGC.v1.000072765 OM-RGC.v1.001058750 OM-RGC.v1.000027805 OM-RGC.v1.000259426 OM-RGC.v1.000673383 OM-RGC.v1.001173520 OM-RGC.v1.000075116 OM-RGC.v1.000310828 OM-RGC.v1.000073739 OM-RGC.v1.000071604 OM-RGC.v1.000280162 OM-RGC.v1.000049837 OM-RGC.v1.000122430 OM-RGC.v1.001016487 OM-RGC.v1.000061559 OM-RGC.v1.000061401 OM-RGC.v1.000621764 OM-RGC.v1.000146594 OM-RGC.v1.000058809 OM-RGC.v1.000061534 OM-RGC.v1.000282052 OM-RGC.v1.000115299 OM-RGC.v1.000058812 OM-RGC.v1.000834606 OM-RGC.v1.000860059 OM-RGC.v1.000663173 OM-RGC.v1.004661489 OM-RGC.v1.000166921 OM-RGC.v1.000079753 OLPV_1 OLPV_2 OM-RGC.v1.000241877 OM-RGC.v1.002963968 OM-RGC.v1.000070861 OM-RGC.v1.001527691 OM-RGC.v1.001230579 OM-RGC.v1.000070058 OM-RGC.v1.000075583 OM-RGC.v1.000683307 OM-RGC.v1.000297388 CeV OM-RGC.v1.000030837 OM-RGC.v1.000129518 OM-RGC.v1.001180381 OM-RGC.v1.002825758 OM-RGC.v1.000070477 OM-RGC.v1.003117577 OM-RGC.v1.000148273 PparvumV OM-RGC.v1.000063931 OM-RGC.v1.000071750 OM-RGC.v1.000105816 OM-RGC.v1.000299791 OM-RGC.v1.000401418 OM-RGC.v1.000102622 OM-RGC.v1.000065051 PkV_RF02 HeV_RF02 PgV PpV OM-RGC.v1.000033614 OM-RGC.v1.000495698 OM-RGC.v1.000054135 OM-RGC.v1.004447818 OM-RGC.v1.000066804 OM-RGC.v1.000081486 OM-RGC.v1.002413486 OM-RGC.v1.000262952 OM-RGC.v1.000912507 272 Functional traits of eukaryotic organisms interacting with CEE-associated

273 viruses

274 The host assignment of Mamiellales, Prymnesiales (Figure 4a) and Chaetocerotales

275 (Figure 4b) to viral clades containing reference viruses isolated from these organismal

276 groups suggests that the network-based approach was able to predict virus–host

277 relationships with a certain level of reliability using our large dataset despite the reported

278 limitations of similar types of analyses (Coenen and Weitz 2018). This positive signal

279 prompted us to use these species-association networks to investigate taxonomic and

280 functional differences between the eukaryotic organisms predicted to interact with viruses

281 that were either positively or negatively associated with CEE (Figure 5). At the functional

282 level, viruses positively associated with CEE had a greater number of connections with

283 -bearing eukaryotes (Q = 0.03) and with silicified eukaryotes (Q = 0.05)

284 compared with viruses negatively associated with CEE (Supplementary Table 4); this

285 suggests that these virus–host systems contribute to CEE. This result also supports the

286 proposed idea that viral lysis enhances the effect of the BCP; that is, lysis of autotrophs is

287 expected to release growth-limiting nutrients (Gobler et al. 1997) that are recycled to

288 grow new cells, while carbon-rich cell debris, potentially ballasted with silica, tends to

289 aggregate and sink (Suttle 2007).

15 Viral sequence ID Taxa Reg. Coeff. Reg. Chloroplast Silicification Calcification polb_OM.RGC.v1.001386278 0.064 n polb_OM.RGC.v1.000070861 0.063 Collodaria polb_OM.RGC.v1.000042601 0.062 Dinophyceae n polb_OM.RGC.v1.001743853 0.057 Trachymedusae polb_OM.RGC.v1.001175669 0.056 Rotaliida n polb_OM.RGC.v1.000129518 0.055 Dinophyceae n polb_OM.RGC.v1.000611300 0.047 Ulvophyceae n polb_OM.RGC.v1.000194282 0.047 n rdrp_36496887 0.038 Poecilostomatoida polb_OM.RGC.v1.010288541 0.038 Bacillariophyta n n polb_OM.RGC.v1.000230224 0.033 Arthra_Symphy_F1 polb_OM.RGC.v1.000079111 0.032 Cephaloidophorida polb_OM.RGC.v1.000968766 0.032 Calanoida polb_OM.RGC.v1.000248170 0.028 Beroida polb_OM.RGC.v1.007771300 0.027 Dinophyceae polb_OM.RGC.v1.015907472 0.024 Coscinodiscophytina n n polb_OM.RGC.v1.000232032 0.013 Gregarinomorphea polb_OM.RGC.v1.000073352 0.012 Mamiellophyceae n polb_OM.RGC.v1.000249074 0.010 Dinophysiales polb_OM.RGC.v1.002035391 0.010 Coscinodiscophytina n n polb_OM.RGC.v1.004312996 0.009 Coscinodiscophytina n n polb_OM.RGC.v1.000912507 0.009 Acanthoecida polb_OM.RGC.v1.000062429 -0.009 Peridiniales polb_OM.RGC.v1.000239928 -0.018 Dinophyceae n polb_OM.RGC.v1.001445888 -0.026 Copepoda polb_OM.RGC.v1.001897164 -0.035 Copepoda polb_OM.RGC.v1.001883961 -0.043 Ostracoda polb_OM.RGC.v1.000844241 -0.043 Peridiniales polb_OM.RGC.v1.000673383 -0.043 Calanoida polb_OM.RGC.v1.000072166 -0.043 Labyrinthulida polb_OM.RGC.v1.000274868 -0.045 Chlorophyceae n polb_OM.RGC.v1.001561949 -0.048 Ostracoda polb_OM.RGC.v1.000205020 -0.048 Collodaria polb_OM.RGC.v1.004472615 -0.049 Calanoida polb_OM.RGC.v1.000228894 -0.051 Variosea polb_OM.RGC.v1.009636901 -0.051 Collodaria polb_OM.RGC.v1.000495602 -0.056 MAST_3-12 polb_OM.RGC.v1.000074398 -0.061 Ichthyosporea polb_OM.RGC.v1.015629864 -0.062 Peridiniales polb_OM.RGC.v1.000287835 -0.064 Collodaria polb_OM.RGC.v1.000471455 -0.065 Collodaria polb_OM.RGC.v1.000110630 -0.073 Acantharea polb_OM.RGC.v1.002652025 -0.075 Dinophyceae n rdrp_89802927 -0.075 Calanoida rdrp_89802926 -0.075 Insecta polb_OM.RGC.v1.002767869 -0.078 Collodaria rdrp_103795290 -0.082 Collodaria polb_OM.RGC.v1.000017697 -0.083 Calanoida 290 rdrp_77677810 -0.099 Leptothecata

291 Figure 5: Functional traits and of putative hosts of viruses associated with 292 the efficiency of carbon export. “Putative hosts” refers to eukaryotic V9-OTUs with the 293 highest absolute edge weight in FlashWeave interaction networks. Taxa correspond to 294 higher rank classification of the V9-OTU. Black squares indicate the presence of a 295 functional trait for the V9-OTU best connected to the virus.

16 296 Integrating previous observations to model a virus-driven BCP

297 Our PLS models revealed that viruses of eukaryotes are associated not only with the

298 magnitude of carbon export, but also with the export efficiency of particles. This finding

299 suggests that viruses contribute to BCP enhancement by promoting aggregate formation

300 and subsequent sedimentation. Consistent with this view, prasinoviruses and putative

301 prymnesioviruses were found to be among the viruses best associated with the efficiency

302 of carbon export under our PLS model for CEE. Several prasinoviruses (fourteen with an

303 available genome sequence) have been isolated for three genera of green microalgae,

304 namely , Ostreococcus and Bathycoccus. These genera belong to the order

305 Mamiellales, which are bacterial-sized phytoplankton common in coastal and oceanic

306 environments and considered to be influential actors in oceanic systems (Not et al. 2012;

307 Weynberg et al. 2017). Chrysochromulina ericina virus, Prymnesium parvum DNA virus,

308 Prymnesium kappa virus-RF02 and Haptolina ericina virus are Mimiviridae viruses

309 isolated from Prymnesiales (Haptophyta) cultures (Figure 4). They are closely related to

310 Phaeocystis globosa virus (PgV) and Phaeocystis poutcheti virus isolated from

311 Phaeocystales (Haptophyta) cultures. Both Prymnesiales and Phaeocystales species have

312 non-calcifying organic scales and some species can form blooms and colonies. The

313 detection of several putative prymnesioviruses in samples containing small cells is

314 consistent with the observation of diverse and abundant noncalcifying haptophytes in

315 open oceans (Liu et al. 2009; Endo et al. 2018). According to the estimates of Liu et al

316 (2009), these noncalcifying haptophytes contribute twice as much as diatoms or

317 prokaryotes to photosynthetic production, thereby highlighting their importance in the

318 BCP. Prymnesiales are an important mixotrophic group in oligotrophic ocean (Unrein et

17 319 al. 2014) and mixotrophy is proposed to increase vertical carbon flux by enabling the

320 uptake organic forms of nutrients (Ward and Follows 2016). Viral infection of such

321 organisms may thus help increase BCP efficiency.

322 As most other cultured algal viruses, prasinoviruses and prymnesioviruses are

323 lytic. Viral lysis not only generates the cellular debris used to build aggregates; it also

324 facilitates the process of aggregation (Weinbauer 2004) via viral-induced production of

325 agglomerative compounds, such as transparent exopolymer particles (TEPs). Lønborg et

326 al. (2013) proposed that the increased DOC and TEP production observed in cultures of

327 Micromonas pusilla infected with prasinoviruses (compared with non-infected cultures)

328 could stimulate particle aggregation and thus carbon export out of the photic zone. Some

329 prasinoviruses encode glycosyltransferases of the GT2 family. Similar to the a098r gene

330 (GT2) in bursaria Chlorella virus 1, the expression of GT2 family members

331 during infection possibly leads to the production of a dense fibrous hyaluronan network at

332 the surface of infected cells. Such a network may trigger the aggregation of host cells,

333 facilitate viral propagation (Van Etten et al. 2017) and increase the cell wall C:N ratio.

334 PgV, closely related to prymnesioviruses (Figure 4), has been linked with increased TEP

335 production and aggregate formation during the termination of Phaeocystis bloom

336 (Brussaard et al. 2007).

337 Other previous experimental and in situ studies have linked viruses and viral

338 activity to vertical carbon flux, cell sinking, and aggregate formation. Several laboratory

339 experiments have revealed associations between viruses and sinking cells or between

340 viruses and aggregate formation. For example, sinking rates of the toxic bloom former

341 Heterosigma akashiwo are elevated following viral infection (Lawrence and Suttle 2004).

18 342 As early as 1993, an experimental study revealed increased particle aggregation and

343 primary productivity upon addition of virally enriched material to seawater samples

344 (Peduzzi and Weinbauer 1993). More recently, cultures of the diatom Chaetoceros

345 tenuissimus infected with a DNA virus (CtenDNAV type II) have been shown to produce

346 higher levels of large-sized particles (50 to 400 µm) compared with non-infected cultures

347 (Yamada et al. 2018). In situ studies have uncovered vertical transport of viruses between

348 photic and aphotic zones in the Pacific Ocean (Hurwitz et al. 2015) and in Tara Oceans

349 samples (Brum et al. 2015), which suggests their association with sinking material. In

350 addition, the level of active Emiliania huxleyi virus (EhV) infection in a coccolithophore

351 bloom has been found to be correlated with water column depth, which suggests that

352 infected cells are exported from the surface to deeper waters in aggregate form (Sheyn et

353 al. 2018). EhVs infecting E. huxleyi have also been found to facilitate particle

354 aggregation and downward vertical flux of both particulate organic and particulate

355 inorganic carbon in the North Atlantic (Laber et al. 2018). No Phycodnaviridae PolB had

356 their best hits to EhVs in our study, which is probably because of sampling regions and

357 periods. These previous observations come as potential mechanistic explanations that

358 provide support for our results suggesting eukaryotic viruses have a “shuttle effect” that

359 enhance carbon export.

360 Viruses versus hosts as key markers of CE and CEE variations

361 Our analysis used similar data and analytical methods as those of Guidi et al. (2016), who

362 explored the link between planktonic networks and the BCP. Those authors reported that

363 bacterial viruses are better associated with CE150 than are cellular organisms. In their

364 study, host lineages for prasinoviruses (Mamiellales) and prymnesioviruses

19 365 (Prymnesiales) were not found to be associated with CE150. The fact that we detected

366 these viruses as predictors of CEE may mean that viruses are better predictors than their

367 hosts, most likely because viruses may better reflect the overall process and mechanisms

368 by which carbon is exported. Viral infection causes lysis and aggregation of cells,

369 enhances sinking rates and increases the export of organic carbon out of the sunlit ocean.

370 Among the eukaryotic organisms studied by Guidi et al. (2016), collodarians,

371 and copepods were reported to be important contributors to CE150. Some

372 of the viral taxa we found to be associated with CEE were connected with collodarians

373 and dinoflagellates in our host prediction analysis, although not as clearly as with

374 Mamiellales and Prymnesiales. Known copepod viruses have a ssDNA genome and

375 hence were not captured in our data. Only one genome sequence of a ssRNA virus and

376 one PolB sequence of a NCLDV are available for dinoflagellates viruses, and none have

377 been reported for collodarians. Viruses infecting these organisms may be among those we

378 found to be associated with CEE, but the lack of reference viruses hampered their

379 identification from the environmental samples.

380 Conclusions

381 In this study, we identified associations between CEE and diverse groups of planktonic

382 eukaryotic viruses collected in the photic ocean from 61 sampling sites during the Tara

383 Oceans expedition. Two lineages were predicted to be important groups facilitating

384 carbon export in the global ocean: prasinoviruses infecting tiny of the order

385 Mamiellales, and Mimivirus relatives putatively infecting haptophytes of the order

386 Prymnesiales. This finding suggests that these eukaryotic viruses, like cyanophages

20 387 previously reported to be strongly associated with carbon export, are important factors in

388 the BCP. This idea is in agreement with observations that their hosts, despite being one to

389 two orders of magnitude less abundant than cyanobacteria, likely dominate carbon

390 biomass and net production in the ocean. The global-scale association between eukaryotic

391 viruses and CEE is empirical evidence supporting the shunt and pump hypothesis,

392 namely, that viruses enhance carbon pump efficiency by increasing the relative amount of

393 organic carbon exported from the surface to the deeper sea.

394 Methods

395 Data context

396 We used publicly available data generated in the framework of the Tara Oceans

397 expedition (Karsenti et al. 2011). Single-copy marker-gene sequences for NCLDVs and

398 RNA viruses were identified from two gene catalogs: the Ocean Microbial Reference

399 Gene Catalog (OM-RGC) and the Marine Atlas of Tara Oceans Unigenes (MATOU).

400 The viral marker-gene abundance profiles used in our study for prokaryotic-sized

401 metagenomes and eukaryotic-sized meta-transcriptomes were from Sunagawa et al.

402 (2015) and Carradec et al. (2018), respectively. For eukaryotic 18S rDNA V9 OTUs, we

403 used an updated version of the data of de Vargas et al. (2015) that included functional

404 trait annotations (chloroplast-bearing, silicified and calcified organisms) of V9-OTUs.

405 Abundance profiles are compositional matrices in which gene abundances are expressed

406 as unnormalized (V9 barcode data) or gene-length normalized (shotgun data) read counts.

407 Eukaryotic plankton samples were filtered for categorization into the following size

408 classes: piconano (0.8–5 µm), nano (5–20 µm), micro (20–180 µm) and meso (180–2000

21 1 2 1 409 µm) (Pesant et al. 2015). Indirect measurements of carbon export (mg− m− d− ) in 5-m

410 increments from the surface to a 1000-m depth were taken from Guidi et al. (2016). The

411 original measurements were derived from the distribution of particle sizes and

412 abundances collected using an underwater vision profiler (Picheral et al. 2010). These

413 raw data are available from PANGEA (Picheral et al. 2014). Net primary production

414 (NPP) data were extracted from 8-day composites of the vertically generalized production

415 model (VGPM) (Behrenfeld and Falkowski 1997) at the week of sampling.

416 Carbon export and carbon export efficiency

1 2 1 417 Carbon flux profiles (mg− m− d− ) based on particle size distribution and abundances

418 were estimated according to Guidi et al. (2008). Carbon flux values from depths of 30 to

419 970 m were divided into 20-m bins, each obtained by averaging the carbon flux values

420 from the designated 20 m from profiles gathered during biological sampling within a 25-

421 km radius during 24 h and having less than 50% missing data (Supplementary Figure 4).

422 For compatibility with Guidi et al. (2016), carbon export was defined as the carbon flux

423 at 150 m. Carbon export efficiency was calculated as follows: CEE = (CEdeep −

424 CEsurface)/CEdeep (Sarmiento 2013) and CEEʹ = CEdeep/CEsurface (Francois et al. 2002). In

425 these formulas, CEsurface is the maximum average carbon flux within the first 150 m (the

426 maximum layer depth varied between 20 to 130 m in this dataset), and CEdeep is the

427 average carbon flux 200 m below this maximum.

428 Acquisition of viral marker genes from ocean gene catalogs

429 Viral genes were collected from two gene catalogs: OM-RGC version 1 (Sunagawa et al.

430 2015) and MATOU (Carradec et al. 2018). The OM-RGC data were taxonomically re-

22 431 annotated as in Carradec et al. (2018). Importantly, the NCBI reference tree used to

432 determine the last common ancestor was modified to reflect the current NCLDVs

433 classification (see Carradec et al. [2018] for details). We classified viral gene sequences

434 as eukaryotic or prokaryotic according to their best BLAST score against viral sequences

435 in the Virus-Host DB (Mihara et al. 2016). DNA polymerase B (PolB) and RNA-

436 dependent RNA polymerase (RdRP) genes were used as markers for NCLDVs and RNA

437 viruses, respectively. For PolB, reference proteins from the NCLDV orthologous gene

438 cluster NCVOG0038 (Yutin et al. 2009) were aligned using linsi (Katoh and Standley

439 2013). A HMM profile was constructed from the resulting alignment using hmmbuild

440 (Eddy 1998). This PolB HMM profile was searched against OM-RGC amino acid

441 sequences annotated as NCLDVs, and sequences longer than 200 amino acids and having

5 442 hits with E-values < 1 × 10− were selected as putative PolBs. RdRP sequences were

443 chosen from the MATOU catalog as follows: sequences assigned to Pfam profiles

444 PF00680, PF00946, PF00972, PF00978, PF00998, PF02123, PF04196, PF04197 or

445 PF05919 and annotated as RNA viruses were retained as RdRPs. The resulting 3,486

446 PolB sequences and 975 RdRP sequences were used along with their abundance matrices

447 for PLS regression analyses, co-abundance network reconstructions and to build

448 phylogenies.

449 Testing for associations of viral abundance with CE150 and CEE

450 To test for an association between the abundance of viral marker genes and CEE, we

451 proceeded as follows. Only marker genes represented by at least two reads in five or

452 more samples were retained. To cope with the sparsity and composition of the data, gene-

453 length normalized read-count matrices were center-log transformed separately for RNA

23 454 viruses and NCLDVs. We next selected genes with a Spearman correlation coefficient to

455 CE150 or CEE greater than 0.2 or smaller than -0.2 (zero values were removed). To

456 assess the association between these marker genes and CEE, we used partial least square

457 (PLS) regression analysis as described in Guidi et al. (2016). The number of components

458 selected for the PLS model was chosen to minimize the root mean square error of

459 prediction (Supplementary Table 2). We assessed the strength of the association between

460 carbon export (the response variable) and viral abundance (the explanatory variable) by

461 correlating leave-one-out cross-validation predicted values with the measured carbon

462 export values. We tested the significance of the correlation by comparing the original

463 Pearson coefficient between explanatory and response variables with the distribution of

464 Pearson coefficients obtained from PLS models reconstructed on permutated data (10,000

465 iterations). We estimated the contribution of each gene (predictor) according to its

466 variable importance in the projection (VIP) (Chong and Jun 2005). The VIP measure of a

467 predictor estimates its contribution in the PLS regression. Predictors having high VIP

468 values (> 2) are assumed to be important for the PLS prediction of the response variable.

469 Phylogenetic analysis

470 Environmental PolB sequences annotated as Phycodnaviridae or Mimiviridae were

471 searched against reference PolB sequences using BLAST. Environmental sequences with

472 hits to a reference sequence having >40% identity and an alignment length greater than

473 400 amino acids were kept and aligned with reference sequences using linsi (Katoh and

474 Standley 2013). We further removed sequences that occurred in fewer than 10 samples to

475 reduce the size of the tree. Environmental RdRP sequences annotated as Picornavirales

476 were translated into six frames of amino acid sequences, and reference Picornavirales

24 477 RdRP sequences collected from the Virus-Host DB (Mihara et al. 2016) were searched

478 against the CDD database (Marchler-Bauer et al. 2015) using rpsBLAST. The resulting

479 alignment was used to trim reference and environmental RdRP sequences to the

480 conserved part corresponding to the , CDD: 279070, before alignment with linsi.

481 PolB and RdRP multiple sequence alignments were manually curated to discard poorly

482 aligned sequences. Trees were constructed using the build function of ETE3 (Huerta-

483 Cepas et al. 2016) as implemented at GenomeNet (https://www.genome.jp/tools-bin/ete).

484 Columns were automatically trimmed using trimAl (Capella-Gutiérrez et al. 2009), and

485 trees were constructed using FastTree with default settings (Price et al. 2009).

486 Virus– co-occurrence analysis

487 We used FlashWeave (Tackmann et al. 2018) with Julia 1.0 to detect virus–host

488 associations on the basis of their co-abundance patterns. Read count matrices for the

489 3,486 PolB, 975 RdRP and 18S V9 DNA barcodes obtained from samples collected at the

490 same location were fed into FlashWeave. 18S-V9 data were filtered to retain OTUs with

491 an informative taxonomic annotation. 18S-V9 OTUs and viral marker sequences were

492 further filtered to conserve only those present in at least five samples. FlashWeave

493 analyses were run separately for each of the four eukaryotic size fractions. The number of

494 samples per size fraction ranged between 51 and 99 for NCLDVs and between 36 and 62

495 for RNA viruses. The number of retained OTUs per size fraction varied between 1,775

496 and 2,269 for NCLDVs and between 48 and 125 for RNA viruses (Supplementary Table

497 3). FlashWeave was run under the settings heterogenous = false and sensitive = true, and

498 PolB/RdRP-V9-OTU edges receiving a weight > 0.2 and a Q-value < 0.01(the default)

499 were retained.

25 500 Mapping of putative hosts onto viral phylogenies

501 In our association networks, individual viral sequences were often associated with

502 multiple 18S-V9 OTUs belonging to drastically different eukaryotic groups, a situation

503 that can reflect interactions among multiple organisms but also noise associated with this

504 type of analysis (Coenen and Weitz 2018). To extract meaningful information from these

505 networks we reasoned as follow: We assumed that evolutionarily related viruses infect

506 evolutionary related organisms, similar to the case of phycodnaviruses (Clasen and Suttle

507 2009). In the interaction networks, the number of connections between viruses in a given

508 clade and its eukaryotic host group should accordingly be enriched compared with the

509 number of connections with non-host organisms arising by chance. Following this

510 reasoning, we assigned the most likely eukaryotic host group as follows. The tree

511 constructed from viral marker-gene sequences (PolB or RdRP) was traversed from root to

512 tips to visit every node. We counted how many connections existed between leaves of

513 each node and the V9-OTUs of a given eukaryotic group (order level). We then tested

514 whether the node was enriched compared with the rest of the tree using Fischer’s exact

515 test and applied the Benjamini-Hochberg procedure to control the FDR among

516 comparisons of each eukaryotic taxon (order level). To avoid the appearance of

517 significant associations driven by a few highly connected leaves, we required half of the

518 leaves within a node to be connected to a given eukaryotic group. Significant enrichment

519 of connections between a virus clade and a eukaryotic order was considered to be

520 indicative of a possible virus–host relationship. We refer to the above approach, in which

521 taxon interactions are mapped onto a phylogenetic tree of a target group using the

522 organism’s associations predicted from a species co-abundance-based network, as TIM,

26 523 for Taxon Interaction Mapper. The script is available upon request. This approach can be

524 extended to interactions other than virus–host relationships.

525 Acknowledgements

526 This work was supported by JSPS/KAKENHI (Nos. 26430184, 18H02279, 19H05667 to

527 H.O.), Scientific Research on Innovative Areas from the Ministry of Education, Culture,

528 Science, Sports and Technology (MEXT) of Japan (Nos. 16H06429, 16K21723,

529 16H06437 to H.O.), the Collaborative Research Program of the Institute for Chemical

530 Research, Kyoto University (2019-29 to S.C.), the Future Development Funding Program

531 of Kyoto University Research Coordination Alliance (to R.B.M.), ICR-KU International

532 Short-term Exchange Program for Young Researchers (to S.C.). Computational time was

533 provided by the SuperComputer System, Institute for Chemical Research, Kyoto

534 University. We thank the Tara Oceans consortium, the projects Oceanomics and France

535 Genomique (grants ANR-11-BTBR-0008 and ANR-10-INBS-09), people and sponsors

536 who supported the Tara Oceans Expedition (http://www.embl.de/tara-oceans/) for

537 making the data accessible. This is contribution number XXX of the Tara Oceans

538 Expedition 2009–2013.

539 References

540 Agusti S, González-Gordillo JI, Vaqué D, Estrada M, Cerezo MI, Salazar G, Gasol JM,

541 Duarte CM. 2015. Ubiquitous healthy diatoms in the deep sea confirm deep

542 carbon injection by the biological pump. Nat. Commun. 6:7608.

27 543 Alberti A, Poulain J, Engelen S, Labadie K, Romac S, Ferrera I, Albini G, Aury J-M,

544 Belser C, Bertrand A, et al. 2017. Viral to metazoan marine plankton

545 nucleotide sequences from the Tara Oceans expedition. Sci. Data [Internet] 4.

546 Available from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5538240/

547 Allen LZ, McCrow JP, Ininbergs K, Dupont CL, Badger JH, Hoffman JM, Ekman M,

548 Allen AE, Bergman B, Venter JC. 2017. The Baltic Sea Virome: Diversity and

549 Transcriptional Activity of DNA and RNA Viruses. mSystems 2:e00125-16.

550 Behrenfeld MJ, Falkowski PG. 1997. Photosynthetic rates derived from satellite-

551 based chlorophyll concentration. Limnol. Oceanogr. 42:1–20.

552 Boyd P, Newton P. 1995. Evidence of the potential influence of planktonic

553 community structure on the interannual variability of particulate organic

554 carbon flux. Deep Sea Res. Part Oceanogr. Res. Pap. 42:619–639.

555 Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, Chaffron S,

556 Cruaud C, Vargas C de, Gasol JM, et al. 2015. Patterns and ecological drivers of

557 ocean viral communities. Science 348:1261498.

558 Brussaard CPD, Bratbak G, Baudoux A-C, Ruardij P. 2007. Phaeocystis and its

559 interaction with viruses. Biogeochemistry 83:201–215.

560 Brussaard CPD, Wilhelm SW, Thingstad F, Weinbauer MG, Bratbak G, Heldal M,

561 Kimmance SA, Middelboe M, Nagasaki K, Paul JH, et al. 2008. Global-scale

28 562 processes with a nanoscale drive: the role of marine viruses. ISME J. 2:575–

563 578.

564 Capella-Gutiérrez S, Silla-Martínez JM, Gabaldón T. 2009. trimAl: a tool for

565 automated alignment trimming in large-scale phylogenetic analyses.

566 Bioinforma. Oxf. Engl. 25:1972–1973.

567 Carradec Q, Pelletier E, Silva CD, Alberti A, Seeleuthner Y, Blanc-Mathieu R, Lima-

568 Mendez G, Rocha F, Tirichine L, Labadie K, et al. 2018. A global ocean atlas of

569 eukaryotic genes. Nat. Commun. 9:373.

570 Chong I-G, Jun C-H. 2005. Performance of some variable selection methods when

571 multicollinearity is present. Chemom. Intell. Lab. Syst. 78:103–112.

572 Clasen JL, Suttle CA. 2009. Identification of freshwater Phycodnaviridae and their

573 potential phytoplankton hosts, using DNA pol sequence fragments and a

574 genetic-distance analysis. Appl. Environ. Microbiol. 75:991–997.

575 Coenen AR, Weitz JS. 2018. Limitations of Correlation-Based Inference in Complex

576 Virus-Microbe Communities. mSystems 3:e00084-18.

577 Culley A. 2018. New insight into the RNA aquatic virosphere via viromics. Virus Res.

578 244:84–89.

579 De La Rocha CL, Passow U. 2007. Factors influencing the sinking of POC and the

580 efficiency of the biological carbon pump. Deep Sea Res. Part II Top. Stud.

581 Oceanogr. 54:639–658.

29 582 Eddy SR. 1998. Profile hidden Markov models. Bioinformatics 14:755–763.

583 Endo H, Ogata H, Suzuki K. 2018. Contrasting biogeography and diversity patterns

584 between diatoms and haptophytes in the central Pacific Ocean. Sci. Rep.

585 8:10916.

586 Falkowski PG, Barber RT, Smetacek V. 1998. Biogeochemical Controls and

587 Feedbacks on Ocean Primary Production. Science 281:200–206.

588 Fawcett SE, Lomas MW, Casey JR, Ward BB, Sigman DM. 2011. Assimilation of

589 upwelled nitrate by small eukaryotes in the Sargasso Sea. Nat. Geosci. 4:717–

590 722.

591 Francois R, Honjo S, Krishfield R, Manganini S. 2002. Factors controlling the flux of

592 organic carbon to the bathypelagic zone of the ocean. Glob. Biogeochem.

593 Cycles 16:34-1-34–20.

594 Gobler CJ, Hutchins DA, Fisher NS, Cosper EM, Saňudo‐Wilhelmy SA. 1997. Release

595 and bioavailability of C, N, P Se, and Fe following viral lysis of a marine

596 chrysophyte. Limnol. Oceanogr. 42:1492–1504.

597 Guidi L, Chaffron S, Bittner L, Eveillard D, Larhlimi A, Roux S, Darzi Y, Audic S,

598 Berline L, Brum JR, et al. 2016. Plankton networks driving carbon export in

599 the oligotrophic ocean. Nature 532:465.

30 600 Guidi L, Jackson GA, Stemmann L, Miquel JC, Picheral M, Gorsky G. 2008.

601 Relationship between particle size distribution and flux in the mesopelagic

602 zone. Deep Sea Res. Part Oceanogr. Res. Pap. 55:1364–1374.

603 Guidi L, Stemmann L, Jackson GA, Ibanez F, Claustre H, Legendre L, Picheral M,

604 Gorskya G. 2009. Effects of phytoplankton community on production, size,

605 and export of large aggregates: A world-ocean analysis. Limnol. Oceanogr.

606 54:1951–1963.

607 Herndl GJ, Reinthaler T. 2013. Microbial control of the dark end of the biological

608 pump. Nat. Geosci. 6:718–724.

609 Hingamp P, Grimsley N, Acinas SG, Clerissi C, Subirana L, Poulain J, Ferrera I,

610 Sarmento H, Villar E, Lima-Mendez G, et al. 2013. Exploring nucleo-

611 cytoplasmic large DNA viruses in Tara Oceans microbial metagenomes. ISME

612 J. 7:1678–1695.

613 Huerta-Cepas J, Serra F, Bork P. 2016. ETE 3: Reconstruction, analysis and

614 visualization of phylogenomic data. Mol. Biol. Evol.:msw046.

615 Hurwitz BL, Brum JR, Sullivan MB. 2015. Depth-stratified functional and taxonomic

616 niche specialization in the “core” and “flexible” Pacific Ocean Virome. ISME J.

617 9:472–484.

31 618 Karsenti E, Acinas SG, Bork P, Bowler C, Vargas CD, Raes J, Sullivan M, Arendt D,

619 Benzoni F, Claverie J-M, et al. 2011. A Holistic Approach to Marine Eco-

620 Systems Biology. PLOS Biol. 9:e1001177.

621 Katoh K, Standley DM. 2013. MAFFT multiple sequence alignment software version

622 7: improvements in performance and usability. Mol. Biol. Evol. 30:772–780.

623 Laber CP, Hunter JE, Carvalho F, Collins JR, Hunter EJ, Schieler BM, Boss E, More K,

624 Frada M, Thamatrakoln K, et al. 2018. Coccolithovirus facilitation of carbon

625 export in the North Atlantic. Nat. Microbiol. 3:537–547.

626 Lawrence JE, Suttle CA. 2004. Effect of viral infection on sinking rates of

627 Heterosigma akashiwo and its implications for bloom termination. Aquat.

628 Microb. Ecol. 37:1–7.

629 Leblanc K, Quéguiner B, Diaz F, Cornet V, Michel-Rodriguez M, Madron XD de,

630 Bowler C, Malviya S, Thyssen M, Grégori G, et al. 2018. Nanoplanktonic

631 diatoms are globally overlooked but play a role in spring blooms and carbon

632 export. Nat. Commun. 9:953.

633 Li W. 1995. Composition of Ultraphytoplankton in the Central North-Atlantic. Mar.

634 Ecol. Prog. Ser. 122:1–8.

635 Liu H, Probert I, Uitz J, Claustre H, Aris-Brosou S, Frada M, Not F, de Vargas C. 2009.

636 Extreme diversity in noncalcifying haptophytes explains a major pigment

637 paradox in open oceans. Proc. Natl. Acad. Sci. U. S. A. 106:12803–12808.

32 638 Lomas MW, Moran SB. 2011. Evidence for aggregation and export of cyanobacteria

639 and nano-eukaryotes from the Sargasso Sea euphotic zone. Biogeosciences

640 8:203–216.

641 Lønborg C, Middelboe M, Brussaard CPD. 2013. Viral lysis of Micromonas pusilla:

642 impacts on dissolved organic matter production and composition.

643 Biogeochemistry 116:231–240.

644 Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He

645 J, Gwadz M, Hurwitz DI, et al. 2015. CDD: NCBI’s conserved domain database.

646 Nucleic Acids Res. 43:D222–D226.

647 Mars Brisbin M, Mesrop LY, Grossmann MM, Mitarai S. 2018. Intra-host Symbiont

648 Diversity and Extended Symbiont Maintenance in Photosymbiotic

649 Acantharea (Clade F). Front. Microbiol. [Internet] 9. Available from:

650 https://www.frontiersin.org/articles/10.3389/fmicb.2018.01998/full

651 Mihara T, Koyano H, Hingamp P, Grimsley N, Goto S, Ogata H. 2018. Taxon Richness

652 of “Megaviridae” Exceeds those of Bacteria and Archaea in the Ocean.

653 Microbes Environ. 33:162–171.

654 Mihara T, Nishimura Y, Shimizu Y, Nishiyama H, Yoshikawa G, Uehara H, Hingamp P,

655 Goto S, Ogata H. 2016. Linking Virus Genomes with Host Taxonomy. Viruses

656 8:66.

33 657 Moniruzzaman M, Wurch LL, Alexander H, Dyhrman ST, Gobler CJ, Wilhelm SW.

658 2017. Virus-host relationships of marine single-celled eukaryotes resolved

659 from metatranscriptomics. Nat. Commun. 8:16054.

660 Motegi C, Nagata T, Miki T, Weinbauer MG, Legendre L, Rassoulzadegand F. 2009.

661 Viral control of bacterial growth efficiency in marine pelagic environments.

662 Limnol. Oceanogr. 54:1901–1910.

663 Nelson DM, Tréguer P, Brzezinski MA, Leynaert A, Quéguiner B. 1995. Production

664 and dissolution of biogenic silica in the ocean: Revised global estimates,

665 comparison with regional data and relationship to biogenic sedimentation.

666 Glob. Biogeochem. Cycles 9:359–372.

667 Not F, Siano R, Kooistra WHCF, Simon N, Vaulot D, Probert I. 2012. Chapter One -

668 Diversity and Ecology of Eukaryotic Marine Phytoplankton. In: Piganeau G,

669 editor. Advances in Botanical Research. Vol. 64. Genomic Insights into the

670 Biology of Algae. Academic Press. p. 1–53. Available from:

671 http://www.sciencedirect.com/science/article/pii/B978012391499600001

672 3

673 Peduzzi P, Weinbauer MG. 1993. Effect of concentrating the virus-rich 2-2nm size

674 fraction of seawater on the formation of algal flocs (marine snow). Limnol.

675 Oceanogr. 38:1562–1565.

34 676 Pesant S, Not F, Picheral M, Kandels-Lewis S, Bescot NL, Gorsky G, Iudicone D,

677 Karsenti E, Speich S, Troublé R, et al. 2015. Open science resources for the

678 discovery and analysis of Tara Oceans data. Sci. Data 2:150023.

679 Picheral M, Guidi L, Stemmann L, Karl DM, Iddaoud G, Gorsky G. 2010. The

680 Underwater Vision Profiler 5: An advanced instrument for high spatial

681 resolution studies of particle size spectra and zooplankton. Limnol. Oceanogr.

682 Methods 8:462–473.

683 Picheral M, Searson S, Taillandier V, Bricaud A, Boss E, Stemmann L, Gorsky G, Tara

684 Oceans Consortium C, Tara Oceans Expedition P. 2014. Vertical profiles of

685 environmental parameters measured from physical, optical and imaging

686 sensors during Tara Oceans expedition 2009-2013. Available from:

687 https://doi.pangaea.de/10.1594/PANGAEA.836321

688 Price MN, Dehal PS, Arkin AP. 2009. FastTree: Computing Large Minimum Evolution

689 Trees with Profiles instead of a Distance Matrix. Mol. Biol. Evol. 26:1641–

690 1650.

691 Proctor LM, Fuhrman JA. 1991. Roles of viral infection in organic particle flux. Mar.

692 Ecol. Prog. Ser. 69:133–142.

693 Quéré CL, Andrew RM, Friedlingstein P, Sitch S, Hauck J, Pongratz J, Pickers PA,

694 Korsbakken JI, Peters GP, Canadell JG, et al. 2018. Global Carbon Budget 2018.

695 Earth Syst. Sci. Data 10:2141–2194.

35 696 Richardson TL, Jackson GA. 2007. Small Phytoplankton and Carbon Export from the

697 Surface Ocean. Science 315:838–840.

698 Sarmiento JL. 2013. Ocean Biogeochemical Dynamics. Princeton University Press

699 Sheyn U, Rosenwasser S, Lehahn Y, Barak-Gavish N, Rotkopf R, Bidle KD, Koren I,

700 Schatz D, Vardi A. 2018. Expression profiling of host and virus during a

701 coccolithophore bloom provides insights into the role of viral infection in

702 promoting carbon export. ISME J.:1.

703 Shibata A, Kogure K, Koike I, Ohwada K. 1997. Formation of submicron colloidal

704 particles from marine bacteria by viral infection. Mar. Ecol. Prog. Ser.

705 155:303–307.

706 Steinberg DK, Mooy BASV, Buesseler KO, Boyd PW, Kobari T, Karl DM. 2008.

707 Bacterial vs. zooplankton control of sinking particle flux in the ocean’s

708 twilight zone. Limnol. Oceanogr. 53:1327–1338.

709 Sullivan MB, Weitz JS, Wilhelm S. 2017. Viral ecology comes of age. Environ.

710 Microbiol. Rep. 9:33–35.

711 Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, Djahanschiri B,

712 Zeller G, Mende DR, Alberti A, et al. 2015. Ocean plankton. Structure and

713 function of the global ocean microbiome. Science 348:1261359.

714 Suttle CA. 2007. Marine viruses--major players in the global ecosystem. Nat. Rev.

715 Microbiol. 5:801–812.

36 716 Tackmann J, Rodrigues JFM, Mering C von. 2018. Rapid inference of direct

717 interactions in large-scale ecological networks from heterogeneous microbial

718 sequencing data. bioRxiv:390195.

719 Tréguer P, Bowler C, Moriceau B, Dutkiewicz S, Gehlen M, Aumont O, Bittner L,

720 Dugdale R, Finkel Z, Iudicone D, et al. 2018. Influence of diatom diversity on

721 the ocean biological carbon pump. Nat. Geosci. 11:27–37.

722 Turner JT. 2015. Zooplankton fecal pellets, marine snow, phytodetritus and the

723 ocean’s biological pump. Prog. Oceanogr. 130:205–248.

724 Unrein F, Gasol JM, Not F, Forn I, Massana R. 2014. Mixotrophic haptophytes are key

725 bacterial grazers in oligotrophic coastal waters. ISME J. 8:164–176.

726 Van Etten JL, Agarkova I, Dunigan DD, Tonetti M, De Castro C, Duncan GA. 2017.

727 Chloroviruses Have a Sweet Tooth. Viruses [Internet] 9. Available from:

728 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408694/

729 de Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, Lara E, Berney C, Le

730 Bescot N, Probert I, et al. 2015. Ocean plankton. Eukaryotic plankton

731 diversity in the sunlit ocean. Science 348:1261605.

732 Ward BA, Follows MJ. 2016. Marine mixotrophy increases trophic transfer efficiency,

733 mean organism size, and vertical carbon flux. Proc. Natl. Acad. Sci. 113:2958–

734 2963.

37 735 Weinbauer MG. 2004. Ecology of prokaryotic viruses. FEMS Microbiol. Rev. 28:127–

736 181.

737 Weinbauer MG, Peduzzi P. 1995. Effect of virus-rich high molecular weight

738 concentrates of seawater on the dynamics of dissolved amino acids and

739 carbohydrates. Mar. Ecol. Prog. Ser. 127:245–253.

740 Weitz JS, Stock CA, Wilhelm SW, Bourouiba L, Coleman ML, Buchan A, Follows MJ,

741 Fuhrman JA, Jover LF, Lennon JT, et al. 2015. A multitrophic model to

742 quantify the effects of marine viruses on microbial food webs and ecosystem

743 processes. ISME J. 9:1352–1364.

744 Weynberg KD, Allen MJ, Wilson WH. 2017. Marine Prasinoviruses and Their Tiny

745 Plankton Hosts: A Review. Viruses 9.

746 Wilhelm SW, Suttle CA. 1999. Viruses and Nutrient Cycles in the SeaViruses play

747 critical roles in the structure and function of aquatic food webs. BioScience

748 49:781–788.

749 Worden AZ, Nolan JK, Palenik B. 2004. Assessing the dynamics and ecology of

750 marine picophytoplankton: The importance of the eukaryotic component.

751 Limnol. Oceanogr. 49:168–179.

752 Yamada Y, Tomaru Y, Fukuda H, Nagata T. 2018. Aggregate Formation During the

753 Viral Lysis of a Marine Diatom. Front. Mar. Sci. [Internet] 5. Available from:

754 https://www.frontiersin.org/articles/10.3389/fmars.2018.00167/full

38 755 Yutin N, Wolf YI, Raoult D, Koonin EV. 2009. Eukaryotic large nucleo-cytoplasmic

756 DNA viruses: Clusters of orthologous genes and reconstruction of viral

757 genome evolution. Virol. J. 6:223.

758 Zhang C, Dang H, Azam F, Benner R, Legendre L, Passow U, Polimene L, Robinson C,

759 Suttle CA, Jiao N. 2018. Evolving paradigms in biological carbon cycling in the

760 ocean. Natl. Sci. Rev. 5:481–499.

761 Supplementary material

762 • Supplement.docx: supplementary figures and tables.

763 • Supplementary_file_1: Information on the predictors (viruses) in the PLS model

764 predicting carbon export efficiency. Predictors are sorted by VIP score.

765 • Supplementary data available at GenomeNet FTP:

766 ftp://ftp.genome.jp/pub/db/community/tara/Cpump/

39