bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 A freshwater radiation of diplonemids 2 3 4 Indranil Mukherjee1#, Michaela M Salcher1, Adrian-Ștefan Andrei1*, Vinicius 5 Silva Kavagutti1,2, Tanja Shabarova1, Vesna Grujčić3, Markus Haber1, Paul 6 Layoun1,2, Yoshikuni Hodoki4, Shin-ichi Nakano4, Karel Šimek1,2, Rohit 7 Ghai1# 8 9 1Department of Aquatic Microbial Ecology, Institute of Hydrobiology,Biology Centre of 10 the Czech Academy of Sciences, , Na Sadkach 7, 37005, České Budějovice, Czech 11 Republic 12 13 2Department of Ecosystem Biology, Faculty of Science, University of South Bohemia, 14 Branišovská 1760, 37005, České Budějovice, Czech Republic 15 16 3KTH Royal Institute of Technology, Science for Life Laboratory, Department of Gene 17 Technology, School of Engineering Sciences in Chemistry, Biotechnology and Health, 18 Stockholm, Sweden 19 20 4Center for Ecological Research, Kyoto University, Otsu, Shiga, Japan 21 22 *Current address: Limnological Station, Institute of Plant and Microbial Biology, 23 University of Zurich, Seestrasse 187, 8802, Kilchberg, Switzerland 24 25 26 27 Keywords: 18S amplicon sequencing/ CARD-FISH/ diplonemids/ freshwater lakes/

28 protists/ metagenomic 18S rRNA assembly

29 Running Title: Diplonemids in freshwater lakes 30 31 32 #Authors to whom correspondence should be addressed

33 Indranil Mukherjee ([email protected])

34 Rohit Ghai ([email protected])

35

36

37

38

1 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

39 Abstract

40 Diplonemids are considered marine protists and have been reported among the most

41 abundant and diverse in the world oceans. Recently we detected the presence

42 of freshwater diplonemids in Lake Biwa, Japan. However, their distribution and

43 abundances in freshwater ecosystems remain unknown. We assessed abundance and

44 diversity of diplonemids from several geographically distant deep freshwater lakes of the

45 world by amplicon-sequencing, shotgun metagenomics and CARD-FISH. We found

46 diplonemids in all the studied lakes, albeit with low abundances and diversity. We

47 assembled long 18S rRNA sequences from freshwater diplonemids and showed that they

48 form a new lineage distinct from the diverse marine clades. Freshwater diplonemids are a

49 sister-group to marine isolates from coastal and bay areas, suggesting a recent habitat

50 transition from marine to freshwater habitats. Images of CARD-FISH targeted freshwater

51 diplonemids suggest they feed on bacteria. Our analyses of 18S rRNA sequences retrieved

52 from single cell genomes of marine diplonemids shows they encode multiple rRNA copies

53 that may be very divergent from each other, suggesting that marine diplonemid abundance

54 and diversity both have been overestimated. These results have wider implications on

55 assessing eukaryotic abundances in natural habitats by using amplicon-sequencing alone.

56

57 Introduction

58 Diplonemids appear as one of the more intriguing findings in amplicon-sequencing

59 surveys of the global oceans microbiota [1]. While the abundance and distribution of these

60 protists were nearly unknown till the last decade, they are now considered (based mainly

61 on metabarcoding approaches) as one of the most abundant and diverse microbial

62 eukaryotic group in the world oceans [2,3]. Marine diplonemid 18S rRNA sequences

63 exhibit high diversity and are the sixth most frequently recovered eukaryotic signatures in

2 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

64 the world’s oceans in amplicon-based surveys [3]. They appear to be more abundant in the

65 mesopelagic zone of the oceans [1,4]. These discoveries have ignited interest in

66 understanding the importance and distribution of these protists in other aquatic non-

67 marine ecosystems. Diplonemids are considered marine protists [5] and were till recently

68 never reported from freshwater ecosystems. Our interest in studying the distribution of

69 free-living kinetoplastids (a sister group of diplonemids) from freshwater lakes using

70 kinetoplastid-specific primers accidentally led us to the discovery of freshwater

71 diplonemids in deep freshwater lakes of Japan [6]. Similar to kinetoplastids, diplonemids

72 are generally not targeted by commonly used eukaryotic V4 primers due to their divergent

73 18S rRNA genes [7,8] and this explains their exclusion from most diversity studies in

74 freshwater environments. This finding suggested that diplonemids are not limited to

75 marine ecosystems and are likely present in other freshwater lakes of the world. To

76 examine the distribution of diplonemids in freshwater habitats, we sampled seven deep

77 freshwater lakes (different depths, both monthly time series and isolated samples) of

78 various trophic statuses from Japan, Czech Republic, and Switzerland. The diversity and

79 abundance of diplonemids were analyzed using PCR based amplicon sequencing, shotgun

80 metagenomic sequencing and CARD-FISH analyses. Our results suggest a broad

81 distribution (albeit with low abundance) of diplonemids in freshwater ecosystems, and the

82 presence of one widespread freshwater clade that appears to be a radiation of known

83 marine isolates.

84

85 Results and Discussions

86 Freshwater diplonemid 18S rRNA gene sequences

87 A single diplonemid sequence (OTU_4) was described recently from the lakes Biwa and

88 Ikeda using 18S rRNA amplicon sequencing [6]. This sequence provided the first

3 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

89 intriguing evidence for the existence of freshwater diplonemids, which have been until

90 now considered as marine eukaryotes [5]. However, more conclusive evidence for the

91 presence of these eukaryotes and their distribution in other freshwater habitats is lacking.

92 As the first diplonemid sequence was described from Lake Biwa, in an effort to more

93 clearly examine the depth- or seasonal-related abundance of freshwater diplonemids, we

94 carried out multiple sampling campaigns with different sampling frequencies at this site.

95 The first campaign in Lake Biwa lasted for three months in spring/summer 2016 and

96 samples were collected from three different depths (Table 1, Supplementary Figure S3).

97 The second campaign lasted throughout the year 2017 with a monthly sampling frequency

98 (Table 1, Supplementary Figure S4). A single depth profile was also collected from Lake

99 Ikeda to examine the vertical distribution of diplonemids (Table 1, Supplementary Figure

100 S5). To expand the known distribution of diplonemids outside Japan, we also collected

101 samples from five European lakes (Řimov, Medard, Zurich, Constance and Lugano). A

102 complete list of sampling sites and their locations, depths and analyses performed with the

103 samples from each site is given in Table 1.

104 Amplicon-sequencing was performed for the sampling campaigns from Lake Biwa

105 and Lake Ikeda (Figure 1A, B, C and Supplementary Figure S3, S4, S5). Relative

106 abundance of diplonemid sequences in these samples was always below 1%. A total of 18

107 sequences were recovered, somewhat more frequently from the hypolimnion. The highest

108 relative abundance was observed in the hypolimnion of Lake Ikeda (100 m), contributing

109 up to 0.96 % of total sequence reads (Figure 1C). Although rare, they were present in

110 nearly all the seasons in Lake Biwa (Figure 1) in both epilimnion (5 m) and metalimnion

111 (10-20 m) but only occasionally in the hypolimnion (50-60 m). The highest amplicon read

112 abundance in Lake Biwa was observed in the hypolimnion of May (0.5% of total

113 sequences, Figure 1A). Diplonemid sequences were more abundant in the epilimnion and

4 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

114 metalimnion during the early part of the spring and as the stratification developed their

115 numbers increased in the hypolimnion (Figure 1A). However, data from the entire year

116 showed the absence of diplonemid sequences in the hypolimnion during summer and

117 autumn (Figure 1B). This discrepancy could be due to the sample collection in different

118 years (spring sampling in 2016 and monthly sampling in 2017) and depths (50 vs. 60 m).

119 However, the low numbers of recovered sequences preclude definitive conclusions

120 regarding seasonal or depth related preferences for diplonemids. Similar results were

121 obtained upon examining the abundance of freshwater diplonemids using an existing

122 shotgun metagenomic timeline from the Řimov reservoir (Figure 1D). Here, diplonemid

123 sequences in the epilimnion were found only during spring or early summer with low

124 abundance (maximum 0.04% of total eukaryotic 18S rRNA reads) (Figure 1D). In contrast

125 to Lake Biwa, diplonemid sequences were found at all seasons in the hypolimnion of the

126 Řimov reservoir with relatively higher proportions (up to 0.6% of total metagenomic

127 reads) (Supplementary Table S2) although these samples were pre-filtered with 5 μm

128 filters, which may have reduced the number of diplonemids. Moreover, the differences in

129 trophic status and sample collection from different depths (Table 1) might be responsible

130 for the observed variation. It is also relevant to mention that dynamics of diplonemids

131 inferred from raw sequence reads is not fully quantitative, as we observed multiple copies

132 of 18S rRNA in single cell genomes from marine diplonemids (Supplementary Table S1).

133 To further examine the distribution of diplonemids at other locations, we sampled

134 several European lakes for shotgun metagenomics (Table 1). As the sequences obtained

135 by amplicon-sequencing were short (300 bp), we wondered if it was possible to identify

136 longer assembled 18S rRNA sequences from freshwater diplonemids in assembled

137 metagenomic contigs. Additionally, we also used previously available metagenomic

138 samples from the Řimov reservoir [17] and other published metagenomic datasets from

5 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

139 the deep Lake Baikal [18,19]. Multiple long sequences were retrieved from all samples

140 except for Lake Baikal and all of them were closely related to the amplicon-derived

141 sequences (See Figure 2). The longest assembled 18S rRNA sequence was 2029 bp

142 suggesting the less examined potential for assembled metagenomic sequence data for

143 recovery of long eukaryotic 18S rRNA sequences.

144 Long sequences (>300 bp) of diplonemids from the metagenomic datasets allowed

145 us to construct a robust maximum-likelihood phylogenetic tree including all closely

146 related sequences of diplonemids from public databases (Figure 2). We observed that the

147 sequences recovered from geographically distant freshwater lakes were highly similar and

148 appeared to be a sister clade of several marine isolates. High similarity of diplonemid

149 sequences from freshwater lakes of the world reiterates what has been suggested already

150 [32] that geographic distance does not restrict the distribution of protists in freshwater

151 lakes around the world. The freshwater sequences formed a separate freshwater cluster

152 that was separated from the marine cultured and environmental sequences with high

153 bootstrap values, suggesting that freshwater diplonemids diverged from the marine ones

154 and are phylogenetically different. The highly congruent freshwater clade provides

155 conclusive evidence for an evolutionary radiation of diplonemids in freshwater

156 ecosystems. Intriguingly, a single isolate (accession MF422200) that was

157 obtained from Tokyo Bay also appeared to be part of this freshwater clade. This might

158 suggest that even though this isolate was obtained in the bay of Tokyo (Supplementary

159 Table S3), it is possible that it is actually derived from the freshwater inflow into the bay.

160 Moreover, the freshwater cluster is closely related to the marine isolates (Figure 2) and

161 majority of those isolates were from bay, estuarine and near-shore regions (Supplementary

162 Table S3) further suggesting a habitat transition of diplonemids from marine to

163 freshwaters.

6 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

164

165 Occurrence, morphology, and potential feeding mode of diplonemids

166 We conducted CARD-FISH with the diplonemid-specific probe ‘Diplo1590R’ [31]

167 targeting all freshwater and marine sequences, to enumerate their abundance in freshwater

168 lakes. However, the number of CARD-FISH stained cells was too low to confidently

169 assess their abundance and dynamics in all the studied lakes. Their abundance was low

170 even during spring, when 18S rRNA read numbers were relatively high. In total, we

171 detected 22 CARD-FISH stained cells from all lakes, again pointing at their rarity in

172 freshwater habitats. However, images of these cells provide some insights about cell size,

173 morphology and potential trophic strategies. All detected diplonemids were ovoid cells in

174 the range of 5-10 μm in length (Figure 3, Supplementary Figure S6). Thus, the freshwater

175 diplonemids observed in the present study are smaller when compared to the cultured

176 marine diplonemids (which are around 20 μm in size), and also the ones observed from

177 marine waters by single cell sorting, which are much diverse in size (7-20 μm) and shape

178 [20].

179 There is little information on the feeding behaviour of diplonemids apart from the

180 consensus that they are heterotrophic. Some studies have also reported diplonemids as

181 mainly benthic dwellers feeding by phagotrophy on bacteria, osmotrophy or by detrivory

182 [33,34]. Protists of size ranges observed here are also known to be efficient grazers of

183 bacteria in aquatic ecosystems [35–39]. All CARD-FISH positive cells observed here also

184 appeared to contain several bacteria (Figure 3). However, it is impossible to conclude

185 from these images whether or not these microbes are inside the cells (suggesting

186 phagotrophic behavior or symbiosis) or outside (epibionts, commensals, etc.) (Figure 3).

187 Although there is no direct evidence of their feeding behavior from environmental

188 samples, diplonemid sequences were detected from the larger planktonic fractions (180-

7 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

189 2000 µm) in TARA Ocean data and suggested to possess a parasitic or symbiotic lifestyle

190 [40]. Moreover, a recent study reported the presence of endosymbiotic bacteria inside the

191 cells of some cultured diplonemids [41], which might mimic bacterivory. Direct evidence

192 using FLB (fluorescently labelled bacteria) uptake experiments [42,43] are necessary to

193 confirm their phagotrophy on bacteria.

194

195 Copy number of 18S rRNA genes in diplonemids

196 Diplonemids have divergent rRNA genes and due to the large number of mismatches, they

197 are generally not targeted in diversity studies using universal eukaryotic primers

198 (especially V4 primers) [7,8]). This explains why they remained undetected in a majority

199 of eukaryotic diversity studies in the world oceans until the TARA expedition. TARA

200 strategically used primers targeting the V9 region [3], which is a short region (~150 bp)

201 considered best to capture a large diversity [10]. Although sequence abundance of

202 diplonemids was high in the global oceanic waters, direct quantitative evidence of their

203 actual abundances is scarce. Only a single study from the Atlantic Ocean quantified the

204 abundance of diplonemids from various depths using CARD-FISH and found their

205 abundance to be <1% of total planktonic eukaryotes [31], similar to our observations from

206 freshwaters. In the microscopic analysis we could barely observe CARD-FISH positive

207 cells even in those samples, which had relatively higher sequence numbers assigned to

208 diplonemids. These observations suggest that diplonemids (especially marine ones)

209 possess high copy numbers of divergent 18S rRNA genes, that could contribute to their

210 amplified sequence abundance [44]. In order to test this, we analyzed ten previously

211 available single cell genomes from marine diplonemids [20]. These genomes are very

212 incomplete (most complete being 9%), so the number of actual 18S rRNAs detected in

213 these is likely to be severely underestimated. Even so, we detected multiple 18S rRNA

8 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

214 gene sequences in these incomplete genomes (up to 21 sequences >300bp in cell_13,

215 Supplementary Table S4). Moreover, the sequences from the same cell did not cluster

216 with each other in a phylogenetic tree in several cases (e.g. cell-13, cell-27 and cell-4sb),

217 suggesting a high intra-genomic diversity of this frequently used gene for assessing protist

218 diversity [44] (Supplementary Figure S2). As the raw sequence data for the single cell

219 genomes are not available, we used the coverage of the contigs to obtain an estimate of

220 18S rRNA copy number (Supplementary Table S4). A conservative estimate, using

221 median coverage of an 18S rRNA containing contig vs coverage of the remaining contigs

222 (>3kb) suggests multiple copies of 18S rRNA genes in one single cell (up to 141 copies,

223 Supplementary Table S1 and Supplementary Table S4). Only a single, related isolate

224 genome is available (Diplonema papillatum sp.) with an estimated genome completeness

225 of ca. 30% (Supplementary Figure S1). We could detect 6 copies of 18S rRNAs, which

226 appears to be somewhat low [22]. However, we could not estimate the putative number of

227 rRNAs, as coverage information for contigs was not available. Given such wide variations

228 in the number of 18S rRNA copies and sequence divergence in the presently available

229 incomplete genomic data, accurate quantifications of protist diversity or abundance by

230 sequencing data alone are expected to remain a pressing issue until improved genomes are

231 available.

232

233 Conclusions

234 We describe a clade of freshwater diplonemids from multiple lakes around the world.

235 Freshwater diplonemids are related to cultured marine diplonemids and distinct from the

236 as yet uncultured marine diplonemids. Direct observations using CARD-FISH also

237 suggest that freshwater diplonemids are consistently smaller in size in comparison to those

238 from the marine habitat. Although diplonemid 18S rRNA gene sequences were found to

9 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

239 be highly abundant in marine waters [3], a prior quantification of marine diplonemids

240 using CARD-FISH suggests that they are rare components of the marine habitat [31]. It

241 has also been recently suggested that estimates from amplicon sequencing are far from

242 quantitative [44,45]. Taken together with the highly divergent and multiple 18S sequences

243 derived from single diplonemid cells, it appears the actual abundances of marine

244 diplonemids using sequence data might have been overestimated. Direct counts by

245 CARD-FISH using an identical probe that targets both freshwater and marine clades

246 suggest that diplonemid abundances in both marine and freshwater ecosystems are very

247 low. However, amplicon-sequencing results from both habitats are drastically different,

248 while a large diversity of diplonemid sequences have been obtained from the marine

249 habitat [3], very limited diversity was observed in freshwaters. This suggests that

250 freshwater diplonemids are less diverse than their marine counterparts and their close

251 relationship indicates a recent habitat transition from marine to freshwaters. Nevertheless,

252 it is important to note that this contradictory result of diversity and abundance might not

253 be restricted to diplonemids alone and may also extend to other protistan groups as well

254 [7,43,45]. Further studies using single-cells, cultures, genome sequencing and

255 experimental approaches will be necessary for an improved understanding of the

256 ecological role and evolutionary history of these poorly understood groups of protists.

257

258 Materials and Methods

259 Sampling: Samples were collected from seven freshwater lakes, in Japan (Biwa- 35°12′

260 N 135°59′ E and Ikeda- 31°14′ N 130°33′ E), Czech Republic (Řimov- 48°51′ N 14°29′ E,

261 and Medard- 50°10′ N 12°35′ E), and Switzerland (Zurich- 47°18’N 8°34’E, Lugano-

262 45°59'N 8°58'E, and Constance- 47°32'N 9°31'E) (Table 1). All these lakes are >100m

263 deep with the exception of the Řimov reservoir and Medard, which are 43 and 50m deep,

10 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

264 respectively. Samples from Lake Biwa, Japan were collected from three depths

265 representing epilimnion, metalimnion and hypolimnion from station Ie-1, a long-term

266 limnological survey station of Kyoto University, Japan. These samples were collected

267 monthly for one year in 2017 (from depths 5, 20 and 60 m) and also from a high

268 frequency weekly sampling, which was conducted for three months during the spring of

269 2016 (from depths 5, 10 and 50 m) (Table 1). Samples from the Řimov Reservoir, Czech

270 Republic were collected seasonally for two years (2016 and 2017) from two depths (0.5

271 and 30 m), representing epilimnion and hypolimnion (Table 1) from a station located at

272 the dam (deepest part of the reservoir). For other lakes, samples were collected once or

273 twice during the thermal stratification period from depths representing the epilimnion and

274 hypolimnion from stations located at the deepest part of the lakes. Samples were collected

275 with a 5-liter Niskin sampler (General Oceanics, Miami, USA) or with a Freidinger

276 sampler. Temperature, concentration of chlorophyll a and dissolved oxygen was

277 determined with a multiparameter probe (911 plus, Sea Bird Electronics, Inc., USA; or

278 YSI 6600 multiparameter sonde V2, Yellow Springs Instruments, USA) (Figure 1).

279

280 DNA extraction, amplicon sequencing and data processing: Amplicon sequencing was

281 conducted only from samples collected from Biwa and Ikeda lakes. Water from the

282 different sampling depths was pre-filtered just after collection using a 20-μm mesh

283 plankton net to exclude larger organisms. One to two liters were filtered onto 0.8 μm

284 polycarbonate filters (47 mm diameter, Costar) at low vacuum (7.5 cm Hg) and

285 immediately frozen at -30°C until analysis. DNA was extracted with the Power Soil DNA

286 isolation and purification kit (MoBio laboratories, Carlsbad, CA, USA) and quantified

287 using a Nanodrop ND-1000 spectrophotometer (NanoDrop Technologies, Inc.,

288 Wilmington, DE, USA). Polymerase chain reactions (PCR) were conducted using

11 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

289 universal eukaryotic primers Euk1391f and EukBr to amplify the V9 region of 18S rRNA

290 genes of all eukaryotes [9,10]. PCRs were performed in 25 µl of reaction volume

291 according to the 18S Illumina amplicon protocol of the Earth Microbiome Project

292 (http://www.earthmicrobiome.org/protocols-and-standards/18s/). Amplicons were

293 examined using agarose gel electrophoresis and DNA was purified using magnetic beads.

294 Samples were quantified using Qubit. The samples were pooled to have uniform DNA

295 concentration in each and were sequenced using the Illumina MiSeq platform. The

296 processing and quality control of the sequencing data were conducted using DADA2 [11].

297 Chimera check was conducted with DADA2 and also with the reference-based chimera

298 searches against the PR2 database [12]. Amplicons with quality score more than Q30 were

299 retained and amplicons were trimmed to have 300 bp using the ‘Filter and Trim’ function

300 of DADA2. A total of 656,427 reads were obtained after processing. Classification of

301 unique sequences was conducted against the SILVA 132 reference database [13]. The

302 unclassified sequences were additionally examined for their closest relatives with BLAST

303 searches against the NCBI NT database [14].

304

305 Filtration and DNA extraction for shotgun metagenomic analysis: Shotgun

306 metagenomic sequencing was conducted for all lakes except for the Japanese ones (Biwa

307 and Ikeda). Water samples (ca. 10 L each) from lakes Řimov (Czech Republic, depths 0.5

308 m and 30 m), Medard (Czech Republic, depths 5 m and 20 m), Zurich (Switzerland,

309 depths 5 m and 80 m), Constance (Switzerland/Germany, depths 5 m and 200 m) and

310 Lugano (Switzerland/Italy, depths 5 m and 50 m) were progressively filtered through 20

311 μm, 5 μm, and 0.22 μm polycarbonate or polysulfone membrane filters (Sterlitech, USA).

312 DNA extraction was performed from the 0.22 μm filters (containing the 5–0.22 μm

12 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

313 microbial size fraction) using the ZR Soil Microbe DNA MiniPrep kit (Zymo Research,

314 Irvine, CA, USA), following the manufacturer’s instructions.

315

316 Preprocessing of metagenomic datasets: Shotgun metagenome sequencing was

317 performed with Novaseq 6000 (2 × 151 bp) (Novogene, HongKong, China). Low quality

318 bases/reads and adaptor sequences were removed from the raw Illumina reads using the

319 bbmap package [15]. Briefly, the reads were quality trimmed by bbduk.sh (using a Phred

320 quality score of 18). Subsequently, bbduk.sh was used for adapter trimming and

321 identification/removal of possible PhiX and p-Fosil2 contamination. Additional checks

322 (i.e., de novo adapter identification with bbmerge.sh) were performed in order to ensure

323 that the datasets meet the quality threshold necessary for assembly. Each metagenomic

324 dataset (i.e. each sample) was assembled independently with MEGAHIT (v1.1.5) [16]

325 using the k-mer sizes: 49, 69, 89, 109, 129, 149, and default settings. Previously available

326 metagenomes and assemblies from Řimov reservoir [17] and Lake Baikal were included

327 in the analysis [18,19].

328

329 Genomic data processing: Assembled sequence data for single cell derived genomes

330 from 10 published marine diplonemids were downloaded [20]. Genome completeness

331 estimates performed by us (and similar to those reported earlier) for these were found to

332 be very low with the best assembly missing 91% of conserved single copy orthologs (as

333 reported by BUSCO [21]). The assembled Diplonema papillatum genome [22] was

334 downloaded from NCBI (accession LMZG00000000). Only the assembled genome is

335 available and raw data was not available although it was mentioned to be 21.5Gb. The

336 genome assembly size is ca. 107Mb (shortest contig 1000bp, longest contig 100kb). The

337 authors estimated the genome size to be 175Mb, but no details were provided on how this

13 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

338 estimate was obtained [22]. Genome completeness estimates for all available genomes

339 (Diplonema papillatum genome and the 10 marine diplonemid SAGs) were obtained using

340 BUSCO [21] with the lineage (euglenozoa_odb10 version 2019-11-21) (busco

341 -m genome -i genome.fna -o output --lineage_dataset euglenozoa_odb10) (Supplementary

342 Figure S1). Eukaryotic small ribosomal subunit sequences were detected in these genomic

343 assemblies using ssu-align [23]. Only those sequences that were >300bp were retained for

344 further analysis. The results are summarized in Supplementary Table S1.

345

346 Retrieval of 18S rRNA sequences from assembled shotgun metagenomic datasets: All

347 diplonemid sequences were extracted from the SILVA database version 138 [13]. These

348 sequences along with the freshwater diplonemid sequence (OTU_4) described previously

349 [6], were used as queries for BLASTN against the metagenomic datasets. Hits with e-

350 value <1e-5, alignment length >200 bp and percentage identity >90% to the queries were

351 extracted. These candidate sequences were aligned to the global SILVA alignment for

352 rRNA genes using the SINA web aligner [24] and subsequently inserted into the guide

353 tree of SILVA SSU database 138 RefNR 99 in ARB [25] using maximum parsimony and

354 only those sequences that affiliated to diplonemids and of length >300 bp were retained.

355

356 Quantification of 18S rRNA sequences of freshwater diplonemids in a shotgun

357 metagenomic time-series from Řimov reservoir: All assembled freshwater diplonemid

358 sequences (from previous section) along with OTU_4 from Lake Biwa [6] were added to

359 the existing SILVA 138 database of 18S sequences. At least half the reads from each

360 metagenomic dataset (minimum 94 million reads, maximum 277 million reads) were

361 scanned by ssu-align to extract potential eukaryotic 18S reads (minimum ca. 65,000,

362 maximum 180,000 found) and these were used as queries against the SILVA138 database

14 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

363 supplemented with freshwater diplonemid sequences using mmseqs2 [26]. Metagenomic

364 query sequences matching the freshwater diplonemid sequences with >95% identity,

365 evalue <1e-5 and alignment length >50bp were considered to originate from freshwater

366 diplonemids (Supplementary Table S2).

367

368 18S rRNA phylogenetic tree: Assembled 18S rRNA sequences identified to be from

369 freshwater diplonemids were aligned to all known clades of using PASTA [27]

370 and a maximum likelihood phylogenetic tree (Figure 2) was made using IQ-TREE 2 using

371 automatic model selection (TIM3e+R10 model), ultrafast bootstraps and SH-aLRT tests (-

372 bb 1000 -alrt 1000) [28–30]. Additionally, all 18S sequences (>300 bp) from the single

373 cell diplonemid genomes and the Diplonema papillatum genome were aligned to the same

374 set of sequences and a phylogenetic tree (Supplementary Figure S2) was created as

375 described above.

376

377 Catalyzed Reporter Deposition-Fluorescent in situ Hybridization (CARD-FISH):

378 Samples for CARD-FISH were collected from the same depths as the samples used for

379 DNA extractions (Table 1). Water samples were fixed in a 2% final concentration of

380 formaldehyde (freshly prepared by filtering through 0.2 µm syringe filter) for at least 3-4

381 hours before filtration. 50 ml of epilimnion/metalimnion and 100 ml of hypolimnion

382 samples were filtered through polycarbonate filters (pore size 0.8 µm, Advantec), rinsed

383 twice with 1X PBS and twice with MilliQ water, air dried and frozen at -20°C until further

384 processing. CARD-FISH was performed according to the method described previously

385 [7]. The filters were hybridized at 35°C for 12 h with a 0.5 µg ml-1 probe concentration

386 and a 30% concentration of formamide. The probe targeting diplonemids ‘DiploR1792’

387 [31] was purchased from Biomer, Germany. Counting was performed using an Olympus

15 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

388 BX 53 epifluorescence microscope under 1000 × magnification at blue/UV excitation.

389 Total microbial eukaryotes were counted simultaneously by DAPI staining under UV

390 excitation. Images were taken and processed using a semiautomatic image analysis system

391 (NIS-Elements 3.0, Laboratory Imaging, Prague, Czech Republic).

392

393 Data Accessibility

394 The sequence data of Japanese lakes generated from amplicon sequencing were submitted

395 to European Nucleotide Archive (ENA) and are available under the BioProject:

396 PRJEB37597, BioSamples ERS4413153–226. Sequence data for all metagenomes

397 generated in this work are archived at EBI European Nucleotide Archive and can be

398 accessed under the BioProject PRJEB35770, BioSamples ERS4146214-50 (Řimov

399 reservoir), BioProject PRJEB35640, BioSamples ERS4065084-85 (Lake Medard),

400 ERS4065070-71 (Lake Lugano), BioProject PRJEB35770, BioSamples ERS4409418–21

401 (Lake Constance), BioSamples ERS4409422–23 (Lake Zurich). All 18S rRNA sequences,

402 alignments and trees are available at figshare

403 (https://doi.org/10.6084/m9.figshare.12091749.v1). This study did not generate any code.

404

405 Author’s Contributions

406 IM and RG conceived the study. IM performed sampling, amplicon analyses and CARD-

407 FISH. RG performed metagenomic and phylogenetic analyses. MMS, A-ŞA and VSK

408 performed sampling and phylogenetic analyses. A-ŞA, TS, VG, MH, PL, YH, SN, KS

409 performed sampling and DNA extractions. IM and RG drafted the manuscript with the

410 input of all other authors.

411

412

16 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

413 Competing interests

414 The authors declare that they have no competing interests.

415

416 Funding

417 This study was supported by the JSPS Bilateral Japanese-Czech Joint Research Project

418 No. JSPS-17-17, funding the collaboration of KS, SN, IM, VG and YH. IM was supported

419 by the research grants CZ.02.1.01/0.0/ 0.0/16_025/0007417 (ERDF/ESF) and 20-12496X

420 (Grant Agency of the Czech Republic). RG was supported by the research grants 17-

421 04828S, 20-12496X (Grant Agency of the Czech Republic) and CZ.02.1.01/0.0/

422 0.0/16_025/0007417 (ERDF/ESF). MMS was supported by the research grants 20-

423 12496X (Grant Agency of the Czech Republic) and 310030_185108 (Swiss National

424 Science Foundation). A-ŞA was supported by the research grants 17-04828S (Grant

425 Agency of the Czech Republic) and MSM200961801 (Academy of Sciences of the Czech

426 Republic). VSK was supported by the research grants 17-04828S, 20-12496X (Grant

427 Agency of the Czech Republic) and 116/2019/P (Grant Agency of the University of South

428 Bohemia in České Budějovice 2019-2021). MH was supported by the research grant

429 310030_185108 (Swiss National Science Foundation). PL was supported by the research

430 grant 19-23469S (Grant Agency of the Czech Republic) and 022/2019/P (Grant Agency of

431 the University of South Bohemia in České Budějovice 2019-2021).

432

433 Acknowledgements

434 We are thankful to the captains of the research vessel ‘Hasu’, Y. Goda and T. Akatsuka

435 for their help in sample collection in Lake Biwa. We are also thankful to Shohei Fujinaga

436 for his help in sample collection and data analysis. P. Znachor, P. Rychtecky and J.

437 Nedoma are acknowledged for help in sampling of Řimov Reservoir and Lake Medard. T.

17 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

438 Posch, E. Loher and V. Lanta are thanked for help in the sample collection of Lake

439 Zurich. S. Dirren and the crew of the research vessel ‘Kormoran’ are thanked for help in

440 sampling of Lake Constance, and F. Leporelli and V. Lanta for help in sampling of Lake

441 Lugano.

442

443 References

444 1. Lukeš J, Flegontova O, Horák A. 2015 Diplonemids. Curr. Biol. 25, R702–704.

445 (doi:https://doi.org/10.1016/j.cub.2015.04.052)

446 2. Lara E, Moreira D, Vereshchaka A, López-García P. 2009 Pan-oceanic distribution of

447 new highly diverse clades of deep-sea diplonemids. Environ. Microbiol. 11, 47–55.

448 (doi:10.1111/j.1462-2920.2008.01737.x)

449 3. de Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, et al. 2015 Eukaryotic

450 plankton diversity in the sunlit ocean. Science 348, 1261605. (doi:

451 10.1126/science.1261605)

452 4. Flegontova O, Flegontov P, Malviya S, Audic S, Wincker P, de Vargas C, Bowler C,

453 Lukeš J, Horák A. 2016 Extreme Diversity of Diplonemid Eukaryotes in the Ocean. Curr.

454 Biol. 26, 3060–3065. (doi:10.1016/j.cub.2016.09.031)

455 5. Tashyreva D, Prokopchuk G, Votýpka J, Yabuki A, Horák A, Lukeš J. 2018 Life cycle,

456 ultrastructure, and phylogeny of new diplonemids and their endosymbiotic bacteria. MBio

457 9, 1–20. (doi:10.1128/mBio.02447-17)

458 6. Mukherjee I, Hodoki Y, Okazaki Y, Fujinaga S, Ohbayashi K, Nakano SI. 2019

459 Widespread Dominance of Kinetoplastids and Unexpected Presence of Diplonemids in

460 Deep Freshwater Lakes. Front. Microbiol. 10, 1–13. (doi:10.3389/fmicb.2019.02375)

18 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

461 7. Mukherjee I, Hodoki Y, Nakano S. 2015 Kinetoplastid flagellates overlooked by

462 universal primers dominate in the oxygenated hypolimnion of Lake Biwa, Japan. FEMS

463 Microbiol. Ecol. 91, 1–11. (doi:10.1093/femsec/fiv083)

464 8. Bochdansky AB, Clouse MA, Herndl GJ. 2017 Eukaryotic microbes, principally fungi

465 and labyrinthulomycetes, dominate biomass on bathypelagic marine snow. ISME J. 11,

466 362–373. (doi:10.1038/ismej.2016.113)

467 9. Amaral-Zettler LA, McCliment EA, Ducklow HW, Huse SM. 2009 A method for

468 studying protistan diversity using massively parallel sequencing of V9 hypervariable

469 regions of small-subunit ribosomal RNA Genes. PLoS One 4, 1–9.

470 (doi:10.1371/journal.pone.0006372)

471 10. Stoeck T, Bass D, Nebel M, Christen R, Jones MDM, Breiner HW, Richards TA. 2010

472 Multiple marker parallel tag environmental DNA sequencing reveals a highly complex

473 eukaryotic community in marine anoxic water. Mol. Ecol. 19, 21–31. (doi:10.1111/j.1365-

474 294X.2009.04480.x)

475 11. Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP. 2016

476 DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods

477 13, 581–583. (doi:10.1038/nmeth.3869)

478 12. Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L, et al. 2013 The Protist

479 Ribosomal Reference database (PR2): A catalog of unicellular Small Sub-Unit

480 rRNA sequences with curated taxonomy. Nucleic Acids Res. 41, 597–604.

481 (doi:10.1093/nar/gks1160)

482 13. Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO.

483 2012 The SILVA ribosomal RNA gene database project: improved data processing and

484 web-based tools. Nucleic Acids Res. 41, D590–D596. (doi:10.1093/nar/gks1219)

19 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

485 14. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990 Basic local alignment

486 search tool. J. Mol. Biol. 215, 403–410. (doi:10.1016/S0022-2836(05)80360-2)

487 15. Bushnell B. 2016 BBMap Short-read aligner, and other bioinformatics tools.

488 Bioinformatics. (doi:https://sourceforge.net/projects/bbmap/)

489 16. Li D, Luo R, Liu CM, Leung CM, Ting HF, Sadakane K, Yamashita H, Lam TW.

490 2016 MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced

491 methodologies and community practices. Methods 102, 3–11.

492 (doi:10.1016/j.ymeth.2016.02.020)

493 17. Kavagutti VS, Andrei AŞ, Mehrshad M, Salcher MM, Ghai R. 2019 Phage-centric

494 ecological interactions in aquatic ecosystems revealed through ultra-deep metagenomics.

495 Microbiome 7, 0–15. (doi:10.1186/s40168-019-0752-0)

496 18. Cabello-Yeves PJ, Zemskaya TI, Rosselli R, Coutinho FH, Zakharenko AS, Blinov V

497 V, Rodriguez-Valera F, Drake HL. 2018 Genomes of Novel Microbial Lineages

498 Assembled from the Sub-Ice Waters of Lake Baikal. Appl. Environ. Microbiol. 84,

499 e02132–17. (doi: 10.1128/AEM.02132-17)

500 19. Cabello‐ Yeves PJ, Zemskaya TI, Zakharenko AS, Sakirko M V., Ivanov VG, Ghai R,

501 Rodriguez‐ Valera F. 2019 Microbiome of the deep Lake Baikal, a unique oxic

502 bathypelagic habitat. Limnol. Oceanogr., lno.11401. (doi:10.1002/lno.11401)

503 20. Gawryluk RMR, del Campo J, Okamoto N, Strassert JFH, Lukeš J, Richards TA,

504 Worden AZ, Santoro AE, Keeling PJ. 2016 Morphological Identification and Single-Cell

505 Genomics of Marine Diplonemids. Curr. Biol. 26, 3053–3059.

506 (doi:10.1016/j.cub.2016.09.013)

507 21. Seppey M, Manni M, Zdobnov EM. 2019 BUSCO: Assessing genome assembly and

508 annotation completeness. In Methods in Molecular Biology, pp. 227–245. Humana Press

509 Inc. (doi:10.1007/978-1-4939-9173-0_14)

20 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

510 22. Morales J, Hashimoto M, Williams TA, Hirawake-Mogi H, Makiuchi T, Tsubouchi A,

511 et al. 2016 Differential remodelling of peroxisome function underpins the environmental

512 and metabolic adaptability of diplonemids and kinetoplastids. Proc. R. Soc. B Biol. Sci.

513 283, 20160520. (doi:10.1098/rspb.2016.0520)

514 23. Nawrocki EP. 2009 Structural alignment and RNA homology search and alignment

515 using covariance models (Ph. D. thesis). Washington University in Saint Louis, School of

516 Medicine. (doi:https://openscholarship.wustl.edu/etd/256/)

517 24. Pruesse E, Peplies J, Glöckner FO. 2012 SINA: Accurate high-throughput multiple

518 sequence alignment of ribosomal RNA genes. Bioinformatics 28, 1823–1829.

519 (doi:10.1093/bioinformatics/bts252)

520 25. Ludwig W, Strunk O, Westram R, Richter L, Meier H, Buchner A, et al. 2004 ARB: a

521 software environment for sequence data. Nucleic Acids Res. 32, 1363–1371.

522 (doi:10.1093/nar/gkh293)

523 26. Steinegger M, Söding J. 2017 MMseqs2 enables sensitive protein sequence searching

524 for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028.

525 (doi:10.1038/nbt.3988)

526 27. Mirarab S, Nguyen N, Guo S, Wang LS, Kim J, Warnow T. 2015 PASTA: Ultra-large

527 multiple sequence alignment for nucleotide and amino-acid sequences. J. Comput. Biol.

528 22, 377–386. (doi:10.1089/cmb.2014.0156)

529 28. Quang B, Schmidt HA, Chernomor O, Schrempf D, Woodhams MD, Von Haeseler A,

530 Lanfear R. 2020 IQ-TREE 2: New Models and Efficient Methods for Phylogenetic

531 Inference in the Genomic Era. Mol. Biol. Evol. 37, 1530–1534.

532 (doi:10.1093/molbev/msaa015)

21 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

533 29. Kalyaanamoorthy S, Minh BQ, Wong TKF, Von Haeseler A, Jermiin LS. 2017

534 ModelFinder: Fast model selection for accurate phylogenetic estimates. Nat. Methods 14,

535 587–589. (doi:10.1038/nmeth.4285)

536 30. Hoang DT, Chernomor O, von Haeseler A, Minh BQ, Vinh LS. 2018 UFBoot2:

537 Improving the Ultrafast Bootstrap Approximation. Molecular biology and evolution. Mol.

538 Biol. Evol. 35, 518–522. (doi:10.5281/zenodo.854445)

539 31. Morgan-Smith D, Clouse MA, Herndl GJ, Bochdansky AB. 2013 Diversity and

540 distribution of microbial eukaryotes in the deep tropical and subtropical North Atlantic

541 Ocean. Deep. Res. Part I Oceanogr. Res. Pap. 78, 58–69. (doi:10.1016/j.dsr.2013.04.010)

542 32. Boenigk J, Pfandl K, Stadler P, Chatzinotas A. 2005 High diversity of the ‘Spumella-

543 like’ flagellates: An investigation based on the SSU rRNA gene sequences of isolates

544 from habitats located in six different geographic regions. Environ. Microbiol. 7, 685–697.

545 (doi:10.1111/j.1462-2920.2005.00743.x)

546 33. López-García P, Vereshchaka A, Moreira D. 2007 Eukaryotic diversity associated

547 with carbonates and fluid-seawater interface in Lost City hydrothermal field. Environ.

548 Microbiol. 9, 546–554. (doi:10.1111/j.1462-2920.2006.01158.x)

549 34. Roy J, Faktorová D, Benada O, Lukeš J, Burger G. 2007 Description of

550 euleeides n. sp. (Diplonemea), a free-living marine euglenozoan. J. Eukaryot. Microbiol.

551 54, 137–145. (doi:10.1111/j.1550-7408.2007.00244.x)

552 35. Massana R, Unrein F, Rodríguez-Martínez R, Forn I, Lefort T, Pinhassi J, Not F. 2009

553 Grazing rates and functional diversity of uncultured heterotrophic flagellates. ISME J. 3,

554 588–595. (doi:10.1038/ismej.2008.130)

555 36. Boenigk J, Arndt H. 2002 Bacterivory by heterotrophic flagellates: community

556 structure and feeding strategies. Antonie Van Leeuwenhoek 81, 465–480. (doi:

557 https://doi.org/10.1023/A:1020509305868)

22 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

558 37. Šimek K, Kasalický V, Jezbera J, Horňák K, Nedoma J, Hahn MW, Bass D, Jost S,

559 Boenigk J. 2013 Differential freshwater flagellate community response to bacterial food

560 quality with a focus on Limnohabitans bacteria. ISME J. 7, 1519–1530.

561 (doi:10.1038/ismej.2013.57)

562 38. SŠimek K, Nedoma J, Znachor P, Kasalický V, Jezbera J, Hornňák K, Sed’a J. 2014 A

563 finely tuned symphony of factors modulates the microbial food web of a freshwater

564 reservoir in spring. Limnol. Oceanogr. 59, 1477–1492. (doi:10.4319/lo.2014.59.5.1477)

565 39. Šimek K, Grujčić V, Hahn MW, Horňák K, Jezberová J, Kasalický V, Nedoma J,

566 Salcher MM, Shabarova T. 2018 Bacterial prey food characteristics modulate community

567 growth response of freshwater bacterivorous flagellates. Limnol. Oceanogr. 63, 484–502.

568 (doi:10.1002/lno.10759)

569 40. Lima-Mendez G, Faust K, Henry N, Decelle J, Colin S, Carcillo F, et al. 2015

570 Determinants of community structure in the global plankton interactome. Science 348,

571 1262073. (doi:10.1126/science.1262073)

572 41. George EE, Husnik F, Tashyreva D, Prokopchuk G, Horák A, Kwong WK, Lukeš J,

573 Keeling PJ. 2020 Highly Reduced Genomes of Protist Endosymbionts Show Evolutionary

574 Convergence. Curr. Biol. 30, 925-933.e3. (doi:10.1016/j.cub.2019.12.070)

575 42. Šimek K, Grujčić V, Nedoma J, Jezberová J, Šorf M, Matoušů A, et al. 2019

576 Microbial food webs in hypertrophic fishponds: Omnivorous ciliate taxa are major

577 protistan bacterivores. Limnol. Oceanogr. 64, 2295–2309. (doi:10.1002/lno.11260)

578 43. Grujcic V, Nuy JK, Salcher MM, Jensen M, Simek K. 2018 Cryptophyta as major

579 bacterivores in freshwater summer plankton. ISME J. , 1668–1681. (doi:10.1038/s41396-

580 018-0057-5)

581 44. Caron DA, Hu SK. 2018 Are We Overestimating Protistan Diversity in Nature? Trend

582 Microbiol. 27, 197–205. (doi:10.1016/j.tim.2018.10.009)

23 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

583 45. Piwosz K, Shabarova T, Pernthaler J, Posch T, Šimek K, Porcal P, Salcher MM. 2020

584 Bacterial and Eukaryotic Small-Subunit Amplicon Data Do Not Provide a Quantitative

585 Picture of Microbial Communities, but They Are Reliable in the Context of Ecological

586 Interpretations. mSphere 5. (doi:10.1128/msphere.00052-20)

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

602

603

604

605

606

607

24 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

608 Table 1. Details of sampling sites Lake, Trophic Surface Z max Sampling depths Sampling date Analyses country status area (km2) (m) (m) performed

Biwa, JP Mesotrophic 674.0 104 5, 10, 50 (2016) Mar-May 2016 (weekly) AS, CF 5, 20, 60 (2017) 2017 (monthly) AS, CF

Ikeda, JP Mesotrophic 11.0 200 5, 50, 100, 200 25.11.2017 AS, CF

Řimov, CZ Eutrophic 2.1 43 0.5, 30 Apr, Aug, Nov 2016, MG, CF 2017 (monthly) MG, CF

Medard, CZ Oligotrophic 4.9 50 5, 20 09.07.2019 MG

Constance, Mesotrophic 536 252 5, 200 03.07.2018 MG, CF CH/DE/AT

Zurich, CH Oligo- 88.7 136 5, 80 11.07.2018 MG, CF mesotrophic 13.11.2019 CF

Lugano, Eutrophic 48.7 288 5, 50 02.04.2019 MG CH/IT 05.11.2019 CF

609 Abbreviations: JP, Japan; CZ, Czech Republic; CH, Switzerland; DE, Germany; AT, Austria; IT, 610 Italy; AS, 18S amplicon sequencing, MG, shotgun metagenomics; CF, CARD-FISH

611

612

613

25 bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B C Proportion of diplonemid 18S rRNA reads (%) 0 0.2 0.4 0.6 0.8 1.0 1.2 0.5 30 6 0.5 30 6 0 5m 5m Chl. (μg/L) 25 5 25 5 0.4 Chl. (μg/L) 0.4 50 Temp. (°C) Temp. (°C) 20 4 20 4 0.3 0.3 15 3 15 3 100 0.2 0.2 Temp. (°C) Chl. (μg/L) Depth (m) 10 2 10 2 150 Lake Ikeda 0.1 5 1 0.1 5 1 Profile 2017 200 0 0 0 0 0 0 J F M A M J J A S O N D 0 5 10 15 20 25 30 March April May

Proportion of diplonemid 18S rRNA reads (%) Proportion of diplonemid 18S rRNA 0 1 2 3 4 D 0.5 30 6 0.5 30 6 0.6 25 50 10m 20m 0.5m 0.4 25 5 0.4 25 5 0.5 20 40 Chl. (μg/L) 20 4 20 4 0.4 0.3 0.3 15 30 Temp. (°C) 15 3 Temp. (°C) 15 3 0.3 Chl. (μg/L) 0.2 Temp. (°C) 0.2 10 20 10 2 10 2 0.2 Chl. (μg/L)

0.1 5 1 0.1 5 1 0.1 5 10

0 0 0 0 0 0 0 0 0 March May April Apr Aug Nov Apr May Jun Jul Aug Proportion of diplonemid 18S rRNA reads (%) Proportion of diplonemid 18S rRNA 2016 2017

0.5 30 6 0.5 30 6 0.6 25 50 50m 60m 30m 0.4 25 5 0.4 25 5 0.5 20 40 20 4 20 4 0.4 0.3 0.3 15 30 15 3 15 3 0.3 0.2 0.2 10 20 Temp. (°C) Temp. (°C) 10 2 10 2 0.2 Temp. (°C) 0.1 Chl. (μg/L) 5 1 0.1 5 1 0.1 5 10 Chl. (μg/L) Chl. (μg/L) 0 0 0 0 0 0 0 0 0 March May J F M A M J J A S O N D April Apr Aug Nov Apr May Jun Jul Aug Proportion of diplonemid 18S rRNA reads (%) Proportion of diplonemid 18S rRNA 2016 2017 Lake Biwa - Spring 2016 Lake Biwa - Timeline 2017 Rimov reservoir 2016-2017

Figure 1. Dynamics of freshwater diplonemids (18S rRNA gene sequence abundance) by A) amplicon sequencing in a weekly spring/summer sampling campaign from Lake Biwa, Japan (depths 5, 10 and 50m), B) amplicon sequencing in a one year sampling campaign from Lake Biwa (depths 5, 20 and 60m) C) shotgun metagenomic time series of the Rimov reservoir, Czech Republic (depths 0.5 and 30m) and D) in a single depth profile from Lake Ikeda (depths 0, 50, 100, 150 and 200m). bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

FN598299 Uncultured marine diplonemid clone BIO9 D4 (1345) EU635597 Uncultured marine diplonemid clone Ti11 D1 20 (1199) EU635596 Uncultured marine diplonemid clone Bi2 D1 21 (1198) FJ000259 Uncultured eukaryote clone SW1F01 (1616) FJ000076 Uncultured eukaryote clone BB2F06 (1610) FN598233 Uncultured marine diplonemid clone BIO1 H12 (1319) FN598252 Uncultured marine diplonemid clone BIO3 A2 (1343) FN598397 Uncultured marine diplonemid clone BS23 B2 (1325) KX189135 Uncultured marine diplonemid clone Tara 960 6 (1585) EU635593 Uncultured marine diplonemid clone Ma126 D1 21 (1198) FN598248 Uncultured marine diplonemid clone BIO2 F3 (1348) FN598420 Uncultured marine diplonemid clone BS25 H3 (1336) FN598413 Uncultured marine diplonemid clone BS25 B2 (1341) KX532973 Uncultured marine eukaryote clone o10 B 77 (1466) KX533083 Uncultured marine eukaryote clone G3 93 (1466) FN598250 Uncultured marine diplonemid clone BIO2 H4 (1348) KJ757576 Uncultured eukaryote clone SGYY747 (2061) KX189168 Uncultured marine diplonemid clone Tara 1645 11 (1606) KX532691 Uncultured marine eukaryote clone o10 800 118 (1335) FN598441 Uncultured marine diplonemid clone BS3 G10 (1348) FN598287 Uncultured marine diplonemid clone BIO7 E10 (1349) FJ000090 Uncultured eukaryote clone 571C11 (1614) Marine EU635618 Uncultured marine diplonemid clone DH117 D1 19 (1200) FN598436 Uncultured marine diplonemid clone BS3 D8 (1349) Diplonemids FJ000260 uncultured eukaryote (1351) FJ000261 Uncultured eukaryote clone 452D08 (1610) DQ504319 Uncultured diplonemid clone LC22 5EP 10 (1098) FJ000071 Uncultured eukaryote clone BB2A02 (1612) EU635651 Uncultured marine diplonemid clone Ma121 D1 21 (1198) KY947153 Diplonemida sp 27 RG-2016 (2068) EU635652 Uncultured marine diplonemid clone Ma131 D1 06 (1198) EU635654 Uncultured marine diplonemid clone DH136 D1 57 (1198) EU635630 Uncultured marine diplonemid clone R26 D1 33 (1199) FJ000255 Uncultured eukaryote clone SW1D09 (1565) EU635606 Uncultured marine diplonemid clone BS16 D1 14 (1195) FN598347 Uncultured marine diplonemid clone BS13 H8 (1331) KX533087 Uncultured marine eukaryote clone G3 97 (1239) EU635672 Uncultured marine diplonemid clone Ma126 D1 39 (1198) 100/100 KY947154 Diplonemida sp 37 RG-2016 (2063) EU635605 Uncultured marine diplonemid clone DH117 D1 41 (1199) AY665088 Uncultured eukaryote clone SCM38C39 (2005) GU820768 Uncultured euglenozoan clone BCA3F14RJ1A09 (1610) 72/94 KJ757308 Uncultured eukaryote clone SGYY1477 (2055) KX189132 Uncultured marine diplonemid clone Tara 960 3 (1907) KJ762701 Hemistasia uncultured eukaryote (2064) 97/100 AB948221 Hemistasia phaeocysticola (2020) MF422202 Hemistasiidae sp (1995) Hemistasia MF422203 Hemistasiidae sp (2064) LH-02apr19-C1499987 Lugano hypolimnion (542) ZE-Oct18-C228045 Zurich hypolimnion (1223) RH-26jul17-C234517 Rimov hypolimnion (1093) MH-09jul19-C1664111 Medard hypolimnion (307) RH-14apr17-C3362255 Rimov hypolimnion (331) RH-27jun17-C1571205 Rimov hypolimnion (491) MH-09jul19-C523660 Medard hypolimnion (762) RH-27jun17-C1179729 Rimov hypolimnion (590) CE-Jul18-C700394 Constance hypolimnion (770) RH-18aug17-C423291 Rimov hypolimnion (994) Freshwater RH-26jul17-C462493 Rimov hypolimnion (487) RH-23may17-C110669 Rimov hypolimnion (2029) Diplonemids RH-27jun17-C1910473 Rimov hypolimnion (417) MH-09jul19-C873310 Medard hypolimnion (518) 100/100 RH-20APR16-C630432 Rimov hypolimnion (642) RH-18aug17-C497411 Rimov hypolimnion (939) CE-Jul18-C713090 Constance epilimnion (440) RH-15aug16-C835002 Rimov hypolimnion (437) CE-Oct18-C2726209 Constance epilimnion (387) 100/100 CH-Jul18-C1818609 Constance hypolimnion (380) OTU 4 Lake Biwa amplicon (293) MF422200 Diplonemida sp (2007) DQ504350 uncultured diplonemid Rhynchopus (1917) AY490209 Rhynchopus sp SH-2004-I (1995) MF422197 Rhynchopus sp (1990) AY425013 Rhynchopus sp ATCC 50226 (2032) AF380997 Rhynchopus sp ATCC 50226 (2012) GAOU01013166 Rhynchopus (1352) AY490210 Rhynchopus sp SH-2004-II (2008) AY180037 uncultured (1743) Marine AY425012 Diplonema sp ATCC 50232 (2148) AF119811 Diplonema papillatum (1981) Isolates FN598401 uncultured marine diplonemid (1330) AY425009 Diplonema ambulator (2062) AY425011 Diplonema sp ATCC 50225 (2067) MF422192 Diplonema sp (2021) MF422190 Diplonema sp (2023) 88/96 AY425010 Diplonema sp ATCC 50224 (2046) Mf422201 Diplonemida gen 2 sp 1 AH-2017 (2010)

100/100 Kinetoplastids

0.4

Figure 2. Maximum-likelihood tree of 18S rRNA of diplonemids. Marine diplonemids, Hemistasia, freshwater diplonemids, sequences from isolates and kinteoplastids are shown. Other euglenids (e.g. EUglena, Phacus) were used as outgroup (not shown and indicated by an arrow). Ultra-fast bootstrap values and SH branch supports are indicated for selected nodes. Scale bar indicates the number of substitutions per site. bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Lake Biwa Řimov Reservoir Lake Zurich A A A

10 µm 10 µm 10 µm

B B B 10 µm

Figure 3. Freshwater diplonemids under fluorescent microscope. A - DAPI, B - FISH, bacteria inside the cells are shown by arrows. bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

cell_13 C:1 [S:1, D:0], F:0, M:129, n:130

cell_1sb C:0 [S:0, D:0], F:0, M:130, n:130

cell_21 C:1 [S:1, D:0], F:4, M:125, n:130

cell_21sb C:0 [S:0, D:0], F:0, M:130, n:130

cell_27 C:0 [S:0, D:0], F:0, M:130, n:130

cell_3 C:1 [S:1, D:0], F:0, M:129, n:130

cell_37 C:0 [S:0, D:0], F:0, M:130, n:130

cell_47 C:1 [S:0, D:1], F:0, M:129, n:130

cell_4sb C:0 [S:0, D:0], F:2, M:128, n:130

dpap C:30 [S:23, D:7], F:12, M:88, n:130

0 20 40 60 80 100

%BUSCOs

Complete (C) and single-copy (S) Complete (C) and duplicated (D)

Fragmented (F) Missing (M)

Supplementary Figure S1. Genome completeness estimation for all available diplonemid genomes using BUSCO (Seppey et al. 2019). Single cell genomes are prefixed with "cell", Diplonema papillatum isolate genome: dpap. bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint cell-27-c31026 (333) (which was not certifiedcell-27-c26100 by peer review)(594) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is cell-27-c602 (382) cell-27-c39174 (411)made available under aCC-BY-NC-ND 4.0 International license. cell-27-c41645 (501) KY947153 Diplonemida sp 27 RG-2016 (2068) FJ000071 Uncultured eukaryote clone BB2A02 (1612) EU635651 Uncultured marine diplonemid clone Ma121 D1 21 (1198) cell-27-c28331 (428) cell-27-c35771 (341) EU635652 Uncultured marine diplonemid clone Ma131 D1 06 (1198) EU635654 Uncultured marine diplonemid clone DH136 D1 57 (1198) cell-9sb-c18102 (386) EU635630 Uncultured marine diplonemid clone R26 D1 33 (1199) cell-37-c56087 (847) cell-4sb-c77108 (810) cell-13-c30360 (745) cell-27-c9571 (320) FJ000255 Uncultured eukaryote clone SW1D09 (1565) EU635606 Uncultured marine diplonemid clone BS16 D1 14 (1195) FN598413 Uncultured marine diplonemid clone BS25 B2 (1341) FN598347 Uncultured marine diplonemid clone BS13 H8 (1331) cell-13-c25882 (377) cell-13-c30273 (673) cell-13-c31185 (492) cell-13-c1523 (1355) cell-13-c16512 (781) cell-13-c7866 (525) cell-13-c11377 (399) cell-13-c9716 (1044) cell-13-c11965 (344) cell-13-c31257 (338) cell-13-c10157 (1421) cell-13-c25288 (302) cell-13-c15957 (717) cell-13-c20800 (379) cell-13-c18566 (379) cell-13-c24568 (732) cell-13-c28805 (972) cell-13-c15517) (300 FJ000259 Uncultured eukaryote clone SW1F01 (1616) FJ000076 Uncultured eukaryote clone BB2F06 (1610) FN598299 Uncultured marine diplonemid clone BIO9 D4 (1345) EU635597 Uncultured marine diplonemid clone Ti11 D1 20 (1199) EU635596 Uncultured marine diplonemid clone Bi2 D1 21 (1198) FN598233 Uncultured marine diplonemid clone BIO1 H12 (1319) FN598252 Uncultured marine diplonemid clone BIO3 A2 (1343) FN598397 Uncultured marine diplonemid clone BS23 B2 (1325) cell-4sb-c42147 (417) EU635593 Uncultured marine diplonemid clone Ma126 D1 21 (1198) KX189135 Uncultured marine diplonemid clone Tara 960 6 (1585) Marine cell-27-c10228 (360) FN598248 Uncultured marine diplonemid BIO2 F3 (1348 Diplonemids cell-47-c14652 (682) cell-47-c7122 (682) cell-47-c11972 (682) cell-47-c12257 (682) cell-47-c22866 (682) cell-47-c14510 (682) cell-4sb-c52750 (593) FN598420 Uncultured marine diplonemid clone BS25 H3 (1336) KX532973 Uncultured marine eukaryote clone o10 B 77 (1466) KX533083 Uncultured marine eukaryote clone G3 93 (1466) FN598250 Uncultured marine diplonemid clone BIO2 H4 (1348) KJ757576 Uncultured eukaryote clone SGYY747 (2061) KX189168 Uncultured marine diplonemid clone Tara 1645 11 (1606) KX532691 Uncultured marine eukaryote clone o10 800 118 (1335) cell-21-c159800 (1279) cell-21-c173570 (395) cell-21-c133830 (568) cell-3-c4076 (2066) FN598287 Uncultured marine diplonemid clone BIO7 E10 (1349) FN598441 Uncultured marine diplonemid clone BS3 G10 (1348) cell-21sb-c5277 (1548) FJ000090 Uncultured eukaryote clone 571C11 (1614) EU635618 Uncultured marine diplonemid clone DH117 D1 19 (1200) 0.8 FN598436 Uncultured marine diplonemid clone BS3 D8 (1349) cell-1sb-c28866 (936) cell-1sb-c37040 (682) KX533087 Uncultured marine eukaryote clone G3 97 (1239) FJ000260.1.1351 Uncultured eukaryote (1351) FJ000261 Uncultured eukaryote clone 452D08 (1610) cell-1sb-c19113 (1194) DQ504319 Uncultured diplonemid clone LC22 5EP 10 (1098) cell-4sb-c100636 (314) cell-4sb-c10234 (707) cell-47-c13440 (1206) cell-37-c68542 (494) cell-37-c110782 (365) cell-4sb-c20910 (302) cell-37-c16423 (445) cell-37-c12122 (902) cell-37-c68032 (597) cell-37-c50818 (786) cell-37-c88864 (561) KY947154 Diplonemida sp 37 RG-2016 (2063) EU635672 Uncultured marine diplonemid clone Ma126 D1 39 (1198) EU635605 Uncultured marine diplonemid clone DH117 D1 41 (1199) AY665088 Uncultured eukaryote clone SCM38C39 (2005) GU820768 Uncultured euglenozoan clone BCA3F14RJ1A09 (1610) KJ757308 Uncultured eukaryote clone SGYY1477 (2055) KX189132 Uncultured marine diplonemid clone Tara 960 3 (1907) 89.8/98 cell-13-c31734 (306) cell-13-c25200 (306) KJ762701 Hemistasia Uncultured eukaryote (2064) Ab948221 Hemistasia phaeocysticola (2020) Mf422202 Hemistasiidae sp (1995) Hemistasia Mf422203 Hemistasiidae sp (2064) LH-02apr19-C1499987 (542) ZE-Oct18-C228045 (1223) RH-26jul17-C234517 (1093) RH-27jun17-C1571205 (491) MH-09jul19-C1664111 (307) RH-23may17-C110669 (2029) CE-Jul18-C713090 (440) RH-27jun17-C1910473 (417) MH-09jul19-C873310 (518) RH-20APR16-C630432 (642) Freshwater RH-15aug16-C835002 (437) CE-Oct18-C2726209 (387) Diplonemids RH-18aug17-C497411 (939) MH-09jul19-C523660 (762) RH-14apr17-C3362255 (331) CH-Jul18-C1818609 (380) RH-26jul17-C462493 (487) 100/100 RH-18aug17-C423291 (994) CE-Jul18-C700394 (770) 100/99 RH-27jun17-C1179729 (590) OTU 4 (293) MF422200 Diplonemida sp (2007) DQ504350 Uncultured diplonemid (1917) AY490209 Rhynchopus sp SH-2004-I (1995) Mf422197 Rhynchopus sp (1990) AF380997 Rhynchopus sp ATCC 50226 (2012) AY425013 Rhynchopus sp ATCC 50226 (2032) GAOU01013166 Rhynchopus sp(1352) AY490210 Rhynchopus sp SH-2004-II (2008) LMZG01024242 Diplonema papillatum (414) LMZG01012232 Diplonema papillatum (358) LMZG01005445 Diplonema papillatum (423) AF119811 Diplonema papillatum (1981) LMZG01015819 Diplonema papillatum (301) Marine 100/100 LMZG01013342 Diplonema papillatum (302) Isolates 99.9/ LMZG01022859 Diplonema papillatum (323) 100 AY180037 Uncultured euglenid (1743) AY425012 Diplonema sp ATCC 50232 (2148) AY425009 Diplonema ambulator (2062) AY425011 Diplonema sp (2067) 88.5/99 MF422192 Diplonema sp (2021) MF422190 Diplonema sp (2023) AY425010 Diplonema sp (2046) FN598401 Uncultured marine diplonemid (1330) MF422201 Diplonemida gen 2 sp 1 AH-2017 (2010)

100/100 Kinetoplastids Supplementary Figure S2. Maximum-likelihood tree of 18S rRNA gene sequences of diplonemids highlighting the copy numbers in single cells. 18S rRNA sequences from the genome of single cells were obtained from public databases and are highlighted with same color. D. papillatum group is highlighted in red in marine isolates. Marine diplonemids, Hemistasia, freshwater diplonemids, sequences from isolates and kinteoplastids are also shown. Other euglenids (e.g. Euglena, Phacus) were used as outgroup (not shown and indicated by an arrow). Ultra-fast bootstrap values and SH branch supports are indicated for selected nodes. The scale bar indicates the number of substitutions per site. bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. A 100 Bacillariophyta Cercozoa Chlorophyta 75 Chrysophyta Ciliophora Cryptophyta Dinophyta 50 Fungi Kathablepharis 18S Abundance (%) 18S Oomycota 25 Others Telonema Unclassified Uroglena 0 Mar Apr May

100 B

Bacillariophyta Cercozoa Chlorophyta 75 Chrysophyta Ciliophora Cryptophyta 50 Dinophyta Fungi Kathablepharis 18S Abundance (%) 18S Others 25 Synurophyta Telonema Unclassified

0 Uroglena

Mar Apr May C 100

Bacillariophyta Bicosoecida Cercozoa 75 Chlorophyta Choanoflagellata Chrysophyta 50 Ciliophora Cryptophyta Dinophyta 18S Abundance (%) 18S Fungi 25 Mesomyceta Others Telonema

0 Unclassified

Mar Apr May

Supplementary Figure S3. Community composition of microbial eukaryotes in Lake Biwa during weekly spring/summer sampling campaign by 18S amplicon sequencing from A) Epilimnion (5m), B) Metalimnion (10m) and C) Hypolimnion (50m). bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is A made available under aCC-BY-NC-ND 4.0 International license. Bacillariophyta 100 Cercozoa Chlorophyta Choanoflagellida

75 Chrysophyta Ciliophora Cryptophyta 50 Dinophyta Fungi

18S Abundance (%) 18S Kathablepharis Others 25 Prymnesiophyta Stramenopile Telonema Unclassified 0 Uroglena J F M A M J J A S O N D

B Bacillariophyta 100 Bicosoecida Cercozoa Chlorophyta Choanoflagellida 75 Chrysophyta Ciliophora Cryptophyta Diatom 50 Dinophyta Fungi

18S Abundance (%) 18S Kathablepharis Others 25 Prymnesiophyta Stramenopile Telonema Unclassified 0 J F M A M J J A S O N D Uroglena

C Bacillariophyta 100 Bicosoecida Cercozoa Choanoflagellida Chrysophyta 75 Ciliophora Cryptophyta Diatom Dinophyta 50 Fungi Heliozoa

18S Abundance (%) 18S Kathablepharis Others 25 Pedinella Perkinsozoa Stramenopile Telonema 0 J F M A M J J A S O N D Unclassified Supplementary Figure S4. Community composition of microbial eukaryotes in Lake Biwa in 2017 by 18S amplicon sequencing from (A) epilimnion (5 m), (B) metalimnion (20 m) and (C) hypolimnion (60 m) bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

100

Cercozoa Chlorophyta 75 Chrysophyta Ciliophora Cryptophyta Diatom 50 Dinophyta Fungi Kathablepharis 18S Abundance (%) 18S Kinetoplastida 25 Others Telonema Unclassified Uroglena 0 5 50 100 150 200 Depth (m)

Supplementary Figure S5. Community composition of microbial eukaryotes in various depths of Lake Ikeda by 18S amplicon sequencing. bioRxiv preprint doi: https://doi.org/10.1101/2020.05.14.095992; this version posted May 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A1 B1 C1

10µm 10µm 10µm A2 B2 C2

10µm 10µm 10µm

D1 E1 F1

10µm 10µm 10µm D2 E2 F2

10µm 10µm 10µm

Supplementary Figure S6. Fluorescent microscopic images of freshwater diplonemids from lake Biwa (A-D), Římov (E) and lake Zurich (F). 1 and 2 refers to the same cells visualized by DAPI (under UV excitation) and CARD-FISH (under blue excitation) respectively.