<<

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1

5 Lateral gene transfer of anion-conducting channelrhodopsins between green and giant viruses

Andrey Rozenberg 1,5, Johannes Oppermann 2,5, Jonas Wietek2,3 , Rodrigo Gaston

Fernandez Lahore 2, Ruth-Anne Sandaa 4, Gunnar Bratbak 4, Peter Hegemann 2,6, and Oded

10 Béjà 1,6

1Faculty of Biology, Technion - Israel Institute of Technology, Haifa 32000, Israel. 2Institute

for Biology, Experimental Biophysics, Humboldt-Universität zu Berlin, Invalidenstraße 42,

Berlin 10115, Germany. 3Present address: Department of Neurobiology, Weizmann

15 Institute of Science, Rehovot 7610001, Israel. 4Department of Biological Sciences,

University of Bergen, N-5020 Bergen, Norway. 5These authors contributed equally: Andrey

Rozenberg, Johannes Oppermann. 6These authors jointly supervised this work: Peter

Hegemann, Oded Béjà. e-mail: [email protected] ; [email protected]

20 ABSTRACT

Channelrhodopsins (ChRs) are algal light-gated ion channels widely used as

optogenetic tools for manipulating neuronal activity 1,2. Four ChR families are

currently known. Green algal 3–5 and cryptophyte 6 cation-conducting ChRs (CCRs),

cryptophyte anion-conducting ChRs (ACRs) 7, and the MerMAID ChRs 8. Here we

25 report the discovery of a new family of phylogenetically distinct ChRs encoded by

marine giant viruses and acquired from their unicellular green algal prasinophyte

hosts. These previously unknown viral and green algal ChRs act as ACRs when

expressed in cultured neuroblastoma-derived cells and are likely involved in

behavioral responses to light.

30 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2

MAIN

Channelrhodopsins (ChRs) are microbial rhodopsins that directly translate

absorbed light into ion fluxes along electrochemical gradients across cellular membranes

35 controlling behavioural light responses in motile algae 9. They are widely used in

optogenetics to manipulate cellular activity using light 10, and therefore there is a constant

demand for new types of ChRs with different functions, be it different absorption spectra 11,

ion selectivity 12, or kinetics 11. So far, ChRs have only been reported from cultured

representatives of two groups of algae: cryptophytes and 2,13 . Recently,

40 metagenomics proved to be a useful tool to identify novel ChRs, as a new family of

anion-conducting ChRs (ACRs) with intensely desensitizing photocurrents were detected

in uncultured and yet to be identified 8.

Channelrhodopsins in metagenomic contigs of putative viral origin. To extend the

45 search for uncharacterized distinct ChRs with potentially new functions, we further

screened various metagenomic datasets from Tara Oceans 14–16. In total, four unique

sequences belonging to a previously undescribed family of ChRs were found in five

metagenomic contigs from the prokaryotic/girus fractions from tropical and temperate

waters of the Atlantic and Pacific Oceans. Two of the contigs were long enough (11 kb

50 and 20 kb) to provide sufficient genomic context (Fig. 1a). We attempted to search for

similar sequences in several metagenomic datasets, and located multiple contigs with

synteny to the two longer ChR-containing contigs (Fig. 1a, Suppl. File 1). The two contigs

recruited two clusters of related fragments (v21821 and v2164382 contig clusters) mostly

from marine samples of Tara Oceans. Interestingly, the v21821-cluster contigs came from

55 the same South Atlantic station, except for one contig with lower identity and synteny

length from a soda lake metagenome 17 (LFCJ01000229.1, see Fig. 1a). The v2164382 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3

cluster was more diverse in both the gene order and geography, but all of the recruited

contigs came from the marine realm. Although none of those recruited fragments

contained ChR genes, they could be utilized in downstream analyses to clarify the origin of

60 the ChR-containing contigs. Surprisingly, inspection of all of those metagenomic contigs

demonstrated a typical viral genome organization with intronless genes separated by short

spacers and several tRNA genes. The fragments harbored high proportions of genes with

affinities to two families of nucleo-cytoplasmic large DNA viruses (NCLDVs), Mimiviridae

and Phycodnaviridae , the two most abundant NCLDV groups in the ocean 18, included

65 multiple nucleo-cytoplasmic virus orthologous groups (NCVOGs) 19, and demonstrated

genome composition similar to these viruses and distinct from the potential host groups

( Fig. S1 ).

Phylogenetically, the viral channelrhodopsins appeared to be different from the four

currently known ChR families and indeed formed a well-supported family of their own (Fig.

70 1b). Although some members of the viral families Phycodnaviridae and Mimiviridae are

known to harbor other microbial rhodopsins and heliorhodopsins 20–22, no virus has

previously been described to code for channelrhodopsins. This encouraged us to

investigate the function and origins of these ChRs and to identify the corresponding

viruses and their putative hosts.

75 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 4

a b orientalis virus PoV-01B draft genome assembly vPyACR_2164382 vPyACR_21821 Green algal SAMEA2620979_1476951+ PymeACR1 CCRs Py2087ACR1 SAMEA2620979_21821 SAMEA2620979_1447958 Prasinophyte vPyACR_21821

SAMEA2620979_1011438 ACRs

10 kb LFCJ01000229.1 MerMAIDs

SAMEA2620980_763909 Putative ChRs SAMEA2623079_2164382 Distribution of matches from Labyrinthulea vPyACR_2164382 Phycodnaviridae SAMEA2621277_1099815 Other organisms Activity vPyACR_1099815 Mimiviridae Other NCLDVs No hits ACR SAMEA2619399_2980695+ Cryptophyte channelrhodopsins ACR (previously confirmed) vPyACR_2980695 NCVOGs ACRs c CCR (previously confirmed) Prasinophyte genes Sample of Pyramimonas genes 1.0 Pyramimonas ACRs 0.5 ACRs from other prasinophytes

Genes from metagenomes v21821 cluster (pyramiviruses)

CA2 0.0 v2164382 cluster

SAMEA2621277_1099815

SAMEA2619399_2980695+ -0.5

Viral genes Cryptophyte CCRs -1.0 -0.5 0.0 0.5 1.0 Pyramimonas orientalis virus PoV-01B (pyramivirus) CA1

Fig. 1. Phylogeny and diversity of viral and green algal ACRs. (a) Metagenomic contigs containing ChR

genes (in bold) and syntenic contigs used for phylogenetic analyses, alongside corresponding genomic

fragments from Pyramimonas orientalis virus PoV-01B . Annotated contigs and the draft genome sequence

80 are available in Suppl. Files 1 and 2. (b) Unrooted tree showing phylogenetic relationships between

confirmed and putative channelrhodopsins including the previously characterized families of green algal

CCRs, cryptophyte ACRs and CCRs, MerMAIDs and the prasinophyte/viral ACRs reported here. The four

members of the prasinophyte/viral ACR family characterized in the current study are highlighted. The gray

circles represent ultrafast bootstrap support values (70-100%), scale bar indicates the average number of

85 amino acid substitutions per site. See Fig. S2 and Suppl. File 3b for the full version of the tree and the

alignment. (c ), Comparison of tetranucleotide composition of ChR genes from prasinophytes and viral

metagenomic contigs (encircled in red) against the background of genes from Pyramimonas (green),

metagenomic contigs (light-blue) and PoV-01B (lilac).

90

Homologs of viral ChRs in prasinophyte algae. An initial screen for proteins similar to

the viral ChRs yielded homologous genes in several transcriptomes of green algae from

two transcriptomic datasets 23,24. An in-depth analysis of the available green algal genomes bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 5

and transcriptomes showed that the homologues of the viral ChRs were strictly confined to

95 three related clades of prasinophytes (paraphyletic assemblage of early diverging

unicellular chlorophytes): , and

Nephroselmidophyceae with the viral ChRs all clustering together with proteins from the

Pyramimonas (Fig. 1b). Among the previously characterized ChRs, the new family

was most similar to green algal CCRs, MerMAIDs and cryptophyte ACRs. The lack of Asp

100 and Glu residues in transmembrane domain 2 (TM2), that are well conserved in green

algal CCRs but generally substituted in ACRs ( Fig. S3 ), led us to hypothesize that the

putative channelrhodopsins from this new clade might conduct anions, hence the

provisional name v Py ACRs for “ viral ChRs similar to putative Pyramimonas ACR s”.

105 Prasinophyte and viral ChRs conduct anions. To examine the function of the new ChR

family, we expressed two viral ChRs (v Py ACR_21821 and v Py ACR_2164382) and one

ChR from Pyramimonas melkonianii CCMP722 ( PymeACR1) in mouse neuroblastoma x

rat neuron hybrid (ND7/23) cells. Two days after transfection, we recorded bidirectional

photocurrents under whole- voltage-clamp conditions and determined wavelength

110 sensitivity and ion selectivity. While the full-length PymeACR1 construct expressed well,

both full-length viral ChRs showed strong retention in the cytosol and did not yield any

photocurrents. We modified the proteins’ N- and C-termini to improve protein folding as

well as membrane trafficking and localization ( Fig. S4, Methods) 25. Even though the viral

constructs remained cytotoxic, these modifications enabled us to analyze v Py ACR_21821,

115 while for v Py ACR_2164382, on the other hand, only singular measurements in standard

buffer were possible (Fig. S5).

We determined the action spectra of v Py ACR_21821 and Pyme ACR1 by recording

transient photocurrents upon stimulation with light between 390 and 690 nm (Fig. 2a). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 6

v Py ACR_21821 is most sensitive (λ max) to 482 nm light and PymeACR1 to 505 nm light

120 (Fig. 2a, inset and Fig. S6 a). Upon longer illumination, photocurrents of both constructs

are non-inactivating during excitation (Fig. 2b,c) but the photocurrent amplitudes of

v Py ACR_21821 are roughly five times smaller compared to Pyme ACR1 (Fig. 2b,c and Fig.

S5, S6b). Next, we tested the ion selectivity by recording photocurrents at different

membrane voltages and ionic conditions and determined the reversal potential (E rev),

125 which is the membrane voltage where inward and outward ion flow cancel each other at a

certain ion gradient. Changing the concentration of the conducted ions causes reversal

횫 - - potential shifts ( E rev). Upon reduction of the external Cl concentration ([Cl ]ex ) from 150

mM to 80 mM and 10 mM the reversal potential shifts almost equally for PymeACR1 and

v Py ACR_21821 (Fig. 2d,e and Fig. S6d,e) to more positive values according to the

130 theoretical Nernst potential ( Fig. S6 c), whereas it shifts slightly more negative upon

- - - replacement of Cl by Br or NO 3 , indicating non specific anion conductivity (Fig. 2e and

Fig. S6d,e). Replacing external Na + with N-methyl-D-glucamine (NMDG +), while keeping

[Cl- ] constant, does not affect the reversal potential and excludes Na + as a transported

charge carrier (Fig. 2e and Fig. S6 d,e). We therefore conclude that PymeACR1 and

135 v Py ACR_21821 are anion-conducting ChRs (ACRs) that naturally conduct Cl - and

- - + potentially Br and NO 3 , but do not conduct Na (Fig. 2e). While for Pyme ACR1 the current

- 8,26 amplitudes at -60 mV increased at lower [Cl ]ex , similar to previously described ACRs , in

vPy ACR_21821 the amplitudes decreased under the same conditions (Fig. 2b,f and Fig.

S6f,g).

140 Additionally, we analyzed the photocurrents of a second prasinophyte

channelrhodopsin from Pyramimonas sp. CCMP2087 ( Py2087 ACR1) ( Fig. S7). In contrast

to PymeACR1 and similarly to both v Py ACRs, this sequence interestingly has the widely

conserved Arg in helix 3 replaced by Gln (position R120 in CrChR2, see Fig. S3 ). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 7

Py2087 ACR1 is most sensitive to 509 nm light and the reversal potential of the

- 145 non-inactivating photocurrents shifts strongly upon reduction of [Cl ]ex from 150 mM to 10

mM, indicating that Py2087 ACR1 is an ACR as well ( Fig. S7 ).

[Cl-] 150 mM [Cl ] 80 mM [Cl ] 10 mM a n= b ex ex ex 1.0 510 nm 40 mV (7) ACR_21821 Py v 1 (10) CR A me Py -80 mV 0.5 nA 0.5 480 500 PymeACR1 λ (nm) 0.1 s max c 470 nm 40 mV Normalized photocurrent

0.0 0.1 nA vPyACR_21821 400 500 600 -80 mV 0.1 s Wavelength λ (nm) d e f n= 0.5 - PymeACR1 PymeACR1 [Cl ]ex (7) PymeACR1 - vPyACR_21821 vPyACR_21821 150 mM NO3 vPyACR_21821 (12) PymeACR1 10 mM vPyACR_21821 (8) 0.0 Br- (12) 1.0

(17) NMDG+ -0.5 (14)

(15) 0.5 Normalized photocurrent Normalized photocurrent 10 mM Cl- (17)

-1.0 (8) 80 mM Cl- (10) 0.0 -80 -40 0 40 -40 0 40 80 120 n= (10) (17) (14) (11) (9) (8) (14) (15) (7) (6) - - - - + - - + - - Holding potential E (mV) ∆E (mV) 3 3 hold rev Br Br NO NO NMDG NMDG 80 mM 10Cl mM Cl 80 mM 10Cl mM Cl

Fig. 2. Electrophysiology of viral and green algal ACRs. (a), Action spectra of vPy ACR_21821 and

150 PymeACR1 normalized to the maximum stationary current. Solid lines represent fitted data. Inset shows

determined maximum sensitivity (λ max) for both ChRs. Photocurrent traces of (b ) PymeACR1 and ( c )

- vPy ACR_21821 at indicated extracellular chloride concentrations ([Cl ]ex ) recorded from -80 to +40 mV in 20

mV steps. Gray bars indicate light application of denoted wavelengths. (d), Current-voltage relationship of

- PymeACR1 and v Py ACR_21821 at [Cl ]ex 150 mM (solid line) and 10 mM (dashed line). Photocurrents were

- 155 normalized to the stationary current at -80 mV with [Cl ]ex 150 mM. (e), Reversal potential shifts (ΔE rev ) upon

exchange of the external buffer. (f), Photocurrent amplitudes at -80 mV upon exchange of the extracellular

buffer normalized to the photocurrent amplitude in 150 mM Cl - buffer. Data is shown as single data points bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 8

(squares or circles), while statistics denote mean ± standard error. The number of conducted experiments (n)

is reported in grey. Source data are provided in Suppl. File 5 (a, d-f).

160

Prasinophyte ACRs are likely part of the visual system. The fact that green algae

appeared to contain previously unknown channelrhodopsins comes as no surprise, since

the same group is already known to possess a different family of ChRs, the green algal

cation channelrhodopsins (CCRs) 3,13. Nevertheless, the distribution of the ACRs is much

165 more narrow as they appear only in some prasinophytes, which is in striking contrast to the

CCRs which are distributed widely in chlorophytes and are even present in some

streptophytes (Fig. 3 and Fig. S8 ). At the same time, we notice that in prasinophytes the

appearance of CCRs nearly coincides with that of the ACRs. Interestingly, all of the

prasinophyte with genes coding for at least one ChR family have eyespots, the

170 main photosensitive organelle that provides the algae directional sensitivity 5,27,28, while for

those prasinophyte species that lack the eyespot neither ChR family could be detected.

The sister-group relationship between the Pyramimonadophyceae and Mamiellophyceae

29,30 suggests that the last common ancestor of this clade possessed both families of the

ChRs, namely CCRs and ACRs, and that they were lost at least three times in this group

175 together with the loss of the eyespot (in the 31 and 32

and the cocci Ostreococcus - Bathycoccus 33) (see Fig. 3). These associations indicate that

both families of the ChRs are likely involved in sensing light, similar to what is known for

CCRs in chlorophyceans 3,5,13,28,34 .

An independent support for the hypothesis of a shared cellular function of the green

180 algal ACRs and CCRs as sensors of light, came from the observation of structural

similarities between the two families. Similar to their viral homologs, C-terminally to the

rhodopsin domain, full-length prasinophyte ACRs possess a domain with high similarity to

response regulators (RR) from two-component regulatory systems ( Fig. S9 ). When bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 9

searching for remote homology, a similar RR-domain was surprisingly discovered also in

185 green algal CCRs including those coming from prasinophytes, but not in other known

groups of ChRs: cryptophyte ACRs and CCRs and the MerMAIDs (see Fig. S9a). In green

algal CCRs, this domain corresponds to one of the three previously noticed conserved

regions, con2, in the C-terminal extensions of chlorophycean ChRs 34. An RR-like domain

could also be identified in putative channelrhodopsins from Labyrinthulea, a marine

190 heterotrophic group (see Fig. S9). This indicates that this domain is not

restricted to ChRs from green algae and was likely present in the last common ancestor of

at least the two families of green algal ChRs and their homologs from Labyrinthulea.

Interestingly, this domain organization is reminiscent of His-kinase rhodopsins (HKRs), a

group of enzymerhodopsins from green algae exemplified by COP5–COP12 from

195 reinhardtii 5,35,36. Yet, in these proteins the rhodopsin and the RR domains

are invariably associated with a corresponding transducer (His-kinase) domain as part of a

complete two-component system typical of other eukaryotic sensor systems 37,38 (see Fig.

S9). In addition, in contrast to the classical response regulators, including the RR domain

of the enzymerhodopsins, the Asp-57 residue that serves as the phosphorylation site as

200 well as the highly conserved residues Ser/Thr-87 and Lys-109 that are responsible for the

conformation change mediated by phosphorylation 39,40 are mutated in nearly all of the

cases in the ChR sequences from green algae, viruses and Labyrinthulea (see Fig. S9b).

Asp-less receiver domains (ALDs) are known and are widespread across the tree of life 41.

Asn-57 at the phosphorylation site, as seen in the prasinophyte and viral ACRs and the

205 putative labyrinthulean ChRs, is the most frequent substitution in ALDs and has the

potential to undergo deamidation to aspartate 41,42. At the same time, substitution of the

highly conserved Ser/Thr-87 and Lys-109 residues puts the RR-like domains of the ChRs,

including those of the prasinophyte ACRs, in a minority position even among ALDs. This

likely indicates that the RR-like domains in ChRs do not undergo phosphorylation and bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 10

210 conformational changes and thus function as constitutive signals 41 or lack a signal

transduction function altogether.

The C-terminal extension of C. reinhardtii ChR1 is known to participate in regulation

and trafficking 35,43. Moreover, there is evidence that in Chlamydomonas , ChRs and HKRs

have similar spatial distribution on the cellular membrane 36,37 and that at least two proteins

215 from these families, ChR1 and COP8 utilize intraflagellar transport (IFT) for their delivery

to the eyespot and flagella 36. We thus tentatively hypothesize that the RR and RR-like

domains of HKRs and ChRs, respectively, might function as a shared trafficking signal, in

addition to their signaling function or even instead of it, in the case of the ChRs. Note in

this context the lack of such domains in both, the cryptophyte ChRs (see Fig. S9a) and the

220 cryptophyte sensory rhodopsins 44. It must be noted that since the close association of the

eyespot with respect to microtubular roots is characteristic only to the UTC clade

(, and Uvlophyceae) and is rarely observed in other

chlorophytes (e.g. in Nephroselmis ) 45,46, the trafficking signal is not expected to be directly

associated with IFT per se .

225

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 11

Channelrhodopsins ACRs w/ confirmed activity Vegetative morphology CCRs w/ confirmed activity multicellular putative ACRs non-motile cells or their colonies putative CCRs

Mamiellophyceae flagellates without eyespots Picocystophyceae Chloropicophyceae flagellates with eyespots Data source Nephroselmidophyceae Pyramimonadophyceae Transcriptomes Genomes Morphology of swarmers Gene set completeness swarmers assumed to be absent swarmers without eyespots Complete genes swarmers with eyespots Fragmented Missing

2 6 2 3 2 7 61 1 sp. NBRC 102842 sp. M3265 sp. CCMP2087 sp. CCMP1384 sp. C1023 TOSAG39-1 sp. sp. ASP10-01a sp. sp. RCC3368 sp. CCMP1436 sp. ML Oedogoniales Chlorophyceae Trebouxiophyceae Chlorodendrophyceae Chlorokybophyceae Coleochaetophyceae Embryophyta Klebsormidiophyceae Leptosira Chloropicon primus Nephroselmis olivacea Nephroselmis pyriformis capsulatus Prasinoderma coloniale Prasinoderma singularis Prasinoderma marina provasolii Cymbomonas tetramitiformis Pterosperma Pyramimonas amylifera Pyramimonas melkonianii Pyramimonas parkeae Pyramimonas Pyramimonas tychotreta Monomastix opisthostigma Dolichomastix tenuilepis Crustomastix stigmatica Ostreococcus lucimarinus Ostreococcus mediterraneus Ostreococcus tauri Bathycoccus prasinos Bathycoccus Bathycoccus Mantoniella antarctica Mantoniella beaufortii Mantoniella Mantoniella squamata Micromonas bravo Micromonas commoda Micromonas polaris Micromonas pusilla Micromonas Picocystis salinarum Picocystis Chloroparvula japonica Chloropicon laureae Chloropicon mariensis Chloropicon roscoffensis Chloropicon

2 7 1945 11 1 7 20 70 324211 3 * 36

Prasinophytes core Chlorophytes Streptophytes

Fig. 3 . Distribution of ChRs among green algae. Presence/absence of the two families of green algal

ChRs in the available transcriptomes and genomes of prasinophytes, as well as summarized counts for the

230 core chlorophyte and streptophyte taxa. For each species several transcriptomes and/or genomes were

merged into a single gene set when available. Morphological features with respect to motility and presence

of the eyespot are indicated for each prasinophyte species and the dominant types are indicated for the two

other groups. The presence of ChRs with confirmed and expected ion selectivities is indicated at the level of

species for prasinophytes and at the level of higher taxa for core chlorophytes and streptophytes, with the

235 darker color signifying the presence of at least one ChR with confirmed activity in the corresponding taxon.

Numbers indicate the number of all available species with genetic data and the corresponding number of

species with ChRs. Note that the are represented here by three basal land species (*).

See a full version of the figure in Fig. S8.

240 vPy ACRs indeed come from giant viruses. With a plausible hypothesis about the

function of the prasinophyte ACRs in hand, we turned to the analysis of the origin of their

putative viral homologues that motivated our study in the first place. None of the available

NCLDV genomes demonstrated close similarity to the metagenomic fragments in terms of

synteny and sequence identity, including those viruses that are known to infect green bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12

245 algae 47 and specifically prasinophytes 48. First we noticed that the v Py ACRs originated

from a particular clade of prasinophyte ACRs from Pyramimonas as suggested by

phylogenetic analysis (see Fig. S2) and that members of the same genus are the only

green algal group with ACRs to be reported to demonstrate viral infections in natural

populations and in culture 49,50 (see also Fig. S10 ). We thus focused on the sole isolate of a

250 Pyramimonas -infecting virus: Pyramimonas orientalis virus PoV-01B ( Mimiviridae ) isolated

in Norway two decades ago 50. Although since then the virus has been lost in culture and

no complete genome was released, enough genetic data had been generated to assemble

a draft genome. It appeared that the PoV-01B genome assembly indeed contained large

fragments syntenic with contigs from the v21821 cluster (see Fig. 1a), which provided a

255 direct connection between the metagenomic contigs and a described viral isolate.

Interestingly, despite the synteny, no other sequence from this cluster besides

SAMEA2620979_21821 itself, contained the channelrhodopsin gene at the corresponding

locus, and the same applied to the PoV-01B genome which did not contain any rhodopsin

genes in the sequenced parts of the genome either. Analogously, the metagenomic contig

260 with the highest similarity to SAMEA2620979_21821 which came from the same sampling

location showed exactly the same gene arrangement with the exception of the lack of the

channelrhodopsin gene (see Fig. 1a). That the ChR-containing viruses from this cluster

are indeed relatively scarce was further supported by the fact that even at that particular

station they were outnumbered by their ChR-lacking counterparts (Fig. S11 ). This diversity

265 in gene arrangement and the fact that v Py ACR homologs were found in green algae

warranted us to test whether the ChR-containing contigs could actually come not from

independent viruses, but from fossilized viral fragments in prasinophyte genomes. We

compared tetranucleotide composition for the viral ChRs and their green algal homologs

against the background of other genes from the metagenomic contigs and PoV-01B on

270 one hand and from the algae on the other. The two resulting clouds of genes were bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13

generally well separated and the viral ChR genes fell separately from their algal homologs

and well within the viral cloud, thus rejecting the hypothesis that the ChR genes are not

part of the original viral genomes (Fig. 1c).

Notwithstanding the monophyly of the viral ChRs, the viruses from which they came

275 appeared to belong to at least two different lineages. First, we noticed that the PoV-01B

genome showed synteny with only one of the two clusters of metagenomic contigs, and

furthermore that the two clusters showed virtually no overlap in gene composition and had

different proportions of mimivirid and phycodnavirid genes (see Fig. 1c). That the two

clusters might represent two viral lineages and not merely two disjoint genomic locations,

280 was hinted at by phylogenetic analysis of the sole shared gene, the D5-like

helicase/primase, which placed contigs from the v21821 cluster together with one of the

two helicase/primase genes from PoV-01B confidently among mesomimiviruses

( Mimiviridae ) and those from the v2164382 cluster in a separate but related clade (Fig.

S12 ). Analogously, based on gene composition analyses ( Fig. S13 ), the longest contigs

285 from the v21821 cluster, along with PoV-01B could be clearly attributed to

mesomimiviruses. In these analyses, the sole long contig from the second cluster showed

more affinity to the Phycodnaviridae , and in particular to Raphidovirus , a virus with a

unique position among the members of the family 51. Finally, phylogenetic analysis of

highly conserved genes placed PoV-01B well among the v21821-cluster contigs and the

290 whole clade, again within the mesomimiviruses ( Fig. S14 ). The only non-marine

representative of this cluster, the contig LFCJ01000229.1 from a soda lake, appeared as

the basalmost and early branching member of this clade. The long v2164382-cluster

contig was resolved as a deep branching member of the Phycodnaviridae , with

Raphidovirus as the closest cultured virus, although the exact branching order remained

295 unclear. The putative viral genomes coding for the two other ChRs remain unidentified.

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 14

The vPy ACR-containing viruses infect Pyramimonas algae. Several lines of evidence

suggest that the ChR-harboring viruses and at least some of their relatives infect

prasinophytes from the genus Pyramimonas . First of all, as noted above the shared origin

300 of all four viral ACRs could be traced back to this particular prasinophyte group.

Noteworthy, they branch within the clade composed of ChRs from species of the

monophyletic subgenus Vestigifera and thus these algae can be conclusively identified as

the donors of the viral ChRs (Fig. 1b, Fig. S2 and Fig. S15 ). Note that the host of PoV-01B,

a member of the v21821-cluster, is a Pyramimonas species from the same clade (see Fig.

305 S15 ). Moreover, among the genes associated with the two putative viral lineages, four,

including a gene for plastidic ATP/ADP-transporter, were found to have no homologs in

other known viral genomes, but instead were detected in plants (see Fig. 1a). The three

genes that were distributed widely among green algae demonstrated the highest similarity

specifically to corresponding homologs from Pyramimonas (see Fig. S8 ). In this respect,

310 the viruses from which the sole non-marine fragment (LFCJ01000229.1, most distant to

the ChR-containing contig SAMEA2620979_21821) analyzed here comes from, must have

a different host as no Pyramimonas species are known from soda lakes.

CONCLUSIONS

315 Here we provide characterization of a new family of anion-conducting ChRs with

intriguing physiological and ecological implications. Although we identified the first

members of the family as putative viral proteins, these ChRs were found to be widespread

among prasinophyte green algae. In motile members of this group we find both, ACRs and

close relatives of CCRs from other green algae, and furthermore similarities in the

320 C-termini in proteins from the two families and evolutionary association with the eyespot

imply that both, ACRs and CCRs, are utilized by motile prasinophytes for light sensing. It

remains to be discovered how ACRs and CCRs co-operate in regulating motility in these bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 15

algae and what allows other green algae to rely solely on CCRs apparently without

concomitant simplification in swimming behavior (compare e.g. the behavioral spectra of

325 Pyramimonas 52 and Chlamydomonas 28).

The viral homologs of the prasinophyte ACRs that initially prompted our study,

represent a relatively recent acquisition from host genomes by Pyramimonas -infecting

viruses as it does not predate the diversification of the genus into morphologically distinct

lineages, yet nucleotide sequences of the corresponding genes lost trace of their algal

330 origin. Despite coming from at least two viral lineages, the mimivirids pyramiviruses and a

putative phycodnavirid clade, the four distinct ChRs from viruses form a monophylum and

thus originate from a single alga-virus lateral gene transfer. The question suggests itself:

Why would viruses carry channelrhodopsin genes, what selective advantage do they

provide? Given the likely role of their algal homologs in sensing light and the preservation

335 of the intracellular C-terminus in viral ACRs with respect to host ChRs, we propose the

hypothesis that the role of the viral ACRs is manipulation of host’s swimming behavior. A

similar hypothesis has been proposed for the Group-I and Group-II viral rhodopsins, a

different family of microbial rhodopsins found in genomes of several mesomimiviruses 20,21,

that are hypothesized to function as pumps or channels, respectively 21,53. The benefits of

340 modifying phototactic or photophobic responses of the host cell by the virus might range

from avoidance of oxidative stress to optimization of photosynthesis for the needs of the

virus.

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 16

Acknowledgments We thank José Flores-Uribe for help with bioinformatics, Eunsoo

345 Kim for providing us an unpublished transcriptome assembly of Cymbomonas

tetramitiformis , Stuart D. Sym and Charles J. O'Kelly for their comments on Pyramimonas

and Richard Pienaar and Stuart D. Sym for providing us with micrographs of

viral infection in Pyramimonas pseudoparkeae . This work was supported by Israel Science

Foundation grant 143/2018 (O.B.), the Milgrom Foundation (O.B.), Research Council of

350 Norway project VirVar 294363 (R.-A.S.), and German Research Foundation grant SFB

1078 B2 (P.H.). P.H. is a Hertie Senior Professor for Neuroscience supported by the Hertie

Foundation. O.B. holds the Louis and Lyra Richmond Chair in Life Sciences.

Author contributions O.B. and J.W. conceived the project. A.R. and O.B. performed

355 bioinformatic analyses. P.H. and J.O. designed molecular characterization. J.O. and

R.G.F.L acquired and analyzed electrophysiology and imaging data, respectively. R.-A.S.

and G.B. performed genome sequencing of the PoV-01B virus. A.R., J.O., and O.B. wrote

the paper, with contributions from all authors.

360 References

1. Govorunova, E. G., Sineshchekov, O. A., Li, H. & Spudich, J. L. Microbial rhodopsins: diversity,

mechanisms, and optogenetic applications. Annu. Rev. Biochem. 86, 845–872 (2017).

2. Govorunova, E. G. et al. The expanding family of natural anion channelrhodopsins reveals

large variations in kinetics, conductance, and spectral sensitivity. Sci. Rep. 7 , 43358 (2017).

365 3. Nagel, G. et al. Channelrhodopsin-1: a light-gated proton channel in green algae. Science 296 ,

2395–2398 (2002).

4. Nagel, G. et al. Channelrhodopsin-2, a directly light-gated cation-selective membrane channel.

Proc. Natl. Acad. Sci. 100 , 13940–13945 (2003). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 17

5. Kateriya, S., Nagel, G., Bamberg, E. & Hegemann, P. “Vision” in single-celled algae.

370 Physiology 19 , 133–137 (2004).

6. Sineshchekov, O. A., Govorunova, E. G., Li, H. & Spudich, J. L. Bacteriorhodopsin-like

channelrhodopsins: Alternative mechanism for control of cation conductance. Proc. Natl. Acad.

Sci. 114 , E9512–E9519 (2017).

7. Govorunova, E. G., Sineshchekov, O. A., Janz, R., Liu, X. & Spudich, J. L. Natural light-gated

375 anion channels: A family of microbial rhodopsins for advanced optogenetics. Science 349 ,

647–650 (2015).

8. Oppermann, J. et al. MerMAIDs: a family of metagenomically discovered marine

anion-conducting and intensely desensitizing channelrhodopsins. Nat. Commun. 10, 1–13

(2019).

380 9. Hegemann, P. Algal sensory photoreceptors. Annu. Rev. Plant Biol. 59, 167–189 (2008).

10. Boyden, E. S., Zhang, F., Bamberg, E., Nagel, G. & Deisseroth, K. Millisecond-timescale,

genetically targeted optical control of neural activity. Nat. Neurosci. 8, 1263–1268 (2005).

11. Klapoetke, N. C. et al. Independent optical excitation of distinct neural populations. Nat.

Methods 11 , 338–346 (2014).

385 12. Rappleye, M. & Berndt, A. Structural basis for ion selectivity and engineering in

channelrhodopsins. Curr. Opin. Struct. Biol. 57, 176–184 (2019).

13. Deisseroth, K. & Hegemann, P. The form and function of channelrhodopsin. Science 357 ,

(2017).

14. Sunagawa, S. et al. Structure and function of the global ocean microbiome. Science 348 ,

390 (2015).

15. Brum, J. R. et al. Patterns and ecological drivers of ocean viral communities. Science 348 ,

(2015).

16. Philosof, A. et al. Novel abundant oceanic viruses of uncultured marine group II

Euryarchaeota. Curr. Biol. 27 , 1362–1368 (2017).

395 17. Vavourakis, C. D. et al. Metagenomic Insights into the Uncultured Diversity and Physiology of

Microbes in Four Hypersaline Soda Lake Brines. Front. Microbiol. 7 , (2016). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 18

18. Hingamp, P. et al. Exploring nucleo-cytoplasmic large DNA viruses in Tara Oceans microbial

metagenomes. ISME J. 7, 1678–1695 (2013).

19. Yutin, N., Wolf, Y. I., Raoult, D. & Koonin, E. V. Eukaryotic large nucleo-cytoplasmic DNA

400 viruses: clusters of orthologous genes and reconstruction of viral genome evolution. Virol. J. 6 ,

223 (2009).

20. Yutin, N. & Koonin, E. V. Proteorhodopsin genes in giant viruses. Biol. Direct 7 , 34 (2012).

21. Needham, D. M. et al. A distinct lineage of giant viruses brings a rhodopsin photosystem to

unicellular marine predators. Proc. Natl. Acad. Sci. 116 , 20574–20583 (2019).

405 22. Pushkarev, A. et al. A distinct abundant group of microbial rhodopsins discovered using

functional metagenomics. Nature 558 , 595–599 (2018).

23. Keeling, P. J. et al. The Marine Microbial Transcriptome Sequencing Project

(MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through

transcriptome sequencing. PLoS Biol. 12 , e1001889 (2014).

410 24. One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and

the phylogenomics of green plants. Nature 574 , 679–685 (2019).

25. Grimm, C., Silapetere, A., Vogt, A., Sierra, Y. A. B. & Hegemann, P. Electrical properties,

substrate specificity and optogenetic potential of the engineered light-driven sodium pump

eKR2. Sci. Rep. 8 , 1–12 (2018).

415 26. Govorunova, E. G., Sineshchekov, O. A. & Spudich, J. L. Proteomonas sulcata ACR1: a fast

anion channelrhodopsin. Photochem. Photobiol. 92 , 257–263 (2016).

27. Hegemann, P. Vision in . Planta 203 , 265–274 (1997).

28. Hegemann, P. & Berthold, P. Sensory photoreceptors and light control of flagellar activity. in

The Chlamydomonas Sourcebook (Second Edition) (eds. Harris, E. H., Stern, D. B. & Witman,

420 G. B.) 395–429 (Academic Press, 2009). doi:10.1016/B978-0-12-370873-1.00050-2.

29. Leliaert, F. et al. Phylogeny and molecular evolution of the green algae. Crit. Rev. Plant Sci. 31 ,

1–46 (2012).

30. Lopes Dos Santos, A. et al. Chloropicophyceae, a new class of picophytoplanktonic

prasinophytes. Sci. Rep. 7 , 14019 (2017). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 19

425 31. Inouye, I., Hori, T. & Chihara, M. Absolute configuration analysis of the flagellar apparatus of

Pterosperma cristatum () and consideration of its phylogenetic position. J.

Phycol. 26 , 329–344 (1990).

32. Henshaw, R., Jeanneret, R. & Polin, M. Phototaxis of the dominant marine pico-eukaryote

Micromonas sp.: from population to single cell. bioRxiv 740571 (2019) doi:10.1101/740571.

430 33. Chrétiennot-Dinet, M.-J. et al. A new marine picoeucaryote: Ostreococcus tauri gen. et sp. nov.

(, Prasinophyceae). Phycologia 34 , 285–292 (1995).

34. Kianianmomeni, A., Stehfest, K., Nematollahi, G., Hegemann, P. & Hallmann, A.

Channelrhodopsins of carteri are photochromic proteins that are specifically expressed

in somatic cells under control of light, temperature, and the sex inducer. Plant Physiol. 151 ,

435 347–366 (2009).

35. Greiner, A. et al. Targeting of photoreceptor genes in Chlamydomonas reinhardtii via

Zinc-finger nucleases and CRISPR/Cas9. Plant Cell 29, 2498–2518 (2017).

36. Awasthi, M., Ranjan, P., Sharma, K., Veetil, S. K. & Kateriya, S. The trafficking of bacterial type

rhodopsins into the Chlamydomonas eyespot and flagella is IFT mediated. Sci. Rep. 6, 34646

440 (2016).

37. Luck, M. et al. A photochromic histidine kinase rhodopsin (HKR1) that is bimodally switched by

ultraviolet and blue light. J. Biol. Chem. 287 , 40083–40090 (2012).

38. Thomason, P. & Kay, R. Eukaryotic signal transduction via histidine-aspartate phosphorelay. J.

Cell Sci. 113 ( Pt 18) , 3141–3150 (2000).

445 39. Bourret, R. B. Receiver domain structure and function in response regulator proteins. Curr.

Opin. Microbiol. 13 , 142–149 (2010).

40. Lewis, R. J., Brannigan, J. A., Muchová, K., Barák, I. & Wilkinson, A. J. Phosphorylated

aspartate in the structure of a response regulator protein. J. Mol. Biol. 294 , 9–15 (1999).

41. Maule, A. F. et al. The aspartate-less receiver (ALR) domains: distribution, structure and

450 function. PLOS Pathog. 11 , e1004795 (2015).

42. Wolanin, P. M., Webre, D. J. & Stock, J. B. Mechanism of phosphatase activity in the

chemotaxis response regulator CheY. Biochemistry 42 , 14075–14082 (2003). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 20

43. Böhm, M. et al. Channelrhodopsin-1 phosphorylation changes with phototactic behavior and

responds to physiological stimuli in Chlamydomonas . Plant Cell 31, 886–910 (2019).

455 44. Sineshchekov, O. A. et al. Rhodopsin-mediated photoreception in cryptophyte flagellates.

Biophys. J. 89 , 4310–4319 (2005).

45. Kreimer, G. Light perception and signal modulation during photoorientation of green

algae. in Photomovement (eds. Häder, D.-P. & Breure, A. M.) vol. 1 193–227 (Elsevier, 2001).

46. Moestrup, Ø. & Hori, T. Ultrastructure of the flagellar apparatus in Pyramimonas octopus

460 (Prasinophyceae). II. Flagellar roots, connecting fibres, and numbering of individual flagella in

green algae. Protoplasma 148 , 41–56 (1989).

47. Etten, J. L. V., Agarkova, I. V. & Dunigan, D. D. Chloroviruses. Viruses 12 , 20 (2019).

48. Weynberg, K. D., Allen, M. J. & Wilson, W. H. Marine prasinoviruses and their tiny

hosts: a review. Viruses 9 , 43 (2017).

465 49. Moestrup, Ø. & Thomsen, H. A. An ultrastructural study of the flagellate Pyramimonas

orientalis with particular emphasis on Golgi apparatus activity and the flagellar apparatus.

Protoplasma 81 , 247–269 (1974).

50. Sandaa, R. A., Heldal, M., Castberg, T., Thyrhaug, R. & Bratbak, G. Isolation and

characterization of two viruses with large genome size infecting ericina

470 () and Pyramimonas orientalis (Prasinophyceae). Virology 290 , 272–280

(2001).

51. Maruyama, F. & Ueki, S. Evolution and phylogeny of large DNA viruses, Mimiviridae and

Phycodnaviridae including newly characterized Heterosigma akashiwo virus. Front. Microbiol.

7, 1942 (2016).

475 52. Sym, S. D., Kawachi, M. & Inouye, I. Diversity of swimming behavior in Pyramimonas

(Prasinophyceae). Phycol. Res. 48 , 149–154 (2000).

53. Bratanov, D. et al. Unique structure and function of viral rhodopsins. Nat. Commun. 10, 4939

(2019). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 1

Lateral gene transfer of anion-conducting channelrhodopsins between green algae and giant viruses

Andrey Rozenberg, Johannes Oppermann, Jonas Wietek, Rodrigo Gaston

5 Fernandez Lahore, Ruth-Anne Sandaa 4, Gunnar Bratbak, Peter Hegemann, and

Oded Béjà

Supplementary information This file contains: - Material & Methods 10 - Additional references - Supplementary Figures - Supplementary Tables - List of supplementary data files

Materials & Methods

15 Metagenomic contigs. The viral ChRs were found in five assembled contigs from Tara Oceans: SAMEA2620979_21821, SAMEA2623079_2164382, SAMEA2621277_1099815, SAMEA2619548_2902552 and SAMEA2619399_2980695 (the SAMEA* prefixes refer to corresponding NCBI biosamples). Although the search was performed on several available assemblies of 20 Tara Oceans, all five contigs come from the assembly generated previously by 1. The two shortest contigs contained two overlapping fragments of a single ORF and could be merged together thanks to an identical overlap of 225 bp. ORFs were annotated using GeneMarkS v. 4.32 2 in the eukaryotic viral mode and prokka v. 1.14.5 3 in the viral mode (giving preference to the GeneMarkS gene boundaries in cases of

25 conflict) with manual corrections and tRNA genes were annotated using tRNAscan-SE v. 2.0.3 4. To increase the set of genes suitable for identification of the corresponding ChR-containing contigs additional longer contigs were recruited by searching for contigs containing homologs to at least four genes from the bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 2

ChR-containing contigs using blastp v. 2.2.31+ 5. One of the longest recruited

30 contigs without ChRs, SAMEA2620979_1476951 (34,130 bp) could be significantly extended further by stitching it with a different contig retrieved from the same marine station, SAMEA2620979_1432764 (39,065), thanks to an exceptionally long overlap of 11,902-11,903 bp with an identity level of 88.5%.

35 Draft genome assembly of Pyramimonas orientalis virus PoV-01B. Isolation and culture of Pyramimonas orientalis virus PoV-01B were described previously 6. Shotgun libraries of randomly sheared, end-repaired DNA from PoV-01B (1-2 Kb) were prepared by Lucigen ( https://www.lucigen.com/ ) using the pSMART-HCKan cloning vector (Lucigen,WI, USA). Clones were sequenced by Sanger sequencing

40 using the MegaBACE 1000 and 4000 instruments (Symbio Corporation, CA, USA) for a total yield of 9666 reads. Base-calling was performed with phred v. 0.020425.c 7. The data was assembled with phrap v. 0.990329 8 resulting in 141 contigs (total length 689,147 bp, N50 = 12,634 bp, L50 = 15). For further analysis the contigs were trimmed at low-quality regions and those longer than 2000 bp were taken for

45 downstream analysis (except for Contig39 which contained the gene for helicase D10). Additionally, a Nextera library was prepared from an infection experiment and a pilot MiSeq run was performed with a total yield of 9225 reads mappable to the genome draft. The contigs were checked for potential overlaps undetected by the assembler due to sequencing errors with blastn and manually joined with the

50 assistance of the Illumina data when needed. The resulting assembly amounted 36 contigs (total length 523,791 bp, N50 = 23,083 bp, L50 = 7, raw read alignment rate 91.12%, coverage 12.64 reads/bp). Annotation was performed using the same pipeline as for the metagenomic contigs. The genetic data was initially intended to assist discovery of phylogenetic markers 6,9 and was not planned to be released as a

55 genome assembly because of incompleteness and sequencing errors. Nevertheless, we managed to retrieve 11 out of 12 highly conserved genes (see Fig. S14), and thus the assembly might be considered close to complete. With a notice of remaining sequencing errors, the assembly is released here as a supplement file (Suppl. File 2), the sequences of phylogenetic markers in Suppl. File 6a. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 3

60 The search for potential rhodopsin genes was performed with blastp and tblastn searches against assembled contigs and raw reads.

Green algal ChRs . Two transcriptomic datasets were recruited in the search for viral ChR homologs: MMETSP 10 re-assemblies (https://doi.org/10.5281/zenodo.746048) 65 and 1KP 11 assemblies. After green algae appeared as the only group containing homologs of the viral ChRs, additional genome and transcriptome assemblies from green algae from NCBI and JGI were added to the dataset. Previously unannotated genomes were annotated with GeneMark-ES v. 4.38 2 in the self-training mode with default settings. For genome assemblies with low N50 values, the gene annotation

70 was performed by running training with the minimum contig length lowered to 5000 (NCBI assembly GCA_004000685.1) or by using models trained on closely related genomes (NCBI assemblies GCA_001630525.1 [GCA_002588565.1 as reference], GCA_002317545.1 [GCA_004335915.1], GCA_003612995.1 [GCA_002897115.1], GCA_003613005.1 [GCA_002897115.1], GCA_004335885.1 [GCA_002284615.1],

75 GCA_004335895.1 [GCA_001662425.1], GCA_004764505.1 [GCA_004335915.1], GCA_008037345.1 [GCA_002814315.1]). The transcriptome of Pyramimonas tychotreta was obtained by clustering contigs from kmers assemblies provided in NCBI SRA for runs SRR4293310-SRR4293315, SRR4293322 and SRR4293323 (Bioproject PRJNA342459) using CD-HIT v. 4.6 12 at the identity level of 99%. Coding

80 sequences for all of the transcriptomes were predicted with TransDecoder v. 5.5.0 (https://github.com/TransDecoder/TransDecoder). Channelrhodopsins were searched for by running a custom pipeline combining pfam profile matching assisted by hmmer v. 3.2 (http://hmmer.org/) with NCBI blast 5 searches and confirming their identity by alignment and phylogenetic reconstruction (see below).

85 Species assignments of some algal strains were corrected or updated as indicated in Suppl. Table 2 with the biggest changes affecting the recently revised genera Picochloron and Micromonas 13,14. Green algal transcriptome and genome assemblies were combined at the level of species, their redundancy was reduced by clustering protein sequences with CD-HIT v. 4.6 at the identity level of 99%. The

90 completeness of the resulting per-species gene sets was tested with BUSCO v. 4.0.2 15 using the viridiplantae_odb10 reference dataset. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 4

Prasinophyte ACRs were defined as those of taxonomically and structurally close ChR sequences including the four confirmed ACRs. Green algal CCRs were defined as those ChRs from green algae which fall within the smallest clade

95 encompassing all known green algal ChRs with confirmed CCR activity (see Fig. S2). Although no experimental evidence exists for the cation-conducting activity of the prasinophyte proteins falling within this clade, primarily because of their cytotoxicity (pers. observations; two of these proteins were unsuccessfully tested in 16), they nevertheless possess the well-conserved Asp positions of the green algal

100 CCRs (see Fig. S3). This definition effectively excluded several green algal putative ChRs of uncertain activity (see Fig. S2) which nevertheless had little influence on the picture of the overall distribution of CCRs as most of those proteins came from species also containing proteins from the defined CCR clade. Phylogenetic relationships between green algae were adopted from 17 and 105 further refined based on 13. Two cases of uncertain phylogenetic position were verified by extracting and blasting rbcL and 18S sequences: Scourfieldia sp. M0560/2 (1KP assembly EGNB, related to Tetraselmis and Scherffelia ) and Trebouxiophyceae sp. KSI-1 (NCBI assembly GCA_003568905.1, belongs to the Watanabea clade). Morphological descriptions and habitats were taken from

110 AlgaeBase (https://www.algaebase.org) and primary literature. Vegetative stages were assigned to one of the following categories: 1) multicellular thalli (filamentous, parenchymatous or pseudoparenchymatous); 2) cocci or colonies/clusters of non-motile cells; 3) flagellates a) with or b) without eyespots. Algae with life-cycles involving alternating vegetative flagellated and non-motile resting phases were

115 coded as flagellates. The algae from the first two categories were further supplied with the information about the presence of non-vegetative flagellated stages (zoospores and/or gametes): 1) those that are assumed to have no flagellated stages; 2) those that have at least one flagellated stage, a) with or b) without eyespots in at least one such stage. Only direct morphological evidence was taken

120 into consideration for species or genera (whenever the corresponding characteristic was included in the generic diagnosis), except for Zygnematophyceae which are known to lack motile stages as a clade 18. The complete list of analyzed transcriptomes and genomes is provided in Suppl. File 4. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 5

Phylogenetic relationships between the Pyramimonas species from which

125 transcriptomes were available were analyzed by extracting nucleotide sequences of ribulose-1,5-bisphosphate calboxylase/oxygenase large subunit (rbcL) gene using blast from the corresponding datasets and recruiting previously published sequences 19,20. Sequences from Cymbomonas were included for outgroup rooting. Alignments were obtained with mafft and analyzed with iqtree (automatic model selection)

130 without trimming.

Analysis of ChR domain organization. Initial analysis of ChR domain organization was performed by running the InterProScan v. 5.36-75.0 pipeline 21 on individual sequences. This strategy allowed the identification of a conserved region in the

135 C-terminal extensions of prasinophyte ACRs as a response regulator domain (RR), but failed to identify the con2 region (see 22) in green algal CCRs that could be aligned with it. To have an independent confirmation of this homologization, different sets of ChRs were created based on well-defined phylogenetic clades and aligned using mafft (G-INS-i). The alignments were converted into protein profiles with 140 HHmake from HH-suite v. 3.2.0 23 (requiring 50% coverage to record a match), supplied with secondary structure predictions (addss.pl) and analyzed with HHsearch against the pdb70 v. 200101 database. The final alignment was created from complete sequences of prasinophyte and viral ACRs, green algal CCRs, green algal HKRs and CheY as the reference with mafft (G-INS-i). Neither InterProScan,

145 nor HHsearch or alignment could identify any domain with homology to RRs in the C-termini of cryptophyte ACRs and CCRs and MerMAIDs.

Tetranucleotide composition analysis. Two separate analyses of tetranucleotide composition were performed: (1) whole-genome composition was calculated for both

150 strands for metagenomic contigs, viruses from the Phycodnaviridae and Mimiviridae , as well as representatives of their photosynthetic host groups (, green algae and ); and (2) individual gene’s composition was calculated for CDS sequences of the ACR genes from the metagenomic contigs and prasinophytes. For the second analysis, random samples of non-ChR genes with

155 CDSs at least 200 bp were taken as background: 100 genes from metagenomic bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 6

contigs and PoV-01B, each, and 40 genes from each of the following Pyramimonas transcriptomes: MMETSP0059, MMETSP1081, MMETSP1169, MMETSP1445 and PRJNA342459-36897 ( P. tychotreta). The tetranucleotide compositions were analyzed with correspondence analysis (cca function from the vegan v. 2.5-5 24

160 package).

Phylogenetic analysis . The alignment of the channelrhodopsin sequences for phylogenetic analysis was performed as follows. All of the collected putative channelrhodopsin protein sequences from the transcriptomic and genomic datasets

165 as well as reference sequences were clustered with CD-HIT at the identity level of 98% and aligned with mafft ( G-INS-i ). The rhodopsin domain was extracted from the alignment and the positions occupied by gaps in less than 50% of the sequences were trimmed. The resulting alignment was clustered at 100% identity and used to perform phylogenetic analysis with iqtree v. 1.6.10 25 (automatic model selection,

170 1000 ultrafast bootstrap replicates). The phylogenetic relationships within the families Phycodnaviridae and Mimiviridae and the metagenomic contigs were resolved as follows. Homologous genes were collected with GET_HOMOLOGUES v. 11042019 26 using all three available algorithms (BDBH, COG and OMCL with inflation values 1, 1.5, 2, 3, 4 and

175 5) with an e-value threshold of 1e-3 for full genomes of cultured viruses (excluding the known phycodnavirid outliers Medusavirus , Mollivurus and Pandoravirus ) and the resulting clusters were filtered by requiring no paralogs and a taxonomic coverage of greater than 90%. Homologous genes from the rest of the genomes and metagenomic contigs were fetched by taking best hits with diamond v. 0.9.24 27

180 (e-value threshold of 1e-5 and subject coverage of at least 50%). The homologs were aligned using mafft v. 7.310 28 (G-INS-i method), trimmed with trimAl v. 1.4.rev15 29 (the “automated1” mode) and the phylogeny was reconstructed with iqtree specifying the orthologs as individual partitions, picking optimal partitioning scheme and substitution models and testing the resulting ML phylogeny with 1000

185 ultrafast bootstrap replicates. The D5-like helicase/primase dataset was created by blasting the protein sequences of the helicase-primase genes from the metagenomic contigs against bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 7

NCLDV protein sequences. The sequences were aligned with mafft (G-INS-i), trimmed with trimAl (automated1) and the phylogeny was reconstructed with iqtree

190 (selecting best-fit model, applying 1000 ultrafast bootstrap replicates). To increase the resolution the process was repeated by focusing on the clade covering the genes from the metagenomic contigs.

Gene sharing analyses . For the gene sharing analyses, orthogroups for the longest

195 metagenomic contigs and Phycodnaviridae and Mimiviridae genomes were collected with GET_HOMOLOGUES (COG algorithm) with an e-value threshold of 1e-3. Genome clustering (Ward’s method) was performed on the genome-by-genome matrix of relative numbers of shared orthogroups. Genome ordination was obtained with correspondence analysis run on the genome-by-orthogroup presence-absence

200 matrix.

Molecular biology . For electrophysiological recordings in ND7/23 cells, human/mouse codon-optimized sequences encoding v Py ACR_21821, v Py ACR_2164382, PymeACR1, and Py 2087ACR1 were synthesized (GenScript,

205 Piscataway, NJ) and cloned in frame with mCherry into the pmCherry-C1 vector using Nhe I and Age I restriction sites (FastDigest, Thermo Fisher Scientific, Waltham, MA). To improve the membrane localization, v Py ACR_21821 and v PyACR_2164382 were further subcloned in frame with eYFP into the pEYFP-N1 vector, using Gibson assembly 30. As previously reported 31, a membrane trafficking sequence

210 (KSRITSEGEYIPLDQIDINV) and an endoplasmic reticulum release sequence (FCYENEV) flanked the fluorophore eYFP, and the N-terminus was extended (MDYGGALSAVGLFQTSYTLENNGSVICIPNNGQCFCLAWLKSNG). Furthermore, the last 131 amino acids of the C-terminus were truncated. Molecular cloning was planned using NEBuilder v. 2.1.0 (New England

215 Biolabs Inc., Ipswich, MA) and SnapGene v. 4.3+ (GSL Biotech LLC, Chicago, IL). The sequences of the codon-optimized CDSs of the prasinophyte and viral ACRs and the corresponding translations were deposited in Genbank (acessions MT353681-MT353684).

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 8

220 Electrophysiology . ND7/23 cell culture (ECACC 92090903, Sigma-Aldrich, Munich, Germany) and electrophysiological experiments were performed as described elsewhere 31,32. In detail, cells were cultured in Dulbecco’s Modified Eagle Medium (DMEM) supplemented with 5% (v/v) fetal bovine serum (FBS) and 1 µg/ml

penicillin/streptomycin at 37 °C and 5% CO 2. For experiments, cells were seeded on 225 poly-D-lysine-coated coverslips at a density of 0.5–1.0×10 5 cells/ml and supplemented with 1 µM all- trans retinal. The next day, cells were transiently transfected with 2 µg DNA using FuGENE Ⓡ HD (Promega, Madison, WI). Whole-cell patch-clamp recordings were performed at room temperature, two days after transfection with a 140 mM NaCl agar bridge as reference electrode and at

230 membrane resistances ≥0.5 GΩ with an access resistance <10 MΩ. Patch pipettes were pulled to resistances of 1.5–2.5 MΩ using a P-1000 micropipette puller (Sutter, Novato, CA) and fire-polished. An AxoPatch200B and a DigiData400 were used to amplify and digitize signals, respectively. Signals were acquired using Clampex 10.4 (all from Molecular Devices, Sunnyvale, CA). A Polychrome V (TILL Photonics,

235 Planegg, Germany) with the bandwidth set to 7 nm served as light source. The light was collimated into an Axiovert 100 microscope (Carl Zeiss, Jena, Germany) and controlled using a programmable shutter system (VS25 and VCM-D1; Vincent Associates, Rochester, NY). Buffer osmolarity was measured (Osmomat 3000basic, Gonotec, Berlin, Germany) and set to 320 mOsm for extracellular buffers and 290

240 mOsm for intracellular buffers using glucose. The pH was adjusted using N-methyl-D-glucamine or citric acid. Liquid junction potentials were calculated using Clampex 10.4 and corrected on-line. Extracellular buffers (Suppl. Table S1) were exchanged in random order to determine the ion selectivity, by manually adding and removing at least 4x1 ml of the respective buffer to the measuring chamber (volume

245 ~0.5 ml). Photocurrents were induced with 470-nm (viral ChRs) or 510-nm (CCMP277-1 and CCMP2087) light for 500 ms and recorded while the membrane potential was held at -80 to +40 mV in steps of 20 mV. Action spectra were recorded at -60 mV using 10-ms pulses of low-intensity light between 390 and 680 nm in steps of 10 nm. A motorized neutral-density filter wheel (Newport, Irvine, CA) was moved

250 into the light path and controlled by a custom software written in LabVIEW (National Instruments, Austin, TX), to maintain an equal photon irradiance at all wavelengths. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 9

휆 The wavelength sensitivity ( m ax) was determined by applying a three-parametric Weibull function to the data normalized to the maximum photocurrent between 410 and 680 nm.

255 Confocal microscopy. For confocal imaging, ND7/23 cells were cultured as described above and seeded at a density of 0.2×10 5 cells/ml in polymer-bottom, 35 mm µ-dishes (ibidi). Two days after transfection with 2 to 2.5 µg DNA using Fugene HD (Promega), confocal images were acquired using an FV1000 confocal laser scanning microscope (Olympus, Shinjuku, Tokyo, Japan) equipped with a 60x water

260 immersion objective with a numerical aperture of 1.2 (UPlanSApo, Olympus). Protein localization was detected by exciting mCherry or eYFP with a 559 nm diode laser and a 515 nm argon laser, respectively (5% transmissivity for both). Acquired z-stacks were analyzed with ImageJ 33. Relevant z-planes were z-projected for representative images of membrane fluorescence.

265

References

1. Philosof, A. et al. Novel abundant oceanic viruses of uncultured marine group II

Euryarchaeota. Curr. Biol. 27 , 1362–1368 (2017).

2. Besemer, J., Lomsadze, A. & Borodovsky, M. GeneMarkS: a self-training method for

270 prediction of gene starts in microbial genomes. Implications for finding sequence motifs

in regulatory regions. Nucleic Acids Res. 29 , 2607–2618 (2001).

3. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinforma. Oxf. Engl. 30,

2068–2069 (2014).

4. Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer

275 RNA genes in genomic sequence. Nucleic Acids Res. 25 , 955–964 (1997).

5. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment

search tool. J. Mol. Biol. 215 , 403–410 (1990).

6. Sandaa, R. A., Heldal, M., Castberg, T., Thyrhaug, R. & Bratbak, G. Isolation and bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 10

characterization of two viruses with large genome size infecting Chrysochromulina

280 ericina (Prymnesiophyceae) and Pyramimonas orientalis (Prasinophyceae). Virology

290 , 272–280 (2001).

7. Ewing, B., Hillier, L., Wendl, M. C. & Green, P. Base-calling of automated sequencer

traces using phred. I. Accuracy assessment. Genome Res. 8 , 175–185 (1998).

8. Bastide, M. de la & McCombie, W. R. Assembling genomic DNA sequences with

285 PHRAP. Curr. Protoc. Bioinforma. 17 , 11.4.1-11.4.15 (2007).

9. Larsen, J. B., Larsen, A., Bratbak, G. & Sandaa, R.-A. Phylogenetic analysis of

members of the Phycodnaviridae virus family, using amplified fragments of the major

capsid protein gene. Appl. Environ. Microbiol. 74, 3048–3057 (2008).

10. Keeling, P. J. et al. The Marine Microbial Eukaryote Transcriptome Sequencing Project

290 (MMETSP): illuminating the functional diversity of eukaryotic life in the oceans through

transcriptome sequencing. PLoS Biol. 12 , e1001889 (2014).

11. One Thousand Plant Transcriptomes Initiative. One thousand plant transcriptomes and

the phylogenomics of green plants. Nature 574 , 679–685 (2019).

12. Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of

295 protein or nucleotide sequences. Bioinformatics 22, 1658–1659 (2006).

13. Lopes Dos Santos, A. et al. Chloropicophyceae, a new class of picophytoplanktonic

prasinophytes. Sci. Rep. 7, 14019 (2017).

14. Simon, N. et al. Revision of the genus Micromonas Manton et Parke (Chlorophyta,

Mamiellophyceae), of the type species M. pusilla (Butcher) Manton & Parke and of the

300 species M. commoda van Baren, Bachy and Worden and description of two new species

based on the genetic and phenotypic characterization of cultured isolates. 168 ,

612–635 (2017).

15. Waterhouse, R. M. et al. BUSCO applications from quality assessments to gene

prediction and phylogenomics. Mol. Biol. Evol. 35 , 543–548 (2018). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 11

305 16. Klapoetke, N. C. et al. Independent optical excitation of distinct neural populations. Nat.

Methods 11 , 338–346 (2014).

17. Leliaert, F. et al. Phylogeny and molecular evolution of the green algae. Crit. Rev. Plant

Sci. 31 , 1–46 (2012).

18. Gontcharov, A. A. Phylogeny and classification of Zygnematophyceae ():

310 current state of affairs. Fottea 8 , 87–104 (2008).

19. Daugbjerg, N., Moestrup, Ø. & Arctander, P. Phylogeny of the genus Pyramimonas

(Prasinophyceae, Chlorophyta) inferred from the rbcL gene. J. Phycol. 30, 991–999

(1994).

20. Suda, S., Bhuiyan, M. A. H. & Faria, D. G. Genetic diversity of Pyramimonas from

315 Ryukyu Archipelago, Japan (Chlorophyceae, ). J. Mar. Sci. Technol.

21, 285–296 (2013).

21. Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinforma.

Oxf. Engl. 30, 1236–1240 (2014).

22. Kianianmomeni, A., Stehfest, K., Nematollahi, G., Hegemann, P. & Hallmann, A.

320 Channelrhodopsins of Volvox carteri are photochromic proteins that are specifically

expressed in somatic cells under control of light, temperature, and the sex inducer. Plant

Physiol. 151 , 347–366 (2009).

23. Steinegger, M. et al. HH-suite3 for fast remote homology detection and deep protein

annotation. BMC Bioinformatics 20, 473 (2019).

325 24. Oksanen, J. et al. vegan: Community Ecology Package. (2019).

25. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and

effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol.

Evol. 32 , 268–274 (2015).

26. Contreras-Moreira, B. & Vinuesa, P. GET_HOMOLOGUES, a versatile software package

330 for scalable and robust microbial pangenome analysis. Appl. Environ. Microbiol. 79, bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 12

7696–7701 (2013).

27. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using

DIAMOND. Nat. Methods 12, 59–60 (2015).

28. Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple

335 sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066

(2002).

29. Capella-Gutiérrez, S., Silla-Martínez, J. M. & Gabaldón, T. trimAl: a tool for automated

alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25, 1972–1973

(2009).

340 30. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred

kilobases. Nat. Methods 6, 343–345 (2009).

31. Grimm, C., Silapetere, A., Vogt, A., Sierra, Y. A. B. & Hegemann, P. Electrical properties,

substrate specificity and optogenetic potential of the engineered light-driven sodium

pump eKR2. Sci. Rep. 8 , 1–12 (2018).

345 32. Oppermann, J. et al. MerMAIDs: a family of metagenomically discovered marine

anion-conducting and intensely desensitizing channelrhodopsins. Nat. Commun. 10,

1–13 (2019).

33. Abramoff, M., Magalhães, P. & Ram, S. J. Image processing with ImageJ. Biophotonics

Int. 11 , 36–42 (2003).

350 34. Piganeau, G., Grimsley, N. & Moreau, H. Genome diversity in the smallest marine

photosynthetic . Res. Microbiol. 162 , 570–577 (2011).

35. Daugbjerg, N., Fassel, N. M. & Moestrup, Ø. Microscopy and phylogeny of Pyramimonas

tatianae sp. nov. (Pyramimonadales, Chlorophyta), a scaly quadriflagellate from Golden

Horn Bay (eastern Russia) and formal description of Pyramimonadophyceae classis

355 nova. Eur. J. Phycol. 1–15 (2019).

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 13

Supplementary Figures

Metagenomic contigs 4 v21821 cluster v2164382 cluster SAMEA2621277_1099815 SAMEA2619399_2980695+ contigs with vChRs

Phycodnaviridae clades

a Chlorovirus

a Coccolithovirus Medusa-Molli-Pandoravirus

a Phaeovirus 2 Prasinovirus

a Raphidovirus Tetraselmis virus 1 Mimiviridae clades

a Klosneuvirinae

a Megavirinae CA2 Mesomimivirinae PPVC

Pyramimonas orientalis virus Stramenopile clades

0 Bacillariophyceae Pelagophyceae Phaeophyceae

Viridiplantae clades Core Chlorophyta Mamiellophyceae Picocystophyceae Pyramimonadales Streptophyta

-2 Haptophytes Haptophyta

-2 0 2 CA1 Fig. S1. Ordination of the metagenomic contigs containing putative viral ChRs and related 360 contigs, genomes of viruses from the Mimiviridae and Phycodnaviridae and genomes of several photosynthetic organisms in the canonical axes of the correspondence analysis of tetranucleotide frequencies. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 14

Cryptophyte CCRs

MMETSP0038_DN19916_c0_g1_i1.p1 [ paramecium]

MMETSP1102_DN12656_c0_g1_i1.p1 [ sp. Caron Lab Isolate]

MMETSP1102_DN42207_c5_g1_i1.p1 [Geminigera sp. Caron Lab Isolate]

MMETSP1091_DN26982_c0_g1_i1.p1 [ sp. CCMP768]

MMETSP0046_DN5020_c0_g1_i1.p1 [ theta]

MMETSP0799_DN34865_c1_g1_i1.p1 [Geminigera cryophila]

1KP_IAYV_2007552.p1 [Rhodomonas sp.]

MMETSP1389_DN22769_c0_g2_i1.p1 [Rhodomonas sp. CCMP768]

MMETSP1048_DN8528_c0_g2_i1.p2 [Hanusia phi]

MMETSP1091_DN29084_c7_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP1043_DN19538_c0_g1_i1.p1 [ andersenii] MMETSP1091_DN29084_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP0989_DN35240_c0_g1_i1.p1 [Undescribed sp. CCMP2293] MMETSP1091_DN29084_c6_g1_i1.p1 [Rhodomonas sp. CCMP768]

MMETSP1389_DN30744_c6_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP1091_DN29084_c3_g1_i1.p1 [Rhodomonas sp. CCMP768]

MMETSP0038_DN49556_c0_g1_i1.p1 [Cryptomonas paramecium] MMETSP1043_DN19538_c0_g1_i2.p1 [Hemiselmis andersenii]

MMETSP1389_DN30744_c10_g3_i1.p1 [Rhodomonas sp. CCMP768] MMETSP1091_DN29084_c0_g2_i1.p1 [Rhodomonas sp. CCMP768]

MMETSP1101_DN33539_c0_g1_i1.p1 [Rhodomonas abbreviata]

MMETSP1091_DN29084_c5_g1_i1.p1 [Rhodomonas sp. CCMP768]

MMETSP1042_DN20243_c0_g1_i1.p2 [Hemiselmis andersenii] MMETSP1091_DN29084_c5_g2_i2.p1 [Rhodomonas sp. CCMP768]

MMETSP1048_DN7190_c0_g1_i1.p1 [Hanusia phi]

GtBCCR2 [Guillardia theta]

MMETSP0484_DN35984_c0_g1_i1.p1 [Rhodomonas lens]

MMETSP1041_DN19092_c0_g1_i1.p2 [Hemiselmis andersenii]

MMETSP1041_DN19092_c0_g1_i2.p2 [Hemiselmis andersenii]

MMETSP0038_DN13477_c0_g1_i1.p2 [Cryptomonas paramecium]

MMETSP0038_DN27019_c0_g1_i1.p1 [Cryptomonas paramecium] MMETSP1355_DN9093_c0_g1_i1.p1 [Hemiselmis tepida] MMETSP0038_DN920_c0_g1_i1.p1 [Cryptomonas paramecium] 1KP_IAYV_2044871.p1 [Rhodomonas sp.] MMETSP1357_DN22821_c0_g1_i1.p2 [Hemiselmis rufescens]

MMETSP0038_DN27019_c0_g2_i1.p1 [Cryptomonas paramecium] MMETSP1389_DN30320_c0_g2_i1.p1 [Rhodomonas sp. CCMP768] MMETSP0038_DN28302_c0_g1_i1.p1 [Cryptomonas paramecium]

MMETSP1091_DN29084_c2_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP0484_DN11893_c0_g1_i1.p1 [Rhodomonas lens]

MMETSP1091_DN23588_c0_g2_i1.p1 [Rhodomonas sp. CCMP768]

1KP_BAKF_2054605.p1 [Cryptomonas curvata]

1KP_IAYV_2005024.p1 [Rhodomonas sp.] MMETSP1050_DN30237_c0_g1_i1.p1 [Cryptomonas curvata] 1KP_IAYV_2005698.p1 [Rhodomonas1KP_IAYV_2045001.p1 sp.] [Rhodomonas sp.]

1KP_IAYV_2005023.p1 [Rhodomonas sp.]

MMETSP1101_DN47309_c0_g1_i1.p1 [Rhodomonas abbreviata] MMETSP1389_DN28221_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP1389_DN30485_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] 1KP_BAKF_2054335.p1 [Cryptomonas curvata]

MMETSP1356_DN21199_c0_g1_i1.p1 [Hemiselmis viresens]

AHH02113.1 [Rhodomonas sp.] MMETSP0472_DN91513_c0_g1_i1.p1 [Tiarina fusus]

1KP_MJMQ_2031350.p2 [Hemiselmis virescens]

MMETSP0472_DN506_c0_g1_i1.p1 [Tiarina fusus]

MMETSP1101_DN31238_c1_g1_i1.p1 [Rhodomonas abbreviata]

1KP_IRZA_2061044.p1 [Proteomonas sulcata] MMETSP0472_DN59412_c1_g1_i1.p1 [Tiarina fusus]

1KP_ROZZ_2062339.p1 [ sp.] MMETSP0988_DN29393_c5_g1_i1.p1 [Undescribed sp. CCMP2293] MMETSP0799_DN27191_c0_g2_i1.p1 [Geminigera cryophila] MMETSP0989_DN27591_c4_g1_i2.p1 [Undescribed sp. CCMP2293]

1KP_MJMQ_2031158.p2 [Hemiselmis virescens] MMETSP0484_DN19366_c1_g1_i1.p2 [Rhodomonas lens] 1KP_IRZA_2001844.p1 [Proteomonas sulcata] MMETSP1101_DN40913_c1_g1_i1.p1 [Rhodomonas abbreviata] MMETSP1047_DN29451_c0_g1_i1.p1 [Rhodomonas salina] MMETSP0484_DN29933_c0_g1_i1.p1 [Rhodomonas lens] 1KP_IRZA_2061006.p1 [Proteomonas sulcata] MMETSP0987_DN28363_c0_g1_i1.p1 [Undescribed sp. CCMP2293] MMETSP0988_DN29393_c5_g2_i2.p1 [Undescribed sp. CCMP2293] MMETSP0047_DN25064_c5_g1_i2.p1 [Chroomonas cf. mesostigmatica] MMETSP0047_DN25064_c5_g1_i1.p1 [Chroomonas cf. mesostigmatica] MMETSP1091_DN23676_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP0986_DN28181_c3_g1_i1.p1 [Undescribed sp. CCMP2293] MMETSP0987_DN28648_c3_g1_i1.p1 [Undescribed sp. CCMP2293] MMETSP1389_DN23253_c1_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP0989_DN27591_c4_g1_i1.p1 [Undescribed sp. CCMP2293]

MMETSP0799_DN39916_c0_g1_i1.p1 [Geminigera cryophila]

MMETSP0798_DN40329_c0_g1_i1.p1 [Myrionecta rubra]

MMETSP1047_DN27607_c0_g1_i1.p1 [Rhodomonas salina]

MMETSP1102_DN40109_c0_g2_i1.p1 [Geminigera sp. Caron Lab Isolate]

MMETSP0046_DN3238_c0_g2_i1.p1 [Guillardia theta]

MMETSP1102_DN41822_c5_g2_i1.p1 [Geminigera sp. Caron Lab Isolate] MMETSP1101_DN40681_c0_g1_i1.p1 [Rhodomonas abbreviata]

MMETSP0038_DN16975_c0_g2_i1.p1 [Cryptomonas paramecium] MMETSP1102_DN12607_c0_g1_i1.p1 [Geminigera sp. Caron Lab Isolate]

MMETSP1101_DN52983_c0_g1_i1.p1 [Rhodomonas abbreviata] MMETSP1048_DN14978_c0_g2_i2.p1 [Hanusia phi]

1KP_IRZA_2004242.p1 [Proteomonas sulcata] MMETSP1048_DN11828_c0_g1_i1.p1MMETSP1048_DN14978_c0_g1_i1.p1 [Hanusia phi] [Hanusia phi] MMETSP1048_DN14978_c0_g1_i2.p1 [Hanusia phi] MMETSP1091_DN28851_c0_g1_i1.p1 [Rhodomonas sp. CCMP768]

MMETSP1048_DN14978_c0_g2_i1.p1 [Hanusia phi] 1KP_IAYV_2047677.p1 [Rhodomonas sp.]

MMETSP1389_DN30604_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] 1KP_IRZA_2060652.p1 [Proteomonas sulcata]

1KP_IAYV_2051368.p1 [Rhodomonas sp.] 1KP_IAYV_2001108.p1 [Rhodomonas sp.]

MMETSP1101_DN48877_c0_g1_i1.p1 [Rhodomonas abbreviata]

MMETSP1047_DN14417_c0_g1_i1.p1 [Rhodomonas salina]

MMETSP0798_DN22857_c0_g1_i1.p1 [Myrionecta rubra]

MMETSP0046_DN15025_c0_g1_i3.p1 [Guillardia theta]

MMETSP1104_DN12643_c6_g1_i1.p1 [Cafeteria sp. Caron Lab Isolate]

1KP_LLXJ_2042068.p1 [Chroodactylon ornatum]

KAA0157615.1 []

1KP_LLXJ_2038500.p1 [Chroodactylon ornatum]

KAA0154785.1 [Cafeteria roenbergensis] MMETSP0942_DN14202_c0_g1_i1.p1 [Cafeteria roenbergensis]

MMETSP0361_DN41464_c1_g2_i3.p1 [Scrippsiella hangoei] MMETSP0361_DN41464_c1_g2_i1.p1 [Scrippsiella hangoei]

MMETSP0359_DN22773_c0_g1_i1.p1 [Scrippsiella hangoei] MMETSP0361_DN41464_c2_g1_i1.p1 [Scrippsiella hangoei]

MMETSP1036_DN31249_c1_g1_i1.p1 [Azadinium spinosum] MMETSP0116_DN53597_c5_g1_i1.p1 [Durinskia baltica]

MMETSP1015_DN58189_c5_g1_i1.p1 [Karlodinium micrum] MMETSP1017_DN58186_c0_g1_i1.p1 [Karlodinium micrum] Cryptophyte

MMETSP1017_DN58186_c0_g2_i1.p1 [Karlodinium micrum] MMETSP0799_DN19143_c1_g1_i1.p1 [Geminigera cryophila]

MMETSP0448_DN40851_c17_g1_i1.p2 [Heterocapsa triquestra] MMETSP1338_DN70185_c0_g1_i1.p1 [Pelagodinium beii]

MMETSP1049_DN22488_c4_g1_i1.p1 [Proteomonas sulcata] 1KP_IRZA_2049932.p1 [Proteomonas sulcata] ACRs

MMETSP0799_DN19143_c0_g1_i1.p1 [Geminigera cryophila] MMETSP1049_DN21631_c2_g1_i1.p1 [Proteomonas sulcata] MMETSP0799_DN46967_c0_g1_i1.p1 [Geminigera cryophila]

MMETSP0227_DN44641_c5_g2_i2.p1GFBE01018556.1.p1 [Polarella [Ansanella glacialis] granifera] MMETSP1440_DN38455_c3_g2_i1.p1GFBE01014774.1.p1 [Ansanella [Polarella granifera] glacialis]

MMETSP1047_DN29475_c0_g2_i1.p1 [Rhodomonas salina] MMETSP1357_DN35345_c0_g1_i1.p1 [Hemiselmis rufescens] MMETSP0227_DN44436_c1_g1_i1.p1GFBE01016376.1.p1 [Polarella [Ansanella glacialis] granifera] MMETSP0047_DN12888_c0_g1_i1.p1 [Chroomonas cf. mesostigmatica] MMETSP1042_DN28549_c0_g1_i1.p1 [Hemiselmis andersenii]

MMETSP0799_DN48910_c0_g1_i1.p1 [Geminigera cryophila]

MMETSP1101_DN50558_c0_g1_i1.p1 [Rhodomonas abbreviata]

MMETSP0986_DN28619_c0_g1_i1.p1 [Undescribed sp. CCMP2293]

MMETSP1048_DN5535_c0_g1_i1.p1 [Hanusia phi] MMETSP0046_DN14824_c0_g1_i2.p1MMETSP0046_DN14824_c0_g1_i1.p1 [Guillardia [Guillardia theta] theta]

GFDM01013838.1.p1 [Paragymnodinium shiwhaense] MMETSP0484_DN25887_c3_g1_i1.p1 [Rhodomonas lens]MMETSP0046_DN14824_c0_g1_i7.p1 [Guillardia theta] MMETSP1047_DN20032_c0_g1_i1.p1 [Rhodomonas salina]

MMETSP1125_DN21672_c0_g1_i1.p1MMETSP1377_DN22992_c0_g1_i1.p1 [ sp. Mp] [Symbiodinium sp. D1a] MMETSP1124_DN23591_c6_g1_i1.p1MMETSP1117_DN29104_c9_g1_i1.p2 [Symbiodinium sp. Mp] [Symbiodinium sp. CCMP2430]

MMETSP1377_DN107700_c0_g1_i1.p1GBSC01021944.1.p1 [Symbiodinium [Symbiodinium sp. clade C] sp. D1a]

MMETSP0135_DN8157_c1_g1_i1.p1 [Symbiodinium kawagutii]

1KP_IAYV_2041205.p1 [Rhodomonas sp.] 1KP_IRZA_2060620.p1 [Proteomonas sulcata] MMETSP1117_DN29104_c2_g1_i1.p2GBGW01028073.1.p2 [Symbiodinium [Symbiodinium sp. CCMP2430] sp. A2] GAKY01137044.1.p1 [Symbiodinium sp. A1] MMETSP1049_DN21151_c0_g1_i1.p1 [Proteomonas1KP_IAYV_2007142.p1 sulcata] [Rhodomonas sp.] GBSC01014913.1.p1 [Symbiodinium sp. clade C] MMETSP1091_DN27594_c0_g1_i1.p1 [Rhodomonas sp. CCMP768]

1KP_IRZA_2008236.p1 [Proteomonas sulcata] 1KP_IRZA_2008173.p1 [Proteomonas sulcata] GBSC01041640.1.p2 [Symbiodinium sp. clade C] MMETSP0988_DN6528_c0_g1_i1.p1 [Undescribed sp. CCMP2293] MMETSP1377_DN108365_c3_g1_i1.p1 [Symbiodinium sp. D1a] MMETSP0046_DN10376_c1_g1_i1.p1 [Guillardia theta]

MMETSP1115_DN28812_c2_g1_i1.p1 [Symbiodinium sp. CCMP2430] MMETSP0046_DN13003_c0_g1_i1.p1 [Guillardia theta] 1KP_IRZA_2061003.p1 [Proteomonas sulcata] MMETSP0799_DN35356_c0_g1_i1.p1 [Geminigera cryophila] MMETSP1110_DN55933_c4_g1_i1.p1 [Symbiodinium sp. CCMP421] MMETSP0986_DN27836_c5_g1_i1.p1 [Undescribed sp. CCMP2293] MMETSP0799_DN40362_c7_g1_i1.p1 [Geminigera cryophila] MMETSP0484_DN29973_c8_g1_i1.p1 [Rhodomonas lens] MMETSP0987_DN3735_c1_g1_i1.p1 [Undescribed sp. CCMP2293]

MMETSP1101_DN9018_c1_g1_i1.p1 [Rhodomonas abbreviata]

GBGW01000071.1.p1 [Symbiodinium sp. A2]

MMETSP0987_DN28469_c9_g1_i1.p1 [Undescribed sp. CCMP2293] GBRZ01028019.1.p1 [Symbiodinium sp. B2] MMETSP1124_DN31244_c11_g1_i1.p1 [Symbiodinium sp. Mp]

MMETSP1102_DN42388_c1_g4_i1.p1 [Geminigera sp. Caron Lab Isolate] MMETSP1091_DN25104_c0_g1_i1.p1MMETSP1102_DN42388_c1_g4_i2.p1 [Rhodomonas sp. CCMP768] [Geminigera sp. Caron Lab Isolate] APZ76708.1 G1ACR_243 [Geminigera sp.]

1KP_IAYV_2004853.p1 [Rhodomonas sp.]

MMETSP1389_DN25105_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP1389_DN30196_c3_g1_i2.p1 [Rhodomonas sp. CCMP768] 1KP_IAYV_2051041.p1APZ76718.1 [Rhodomonas R1ACR_799 [Rhodomonassp.]MMETSP1047_DN23360_c0_g1_i1.p1 sp. CCMP768] [Rhodomonas salina] MMETSP1091_DN27522_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP1091_DN5692_c0_g1_i1.p1 [Rhodomonas sp. CCMP768] MMETSP1038_DN57427_c1_g1_i1.p1 [Azadinium spinosum] MMETSP0851_DN45665_c2_g1_i1.p1 [Pseudo-nitzschia fraudulenta] MMETSP1389_DN30582_c0_g1_i1.p1 [Rhodomonas sp. CCMP768]

MMETSP1091_DN5692_c1_g1_i1.p2 [Rhodomonas sp. CCMP768]

MMETSP1036_DN61415_c1_g1_i1.p1 [Azadinium spinosum]

MMETSP0290_DN21387_c2_g1_i1.p1 [Chromera velia]

MMETSP1451_DN12710_c24_g1_i1.p1 [Vitrella brassicaformis] Putative ChRs from Labyrinthulea

AHH02149.1 [Cryptomonas curvata]

MMETSP0108_DN24423_c0_g1_i1.p1 [ pacifica] AHH02151.1 [Chroomonas sp.]

AHH02117.1 [Chroomonas sp.] AHH02127.1 [Hemiselmis virescens]

AHH02156.1 [Chroomonas sp.]

AHH02099.1 [Proteomonas sulcata] PRJNA437349_DN2923_c0_g1_i1.p1 [Thraustochytriidae sp. PKU_Mn16] MMETSP0964_DN10458_c0_g2_i1.p1GBG23965.1 [Hondaea [Schizochytrium fermentalgiana] aggregatum]

MMETSP1346_DN9014_c0_g1_i1.p1MMETSP1346_DN5182_c0_g3_i1.p1 [Aplanochytrium [Aplanochytrium stocchinoi] stocchinoi] GBG24569.1 [Hondaea fermentalgiana] MMETSP1389_DN27232_c0_g1_i2.p2 [Rhodomonas sp. CCMP768] JGI_Aurli1_fgenesh1_pg.12__285 [Aurantiochytrium limacinum] AHH02108.1 [Pavlova lutheri] MMETSP0964_DN8906_c0_g3_i1.p1 [Schizochytrium aggregatum] AHH02131.1 [Chroomonas sp.]

JGI_Aurli1_fgenesh1_pg.12__284 [Aurantiochytrium limacinum] AHH02103.1 [Rhodomonas sp.] GBG24568.1 [Hondaea fermentalgiana] PRJNA484124_DN3275_c1_g2_i1.p1 [Schizochytrium sp. HX-308]

MMETSP1424_DN13697_c0_g1_i1.p1 [] MMETSP0469_DN2935_c0_g1_i1.p2 [Oxyrrhis marina]

AHH02130.1 [Gloeochaete wittrockiana]

AHH02116.1 [Stichococcus bacillaris]

AHH02111.1 [Cyanophora paradoxa]

1KP_WLZP_2036537.p2 [Cyanophora paradoxa]

1KP_IRZA_2050178.p1 [Proteomonas sulcata]

AHH02143.1 [Scherffelia dubia]

1KP_LLXJ_2001886.p1 [Chroodactylon ornatum] MMETSP1377_DN102897_c1_g1_i1.p2 [Symbiodinium sp. D1a]

KAA0157168.1 [Cafeteria roenbergensis]KAA0154771.1 [Cafeteria roenbergensis]

KAA0157245.1 [Cafeteria roenbergensis] MMETSP0942_DN11728_c5_g1_i1.p1 [Cafeteria roenbergensis]

MMETSP1100_DN33240_c24_g1_i1.p1 [Phaeocystis antarctica] MMETSP1444_DN16370_c0_g1_i1.p1 [Phaeocystis antarctica] MMETSP1100_DN32616_c0_g1_i1.p1 [PhaeocystisMMETSP1100_DN27005_c0_g1_i1.p1 antarctica] [Phaeocystis antarctica]

MMETSP1178_DN16108_c17_g1_i1.p1MMETSP1100_DN32654_c17_g1_i1.p2 [unid sp. CCMP2000] [Phaeocystis antarctica] MMETSP1178_DN16108_c16_g1_i1.p1 [unid sp. CCMP2000] MMETSP1100_DN26652_c2_g1_i1.p2 [Phaeocystis antarctica] MMETSP1100_DN32654_c12_g2_i1.p1 [Phaeocystis antarctica] MMETSP1162_DN17627_c1_g1_i1.p1 [Phaeocystis sp. CCMP2710] MMETSP1178_DN14311_c2_g1_i1.p2 [unid sp. CCMP2000]

Known activities 1KP_ISIM_2036828.p1 [Nephroselmis pyriformis] MMETSP0033_DN14889_c0_g1_i1.p1 [Dolichomastix tenuilepis] MMETSP1326_DN13418_c2_g1_i1.p1 [Mantoniella beaufortii]

1KP_QXSZ_2010086.p1 [Mantoniella squamata] MMETSP1326_DN13817_c3_g1_i1.p1 [Mantoniella beaufortii]

MMETSP0033_DN11041_c4_g1_i1.p1 [Dolichomastix tenuilepis] MMETSP0033_DN11041_c4_g1_i2.p1 [Dolichomastix tenuilepis]

MMETSP0803_DN11515_c6_g1_i1.p1 [Crustomastix stigmatica]

MMETSP1326_DN13616_c1_g2_i1.p1 [Mantoniella beaufortii] 1KP_QXSZ_2042408.p1 [Mantoniella squamata] MMETSP1106_DN9282_c2_g1_i1.p1 [Mantoniella antarctica] 1 MMETSP1326_DN14043_c7_g1_i1.p1 [Mantoniella beaufortii] ACR MMETSP1106_DN5271_c1_g1_i1.p1 [Mantoniella antarctica] 1KP_QXSZ_2037381.p2 [Mantoniella squamata]

CYMBO-Kim_comp30349_c0_seq1.p1 [Cymbomonas tetramitiformis] 1KP_XIVI_2076994.p1 [Cymbomonas sp. M3265]

MMETSP1169_DN2685_c1_g1_i1.p2 [Pyramimonas melkonianii]

1KP_TNAW_2078497.p1 [Pyramimonas parkeae] MMETSP1081_DN18642_c1_g1_i1.p1 [Pyramimonas amylifera] MMETSP1081_DN19574_c0_g2_i1.p1 [Pyramimonas amylifera]

1KP_TNAW_2008887.p1 [Pyramimonas parkeae]

1KP_TNAW_2012857.p1 [Pyramimonas parkeae] Prasinophyte MMETSP0058_DN11622_c0_g1_i1.p1 [Pyramimonas parkeae] 1KP_TNAW_2014098.p1 [Pyramimonas parkeae] CCR MMETSP0058_DN19955_c0_g1_i1.p1 [Pyramimonas parkeae]

MMETSP1169_DN4941_c0_g2_i1.p2 [Pyramimonas melkonianii]

PRJNA342459-36897_SRR4293310_2610.p1 [Pyramimonas tychotreta] MMETSP1445_DN18775_c1_g1_i2.p1 [Pyramimonas sp. CCMP2087]

MMETSP1445_DN18775_c1_g1_i1.p1 [Pyramimonas sp. CCMP2087] PRJNA342459-36897_SRR4293315_57652.p1 [Pyramimonas tychotreta] ACRs SAMEA2623079_2164382_0040 SAMEA2621277_1099815_0010

SAMEA2619399_2980695__0010

SAMEA2620979_21821_0080

Viral ACRs

QCW06520.1 MerMAID2

QCW06525.1QCW06524.1 MerMAID7 MerMAID6

QCW06519.1 MerMAID1 CYMBO-Kim_comp45579_c0_seq1.p1 [Cymbomonas tetramitiformis] QCW06521.1 MerMAID3 QCW06523.1 MerMAID5 ABZ90902.1 VChR2 [Volvox carteri] QCW06522.1 MerMAID4 EFJ51427.1 [Volvox carteri]

BDSJ01000074.1_gmes_5695 [ sp. 2006-703-Eu-15] 1KP_RYJX_2004815.p1 [ morum] 1KP_XIVI_2002629.p2 [Cymbomonas sp. M3265]

1KP_POIR_2016788.p2 [Volvox aureus] 1KP_XIVI_2009048.p1 [Cymbomonas sp. M3265] 1KP_JWGT_2005579.p1 [Volvox aureus] AHH02134.1 [Volvox aureus]

1KP_RNAT_2007870.p1 [Eudorina elegans] CYMBO-Kim_comp85151_c0_seq1.p1 [Cymbomonas tetramitiformis] BDSK01000035.1_gmes_6649 [Yamagishiella unicocca] MerMAIDs 1KP_ISPU_2005709.p1 [Volvox globator]1KP_ISPU_2052027.p1 [Volvox globator] KXZ47650.1 [Gonium pectorale]

1KP_KUJU_2052186.p1 [Gonium pectorale] MMETSP0803_DN12319_c0_g1_i1.p1 [Crustomastix stigmatica] PNH09255.1 [Tetrabaena socialis] 1KP_QXSZ_2009460.p1MMETSP1326_DN13951_c12_g1_i1.p1 [Mantoniella squamata] [Mantoniella beaufortii]

AEY68813.1 PsChR2 [Pleodorina starrii] PNW86407.1 [Chlamydomonas reinhardtii] 1KP_IHOI_2029326.p2 [Chloromonas oogama] MMETSP1468_DN10112_c0_g1_i1.p1 [Mantoniella sp. CCMP1436] 1KP_JMTE_2014125.p1 [Pseudoscourfieldia marina]

BDDC01000038.1_gmes_1266 [Chlamydomonas sphaeroides] QAXN01000269.1_gmes_973 [Chlamydomonas sp. 3222]

1KP_XIVI_2002629.p1 [Cymbomonas sp. M3265] KXZ47652.1 [Gonium pectorale] 1KP_KUJU_2052448.p1 [Gonium pectorale] PNH09251.1 [Tetrabaena socialis]1KP_JKKI_2005143.p2 [ rostrata]

1KP_BCYF_2046542.p1 [Chlamydomonas cribrum]

EFJ42452.1 [Volvox carteri]

1KP_RYJX_2009716.p1 [Pandorina morum] BDSI01000046.1_gmes_13717 [Eudorina sp. 2006-703-Eu-15] 1KP_FPCO_2030141.p1 [Interfilum paradoxum] 1KP_RYJX_2006946.p4 [Pandorina morum] BDSJ01000018.1_gmes_17536 [Eudorina sp. 2006-703-Eu-15] 1KP_RNAT_2003716.p11KP_POIR_2033228.p1 [Eudorina [Volvox elegans] aureus] GAQ87305.1 [Klebsormidium nitens]

QAXL01005253.1_gmes_20130QAXL01004948.1_gmes_4260 [Chlamydomonas [Chlamydomonas sp. WS7] sp. WS7] 1KP_FMVB_2037891.p1 [Scherffelia dubia] BDDB01000929.1_gmes_8484 [Chlamydomonas debaryana] PNW72909.1 [Chlamydomonas reinhardtii] AEY68812.1 PsChR1 [Pleodorina starrii]

1KP_HVNO_2065902.p1 [Tetraselmis chui] BDDC01003098.1_gmes_17225 [Chlamydomonas sphaeroides] 1KP_HHXJ_2004754.p1 [Tetraselmis striata]

1KP_JKKI_2057554.p1 [Lobomonas rostrata] MMETSP0804_DN11001_c0_g1_i1.p1 [Tetraselmis astigmatica] 1KP_XOAL_2042306.p1 [Dolichomastix tenuilepis] MMETSP1326_DN13944_c5_g2_i1.p1MMETSP0033_DN1573_c0_g1_i1.p1 [Mantoniella [Dolichomastix beaufortii] tenuilepis]

1KP_KYIO_2037141.p1 [ viride] 1KP_QXSZ_2010783.p1 [Mantoniella squamata] AEI83869.1 MvChR1 [Mesostigma viride] 1KP_USIX_2008230.p1 [Neochlorosarcina sp. M2315] MMETSP1106_DN11521_c12_g1_i1.p1 [Mantoniella antarctica] 1KP_QWRA_2058964.p1 [Vitreochlamys sp. M2074] 1KP_SRGS_3056135.p1 [Chlamydomonas bilatus] 1KP_XIVI_2008500.p1 [Cymbomonas sp. M3265] BDDA01001003.1_gmes_15586BDDA01000174.1_gmes_6319 [Chlamydomonas asymmetrica] [Chlamydomonas asymmetrica]1KP_IRYH_2036755.p1 [Heterochlamydomonas inaequalis]

1KP_FXHG_2082152.p1 [Hafniomonas reticulata] CYMBO-Kim_comp57022_c1_seq1.p1 [Cymbomonas tetramitiformis] 1KP_GFUR_2022435.p1 [Chloromonas subdivisa]

1KP_VALZ_2053960.p1 [Chlamydomonas noctigama] 1KP_IRYH_2037105.p1 [Heterochlamydomonas inaequalis] AHH02150.1 [Spermatozopsis exsultans] 1KP_VALZ_2004172.p1 [Chlamydomonas noctigama] 1KP_DUMA_2007005.p1 [Tetraselmis cordiformis]

1KP_BTFM_2045979.p1 [Monomastix opisthostigma] 1KP_TSBQ_2014336.p1 [Chlamydomonas1KP_LNIL_2005986.p1 sp. M2762] [Pteromonas angulosa] 1KP_GFUR_2000556.p1 [Chloromonas subdivisa] 1KP_TNAW_2078329.p1 [Pyramimonas parkeae] 1KP_TSBQ_2004646.p1 [Chlamydomonas sp. M2762]

1KP_GFUR_2095137.p1 [Chloromonas subdivisa] MMETSP1081_DN15038_c0_g1_i1.p1 [Pyramimonas amylifera]

JGI_Chlkli_DN20981_c4_g3_i1.p2 1KP_ZIVZ_2052033.p2[Chlamydomonas klinobasis] [Phacotus lenticularis] JGI_Chlkli_DN20981_c12_g1_i1.p1 [Chlamydomonas klinobasis] MMETSP1445_DN18216_c0_g2_i1.p1 [Pyramimonas sp. CCMP2087] 1KP_JKKI_2009146.p1 [Lobomonas rostrata] 1KP_LNIL_2007488.p1 [Pteromonas angulosa]

1KP_ACRY_2041915.p11KP_ACRY_2003530.p1 [Pteromonas sp. M2761][Pteromonas sp. M2761] 1KP_XOAL_2007882.p2 [Dolichomastix tenuilepis] 1KP_TSBQ_2009594.p1 [Chlamydomonas sp. M2762] QAXD01000165.1_gmes_1478 [Haematococcus sp. NG2] MMETSP1169_DN16301_c20_g1_i1.p1 [Pyramimonas melkonianii] PRJNA342459-36897_SRR4293311_1034.p1 [Pyramimonas tychotreta] MMETSP0033_DN7944_c0_g1_i1.p1 [Dolichomastix tenuilepis] AEY68835.1 PgChR1 [Pyramimonas gelidicola] BDCZ01000016.1_gmes_1227 [Chlamydomonas applanata] 1KP_QXSZ_2011642.p1 [Mantoniella squamata] GAQ79757.1 [Klebsormidium nitens] 1KP_KFEB_2024057.p11KP_AGIO_2054542.p11KP_GUBD_2007608.p1 [Haematococcus [Haematococcus [Brachiomonas pluvialis] pluvialis] submarina]

MMETSP1391_DN15396_c15_g1_i1.p1 [Chlamydomonas leiostraca]

MMETSP1392_DN11660_c0_g1_i1.p1 [Chlamydomonas chlamydogama] MMETSP1326_DN13908_c1_g4_i1.p1 [Mantoniella beaufortii]

MMETSP1326_DN13908_c1_g3_i1.p1 [Mantoniella beaufortii] 1KP_NDPQ_2014096.p1 [Dunaliella salina] MMETSP1106_DN7780_c0_g1_i1.p1 [Mantoniella antarctica] 1KP_BTFM_2050818.p1 [Monomastix opisthostigma] 1KP_UKUC_2023941.p1 [Dunaliella salina] 1KP_VBLH_2011304.p1 [Cladophora glomerata] 1KP_RHVC_2038831.p11KP_SYJM_2040838.p1 [Dunaliella salina] [Dunaliella viridis] MMETSP1468_DN18246_c10_g1_i1.p1 [Mantoniella sp. CCMP1436] AEY68833.1 DChR1 [Dunaliella salina] 1KP_XIVI_2096613.p1 [Cymbomonas sp. M3265] 1KP_NDPQ_2014095.p1 [Dunaliella salina] 1KP_XDLL_2048884.p1 [Oogamochlamys gigantea] 1KP_NTLE_2052176.p1 [Asteromonas gracilis] 1KP_PUAN_2022801.p1 [Pedinomonas tuberculata] CYMBO-Kim_comp69310_c0_seq1.p1 [Cymbomonas tetramitiformis]

MMETSP1081_DN19374_c1_g1_i1.p1 [Pyramimonas amylifera]

LGRX01000921.1_gmes_3651 [Cymbomonas tetramitiformis]

MMETSP1081_DN19374_c2_g1_i1.p1 [Pyramimonas amylifera]

1KP_QJYX_2006028.p1 [Oltmannsiellopsis viridis]

MMETSP1169_DN16214_c3_g1_i1.p1 [Pyramimonas melkonianii] MMETSP1106_DN7780_c0_g2_i1.p1 [Mantoniella antarctica]

1KP_RRSV_2003374.p1 [Pedinomonas minor]

MMETSP1445_DN18621_c2_g1_i1.p1 [Pyramimonas sp. CCMP2087] MMETSP1106_DN7780_c0_g3_i1.p1 [Mantoniella antarctica]

MMETSP0491_DN16650_c0_g1_i2.p1 [Tetraselmis chui] PRJNA342459-36897_SRR4293310_539.p1 [Pyramimonas tychotreta] AHH02137.1 [Tetraselmis chui] 1KP_VIAU_2064267.p1 [ crucifera]

1KP_DXNY_2004539.p1 [Microthamnion kuetzingianum] 1KP_GJIY_2000345.p1 [ sp. UTEX LB 1445]

1KP_GJIY_2000346.p1 [Neochloris sp. UTEX LB 1445]

1KP_FXHG_2014554.p1 [Hafniomonas1KP_AJUW_2011513.p1 reticulata] [Chloromonas rosae] 1KP_QPDY_2006865.p1 [ irregularis] 1KP_ZLBP_3016371.p1 [Chloromonas reticulata]

MMETSP0819_DN10510_c2_g1_i1.p1 [Tetraselmis striata] 1KP_FMVB_2004044.p1 [Scherffelia dubia] 1KP_TNAW_2078033.p1 [Pyramimonas parkeae] MMETSP0818_DN13137_c0_g1_i1.p1 [Tetraselmis striata] 1KP_HHXJ_2042658.p1 [Tetraselmis striata] AHH02112.1 [Pyramimonas parkeae]

AGF84747.1 PsChR2 [Tetraselmis subcordiformis] AER58217.1 CyChR1 [Chlamydomonas yellowstonensis] MMETSP0804_DN10817_c0_g1_i1.p1 [Tetraselmis astigmatica] MMETSP0491_DN8307_c0_g1_i1.p1 [Tetraselmis chui] 1KP_FMVB_2009447.p1 [Scherffelia dubia]

MMETSP0419_DN14107_c5_g1_i1.p1 [Tetraselmis sp. GSL 018] 1KP_AOUJ_2001741.p1 [Chlamydomonas sp. M2779] 1KP_PRIQ_2051706.p1 [Chlorococcum oleofaciens] MMETSP0419_DN13867_c1_g1_i1.p1 [Tetraselmis sp. GSL 018] MMETSP0419_DN13584_c9_g1_i1.p1 [Tetraselmis sp. GSL 018] 1KP_DUMA_2001575.p1 [Tetraselmis cordiformis]

1KP_RUIF_2054670.p1 [Carteria obtusa] MMETSP0491_DN16897_c0_g1_i1.p1 [Tetraselmis chui] QAXP01006359.1_gmes_251 [Characiochloris sp. AAM3] MMETSP0804_DN5417_c0_g1_i1.p1 [Tetraselmis astigmatica] QAXH01005832.1_gmes_12198 [Chloromonas sp. AAM2] 1KP_HHXJ_2002733.p1 [Tetraselmis striata]

MMETSP0491_DN14406_c0_g1_i1.p1 [Tetraselmis chui]

1KP_HVNO_2016747.p1 [Tetraselmis chui] 1KP_KFEB_2012309.p1 [Haematococcus pluvialis] 1KP_HHXJ_2002819.p1 [Tetraselmis striata] 1KP_FMVB_2001758.p2 [Scherffelia dubia]

MMETSP0819_DN7537_c0_g1_i1.p1 [Tetraselmis striata]

GAX82612.1 [Chlamydomonas eustigma] 1KP_MULF_2004916.p1 [Chlamydomonas bilatus] AHH02125.1 [Haematococcus droebakensis] 1KP_SRGS_3007306.p1 [Chlamydomonas bilatus] 1KP_DUMA_2008289.p1 [Tetraselmis cordiformis] 1KP_AJUW_2055939.p11KP_XOZZ_2023960.p1 [Chloromonas [Chlamydomonas rosae] sp.] 1KP_GDUD_3008238.p1 [Chloromonas reticulata] 1KP_GUBD_2050375.p1 [Brachiomonas submarina]

1KP_ENAU_2000997.p1 [Spermatozopsis similis] MMETSP1392_DN10989_c1_g1_i1.p1 [Chlamydomonas chlamydogama] 1KP_LNIL_2008704.p1 [Pteromonas angulosa] MMETSP0804_DN3453_c0_g1_i1.p3 [Tetraselmis astigmatica] 1KP_VALZ_2006701.p1 [Chlamydomonas noctigama] 1KP_QPDY_2004406.p1 [Coleochaete irregularis] 1KP_ZLQE_2008018.p1 [Stephanosphaera pluvialis] 1KP_VJDZ_2001928.p1 [Botryococcus sudeticus] 1KP_AOUJ_2010847.p1 [Chlamydomonas sp. M2779] 1KP_ZIVZ_2051645.p1 [Phacotus lenticularis] QAXP01000305.1_gmes_10456 [Characiochloris sp. AAM3] 1KP_ENAU_2000733.p2 [Spermatozopsis similis] QAXK01003763.1_gmes_8296 [ sp. KRBP]

1KP_ZNUM_2026218.p1 [Leptosira obovata] 1KP_RUIF_2054617.p1 [Carteria obtusa] 1KP_PRIQ_2002050.p1 [Chlorococcum oleofaciens] 1KP_ISGT_2001631.p1 [Uronema sp. CCAP 334/1] 1KP_AOUJ_2010869.p1 [Chlamydomonas sp. M2779] JGI_Chlpit_DN14232_c5_g1_i1.p1 [Chlamydomonas pitschmannii]

MMETSP1367_DN16944_c1_g1_i1.p2 [Symbiodinium1KP_JRGZ_2001873.p1 sp. C1] [Chlamydomonas moewusii]

1KP_MXDS_2001007.p1 [Spermatozopsis exsultans]

1KP_ACRY_2003180.p2 [Pteromonas sp. M2761] MMETSP0063_DN14184_c0_g2_i1.p1 [Chlamydomonas euryale] AHH02128.1 [Chlamydomonas bilatus] 1KP_JMUI_2054714.p1 [Stigeoclonium helveticum] 1KP_FOYQ_2036621.p1 [Microspora cf. tumidulaASW 06004 (M1292)]

JGI_Chlaci_DN14711_c0_g2_i5.p1 [Chlamydomonas acidophila]

NSMO01014337.1_gmes_23702 [Coelastrella sp. UTEX]

1KP_BAZF_2070338.p1 [Chaetopeltis orbicularis]

1KP_MULF_2007075.p1 [Chlamydomonas bilatus] 1KP_MXDS_2001008.p1 [Spermatozopsis exsultans]

1KP_VALZ_2008404.p1 [Chlamydomonas noctigama]

VIIQ02000124.1_gmes_4764 [Desmodesmus armatus]

1KP_VIAU_2008761.p1 [Carteria crucifera]

MMETSP1180_DN11300_c0_g1_i1.p1 [Chlamydomonas cf. sp. CCMP681] 1KP_JKKI_2057435.p1 [Lobomonas rostrata]

NEDT01002701.1_gmes_17554 [Tetradesmus obliquus]

1KP_SRGS_3010032.p2 [Chlamydomonas bilatus]

RWJK01001661.1_gmes_5912 [ sp.ARA3] NEDT01000234.1_gmes_12438 [Tetradesmus obliquus] NMUB01000077.1_gmes_9494 [ sp. 549] Green algal CCRs Fig. S2. Phylogenetic analysis of known and putative channelrhodopsins. The prasinophyte 365 and viral ACRs form a well-supported clade not nested in any of the described families of ChRs. The ultrafast bootstrap support values are indicated by circles (70-100 range). The sequences and the phylogenetic tree are available in Suppl. File 3a,b. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 15

TM2 TM3 80 90 100 110 120 130 140

Prasinophyte/viral ACRs *** vPyACR_21821 K F E V L Y VTGVGSLTDLVKLTITNEL PASYVIE - NN G V - LFN W S Q FLG W M I T C P V L L IH L S N L A G vPyACR_2164382 K F E V M Y VSSIGSITDLVKLAFTDYT PASITIK - END I - VFN W S Q YYG W L I T C P V L L IH L S N L A G vPyACR_2980695 K F E V L Y VTGVGSLTDLVKLTITNEL PASYTIK - NNDV - TMN W S Q FYG W M I T C P V L L IH L S N L A G PymeACR1 [Pyramimonas melkonianii] K F E V L Y VTGVGGISDF IKVTAATVV PARLTI Q - EN G I - KFG W S R YFG W LT T C P V L L IH L S N L A G Py2087ACR1 [Pyramimonas sp. CCMP2087] K F E V L Y VAGVGSITDFVKLFGTSYV PATFKIA - NN G V - T F P W S Q YCG W L V T C P V L L IH L S N L A G 1KP_ISIM_2036828.p1 [Nephroselmis pyriformis] K W EVMF VMFTGMLCDIVDIIGGRRGWNPISIT ... GTDK - L V P W M R YCG W L I T C P V L L IH L S N LT G 1KP_QXSZ_2010086.p1 [Mantoniella squamata] R W E V L YIATVAGLANTIGLAAGHLE PLSFAAE - G V G K - KT P W L RFSS WMT T C P V L L IH LN N L A G MMETSP0033_DN11041_c4_g1_i2.p1 [D. tenuilepis] R W E V I Y VASMSTVITLVKIFAFGMS PFYFEDT - - R G A ... V V V W I R YAG W L I T C P V L L IH L S N L A G Putative prasinophyte CCRs AEY68835.1 = PgChR1 [Pyramimonas gelidicola] G W E V M Y VATL ETLN YVFL IGWGDLA P FHWK L D ... - H GE ... E V P V A R YME W L I T C P V L L I A L S N LT G MMETSP1445_DN18216_c0_g2_i1.p1 [P. sp. CCMP2087] G W E V M Y VAI L ETIN YVFL IGWGDLS P F YWK L D ... - H GE ... E V P V A R YME W L I T C P V L L I A L S N LT G MMETSP1169_DN16301_c20_g1_i1.p1 [P. melkonianii] G W E I I Y VAG I ESI L YVLLLGWGDLS P F LWK L D ... - H G K ... H V P V A R YFG W L L T C P V I I I A L S N LT G 1KP_XOAL_2042306.p1 [Dolichomastix tenuilepis] G W E V I Y VAT I ETIK YVLELAWEFNS PSTLY LC - - D G K ... R V P W L R YAE W L L T C P V L L I A L S N LT G 1KP_QXSZ_2010783.p1 [Mantoniella squamata] G W E V I YIALV EFVK YILEIMWEYES PATLY LC ... VN GT ... E A P W L R YGE W L I T C P V I L I A L S N L S G 1KP_XOAL_2007882.p2 [Dolichomastix tenuilepis] G W E V L YISLV EMI K YVIE IWWERYS PATIT L S - - N GN - Q A A W L R YSE W L L T C P V I L I A L SRIGT 1KP_QXSZ_2011642.p1 [Mantoniella squamata] G W E V I Y I AWV EMI K YI LEIWWESSS PATIR LT - - N G A - I A P W L R YAE W L I T C P V I L I A L SRVGA MMETSP1169_DN16214_c3_g1_i1.p1 [P. melkonianii] G W E V A Y VAVL ESI I YIVSIGWEEYS PATIY LT - - N G A - V V P W I R YLE W L L T C P V I L I A L SRVGT MMETSP1445_DN18621_c2_g1_i1.p1 [P. sp. CCMP2087] G W E V A Y IAVV ETIK YIVELGWEEES PASIY LT - - N GN - V V P W I R YVE W L S T C P V I L I A L SRVGT Chlorophycean CCRs AgChR [Asteromonas gracilis-B] G W E E V Y VGI I ELTHICIAIFREFDS PAMLY L S - - T GN - F V V W A R Y A S W L L S C P V I L IH L S N LT G DChR1 [Dunaliella salina] G W E E V Y VNI I ELVHIALVIWVEFDK PAMLY LN - - D G Q - M V P W L R Y S A W L L S C P V I L IH L S N LT G BsChR1 [Brachiomonas submarina] G W E E V Y VCI I ELAHVCIAIFHEIES PSTLY L S - - T GN - Q I L W L R YAE W L L S C P V I L IH L S N LT G BsChR2 [Brachiomonas submarina] G W E E V Y VCCV ELVKVLFEIYHEIHH PCTLY L V - - T GN - F I L W L R YGE W L L T C P V I L IH L S NIT G HdChR [Haematococcus droebakensis] G W E E V Y VCLV EMVKVMIEVFHENDS PATLY L S - - T GN - F I M W I R YGE W L L S C P V I L IH L S NIT G CnChR2 [Chlamydomonas noctigama] G W E E V Y VCVV ELVKVLLEIYKEFES PASIY LP - - TAN - A A L W L R YGE W L L T C P V I L IH L S NIT G CsChR [Chloromonas subdivisa] G W E E V Y VTI I ELVHVCFGLWHEVDS PCTLY L S - - T GN - M V L W L R YAE W L L T C P V I L IH L S N LT G NsChR [Neochlorosarcina sp.] G W E E I Y VCTV ELIKVSID Q FLSSNS PCTLY L S - - T GN - R V L W I R YGE W L L T C P V I L IH L S NVT G CrChR2 [Chlamydomonas reinhardtii] G W E E I Y VCA I EMVKVILEFFFEFKN PSMLY L A - - T GH - R V Q W L R YAE W L L T C P V I L IH L S N LT G VChR2 [Volvox carteri f. nagariensis] G W E E V Y VCCV ELTKVVIEFFHEFDE PGMLY L A - - N GN - R V L W L R YGE W L L T C P V I L IH L S N LT G CrChR1 [Chlamydomonas reinhardtii] G W E E I Y VAT I EMIKFIIEYFHEFDE PAVIYSS - - N GN - K T V W L R YAE W L L T C P V I L IH L S N LT G VChR1 [Volvox carteri f. nagariensis] G W E E V Y VAL I EMMKSI IEAFHEFDS PATLW L S - - S GN - G V V W M R YGE W L L T C P V L L IH L S N LT G Chrimson [Chlamydomonas noctigama] G W E E V Y VCCV EVLFVTLEIFKEFSS PATVY L S - - T GN - HAYCL R YFE W L L S C P V I L I K L S N L S G Chronos [Stigeoclonium helveticum] G W E E V Y VCV I ELVKCFIELFHEVDS PATVY Q T - - N GG - A V I W L R Y S M W L L T C P V I L IH L S N LT G Chlorodendrophycean CCRs SdChR [Scherffelia dubia] G W E E V Y VCSV ELIKVILEIYFEFTS PAML F LY - - G GN - IT P W L R YAE W L L T C P V I L IH L S NIT G TcChR [Tetraselmis cordiformis] G W E E L Y VCTV EFTKVVVEVYLEYVP PFMIY Q M - - N G Q - HT P W L R YME W L L T C P V I L IH L S NIT G PsChR2 [Tetraselmis subcordiformis] G W E E L Y VCT I EFCKIVIELYFEFSP PAMIY Q T - - N GE - VT P W L R YAE W L L T C P V I L IH L S NIT G Mesostigma CCR MvChR1 [Mesostigma viride] G W E VWF VAC I ETSI YI IAITSEADS PFTLY LT -- N G Q - I S P Q L R YME W L M T C P V I L I A L S NIT G MerMAIDs MerMAID7 K W EVAW VASV ETVN Y V I Q V A --- TGSPRVE L A - - H GG - M F P W M R YVG W Q L T C P V L LAF IV - V N M MerMAID2 K W EVAW VAIV ETIN Y T I Q I S --- TGSE Q IE L A - - G G A - K F P W M R YVG W Q L T C P V L LTFIV - T N M MerMAID4 D W E P I Y VSGV EF IA YLMALVL - PDGYAFRSF Q - - D G A ... T V P V F R Y V S W L A T C P I I L K V LVTVIS Cryptophyte ACRs C1ACR_023 [unident. cryptophyte CCMP2293] P W E A I YLPMV EAIV YVMAY ---- TGNGYLKMG - - S GR - V L P W A RMAS W L V T C P VMHG Q I SGMT L C1ACR_887 [unident. cryptophyte CCMP2293] P W E A M YLPGV ECV I YTLAF ---- TGNGYVRMV - - D GR - ILT W S RLAS W LVC C P V M LG Q I SAITL GcACR_439 [Geminigera cryophila] P W E A I YLPMT ESIT YGLAF ---- TGNGYIR L A - - T G K - I L P W A RMAS W LC T C P I M L GMV S NMA L GtACR1 [Guillardia theta] P W E A I YLPTT EMIT YSLAF ---- TGNGY IRVA - - N G K - Y L P W A RMAS W LC T C P I M LGLV S NMA L PsuACR_003 [Proteomonas sulcata] A W E S I YLPLA ECIT YFLAF ---- TGYGVLRMS - - D GR - I F P W A RMAS WVC T C P V M LG Q V SAMGL RlACR_477 [Rhodomonas lens] A W E A I YLPLC ETIT YFLAF ---- SGNGYLRMA - - D G Q - I F P W A RMAS W L V T C P V M LG Q V SAMAL GtACR2 [Guillardia theta] P W E S V YLPFV ESIT YALAS ---- TGNGTL Q MR - - D GR - F F P W S RMAS W LC T C P I M LG Q I S NMA L GcACR_145 [Geminigera cryophila] T W E A V YLPLV ECI L YCCTG ---- AGYGVLRMG - - D GR - M L P L S RMCS W L I T C P I M L S Q AFGIHD R1ACR_877 [Rhodomonas sp. CCMP768] T W E A V YLPLV ETVL YGMAA ---- SGNGNLRIS - - N GR - M V P M A RFCS W L I T C P I M L L Q VVVMHD R2ACR_853 [Rhodomonas sp. CCAC1630] T W E A V YLPLV ETIL YSVAA ---- TGNGNMR L S - - D GR - V V P I A RFCS W L I T C P I M L F Q VVGIHE R2ACR_142 [Rhodomonas sp. CCAC1630] A W EAVFLPLA ETCL YGLAS ---- TGNGYVTMA - - N GR - T L P WG RMAS W L A T C PIIMN Q INAVAT ZipACR [Proteomonas sulcata] G W E A I YLPLA EVVT YSIAA ---- NGEGVLRMA - - D GR - YFNFAKLAG WAVC C P I M L I Q I GGMA Q C1ACR_561 [Chroomonas sp. CCMP2293] T W EFVL VPLT EAV I YGLGY ---- SGIGYVV L Q - - S GR - T V A W M RICH W L I T V P V L L L Q I SDVAH PsuACR_433 [Proteomonas sulcata] G W E S I YLPAC EMT L YSLAA ---- SGNGF I Q L A - - D GR - I I P Y A RMAA WIC T C P A L L M Q ING L A K Cryptophyte CCRs ChRmine [Tiarina fusus] L -- A L F VNFFAMLS YFGKIVADTLG ------... N F G D ... GFGNY R YADYML T C PMLVYD L -- LY Q GtBCCR3 [Guillardia theta] -- S V F VNLFAALA YWAKLASHANG ------D V GP ... SVTTYK YLDY L F T C PLLT I D L -- LWC GtBCCR4 [Guillardia theta] L -- S I Y VNLIAAIT YWGRICAHFNN ------D M G L ... SVNYFK YLDYIF T C PILTLD L -- L W S GtBCCR2 [Guillardia theta] L -- TVF INLWCALA YFAKVL Q SHSN ------DN G F ... PLTVIP YVDYCT T C PLLTLD L -- LWC Fig. S3. Alignment of the transmembrane domains 2 and 3 (TM2 and TM3) of different 370 channelrhodopsins. Representative sequences of the ChRs with known activities were bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 16

obtained by clustering at 60% identity level. The prasinophyte ChRs tested in this study are marked with red arrows, the activity of the other proteins from this group is putative. Green algal CCRs are subdivided into taxonomic groups (cf. Fig. S2). The location of the TMs and the position numeration correspond to the structure of CrChR2 (PDB: 6EID); the alignment 375 positions are highlighted according to conservation level; the Asp motifs in green algal CCRs are marked with dashed green frames; the XCP motif in TM3 is marked with asterisks; the unique Arg>Gln substitution in TM3 in viral ACRs and Py2087ACR1 is indicated with a dashed lilac frame.

a N vyPACR_21821 RR-like mCherry

1.0

0.5 F. Intensity (AU)

0.0 0 15 30 Distance (µm) 380

b NC2C1 vyPACR_21821 TS eYFP ER

1.0

0.5 F. Intensity (AU)

0.0 0 8 16 Distance (µm) Fig. S4. Membrane targeting of v PyACR_21821. Confocal images of ND7/23 cells after two days' expression of full-length (a) or membrane-targeted (b) v PyACR_21821. Fluorescence (left) of mCherry is shown in red and of eYFP in yellow. Fluorescence intensity profiles on the right were measured at the locations indicated by a thin white line in the fluorescence 385 images.

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 17

vPyACR_2164382

470 nm 5pA 15

0.1 s - [Cl ]ex 150 mM

vPyACR_21821

470 nm . nA 0.1 nA 0.1 nA 0.1

0.1 s 0.1 s 0.1 s - - - [Cl ]ex 150 mM [Cl ]ex 80 mM [Cl ]ex 10 mM

470 nm . nA 0.1 nA 0.1 nA 0.1

0.1 s 0.1 s 0.1 s

+ - - NMDG Br NO3

PymeACR1 510 nm . nA 0.5 . nA 0.5 . nA 0.5

0.1 s 0.1 s 0.1 s - - - [Cl ]ex 150 mM [Cl ]ex 80 mM [Cl ]ex 10 mM

510 nm . nA 0.5 . nA 0.5 . nA 0.5

0.1 s 0.1 s 0.1 s + - - NMDG Br NO3 Fig. S5. Photocurrent traces of v PyACR_2164382, v PyACR_21821, and PymeACR1. 390 Photocurrents were induced with light of indicated wavelengths (gray bars) and recorded at membrane potentials between -80 mV (lightest colored lines) and +40 mV (darkest colored lines) in steps of 20 mV. The external buffer contained high concentrations of the indicated ions (see Suppl. Table 1 for buffer compositions). bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 18

Nernst potential a n= c PymeACR1 vPyACR_21821 (7) 100 vPyACR_21821

PymeACR1 (10) 75 480 490500 510 -20 -10 0 � �� max (nm) max (nm) (mV) 50 rev E

b n= 25 vPyACR_21821(21)

PymeACR1 (17) 0

0.0 1.0 2.0 -1.0 0.5 0.0 0 50 100 150 � - -IS (nA) -IS (nA) [Cl ]ex (mM) d 60 e 100 30 (mV) (mV) 50 rev rev 0 E E 0 -30 n= (10) (17) (14) (12) (11) n= (8) (14) (15) (7) (6) 60 90 60 (mV) (mV) 30 30 rev rev 0 E E 0 � � -30 -30

------+ + - - - - + 3 3 Br Br Na NO NO 80 Cl 10 Cl 80 Cl 10 Cl 150 Cl 150 Cl 150 Cl 150 Cl 150 Cl 150 Cl 150 Cl NMDG150 Cl 150 Cl NMDG

200 f 2 g (nA) (pA) 100 S 1 S -I -I

0 0 n= (10) (17) (14) (11) (9) n= (8) (14) (15) (7) (6) 0 0 (pA) (nA)

S -30 S -I -I � � -60 -1 - - - - - + ------+ 3 3 Br Br NO NO 80 Cl 10 Cl 80 Cl 10 Cl 150 Cl 150 Cl 150 Cl NMDG150 Cl 150 Cl 150 Cl 150 Cl 150 Cl NMDG150 Cl 150 Cl 395 Fig. S6. Theoretical vs experimental reversal potential and estimation plots for 휆 electrophysiological data. (a) & (b) Estimation plots of max for v PyACR_21821 and

PymeACR1 (a) and stationary photocurrent amplitudes (I S) at -60 mV (b). (c) Comparison of the theoretical Nernst potential for chloride (line) with the experimentally derived reversal

potentials (E rev) for v PyACR_21821 (purple) and PymeACR1 (green) at external chloride - 400 concentrations ([Cl ]ex ) of 10 mM, 80 mM, and 150 mM. (d, e) Estimation plots of paired Erev

(top) and resulting ΔE rev (bottom) upon exchange of the external buffer as indicated (see Suppl. Table S1 for buffer compositions) for PymeACR1 (d) and v PyACR_21821 (e). (f, g)

Estimation plots of paired stationary photocurrent amplitudes at -60 mV (-I S) and resulting bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 19

shifts (Δ-IS ) upon exchange of the external buffer as indicated (see Suppl. Table S1 for buffer 405 compositions) for PymeACR1 (f) and v PyACR_21821 (g). The estimation plots show the mean difference between test and control group. Both groups are plotted as pairs or as single data points (a and b) with the mean values (white dot) ± standard deviation (solid line). The mean difference is indicated by a solid dot with each bootstrap sampling distribution indicated by a filled curve with the error bars in indicating the 95% confidence 410 interval. The number of biological replicates is indicated (n). Source data are provided Suppl. File 5.

� 508.6 ± 0.6 nm - max [Cl ] a 1.0 b c ex 510 nm 0.5 150 mM 40 mV 10 mM . nA 0.5 0.0 -80 mV - [Cl ] 150 mM 0.5 ex 0.1 s 100 -0.5 80 80 �

510 nm E 60 60 rev (mV) (mV) Normalized photocurrent

Normalized photocurrent 40 rev 40

40 mV E 20 . nA 0.5 20 -1.0 -80 mV 0 0 - [Cl ]ex (mM) 150 10 0.0 - [Cl ] 10 mM 400 500 600 ex 0.1 s -80 -40 0 40 Wavelength (nm) Holding Potential Ehold (mV) Fig. S7. Basic electrophysiological characterization of Py2087ACR1. a, Action spectrum 415 normalized to the maximum photocurrent. Dots are single measurements (n = 9) fitted (solid 휆 line) to determine maximum activity wavelength ( max) indicated as mean ± SEM. b, Example - current traces recorded with an external chloride concentration ([Cl ]ex ) of 150 mM or 10 mM (internal [Cl- ]: 120 mM) at membrane potentials between -80 mV and +40 mV in steps of 20 mV. Photocurrents were elicited with 510 nm light (gray bars). c, Current-voltage relationship

- 420 of photocurrents at [Cl ]ex of 150 mM (solid line; n = 4) and 10 mM (dashed line; n = 4). Inset

shows a paired estimation plot of the determined reversal potentials (E rev) for the respective

measurements (left) and the resulting reversal potential shifts (ΔE rev; right). The mean is indicated by a solid dot with the error bars indicating 95% confidence interval. The bootstrap sampling distribution is indicated by a filled curve. Source data are provided in Suppl. File 5. 425 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 20

123 Mesostigma viride Mesostigmatophyceae atmophyticus Chlorokybophyceae Streptophyta Spirotaenia minuta Spirotaenia sp. CCAC 0214 Entransia fimbriata Interfilum paradoxum Klebsormidium nitens Klebsormidiophyceae Klebsormidium subtile braunii Charophyceae Chaetosphaeridium globosum Coleochaete irregularis Coleochaetophyceae Coleochaete scutata Marchantia polymorpha Physcomitrella patens Embryophyta Selaginella moellendorffii Netrium digitus Spirogyra sp. M1810 Cylindrocystis brebissonii Cylindrocystis cushleckae Cylindrocystis sp. M3015 Mesotaenium braunii Mesotaenium caldariorum Mesotaenium endlicherianum Mesotaenium kramstae Mougeotia sp. CCAC 0197 Nucleotaenium eifelense Planotaenium ohtanii Zygnema sp. M1384-WGMD Zygnemopsis sp. CCAP 699 1 Roya obtusa Bambusina borreri Closterium lunula Cosmarium broomei Cosmarium granatum Zygnematophyceae Cosmarium ochthodes Cosmarium subtumidum Cosmarium tinctum Cosmocladium cf. constrictum Desmidium aptogonum Euastrum affine Gonatozygon kinahanii Micrasterias fimbriata Onychonema laeve Penium exiguum Penium margaritaceum Phymatodocis nordstedtiana Pleurotaenium trabecula Staurastrum sebaldi Staurodesmus convergens Staurodesmus omearii Xanthidium antilopaeum Nephroselmis olivacea Nephroselmis pyriformis Nephroselmidophyceae Prasinoderma coloniale Prasinoderma singularis Palmophyllophyceae Prasinoderma sp. NBRC 102842 Pseudoscourfieldia marina Pycnococcus provasolii Pseudoscourfieldiales Cymbomonas sp. M3265 ? Cymbomonas tetramitiformis Pterosperma sp. CCMP1384 Pyramimonas amylifera Pyramimonas melkonianii Pyramimonadophyceae Pyramimonas parkeae Pyramimonas sp. CCMP2087 Pyramimonas tychotreta Monomastix opisthostigma Dolichomastix tenuilepis Chlorophyta Crustomastix stigmatica Ostreococcus lucimarinus Ostreococcus mediterraneus Ostreococcus tauri Bathycoccus prasinos Bathycoccus sp. C1023 Bathycoccus sp. TOSAG39-1 Mantoniella antarctica Mamiellophyceae Mantoniella beaufortii Mantoniella sp. CCMP1436 Mantoniella squamata Micromonas bravo Micromonas commoda Micromonas polaris Micromonas pusilla Micromonas sp. ASP10-01a Picocystis salinarum Picocystis sp. ML Picocystophyceae Chloroparvula japonica Chloropicon laureae Chloropicon mariensis Chloropicon primus Chloropicophyceae Chloropicon roscoffensis Chloropicon sp. RCC3368 Pedinomonas minor Pedinomonas tuberculata Pedinophyceae Scherffelia dubia Scourfieldia sp. M0560 2 Tetraselmis astigmatica Tetraselmis chui Chlorodendrophyceae Tetraselmis cordiformis Tetraselmis sp. GSL 018 ? Tetraselmis striata Oltmannsiellopsis viridis Cladophora glomerata Blastophysa cf. rhizopus "Prasinophytes" virescens

? Trentepohlia annulata ? Bryopsis plumosa

? Caulerpa lentillifera ? Codium fragile Halochlorococcum marinum

? Acrosiphonia sp. SAG 127.80 ? Ulvophyceae Core chlorophytes Ignatius tetrasporus Planophila laetevirens Bolbocoleon piliferum Desmochloris halophila

? Ochlochaete sp. CCMP2390 ? percursa

? Ulva compressa ? Ulva prolifera

? Ulvella endozoica ? Microthamnion kuetzingianum Geminella sp. SAG 20.84 crispa Stichococcus bacillaris Stichococcus sp. RCC 1054 Chloroidium sp. CF Chloroidium sp. JM Symbiochloris reticulata Trebouxiophyceae sp. KSI-1

? Asterochloris sp. Cgr DA1pho ? Trebouxia arboricola Trebouxia gelatinosa Trebouxia sp. A1-2 Trebouxia sp. TZW2008 Botryococcus braunii Botryococcus sudeticus Botryococcus terribilis pringsheimii Coccomyxa sp. LA000219 Coccomyxa subellipsoidea costavermella Picochlorum sp. BH-2019 Picochlorum sp.'celeri' Picochlorum sp. RCC 944 Trebouxiophyceae Picochlorum sp. SENEW3 Picochlorum sp. 'soloecismus' DOE 101

fl bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 21

? Eremosphaera viridis ? Auxenochlorella protothecoides Auxenochlorella pyrenoidosa Chlorella minutissima Chlorella sorokiniana Chlorella sp. A99 Chlorella sp. ArM0029B Chlorella sp. CH2018 Chlorella sp. KRBP Chlorella variabilis Chlorella vulgaris Helicosporidium sp. ATCC 50920 Micractinium conductrix Nannochloris atomus Nannochloris sp. RS Nannochloris sp. X1 Parachlorella kessleri Prototheca bovis Prototheca ciferrii Prototheca cutis Prototheca stagnorum Prototheca wickerhamii Leptosira obovata Chaetopeltis orbicularis Chaetopeltidales Oedogonium cardiacum Oedogoniales ? Oedogonium foveolatum ? Aphanochaete repens Fritschiella tuberosa Halofilum salinum Channelrhodopsins Helicodictyon planctonicum Chaetophorales Stigeoclonium helveticum Uronema belkae Uronema sp. CCAP 334 1 ACRs with confirmed activity Ankistrodesmus sp. M3332 Chromochloris zofingiensis Coelastrella sp. M60 CCRs with confirmed activity Coelastrella sp. UTEX B 3026 Cylindrocapsa geminella Desmodesmus armatus putative ACRs Messastrum gracile

? Microspora cf. tumidula ? putative CCRs Monoraphidium sp. 549 Sphaeropleales Mychonastes homosphaera Neochloris oleoabundans Neochloris sp. UTEX LB 1445 Vegetative morphology Pediastrum duplex subcapitata

? Scenedesmus dimorphus ? lamentous or (pseudo)parenchymatous thalli Scenedesmus quadricauda fi Scenedesmus sp. ARA Scenedesmus vacuolatus cocci or colonies/clusters of non-motile cells Tetradesmus obliquus Asteromonas gracilis Brachiomonas submarina Motile cells or their colonies: Carteria crucifera Carteria obtusa ?

? Characiochloris sp. AAM3 ? flagellates without eyespots Chlamydomonas acidophila Chlamydomonas applanata Chlamydomonas asymmetrica flagellates with eyespots Chlamydomonas bilatus ? Chlamydomonas cf. sp. CCMP681 ? Chlamydomonas chlamydogama Chlamydomonas cribrum ? Chlamydomonas debaryana Morphology of swarmers (zoospores or gametes) Chlamydomonas euryale ? Chlamydomonas eustigma Chlamydomonas klinobasis ? motile stages assumed to be absent Chlamydomonas leiostraca ? Chlamydomonas moewusii Chlamydomonas noctigama swarmers without eyespots Chlamydomonas pitschmannii Chlamydomonas reinhardtii Chlorophyceae Chlamydomonas sp. 3112 ? zoospores and/or motile gametes with eyespots Chlamydomonas sp. 3222 ? Chlamydomonas sp. AIC ? Chlamydomonas sphaeroides ? Chlamydomonas sp. M2762 ? Homologs of viral contig genes Chlamydomonas sp. M2779 ? Chlamydomonas sp. WS3 ? Chlamydomonas sp. WS7 ? 1 SAMEA2620979_21821_0060 (plastidic ATP/ADP-transporter) Chlorococcum microstigmaticum Chlorococcum oleofaciens Chloromonas oogama ? 2 SAMEA2620980_763909_0790 Chloromonas reticulata Chloromonas rosae v. psychrophila Chloromonas sp. AAM2 ? 3 SAMEA2623079_2164382_0180 Chloromonas subdivisa ? Chlamydomonadales Chloromonas tughillensis Dunaliella salina Dunaliella sp. M2 ? Gene set completeness Dunaliella sp. RO ? Dunaliella sp. WIN1 ? Dunaliella sp. YS1 ? Complete genes Dunaliella tertiolecta Dunaliella viridis dumsii Eudorina elegans Fragmented Eudorina sp. 2006-703-Eu-15 Golenkinia longispicula Gonium pectorale Missing Haematococcus lacustris Haematococcus pluvialis Haematococcus sp. NG2 ? Hafniomonas reticulata Data sources Heterochlamydomonas inaequalis Lobochlamys segnis Lobomonas rostrata

? Genomes Neochlorosarcina sp. M2315 ? Oogamochlamys gigantea Pandorina morum Transcriptomes Phacotus lenticularis parva Pteromonas angulosa Pteromonas sp. M2761 ? Spermatozopsis exsultans Spermatozopsis similis Stephanosphaera pluvialis Tetrabaena socialis Vitreochlamys sp. M2074 Volvox aureus Volvox carteri Volvox globator Yamagishiella unicocca

Figure S8. Distribution of confirmed and putative channelrhodopsins among green algae. The cladogram reflects the consensus topology of green algal phylogeny largely based on 430 Leliaert et al 17 and further refined based on 13,34. Available genome and transcriptome assemblies were merged together on the level of the species (see Suppl. File 4).

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 22

1 100 200 300 400 500 600 700 800 900 1,000 1,100 1,200 1,300 1,396 a Green algal His-kinase rhodopsins CrHKR1 [Chlamydomonas reinhardtii] HK RR Cyc CrCop8 [Chlamydomonas reinhardtii] HK RR Cyc SAM Prasinophyte/viral ACRs vPyACR_21821 RR-like 3 PymeACR1 [Pyramimonas melkonianii] RR-like 3 MerMAID1 Putative prasinophyte CCRs MMETSP1169_DN16214_c3_g1_i1.p1 [P. melkonianii] RR-like 3 GtACR1 MMETSP1169_DN16301_c20_g1_i1.p1 [P. melkonianii] con1 RR-like 3 Chlorophycean CCRs GtCCR1 CrChR2 [Chlamydomonas reinhardtii] con1 RR-like 3 Putative labyrinthulean ChRs GBG24568.1 [Hondaea fermentalgiana] RR-like

3.33

Response regulator web-logo 1.67 M I V I D L L V L I V L T I I F D L Y V D L I L I L V I D V P L I E L MD D P V A A I V L V E Y E V N E L L K R E L V H E K R N E T N G G V I A G S L E E K P L R I Q E L E KP V DV T E L K D R DHP E K G A Q E K L E L N L K S E I L K KE E K E V D A I EKER VM I I D P V K E SK E A E I V S I A E K L A I N F KV K I K D E D K R K R S G I V E AK I E D K I S R K F K A A I E R E S F A I Q D A K E QL T N E R D R A A D D K V Q E V V A A E AAA ENV R D S A E E QN R R Q K K T A A V V E G D I S E T E S S V S I A T S A T SA K K K T NG S K A A QLMQ E D Q I R F E F S V N S I L D N NV RA K S V F R F A HP Y L L E I E Q G S K R RE L Q Q A D E V K N S V D S G M SA S QD A Q M T MF N Q K I R N V Q M S T Q F M I E D V F Q G C M I T F D T R Q LM HS V T Q N S Q K Q S V I T T S D E D F D R Y T N A M K K D K E Q T FQ Q T E V C V Q K R M C M F T N M R K K V T D Q T T F SNN M T C Q Q L T N K T D K V M LQQT F R R M S D I W Y Y T N N T M Q F R Q M V S YM R Q D N T V Q M MD T N D Y Q Q DP T Y I R C R T R K T Q N I L A NG K Q Y Q M Q M YN N Y M F N N M C M N Y D V M T R Q C D N N P I T V M H W N C Y C H W H H H C M M F Q C C H FM NWHDQC NNH C QCQQ NH N WH HHM C CMC M H CV H H HMYN H D T H W C MY TQSNQK T HM N H b L ME NQ T P H C DN I Q M QC A S Y N V Q M Q 0 G K vv v ! ####! ! 10 20 30 40 50 * 60 70 80 90 100 110 CheY [Escherichia coli] F L V V DDFS - T MRR I VRNL L - KE L G F NN VEE A E D G -- V D A L N K L Q A G GY ------G F V I S D WN M P NMD G L - E L LKT I R -- A D GAMSA L P V L M V ---- T AE AKKENI IA A A Q A G ----- A S G Y V V K P Green algal His-kinase rhodopsins CrHKR1 [Chlamydomonas reinhardtii] V L S V D D D P - VN Q L V I Q N L L - A P V G Y - E I L Q A M D G -- Q E A L K V LTEEER L ------P D V I L L D V M M P G M S G Y - E V C R K L R -- E MY PLSC I P V I M I ---- S AKSKEEHIVE GLAA G ----- S N D Y V V K P Vop5 [Volvox carteri f. nagariensis] V L S V D D D P - VN Q L V I Q N L L - A P V G Y - E I L Q A M D G -- Q E A L Q V LKSEER L ------P D V I L L D V M M P G M S G Y - E V C R K L R -- E MY PLSC I P V I M I ---- S AKSKEEHIVE GLAA G ----- S N D Y V --- Cop8 [Chlamydomonas reinhardtii] V L S V D D D P - IN Q L V VENL L - L PE G F - K VE Q AMC G -- S E A L D W L Q HT P L L ------P D V I L L D I M M PDMS G Y - E V C Q E V R -- RRFST VC I P I L M V ---- S AN GT PEHVMK GLEA G ----- A V D Y V K K P Cop6 [Chlamydomonas reinhardtii] V L S V D D E D - IN Q I V LEEI L - T D S G Y - A FAR C M D G -- A E A LE W LCASDT M ------P D L I L L D C M M P VMS G H - E F C AT L R -- K V I P G N V L P V I M V ---- S AKSDEENIVE GLRS G ----- S N D F ---- VcCop6 [Volvox carteri f. nagariensis] I L S V D D EE - VN Q I V LEEI L - T S T G Y - H FAR C M D G -- L E A L D W LCASET M ------P D L I L L D C M M PNMS G H - E F C AT L R -- K V I P G N V L P V I M V ---- S AKSDEDNIVE GLRS G ----- S N D F V R K P Prasinophyte/viral ACRs vPyACR_2980695 V L L I ENNN - I YTH Y FYNI L - I Q HE C - N V D I AHND -- I D I YEN LNNIEK ---- Q Y --- D M I F IN HE I AKFN Q Y - S I M FE ---- IR KH vPyACR_21821 I L L V ENNA - I YSH Y FDN I L - I Q H G C - T V D V SKN I -- D E L L F M TSNNDR ---- Q Y --- D M I F MNKT M AIENS Y - K M M YE ---- IR K R M F M L P V I V Y ---- G R D ITEYDMMNRNNT G ----- I D D F L I A P vPyACR_2164382 I L L V DN EH - N FLH Y FST I L - Y Q L D C - D IY V ASN I -- N E I LIN T S K Q ------H C --- D L I F IN YT L A V Q NN C - K L M CE ---- IR Q Y L C M I P V V A Y ---- G H V IDETHF Q NRHLT G ----- I D D F ITT P vPyACR_1099815 V I L V ESNA - I Y Q H Y LE Q T L - T Q LN C - N VT T VSE F -- N E I LIE TSKN ------K P --- D M L F IN Y D I ASYSN F - Y I M FN ---- IR K Y L Y M L P V I A Y ---- GNN I D I Q IVNNKHAT G ----- I D D F I S A P PymeACR1 [Pyramimonas melkonianii] I L L V E Q DE - V F GY F F YNM L - V Q H G S - K V D L AKN L -- S E L M A K TSNV G I ---- Q Y --- D M I M IN H D L A K Q NN Y - Q I M FE ---- IR K S L F M L P V I S Y ---- GRN I PKDEMDNRTAT G ----- I D D F L V A P Py2087ACR1 [Pyramimonas sp. CCMP2087] I L L V E Q DE - V F GY F F YNM L - V Q H S C - A V D M A K D A -- N E V M Q K TLTV G V ---- Q Y --- D M I M IN H D L ARANN Y - Q I M A D ---- IR K S L F M L P V I A Y ---- GRN I PKDDMDNRNAT G ----- I D D F L V A P 1KP_ISIM_2036828.p1 [Nephroselmis pyriformis] I L I L E D D D - V M V S L FHIK L - RH H K C - Q V D S C R S V -- D Q L TN L VEASSL ---- Q F --- D F I L GN YE L IRANN Y - E L M Y K ---- LR KT G S I L P V V A Y ---- S R I F A PEETA Q MMR S G ----- V D D Y I E A P 1KP_QXSZ_2010086.p1 [Mantoniella squamata] L L C SAH D D - Y V S V V LRTK L - A A I A M - Q VT V L GN T -- E D I M S L LDKEAD ---- R F --- A F V I V S L P M LRFCN G - E I A Q H A S V CARS PRK I L P L V T Y ---- T QQ INDEDFSF IRSLE ----- V D D F I D A P MMETSP0033_DN11041_c4_g1_i2.p1 [D. tenuilepis] I L M S P LND ---- G L L VNV T - AE VN A ------L V --- Q A A L YN L E D TSTEA -- T M T A A L --- R L R A ------A L Q AKTANKLE GKSD G - A R M S A D Q E P K A P Putative prasinophyte CCRs 1KP_XOAL_2007882.p2 [Dolichomastix tenuilepis] V V V L D GT G - A L T A F F QQQ L - T A L P V - N V M A A S S P -- A E A AN L A S Q H R A ------D F L L V ------PNN - AE L I A ---- Q V R N V T P S P L V A V ------TE P GT P A G M S G ----- T DDE V E L P 1KP_QXSZ_2011642.p1 [Mantoniella squamata] V I L G D P Q G - S L V P Y YTKA F - S K L P A - K LT V V D S M -- E S L LDE LETMK Q N G Q - R C --- D M C L V A PT L LEHEA ----- F D ---- S L RSE H A V K I V V Y V -- E G L A A I PEDSLA PFRER ----- C D D F I E A P MMETSP1169_DN16214_c3_g1_i1.p1 [P. melkonianii] F I I L D S S G - T I G A F FSEE F - K K V G A - E A D I VKT V -- T Q I P V L LEELEEDED - P Y --- E F V L L S A G D M A D P M ----- VN ---- A L KNE HN L P L V V F ------ATEA QQ DDEALDL ----- A D D I I P I P MMETSP1445_DN18621_c2_g1_i1.p1 [P. sp. CCMP2087] F L I L D T A G - T V S A F FSDE F - K K V G A - A AT T ASN L -- DT V V D L L M Q VDEDTD E P Y --- E F L I L G V S D MN GH Q ----- V A ---- A I K D Q H K V P L V V F ------ATDS QQ D D PNLDE ----- A D D I I PT P 1KP_XOAL_2042306.p1 [Dolichomastix tenuilepis] V L I C DST G - T L GN F YAEE L - T K L G C - A TT I CTR T -- D D A L Q A V Q RMR L ------N M V L A E D G F F D P A A ----- C G ---- R I R A S F G L P I V G F ------G PR G Q A S PSN G ----- L D D F V T S P 1KP_QXSZ_2010783.p1 [Mantoniella squamata] V I V L DARNSE F A A F F G E K L ESE F G L - R V I F AAN V -- A D L Q SN MLAH G G ---- A V --- A G V F FAET T M D PET ----- V A ---- S M R A S W R K P F I A Y ------GMRNL Q A S G L P ----- A D D Y L K V P MMETSP1169_DN16301_c20_g1_i1.p1 [P. melkonianii] V A V A D H M G F QQ G Q F FVAK M E Q E M G A - V V Q A CRSK -- Q E L L Q T L Q F A QQ S GR - P T --- D M V L A I D G L LT P S D ----- A I ---- S LH QQ F K T V V V Q F ------GTAMA P QQ S F ----- G E G Y L Q T P MMETSP1445_DN18216_c0_g2_i1.p1 [P. sp. CCMP2087] V V I L D K M G F Q A S M L L SNK L EAE M G A - S C V T VSTK -- P E V FE Q M QQ A M A G G L - A V --- D M V L A H D G M I TAND ----- A I ---- M LH QQ YR V A V V Q F ------G PNMS P QQ A F ----- G E G Y VKT P Chlorophycean CCRs BsChR2 [Brachiomonas submarina] V I V A I V D A - Q M L P A F Q ST F - A S L P A - A IE M V P A L G L E A T L Q L AH Q A L M M G G -- L --- D C V L AH P D L LKDTS P - T G L Q M ---- R L R S - A G I P V I A C ---- G W V V G G PYNE I I ESA G ----- V D G W VE G P CsChR [Chloromonas subdivisa] I I L C V P D M - S L V D F FRE Q F - S Q M P V - P FE V V P A L G P E V A L Q L V QQ A L S I G G -- AN Y I D Y V M VH PE F LRDRS P - G G L I G ---- R L K M - S G MR V C A F ---- G W V P S G A PREL IEAC G --- P M L D G F VE G P NsChR [Neochlorosarcina sp.] V I L A V P D V - S M I D F FRE Q F - A G L P S - P VE L I P A L G ADT A V Q I V QQ A A A L G G -- C --- D F V L VH PE F LRDLS G - N G L I A ---- R L R S - LN Q R V C A F ---- G WSKA G P IKDL I EAAA ----- L D G F VE G P CrChR2 [Chlamydomonas reinhardtii] V I L A V P D I - S M V D F FRE Q F - A Q L S V - T YE L V P A L G ADN T L A L VT Q A Q N L G G -- V --- D F V L IH PE F LRDRSS - T S I L S ---- R L R G - A G Q R V A A F ---- G W A Q L G PMRDL IESAN ----- L D G W LE G P VChR2 [Volvox carteri f. nagariensis] V I L A V P D I - S M V D Y FRE Q F - A Q L P V - Q YE V V P A L G ADN A V Q L V V Q A A G L G G -- C --- D F V L LH PE F LRDKSS - T S L P A ---- R L R S - I G Q R V A A F ---- G W S P V G PVRDL I ESA G ----- L D G W LE G P CrChR1 [Chlamydomonas reinhardtii] V I L A V P D I - S M V D F FRE Q F - AR L P V - P YE L V P A L G AEN T L Q L V QQ A Q S L G G -- C --- D F V L MH PE F LRDRS P - T G L L P ---- R L K M - G G Q R A A A F ---- G W A A I G PMRDL IE G S G ----- V D G W LE G P VChR1 [Volvox carteri f. nagariensis] V I L A V P D I - S M V D F FRE Q F - A Q L P V - P YE V V P A L G AEN T V Q L V QQ A A M L G G -- C --- D F V L MH PE F LRDR G P - T G L L P ---- Q V K M - M G Q R T A A F ---- G W S Q M G PMRDL I ESS G ----- V G A W LE G P Chrimson [Chlamydomonas noctigama] V I L A DT D I - F VTE M F K A Q F - A Q L P A - A IE L I P A L G ADN A L Q L V QQ A S V L G G -- C --- D F V M VH P Q F LKDNS P - S G L V A ---- R L R M - M G Q R V V A F ---- G P A --- NLREL I ESCD ----- V D A W I E A P Chlorodendrophycean CCRs SdChR [Scherffelia dubia] V L V AAT D P - G T L G F I Q E Q M - A S L P T - H VT V V T A L G PE E L Q E T LKSR GH ---- Q V --- D G V L L A PE F LSST P --- E L V A ---- T L K Q R L L L P V Y A F ---- G W Q R F G PLKEL I ERS G ----- V D G W LE G P TcChR [Tetraselmis cordiformis] I I I AVT DE - S I L S F F QQ E V - N K L G A - T VR I ISC M G M QQ LDE A L Q K G A ------D G L F L S P D Y L QQ P G ---- I V Q ---- H I K T K Y K I G V Y A Y ---- G W GKTS PARSV I ESS G ----- V D G W LE G P PsChR2 [Tetraselmis subcordiformis] I V L A V A D P - M TLT F FT QQ L - S Q L D A - T IR A T P A M G Q G Q LE Q V L E K G G F ------D G V L V S PE Y I QQ V G ---- L V Q ---- R L K D K YH M P V Y A F ---- G W GKSS PWRSV IE G S G ----- V D G W LE G P Mesostigma CCR MvChR1 [Mesostigma viride] V L F L DYT G G G Y V S F FE QQ L - SN M G V - N V T K CWSD -- D D M YN T A GVANVK --- Q L F -- H F A M I PNN A L G ----- G Q M V M ---- D L R G - T G L L V V A Y ------G PE P P M P G M G ----- Q DE F V P L Q Putative labyrinthulean ChRs GBG23965.1 [Hondaea fermentalgiana] I L I I E P A V - S F Q R L FMYM L - K D VN V - D AE M A L D L -- NH A K S I LRR GRVD --- D F --- E T I L VN LT T HFDR P -- N A LKE F K K IVSEE P F Q V P V L G YT F D ------DNCLAM LEKEDKRNE L TH G I VHH V estExt_fgenesh1_pg.C_3_t10474 [Schiz. aggregatum] I L V V ER E P - H I Q R L IEYL T - Q D AN V - N VE F AFDN -- D Q ARR I LDRCRLD --- Q F --- S C V F IN L S D ALNMD P - HE MEE A K V HFSAE P FC M P V V G YT F DYSDT V Q DHE IAL ------F CHAR VAH M GBG24568.1 [Hondaea fermentalgiana] L L I V E P K I - E L Q R L FMF I L - R D A G I - E GE I A F D L -- E S A T K I IRRDSL T --- H Y --- D A I L LN LT V AFE Q R -- I A T Q K F R M Q FR R K P FY L P L L G YC F D ------DNYA P L AEREEVE RN L S D G F VRH V MMETSP1346_DN5182_c0_g3_i1.p1 [Aplan. stocchinoi] V L L L DKS I - L V H K T VVMM M - ER V G V - E I S I A F D F -- Q H A K D H LRE Q P S Y --- L Y --- D A V F V V P D H YTTKEE KEN L K A FSE F I C S S P F K I P M L G LY F ------H N K G K LKN P G --- E Y VH G I IER P 435 Fig. S9. Domain organization in different channelrhodopsin families, compared to the related family of green algal His-kinase rhodopsins (HKRs). (a) Domains in the intracellular C-terminal extensions in representative ChRs and HKRs. Prasinophyte and chlorophycean CCRs belong to green algal CCRs. The inset shows the three ChR families without C-terminal extensions. Gray boxes indicate rhodopsin TM domains. HK — His-kinase 440 domains, RR(-like) — response regulator(-like) domains, Cyc — nucleotide cyclase domains, SAM — sterile alpha motif domain, con1 and 3 — conserved regions 1 and 3 first described CCRs from Chlamydomonas and Volvox 22. (b) Response regulator-like domains in three families of channelrhodopsins compared to their functional homologs: CheY and response regulator domains from green algal His-kinase rhodopsins. Green algal CCRs are 445 subdivided into taxonomic groups. The numeration corresponds to residue positions in bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 23

CheY; above the alignment is a web-logo for the seed alignment of the Pfam profile Response_reg; response regulator active sites are indicated with symbols (as summarized in NCBI CDD: cd00156): * — phosphorylation site, v — divalent cation binding sites, # — dimerization interface, ! — other active sites. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 24

sc 2 μm

ch

m

m t st m

450 1 μm

Figure S10. TEM micrograph of a Pyramimonas pseudoparkeae cell infected with a giant virus. Upper panel: general overview of viral factories and surrounding cellular structures, bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 25

lower panel: closeup of a viral factory. ch — , m — mitochondria, t — trichocyst, st 455 — starch grains, sc — scales. The diameter of the mature viral particles is ca. 199 nm. Courtesy of Richard Pienaar.

DNA Protein

0.15

0.4

0.10 0.3

0.2

0.05 Coverage,reads/residue 0.1

0.00 0.0

ORF5 ORF6 ORF7 ORF8 ORF9 ORF5 ORF6 ORF7 ORF8 ORF9 Gene

Contig SAMEA2620979_14769512 (w/o ChR) SAMEA2620979_21821 (with ChR)

ORF5 ORF6 ORF7 ORF8 (ChR) ORF9

1 500 1000 bp 50% DNA identity 100%

Figure S11. Abundance of the two variants of the genomic location around the 460 channelrhodopsin gene at station TARA_067. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 26

Klosneuvirus KNV1 ARF11390.1 Satyrvirus AYV85161.1 Hokovirus HKV1 ARF11016.1

Satyrvirus AYV85159.1

Satyrvirus AYV85546.1 Barrevirus AYV76933.1

Tupanvirus soda lake AUL77564.1 Homavirus AYV82268.1 Gaeavirus AYV80252.1 Tupanvirus deep ocean AUL78844.1 Metagenomic contigs

v2164382 cluster Powai lake megavirus ANB50423.1 Bodo saltans virus NG1ATZ80378.1

Catovirus CTV1ARF08922.1 Terrestrivirus AYV76011.1 Pyramimonas orientalis virus PoV_3500 Indivirus ILV1 ARF09495.1Indivirus ILV1

Harvfovirus AYV81208.1 v21821 cluster Megavirus chiliensis AEQ32733.1 Hyperionvirus AYV84817.1

Acanthamoeba polyphagaMoumouvirus moumouvirus goulette AGC01776.1 AGF85518.1 contigs with ChRs oPacV-662 QFG73876.1 Faunusvirus AYV79781.1 Phycodnaviridae clades mPacV-611 QFG74308.1 Tupanvirus soda lake AUL78737.1 cPacV-1605Cafeteria VVU94400.1 roenbergensis virus BV-PW1 ADO67528.1

Raphidovirus Tupanvirus deep ocean AUL79725.1

Tetraselmis virus 1 AUF82467.1 Mimiviridae clades Satyrvirus AYV85599.1 Choanovirus ChoanoV1 QDY51856.1 Klosneuvirinae Hirudovirus Sangsue AHA45665.1 anophagefferens virus AII17132.1

LFCJ01000229.1 Megavirinae Acanthamoeba polyphaga mimivirus ADO18054.1

Mesomimivirinae SAMEA2620979_1011438 SAMEA2620979_1871711 PPVC SAMEA2620979_1447958 SAMEA2620979_769160 SAMEA2620979_1361262 Pyramimonas orientalis virus PoV_5760 SAMEA2620979_1476951+ Intein inserts SAMEA2620979_1880599

Hyperionvirus AYV84787.1 CERI01035141.1 Intein-containing proteins SAMEA2620979_21821 Hyperionvirus AYV83774.1

Catovirus CTV1 ARF08563.1

Bodo saltans virus NG1 ATZ81249.1 Organic Lake phycodnavirus 1 ADX06014.1 Hokovirus HKV1 ARF10447.1 Organic Lake phycodnavirus 2 ADX06425.1

Dasosvirus AYV77629.1 oPacV-421 QFG75185.1 Chrysochromulina ericina virus CeV01 ALH23175.1 Phaeocystis globosa virus 16T AGM15511.1

SAMEA2623079_2164382

Bodo saltans virus NG1 ATZ80189.1 SAMEA2620980_763909 SAMEA2620970_1326537 Bodo saltans virus NG1 ATZ80064.1 SAMEA2620979_1257743

Faunusvirus AYV79656.1

Barrevirus AYV76838.1 Heterosigma akashiwo virus 01 AOM63458.1

Harvfovirus AYV81404.1

Satyrvirus AYV85353.1 Indivirus ILV1 ARF10087.1

Harvfovirus AYV81568.1 Catovirus CTV1 ARF08099.1 Klosneuvirus KNV1 ARF12594.1 Satyrvirus AYV85333.1

1 Bodo saltans virus NG1 ATZ80631.1 Tree scale: Bodo saltans virus NG1 ATZ80273.1

Satyrvirus AYV85346.1

Ultrafast bootstrap support: 70 80 90 100

Indivirus ILV1ARF10103.1

Hokovirus HKV1ARF11140.1 Figure S12. D5-like helicase-primase phylogeny. Proteins with intein inserts are indicated with red squares. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 27

a 2 Metagenomic contigs

vACR_21821 cluster vACR_2164382 cluster

Phycodnaviridae clades

1 Chlorovirus Coccolithovirus

Medusa-Molli-Pandoravirus CA2 Phaeovirus

Prasinovirus

Raphidovirus

0 Mimiviridae clades Klosneuvirinae

Megavirinae Mesomimivirinae

Predatory protist viral clade

-1

-2 -1 0 CA1

Terrestrivirus Catovirus CTV1 Edafosvirus b Homavirus Klosneuvirus KNV1 Indivirus ILV1 Barrevirus Hyperionvirus Harvfovirus Hokovirus HKV1 Bodo saltans virus NG1 Faunusvirus Gaeavirus Dasosvirus mPacV-611 Cafeteria roenbergensis virus BV-PW1 oPacV-662 cPacV-1605 Choanovirus ChoanoV1 Moumouvirus goulette Acanthamoeba polyphaga moumouvirus Powai lake megavirus Megavirus chiliensis Hirudovirus Sangsue Acanthamoeba polyphaga mimivirus Tupanvirus soda lake Tupanvirus deep ocean Satyrvirus Phaeocystis globosa virus 16T Chrysochromulina ericina virus CeV01 oPacV-421 Yellowstone lake mimivirus Organic Lake phycodnavirus 2 Organic Lake phycodnavirus 1 Tetraselmis virus 1 Aureococcus anophagefferens virus Pyramimonas orientalis virus SAMEA2620979_1476951+ SAMEA2620979_1011438 SAMEA2620980_763909 Heterosigma akashiwo virus 01 Feldmannia species virus Ectocarpus siliculosus virus 1 virus PS401 Mollivirus sibericum Acanthamoeba castellanii medusavirus Emiliania huxleyi virus 86 Emiliania huxleyi virus 156 Pandoravirus salinus Pandoravirus inopinatum Pandoravirus neocaledonia Pandoravirus macleodensis Paramecium bursaria Chlorella virus 1 Only Syngen Nebraska virus 5 Acanthocystis turfacea Chlorella virus 1 Paramecium bursaria Chlorella virus NY2A Paramecium bursaria Chlorella virus AR158 Paramecium bursaria Chlorella virus MT325 Paramecium bursaria Chlorella virus FR483 Micromonas sp. RCC1109 virus MpV1 Micromonas pusilla virus SP1 Micromonas pusilla virus 12T Dishui lake phycodnavirus 1 Bathycoccus sp. RCC1105 virus BpV2 Bathycoccus sp. RCC1105 virus BpV1 Ostreococcus tauri virus 5 Ostreococcus lucimarinus virus 1 Ostreococcus lucimarinus virus 2 Ostreococcus tauri virus RT-2011 Yellowstone lake phycodnavirus 2 Yellowstone lake phycodnavirus 1 Yellowstone lake phycodnavirus 3 Satyrvirus Gaeavirus oPacV-421 oPacV-662 Barrevirus Homavirus Dasosvirus erens virus erens mPacV-611 cPacV-1605 Harvfovirus Propotion of genes shared Edafosvirus Faunusvirus ff Terrestrivirus Indivirus ILV1 Indivirus Hyperionvirus Catovirus CTV1 Catovirus Hokovirus HKV1 Hokovirus Tetraselmis virus 1 virus Tetraselmis Klosneuvirus KNV1 Klosneuvirus Mollivirus sibericum Mollivirus Megavirus chiliensis Megavirus Pandoravirus salinus Pandoravirus Hirudovirus Sangsue Hirudovirus Tupanvirus soda lake soda Tupanvirus

050 100% megavirus lake Powai Moumouvirus goulette Moumouvirus Tupanvirus deep ocean deep Tupanvirus Choanovirus ChoanoV1 Choanovirus Bodo saltans virus NG1 virus saltans Bodo SAMEA2620980_763909 Pandoravirus inopinatum Pandoravirus Feldmannia species virus species Feldmannia Emiliania huxleyi virus 86 virus huxleyi Emiliania SAMEA2620979_1011438 Ostreococcus tauri virus 5 virus tauri Ostreococcus Yellowstone lake mimivirus lake Yellowstone Pandoravirus neocaledonia Pandoravirus Emiliania huxleyi virus 156 virus huxleyi Emiliania Pandoravirus macleodensis Pandoravirus SAMEA2620979_1476951+ Dishui lake phycodnavirus 1 phycodnavirus lake Dishui Micromonas pusilla virus SP1 virus pusilla Micromonas Micromonas pusilla virus 12T virus pusilla Micromonas Ectocarpus siliculosus virus 1 virus siliculosus Ectocarpus Pyramimonas orientalis virus orientalis Pyramimonas Emiliania huxleyi virus PS401 virus huxleyi Emiliania Only Syngen Nebraska virus 5 virus Nebraska Syngen Only Organic Lake phycodnavirus 2 phycodnavirus Lake Organic Organic Lake phycodnavirus 1 phycodnavirus Lake Organic Phaeocystis globosa virus 16T virus globosa Phaeocystis Heterosigma akashiwo virus 01 virus akashiwo Heterosigma Ostreococcus tauri virus RT-2011 virus tauri Ostreococcus Ostreococcus lucimarinus virus 2 virus lucimarinus Ostreococcus Ostreococcus lucimarinus virus 1 virus lucimarinus Ostreococcus Yellowstone lake phycodnavirus 2 phycodnavirus lake Yellowstone Yellowstone lake phycodnavirus 1 phycodnavirus lake Yellowstone Yellowstone lake phycodnavirus 3 phycodnavirus lake Yellowstone Aureococcus anophage Aureococcus Acanthamoeba polyphaga mimivirus polyphaga Acanthamoeba Micromonas sp. RCC1109 virus MpV1 virus RCC1109 sp. Micromonas Bathycoccus sp. RCC1105 virus BpV2 virus RCC1105 sp. Bathycoccus Bathycoccus sp. RCC1105 virus BpV1 virus RCC1105 sp. Bathycoccus Cafeteria roenbergensis virus BV-PW1 virus roenbergensis Cafeteria Chrysochromulina ericina virus CeV01 virus ericina Chrysochromulina Paramecium bursaria Chlorella virus 1 virus Chlorella bursaria Paramecium Acanthamoeba castellanii medusavirus castellanii Acanthamoeba Acanthocystis turfacea Chlorella virus 1 virus Chlorella turfacea Acanthocystis Acanthamoeba polyphaga moumouvirus polyphaga Acanthamoeba Paramecium bursaria Chlorella virus NY2A virus Chlorella bursaria Paramecium Paramecium bursaria Chlorella virus FR483 virus Chlorella bursaria Paramecium Paramecium bursaria Chlorella virus AR158 virus Chlorella bursaria Paramecium Paramecium bursaria Chlorella virus MT325 virus Chlorella bursaria Paramecium 465 Fig. S13. Gene sharing analyses for the members of the Phycodna- and Mimiviridae and the metagenomic contigs. (a) Ordination of long metagenomic contigs and genomes of related viruses in the canonical axes of correspondence analysis based on orthogroup presence/absence. The genomes of Coccolithovirus, Medusavirus, Mollivirus and Pandoravirus appeared as outliers with respect to the core Phycodnaviridae and the bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 28

470 Mimiviridae and were omitted in the second round of this analysis as presented here. (b) Heatmap and clustering of long metagenomic contigs and viruses from the Phycodnaviridae and Mimiviridae families based on orthogroup sharing. Color intensity reflects the proportion of genes shared between a genome pair (relative to the genome in the row).

Tree scale: 0.1 Ultrafast bootstrap support: 70 80 90 100 0 Kbp 0 Kbp 500 Kbp1000 Kbp1500 Kbp2000 Kbp2500 L250 (TFIIb) L250 (RNR2)L312 (helicase)L396 (A32)L437 (RNR1)R313 (PolB) R322 (WLM)R325 (TFIIS) R339 R409 (VLTF3) R429 R468 (TopoIIA) R480 Metagenomic contigs Ectocarpus siliculosus virus 1 Phaeovirus Feldmannia species virus v21821 cluster Emiliania huxleyi virus PS401 1 Coccolithovirus Emiliania huxleyi virus 156 3 v2164382 cluster Emiliania huxleyi virus 86 2 Medusavirus Acanthamoeba castellanii medusavirus Mollivirus Mollivirus sibericum Pandoravirus inopinatum Phycodnaviridae clades Pandoravirus salinus Pandoravirus Pandoravirus macleodensis Chlorovirus Pandoravirus neocaledonia SAMEA2620980_763909 Raphidovirus Heterosigma akashiwo virus 01 Coccolithovirus Acanthocystis turfacea Chlorella virus 1 Paramecium bursaria Chlorella virus FR483 1 Medusa-Molli-Pandoravirus Paramecium bursaria Chlorella virus MT325 1 Chlorovirus Only Syngen Nebraska virus 5 Phaeovirus Paramecium bursaria Chlorella virus 1 Paramecium bursaria Chlorella virus AR158 Paramecium bursaria Chlorella virus NY2A Prasinovirus Yellowstone lake phycodnavirus 1 Yellowstone lake phycodnavirus 3 Raphidovirus Yellowstone lake phycodnavirus 2 Dishui lake phycodnavirus 1 Bathycoccus sp. RCC1105 virus BpV1 Bathycoccus sp. RCC1105 virus BpV2 Mimiviridae clades Prasinovirus Micromonas pusilla virus 12T Micromonas pusilla virus SP1 Klosneuvirinae Micromonas sp. RCC1109 virus MpV1 Ostreococcus tauri virus RT-2011 Ostreococcus lucimarinus virus 1 Megavirinae Ostreococcus lucimarinus virus 2 Ostreococcus tauri virus 5 Mesomimivirinae Aureococcus anophagefferens virus Tetraselmis virus 1 475 Predatory protist viral clade LFCJ01000229.1 Pyramimonas orientalis virus PoV-01B SAMEA2620979_1476951+ SAMEA2620979_1447958 Rhodopsin genes SAMEA2620979_1011438 'Mesomimivirinae' Choanovirus ChoanoV1 3 Yellowstone lake mimivirus 1 Non-ChR type 1 rhodopsins Chrysochromulina ericina virus CeV01 1 Heliorhodopsins Phaeocystis globosa virus 16T 1 oPacV-421 Organic Lake phycodnavirus 1 Organic Lake phycodnavirus 2 1 Virion morphology Faunusvirus oPacV-662 1 Icosahedral PPVC cPacV-1605 1 2 Cafeteria roenbergensis virus BV-PW1 Amphora-shaped mPacV-611 Spherical Satyrvirus Tupanvirus deep ocean Tupanvirus soda lake Acanthamoeba polyphaga mimivirus Host group Megavirinae Hirudovirus Sangsue Acanthamoeba polyphaga moumouvirus Amoebae Moumouvirus goulette Megavirus chiliensis Heterotrophic flagellates Powai lake megavirus algae Hokovirus HKV1 Homavirus algae Klosneuvirus KNV1 Green algae Indivirus ILV1 Barrevirus Edafosvirus Klosneuvirinae Bodo saltans virus NG1 Completeness Dasosvirus Gaeavirus Full-representation MAGs Catovirus CTV1 Terrestrivirus Partial genome Harvfovirus Hyperionvirus

Fig. S14. Phylogenetic relationships among Mimi- and Phycodnaviridae , and the lineages including ChR-containing viruses, showing the distribution of rhodopsin genes, virion morphology, known host groups and genome sizes. The relationships between the long metagenomic contigs and their shorter relatives containing ChRs are shown in Fig. 1c. The 480 phylogenetic reconstruction is based on 12 near-universal genes and their names in Mimivirus (with functions in parentheses) and distribution in the alignment are depicted. The protein sequences for each of the 12 genes are available in Suppl. File 6a,b. bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 29

Cymbomonas sp. M3265 Cymbomonas tetramitiformis K-0467 Cymbomonas tetramitiformis PLY262 subgenus II sensu Pyramimonas formosa Pyramimonas australis Daugbjerg et al 2019 Pyramimonas cirolanae K-0255 Pyramimonas grossii K-0253 Pyramimonas sp. PA-2015 Pyramimonas parkeae NIES254 Pyramimonas parkeae CCMP726 (MMETSP0059) Trichocystis Pyramimonas parkeae K-006 Pyramimonas parkeae CCMP726 (FJ493499) Pyramimonas parkeae CCAC 0082 Pyramimonas moestrupii K-0265 Pyramimonas mantoniae K-0254 Pyramimonas melkonianii CCMP722 Pyramimonas mitra K-0241 Pyramimonas cyclotreta K-0398 Pyramimonas orientalis K-003 Pyramimonas sp. 'Greenland' Pyramimonas 'tychotreta' Vestigifera Pyramimonas diskoicola NC31 Pyramimonas diskoicola NC30 Pyramimonas tychotreta RS-11 Pyramimonas sp. CCMP2087 Pyramimonas olivacea Punctatae Pyramimonas aurea MBIC10862 subgenus 1 sensu Pyramimonas sp. TUP12 Pyramimonas sp. TU2 Suda et al 2013 Pyramimonas sp. Te2Py Pyramimonas amylifera CCMP720 Pyramimonas propulsa K-0005 Pyramimonas cyrtoptera Pyramimonas Pyramimonas octopus K-001 Tree scale: 0.01 Pyramimonas tetrarhynchus K-0002 Pyramimonas vacuolata M-4 Pyramimonas vacuolata MIP2 Fig. S15. Nucleotide chloroplast rbcL phylogeny of Pyramimonas. Strains appearing in Fig. 485 1c are indicated in green and the host of PoV-01B is indicated in bold . The clade names (subgenera) follow the previous studies 34–36. Dots indicate ultrafast bootstrap support values (70-100 range), the tree is rooted using rbcL sequences from Cymbomonas . rbcL sequence alignment is available in Suppl. File 7.

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 30

490 Supplementary Tables

Supplementary Table S1. Composition of intra- and extracellular buffers for electrophysiological experiments. All concentrations are given in mM, LJPs are listed in mV. Asp, aspartate; EGTA, ethylene glycol tetraacetic acid; HEPES, 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid; LJP, liquid junction potential; NMDG, 495 N-Methyl-D-glucamine

NaCl KCl MgCl2 CaCl2 CsCl NaAsp NMDG HCl NaBr NaNO3 HEPES EGTA LJP Intra Std. 110 1 2 2 1 - - - - - 10 10 - Extra 150 Cl- 140 1 2 2 1 - - - - - 10 - 0.6

80 Cl - 70 1 2 2 1 70 - - - - 10 -

10 Cl - - 1 2 2 1 140 - - - - 10 - -12.6

NMDG+ 1 1 2 2 1 - 140 140 - - 10 - 6.3

Br- - 1 2 2 1 - - - 140 - 10 - 1.0

- NO3 - 1 2 2 1 - - - - 140 10 - -0.3

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.15.042127; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. 31

List of supplementary data files

Suppl. File 1. Annotated metagenomic contigs analyzed in this study including the contigs containing ChR genes, file in Genbank format. 500 Suppl. File 2. Annotated genome draft of Pyramimonas orientalis virus PoV-01B , file in Genbank format. Suppl. File 3a. Database of collected ChR sequences, xlsx file: non-redundant protein sequences, confirmed activities, representative sequences after 98% identity clustering, representative sequences after 100% identity clustering of trimmed rhodopsin domains. 505 Suppl. File 3b. Final dataset of rhodopsin domains and the phylogenetic tree, nexus file. Suppl. File 4. List of analyzed green algal transcriptomes and transcriptomes and viral genomes, xlsx file. Sheet A: Green algal species analyzed for the presence of ACRs and CCRs: taxonomy and data sources (strais: accessions). References are provided for data sources other than MMETSP 10 and 1KP 11. Original species identifications are provided in 510 parenthesis when different. Sheet B: Viral genomes used for clarification of the origin of the metagenomic contigs containing ChR genes. Suppl. File 5. The source data from Ephys for electrophysiological experiments used in Figs. 2, S6 and S7, xlsx file. Suppl. File 6a. The twelve genes used for phylogenetic analysis of the Mimiviridae and 515 Phycodnaviridae , xlsx file: L250 (TFIIb), L312 (RNR2), L396 (helicase), L437 (A32), R313 (RNR1), R322 (PolB), R325 (WLM), R339 (TFIIS), R409, R429 (VLTF3), R468, R480 (TopoIIA). Suppl. File 6b. Phylogenetic tree of the Mimiviridae and Phycodnaviridae and the metagenomic contigs, nexus file. 520 Suppl. File 7. Alignment of rbcL gene sequences from Pyramimonas and Cymbomonas species and the corresponding phylogenetic tree, nexus file. Submitted to Genbank - Constructs.gbk. Mammalian codon-optimized constructs of prasinophyte and viral ACR coding sequences used for expression. (Genbank accession numbers MT353681-MT353684).