Supplementary Information for

Diel transcriptional oscillations of light-sensitive regulatory elements in open ocean eukaryotic plankton communities

Sacha N. Coesel1*, Bryndan P. Durham2, Ryan D. Groussman1, Sarah K. Hu3, David A. Caron4, Rhonda L. Morales1, François Ribalet1, and E. Virginia Armbrust1.

1School of Oceanography, University of Washington, Seattle, Washington 98195, USA. 2Department of Biology, Genetics Institute, University of Florida, Gainesville, Florida 32610, USA 3Marine Chemistry & Geochemistry, Woods Hole Oceanographic Institution, Woods Hole, MA 02543, USA 4Department of Biological Sciences, University of Southern California, 3616 Trousdale Parkway, Los Angeles, CA 90089-0371, USA

*Sacha N. Coesel Email: [email protected]

This PDF file includes:

Figures S1 to S10 Table S1 Legends for Datasets S1 to S7

Other supplementary materials for this manuscript include the following:

Datasets S1 to S7

1

two-comp system sensor kinase 1 0.0.30. 0. 0.10.1 0. 0.6 0.3 0. CYCc 0. 1 0.6 0. 10.6 A 0. 0. histidine kinase 0.1 0.3 0. 10. 0. 0.2 0. 0.2 0. 0. enzyme 0. 0. 0.6 0. two-comp system sensor kinase 0. 0. 0. 1 0.5 CYCc 1 1 Enzyme rhodopsin 0.6 0. 0. 0.2 0. 0. 1 0. 0. 0.2 11 1 0.2 0. 0.60. 0.1 0. 0. 0. PDE & Guan Cyc 1 1 0. 0. 1 1 0.5 0. 0.6 0. 0. 0. 0. 0. 0.2 0.5 0. two-comp system sensor kinase 0. 0.6 1 0. 0.2 0. 0.5 1 1 0. 0.2 Archaea 0.3 0.6 0. 0.6 0.2 0. 0. 0. 0.1 1 Bacteria 0.21 0.6 two-comp system sensor kinase 0.4 0.5 1 0.4 0. 0.5 Opisthokonta 0. 1 0.0.0. CYCc 0. 0.5 0.1 0.5 0. 0. 0. 0. 0. 0. 0.5 0. Alveolata 0.4 0. 0. 0. 0. 0. 0. 0. Cryptophyta 0.5 0.5 0. 1 0.6 0. 0. 0.6 0.5 0. 1 Haptophyta 0.4 1 0.20.5 1 1 0. 0.41 Stramenopiles 0. 0.5 0.5 0. SMC_N 0. 0.30.1 0.6 1 1 0.0.50.6 / 1 0.5 0.40.4 1 Apolipoprotein 0.4 Rhodophyta 0.6 0.50.0.0.0. 0.5 1 1 0.4 Neuromodulin Glaucophyta 1 0. 1 0. 0.5 0. 0. 0. 1 0. 0. Viruses 0. 1 0. 0.11 0.1 0.4 0.2 0. 0.6 C2 domain Other 1 1 0.2 0.5 0.3 0.3 0. 0.0.4 1 0. 0.50.0. 0. 0. 0.2 1 0.0. - 0.5 1 Chl - pump (archaeal) 0. 0. 0.5 1 0.6 0.4 0. 0.6 1 1 0. 0. B 0.6 1 0.6 0. 0. 0. 1 0. 0. 1 Channel rhodopsin (phototaxis) 1 0. 0.4 0.4 0. channel 0. 0. 0.40.6 0.6 1 0.2 0.3 0.4 0.3 0.6 1 1 0.0. 1 0. 1 0. 1 1 0. 1 0.10. 0. 1 1 1 0. 0.0.30.3 1 0. 0. 0.50.2 1 0.6 0. 1 1 0.0.0. C 0.1 0. 0.2 1 1 0.51 0.3 0. 1 0.4 0.3 0. sensory 1 0. 1 0.10.2 0.1 0.3 0. 0. 0. 0.2 0.50. 0.30. 0. 1 1 0.40.20.6 0. 0.4 1 0.2 0.6 0.1 1 0. 1 0. 0.10. 0.30. 1 1 0. 0.4 0. 0.2 0. 0.0. 1 1 0. 0. 1 0.3 0.5 1 0.6 0. 0.6 0.0. 0.5 1 0. 0.0.6 1 Sensory rhodopsin 1 0.4 1 0. 0. 0. 1 1 0.1 0. 0. 0. 0.3 0.1 0.4 1 0. 0.50.3 1 0. 0. 1 + 0.2 0. 0. 0.3 0.1 1 Sensory (phototaxis) / H -pump (archaeal) 0. 1 0.1 0.6 1 0. 0.6 0.51 Viral rhodopsin 1 0.0.4 0.4 11 0. 0.10.2 0.6 0.1 1 D 0.6 0.10.4 Proteorhodopsin ( H+ pump) 0. 0.0.61 0.0.4 1 0. 1 0. 0.6 1 ion-pump 0.11 1 0. 0. 0. 0.3 1 0.3 0. 0.50.30. 0.5 0.0. + - 1 Na and Chl -pump rhodopsins 1 1 0. 0.6 0. 0.5 1 0. 1 0.3 0.6 0.3 0. 0. 0. 1 0. 0. 1 0.10. 0. 0. 0.51 0.1 0.1 1 0.51 E 0. 1 1 0.5 0. 0.6 0.6 0.30.0.5 1 0.4 0.2 helio 0. 0.10. 1 0. 0.1 1 0. 0. 0.50.6 0.3 0.3 0. 0.10.1 0.4 1 0.4 0. 1 1 0. 0. Heliorhodopsin (sensory) 0. 0.5 0. 0.5 0.50. 0. 1 0.6 0.1 0. 0. 0.0.6 0. 1 0. 0. 0. 0.3 0.2 1 0.5 1 1 0.6 0.2 0.1 1 1 0.3 Fig. S1. Marine-relevant reference tree for rhodopsin. A mid-point rooted approximate maximum likelihood tree (Fasttree) of 933 reference sequences with known . Local support values (Shimodaira-Hasegawa test, 1,000 resamples) of > 0.85 are indicated. The edges are colored according to taxonomy. The main functional clades (A~E) are grey-shaded. The red circles indicate the JGI and NCBI- derived sequences used to build the HMM-profile, all others are obtained by hmmsearches on marine relevant genomes and transcriptomes. Functional annotations (black text) are based on the NCBI-derived reference sequences, and protein domain annotations (blue text) are derived from cd-hit blast (NCBI) domains of the full-length sequences. The abbreviations are as follows: CYCc, Adenylyl-/guanylyl cyclase, catalytic domain; SMC_N, structural maintenance chromosomes; PDE, rhodopsin phosphodiesterase; Guan cyc, rhodopsin guanylate cyclase.

2

0. 0.0.6 0.50. 1 0. 0. 0.4 0. 0.0.6 0.4 0.6 0. 1 A 0.60.0. 0.30.6 1 0. 0.30. 1 0. 0.5 0.3 0.1 0. 0.5 0. 0.40. 0. 1 1 1 0. 6-4/Dual-function CPF 1 0. 0.10.5 0.2 0.30.0. 1 0.5 0.6 0.4 1 0.5 0. 0.50. 0. 0. 0.5 0. 0.30.50.5 0.5 0. 0.5 0. 0. 0. 0.5 0. 0. 0. 0.6 0.0. 0.4 0. 0.6 1 0. 0. 0 1 0.4 0.1 0. 0. 0.6 1 0. 0. 1 0.50. 0. 1 0.5 0. 0. 0. 6-4 photolyase 0.10. 1 0.0.3 0.2 1 0. 0.51 0. 1 0. 1 0.40. 1 0.40. 0. 0. 0. 1 0. 0.0.6 0. 1 0.0.0.5 0.20.0.5 Animal cryptochrome 0. 1 1 0.2 1 1 1 1 1 0. 0. 1 1 1 1 1 1 0. 1 0.1 0.3 0.5 0. 0.5 0.6 0. 0. 0.11 0.2 0. 0. 0. Bacterial cryptochrome 1 0.5 0. 1 0. 0.1 0. 0. 1 0. 0.5 0. B 1 0. 0.4 0. 0.0.1 0.5 0.60. Cry-DASH-like 0.4 0.1 0.4 0.4 0. 0.0.2 0. 1 1 0. 0.1 0.6 1 0. 0.3 1 0.5 0.4 0. 0.5 Archaea 0.3 1 1 0.4 0. 1 0.3 Bacteria 0.1 1 0. 0.5 11 0.4 0.5 0. 0.0.6 1 Opisthokonta 0.5 0. 0. 0.2 1 0.5 0. 0. 1 0. 0.3 0.50. 0.1 Rhizaria 1 0.4 0. 0. 0.4 0. 0. 1 0. 0.5 Alveolata 1 0.1 0.30.6 0.6 1 1 1 1 1 0.5 0.6 Cryptophyta 0. 0. 0. 1 1 11 0.5 0. 0.20. Haptophyta 0. 0. 0.2 Cry-DASH 0.4 1 0. 0.10.6 0. 1 0. Stramenopiles 0. 0.6 0.10.1 1 0.31 Chlorophyta / Streptophyta 0.1 0. 1 0. 0.2 0. 0.1 Rhodophyta 0. 0. 0. 0.1 0. 0.30.5 0. 0.4 0.5 0. 0.0. 1 0. 1 Glaucophyta 0.2 0.2 0.6 0.2 1 0.6 0. 0.6 1 0. Viruses 0.0.5 0. 0.5 0.0.1 0. 0. 0.0.0. Other 0.6 0. 0. 1 1 0.0.0. 0. 0.2 0.4 0.5 0.5 0. 0. 1 0.1 1 0.6 0.1 0. 1 1 0. 1 0.0. 0.0.6 1 0.3 1 0. 0. 0.10.6 class I-III CPD photolyase 0. 0.5 0.1 0.0.6 0. 0.6 0. 0. 0.4 viruses C 1 0. 1 1 0. 1 1 0. 1 0.5 1 1 1 1 0. 0. 0.41 0.6 0.60. 0.5 1 0.4 0.6 0. 0. 0.6 1 0.6 1 0.2 0.1 1 1 0.5 1 0.2 1 cryptochrome-like 0. 0. 0.1 1 0. 0.5 0.5 0. 0.6 0.0. 0. 0.5 0. 0.0.1 0.5 1 0.4 0.3 1 1 0.4 0.6 0. 0.1 0. 0. 0. 0.61 1 0. 0.60. 0. 1 0.2 0. 0.0. 0. 0.1 0. 1 plant cryptochrome 1 0.6 1 0.1 0.3 0. 1 0.0. 0.41 0.1 0.51 0.5 0. 1 0. 0. 0. 0.6 1 0. 0. unknown 1 0.2 0. 0.5 0.40.1 0. 0.4 0.0.1 0. 1 0. 0.4 1 0.2 0.0.5 1 0.5 0.1 0.5 0.1 0.6 class II CPD-like 0. 0.4 0.4 0. 1 0.5 0.0.6 0.5 0. 0.1 0. D1 0.2 0. 0.6 1 0.4 1 0.3 0. 0. 0.40. 0. 0. 0.2 1 0.3 0. 0. 0. 0. 0.3 1 0.4 0. 1 0. 0. 1 0. 1 0. 0. 0.6 1 D2 0. 0.2 0. 0. 1 0. 1 0.50. 0.6 1 0. 0. 0. 0. 0. 0.4 1 0.5 0.0. 0.50.0.10. 1 0.0.10. 0. 0.1 0.5 0. 0. 0. 1 0.1 0.2 1 0.0.30.60.60.4 0.0. 0. Class II CPD 0. 0.0.60.6 0.6 1 1 1 1 0.0.40. 1 0. 0.50.0.4 0.40. 0. 1 1 0. 0.4 0. 0.0.2 0. 0.40. 0.6 0. 0. 0. 0. 0. 0. 0.3 1 0.50.2 1 0. 0.0. 0.4 0. 0.2 1 0.3 0. 0.6 1 0.2 0. 0.

Fig. S2. Marine-relevant reference tree for cryptochrome/photolyase. A mid-point rooted approximate maximum likelihood tree (Fasttree) of 1135 reference sequences with known taxonomy. Local support values (Shimodaira-Hasegawa test, 1,000 resamples) of > 0.85 are indicated. The edges are colored according to taxonomy. The main functional clades (A~D2) are grey-shaded. The red circles indicate the JGI and NCBI-derived sequences used to build the HMM-profile, all others are obtained by hmm searches on marine relevant genomes and transcriptomes. Functional annotations (black text) are based on the NCBI-derived reference sequences.

3

GTP cyclohydrolase II 0.6 1 0.5 0.5 0. 0. 0.0.0. 0. pkinase and REC 0.4 0.1 0.1 0.41 0. A 1 0.5 0.60.21 0.50.5 0. 0.6 1 0.6 0.60.4 0.1 1 0. 0.5 0.0.40.10.30.2 0.60. 0. 0. pkinase 0. 0.0. 0. LOV-PAS (bacterial) 0.0.60.6 0.6 0.2 0.3 0.60. 0.60. 0.5 0.60. 0.6 B 0. 0.6 11 0.50.4 1 0.3 0. 0.6 0. 0. kelch-repeat F-box 0. 0. 0. 0.0.0.5 0. pkinase Zeitlupe/Adagio 0.4 0.61 0.5 1 0.1 0. 0.6 0.6 0.60.6 1 0. 0.5 0.0.6 0. 0.3 0. 0.0.0.5 0.40.30. neuralized 0.4 0.50.5 0. 10.1 0. 0.1 0. 0. C 0.6 0. 0. 0. 0. 0.0. 0. 0. 0.0.0.0.0. 0.10.1 0.0. 0. 0. 0.0. 0. 1 0. 0.5 0. 0. 0.5 0.0. 0. 0. 0.61 0. 0. 0. 0.40.60.4 0. 0.10. Archaea 0.0. 0.10.6 0.6 0.0. 0.1 0. 0. 0.50.6 0. 1 0. 0.40. 0. 0. 0.6 0.0. 0.0. Bacteria 0.0.0.1 0.0. 0. 0. 0.0.1 0. 0. 0.0. 0. 1 0. 0.6 Opisthokonta 0.5 0.5 0.3 0.2 0. 1 0.40. 0.0.10. 0. 0.0.31 0.3 0.0.0.4 1 Rhizaria 0.6 0.5 1 0. 0. 1 0.0. 0. 0.40.30.4 1 0.3 0. 0.0.310.6 0. 0.40.0.4 0.0.0. Alveolata 0. 1 0. 0. 0.5 11 1 0.60. 1 0.5 0.5 0.1 Cryptophyta 0.5 0.6 0.0. 0. 0.1 0.40.60.0.4 1 0.4 0.10.50.60.0.0.10.5 1 0.5 1 Haptophyta 0.6 0. 0.0.60.6 0.5 10.11 0.40. 0.6 0.6 0. 0. 0.60.6 0. 0. 0.6 1 0. Stramenopiles 0.40.31 0.20. 0. 0.20.5 neuralized 0.30.0.0. 0.4 0. 0.2 Chlorophyta / Streptophyta 0.4 0. 0. 0. 0. 0.0.1 0.50.6 0.11 0. 0. 1 0.2 Rhodophyta 0.2 0.3 0.5 10.10.0.50.40. 1 0. 0.4 0. 0.60.50.6 0.0.5 0.6 0.6 0.30.2 0.11 0. 1 0.6 Glaucophyta 0.2 0.4 0.0. 0.30. 0.5 0. 0.10.60.0. 0.6 0.60.60.1 0.2 Helmchrome 0.4 0.60.1 Viruses 0.5 0. 0.6 1 0.3 0.40.20.30.2 0. 0. 1 0. 0. RGS 0. 0.60.30. Other 0.1 0. 0.5 1 0.0. 0.41 0.6 0. 0.6 0.6 0.60. 0. 0.0.0.4 0.6 0. 0.1 1 0.50.0.6 0. 0.0. 0.0.10. 0.5 0.20.2 0.1 0.0. 0.5 0.0.3 0. HLH 0.1 0. 10.1 0.3 0.0.5 0. 0.6 0. 1 0.6 0.6 0. 0.6 0.1 0.3 0.4 0.0. 0.0. 0.4 0.3 0.20.4 0. 0. 0.6 0. 0. 0.1 0.6 0.5 D 0. 0.2 0.6 0.6 0.1 0. 0.5 1 0.6 0. 0.2 0. 0.60. 0.20.6 0.0. 0.2 0.1 0. 0.20. 0. 0. 0.0.60. 0. 0. 0.5 1 GTP cyclohydrolase II 0.5 0. 0. 0. 0.6 0.20. 0. 0.2 0.50.1 0. 0. 0.0.60. 0.4 0. 0.50.0.60.20.0.0.1 EAL and GGDEF 0. 0.40.2 0.6 0. 0. 0.0. 0.2 0.4 0. 0. 0.0.4 pkinase 0. 0.5 0.1 0. 0. 0.20.2 0.6 0. 0.40.6 0.6 0.2 0.0.0. 0.3 0.4 0.0.4 0.3 0.20. 0. 0. 0.4 0.0. 0. 0. 0. 0.5 0.3 0.6 0.6 0. 0.0. 0.5 0. 0. 0.0.6 0.4 1 0. 0. 0.4 0. 0.0.3 1 0.2 0.4 0.4 0. 0.2 0. 0.4 0.5 0.0.30.4 0.1 0.5 0.2 0. 0.6 0.10.0.0. K-voltage gated channel (animalia) 0.4 0.30.50.60.3 0.5 0. 0.6 0.5 1 0.5 0. 0.3 0.6 0. 0. 0.20.2 0.3 0. 0. 0.2 0. 0. 0.2 0.5 0.1 0.2 0.5 0.6 0.6 0.5 1 0.30. 0.3 0.6 0.1 0.5 0.50.2 0.2 0.50. 1 0.2 0. 0.10.6 0.6 0.60.6 0. 0.3 0. 0. 0. 0.2 0.3 0.0.3 0. Homeobox 0. 0.60.4 0. 0. 0.0.20.2 0. 0. 0. 0. 0. 1 0. 0.0. 0.40. 0. 0. 0.0.6 0.5 0. 0. 0.0.40.50.40.5 0.0.5 0.1 0.0.0. bZIP 0.1 0.0.6 0.0.6 0.6 0.1 0.0. 0.60.3 0.0.6 0.1 0.0. 0.0.3 0. 0. 0.3 0.2 0. 0. 0. 0. 0.0. 0. 1 0.5 1 0.30. 0. 0.3 0.1 0.0.1 0.2 0. 0. 0. 0. 0. 0.10.1 0. 0.4 0.5 0.1 0.10.3 0. 0.1 0.1 0.0.50.0.6 0.4 1 0. 0.0. 0. 0.1 1 0.60.4 0. 0. 1 0. 0.50.0.2 1 0. 0. 0. 0.0. 0.61 0. 0. 0.60. 0.1 0.4 0.4 0.3 0.4 0.4 0. 0.0. HSF 0.3 1 1 0.5 0.6 1 0.6 0.6 0.2 0. 0.0. 0.2 0.6 0.2 1 0.5 0.30.2 0.0.40.5 0.5 0.5 1 0.4 0. 0.5 0. 0.60.0.50.30. 0. 0.5 0.50.6 0.3 0. 0.10.1 0.5 0.1 0.40. 0.2 1 0.2 1 LOV-PAS-kinase (bacterial) 0. 0. 0.3 0.1 0.0.20. 0.6 0. 0. 0.60.3 0.6 0. 0.5 0.5 0.5 0.30. 1 0. 1 0.2 0. 1 0.40.0.4 0. 0. 0.1 0.40.30.50.2 E 0.0.3 0.2 0. 0.0.0. 0. 0.40.0. 0.1 0.60.10.0.60. 0.60.1 0.4 0. 0.0. 0.0.0.40. 0. 0.60.0.10.60.1 0.2 0.50.30.50.2 bZIP 0.10.0.51 0. 0.0.3 0.0.20. 0.4 1 0.0.6 0.0. 0.1 0.60. 0.40.4 0.50. 0.6 0.20. 0.60.3 0.0. 0.5 0. 0.0.0. 0.2 Aureochrome 0.0.1 0.2 0.4 0.0.0.20.20.60. 0. 0. 0.10.0.6 1 0.0.6 0. 0.0.60.1 0.0.50.5 0.6 0. 0. 0.0.20. 0. 0.60.4 0.0.2 1 0.2 0.40.10.3 0. 0.20.0.5 0.20. 0. 0. 0.1 0. 0.60.6 0.4 globin-like 0.0.0. 0.2 0. 0. 0.4 0. 1 0. 1 0.0.5 1 0.0.0.1 0.4 0.10.1 SAM and RRM 0.5 0.3 0.0.20.2 0.50.6 0.4 0.0. 0. 0.0.0.4 0.1 0.0. 0.0.0. 1 0.4 0.1 0.3 0. 0.1 0.6 0.2 pkinase Phototropin/Neochrome 0.3 0.6 0. 0.6 1 0.60.0. 0.1 0. 0.1 0. 0. 0. 0.5 0.60.4 1 0.6 0. 0.50.60. 0. 0. 0.1 0.4 0.0. 0.5 0. 0.

Fig. S3. Marine-relevant reference tree for LOV-domain proteins. An unrooted approximate maximum likelihood tree (Fasttree) of 2509 reference sequences with known taxonomy. Local support values (Shimodaira-Hasegawa test, 1,000 resamples) of > 0.85 are indicated. The edges are colored according to taxonomy. The main functional clades (A~E) are grey-shaded. The red circles indicate the JGI and NCBI- derived sequences used to build the HMM-profile, all others are obtained by hmm searches on marine relevant genomes and transcriptomes. Functional annotations (black text) are based on the NCBI-derived reference sequences, and protein domain annotations (blue text) are derived from cd-hit blast (NCBI) domains of the full-length sequences. The abbreviations are as follows: REC, signal receiver domain; RGS, regulator of G-protein signaling; HLH, helix-loop-helix domain; bZIP, basic leucine zipper domain; HSF, Heat shock factor; SAM, S-adenosylmethionine-dependent methyltransferases; RRM, RNA recognition motif.

4

0.3 0. 0.1 1 0.0. 0.6 1 0. 0. 1 1 0.4 1 0. 1 0. 0.10.3 1 0.6 1 0. 1 1 1 1 1 0.6 1 1 1 1 1 1 1 1 1 1 0.5 0. 0. 1 1 1 0.5 1 0.5 0. 0. 1 1 0. 0. 0. 0. 1 0.6 1

1 1 1 1 0.5 1 1 0. 1 0. 0.1 1 0.2 0. 0.3 1 1 1 1 0. 1 1 1 0. 0.5 1 1 0. 0. 1 0. 0. 1 1 1

0. 1 1

1 1

1 1 1 0.

0. 0. 0. 0.4 0. Bacteria 0.2 1 1 Opisthokonta 0. 0. 0. Cryptophyta 0. 0. 1 0.6 1 1 Stramenopiles 1 0. 0.1 0.1 0.1 0.2 1 Chlorophyta / Streptophyta 0.5 1 0.6 0.6 1 Glaucophyta 0. 0.6 1 Viruses 1 1 Other 1 1 0.6 1

0. 0.6 0. 1 0. 0. 0. 0.1

1 0.6 0. 0.1 1 0. 1 1 1 0.40.2 0.5 0. 1 0.3 1 1

0. 0.

0. 1 1 1 1 1 1 1 0. 1 0. 0. 0.4 1 1 1 1 0. 1 0. 1 1 0. 1 1 1 0. 1 0.5 1 1 0.6 0. 0.6 0.3 0.4 1

Fig. S4. Marine-relevant reference tree for phytochrome. A midpoint-rooted approximate maximum likelihood tree (Fasttree) of 301 reference sequences with known taxonomy. Local support values (Shimodaira-Hasegawa test, 1,000 resamples) of > 0.85 are indicated. The edges are colored according to taxonomy.

5

A B n = 850 (92%) n = 140 (100%)

1.2 genome 1.2

Chlre4 0.8 0.8 Emihu1 Phatr2 0.4 0.4 THAOC pendant_length pendant_length Thaps3 0.0 0.0 NA viculales viculales Pavlovales Mamiellales Thiotrichales Zygodiscales Prymnesiales Chlamydomodales Thalassiosirales Nephroselmidales Chlamydomodales Pseudoscourfieldiales

C D n = 1033 (89%) n = 68 (100%)

1.2 1.2

0.8 0.8

0.4 0.4 pendant_length pendant_length

0.0 0.0 NA viculales viculales Corethrales Bacillariales Thalassiosirales Coccolithales Chlorachniida Isochrysidales Lithodesmiales Pelagomodales Thalassiosirales Prasinococcales Chlamydomodales Sphingobacteriales

Fig. S5. Phylogenetic placement analysis of synthetic metatranscriptome reads with known taxonomy. The accuracy of the phylogenetic placement was assessed using a synthetic mutated metatranscriptome dataset generated from genome-derived gene models of Chlamydomonas reinhardtii (Chlre4; Chlorophyceae-Chlamydomodales), Emiliania huxleyi (Emihu1; -Isochrysidales), Phaeodactylum tricornutum (Phatr2; Bacillariophyceae-viculales), Thalassiosira oceanica (THAOC) and Thalassiosira pseudonana (Thaps3; Bacillariophyceae-Thalassiosirales) (see Methods). Phylogenetic placements of HMM-identified short reads ('synthetic transcripts') with homology to the LOV-domain, rhodopsin, cryptochrome/photolyase, and phytochrome are indicated in panels A~D, respectively. The number of synthetic short reads identified by the hmmsearch (E < 0.001) are indicated by (n). The taxonomic placement for each synthetic short read is indicated by a dot in the jitter box plots. The dots are color-coded based on their respective genome of origin (see color key - boxed). The Y-ax categories represent the pplacer predicted taxonomy at phylogenetic order level. Reads that could not be placed at the order level are indicated with 'NA'. The X-axis represents the predicted maximum likelihood pendant branch length, e.g. the branch length for the placement edge of the short reads. A pendant length cut-off of > 0.7 has been used throughout this work. The percentage of correctly placed short reads (pendant length cut-off < 0.7) are indicated within parenthesis.

6

Tree scale:

Colored ranges Raphid-pennate opisthokonta Araphid-pennate Animalia rhodophyta Choanozoa Taxonomy Fungi Motility Trophic mode Rhodophyta cryptophyta Amoebozoa glaucophyta Polar-centric Bacillariophyceae Phaeodactylum tricornutum Chlorodendrophyceae chlorophyta Amphora coffeaeformis Fragilariopsis cylindrus

Thalassiothrix antarctica Pyramimodophyceae Synedropsis rectacf

Grammatophora oceanica teletaCapitella haptophyta Oikopleura dioica Lepeophtheirus salmonis Chlorophyta Salmo salar Nematostella vectensis Monosiga brevicollis Amphimedon queenslandica Astrosyne radiata Debaryomyces hansenii Batrachochytrium dendrobatidis Rhodosorus marinus Prasinophyceae Pyropia yezoensis Triceratium dubium Cyanidioschyzon merolae Compsopogon coeruleus rhizaria Extubocellulus spinifer Madagascaria erythrocladiodes Thalassiosira pseudonanaOdontella aurita Porphyridium aerugineum Stygamoeba regulata Paramoeba atlantica Chlorophyceae Vannella sp Ditylum brightwellii Prasinoderma coloniale Trebouxiophyceae Eucampia antarctica Pyramimonas parkeae Radial-centric Pyramimonas amylifera Chaetoceros debilis Pycnococcus sp Dactyliosolen fragilissimus Unidentified RCC1871 Unidentified eukaryote RC2339 Picocystis salinarum Nephroselmidophyceae stramenopiles Proboscia alata Coscinodiscus wailesii Chlamydomonas euryale Stichococcus sp Stephanopyxis turris Coccomyxa sp Aulacoseira subarctica Nephroselmis pyriformis Mamiellophyceae Micromonas pusilla Corethron hystrix Dolichomastix tenuilepis Gloeochaete witrockiana Leptocylindrus danicus Cyanoptyche gloeocystis Bolidophyceae Bolidomonas pacifica Glaucophyta Trophic mode Florenciella parvula Guillardia theta Chroomonas mesostigmaticacf Dictyocha speculum Unidentified eukaryote CCMP2293 Pteridomonas danica Unidentified eukaryote CCMP2436 Cryptophyta Phaeocystis antarctica Heterotrophic Rhizochromulina marina Scyphosphaera apsteinii Dictyochophyceae Chrysoreinhardia sp Emiliania huxleyi Coccolithus pelagicussspbraarudi Unidentified eukaryote D1 Phototrophic (silicoflagellates) Unidentified Pelagococcuseukaryote CCMP2097 subviridis Lotharella oceanica Minchinia chitonis Pavlovophyceae Heterosigma akashiwo Elphidium margaritaceum Ochromonas sp Sorites sp virens Nannochloropsis oceanica Condylostoma magnum Mixotrophic Mallomonas sp Fabrea salina Protocruzia adherens focardii Pseudokeronopsis sp Strombidium inclinatum Synchroma pusillum Favella ehrenbergii Prymnesiophyceae Strombidinopsis acuminatum Aristerostoma sp Platyophrya macrostoma Unidentified eukaryote CCMP2298 Tiarina fusus japonicum thermophila Anophryoides haemophila Photo/Mixotrophic Pelagophyceae Bicosoecid sp Uronema sp PhytophthoraPhytophthora ramorum sojae Pinguiococcus pyrenoidosus Unknown Pyrocystis lunula Alexandrium monilatum brevis Alveolata sp Aplanochytrium kerguelense Chlorarachneae marina Raphidophyceae Scrippsiella hangoei Amoebophrya sp SymbiodiniumThoracosphaera kawagutii heimii Myrionecta rubra Mesodinium pulex Prorocentrum minimum Eustigmatophyceae Foraminifera

Chrysophyceae

OomycotaBigyra Pinguiophyceae Ciliophora

Dinophyceae

Apicomplexa

Fig. S6. Trophic mode and motility of different groups of marine . An 18S rRNA maximum likelihood phylogenetic tree, representing 117 different eukaryotic orders relevant for the marine environment, was used to visualize trophic mode, motility and taxonomy of the represented protists. For illustrative purpose, one representative per order level was retained to represent all species within that order. The inner colored strip indicates the different trophic modes. The black checkmarks and red stars indicate the presence of flagella or cilia, respectively, indicative of motility of mature cells. The taxonomic phylum and class-level classifications are indicated by the colored ranges and outer colored strip.

7

Color Key LOV−NeuralizedLOV−NeuralizedLOV−Neuralized

20000 60000 (reads L-1 ) Value sum reads L-1 taxonomy

1120000

Dinophyceae − Gonyaulacales859346 Dinophyceae −

1211344 Dinophyceae − Peridiniales Dinophyceae − Suessiales 210120 Dinophyceae − Prorocentrales

Dinophyceae − Peridiniales155747 Dinophyceae − 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02

Dinophyceae − Prorocentrales Fig. S7. LOV-neuralized transcript abundance over the four-day sampling period. Heat map representing color-coded transcript levels (reads L-1; mean of biological duplicates) of transcripts mapped to LOV- neuralized over the four-day sampling period. Indicated are the number of reads and the assigned taxonomy up to the phylogenetic order level. Grey blocks indicate sampling points for which transcripts Dinophyceae − Gymnodiniales were not detected. Significant diel periodicity (RAIN, p-value 0.001) was not established for transcripts in these categories. 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02

8

Tree scale: 10

10

NIFL_KLEPN/25-136 NQNP-R-LLAS Fig. S8 - part 1 NIFL_AZOVI/25-136 GKNE-S-ILSN Gamma.Q0HPR6 GKNCRS--LEA Plant.Q5Z8K3.1 GRNC-R-FLQC Plant.Q67UX0.1 GRNC-R-FLQC Plant.Q94BT6.2 GGNC-R-FLQC Plant.Q6UEI5 GRNC-R-FLQC Plant.NP_849983.1 GRNC-R-FLQC Plant.AAK27435.1 GRNC-R-FLQY Plant.Q6UEI4 GRNC-R-FLQY Archaea.Q5UWI7 GRNP-R-FLQG Archaea.Q5V5P7 GKNC-R-ILQG Archaea.Q5V3S3 GRNC-R-FLQG Archaea.ZP_02014689.1 GVNC-R-FLQG Archaea.Q3IM51 GRNC-R-FLQG Firmi.Q8ESN8 GKNC-R-FLQG Firmi.O34627.1 GKNC-R-FLQG Firmi.YP_001422316.1 GKNC-R-FLQS Firmi.P58724.1 GSNC-H-FLQG Firmi.Q722B8 GSNC-H-FLQG Firmi.A0AGP3 GSNC-H-FLQG Firmi.Q92DM1.1 GSNC-H-FLQG Gamma.Q30NS0 GKNC-R-FLHS Beta.Q2Y837 GKNC-R-FLQG Gamma.Q3J6W8 GKNC-R-ILQG Beta.ZP_00244844.1 GRNC-R-FLQA Cyano.Q5N2F7 GKSC-R-FLQG Gamma.Q1QU87 YRDC-R-FLQG Gamma.Q88JB0 YQDC-R-FLQG Gamma.Q3KHW7 YQDC-R-FLQS Gamma.Q88E39 YQDC-R-FLQG Gamma.Q4BW45 GKNC-R-FLQG Cyano.Q8YT51 GRNC-R-FLLG Cyano.ZP_00105980.1 GQNC-R-FLQR Cyano.Q8YSB9 GQNC-R-FLQT Cyano.ZP_00111211.1 GRNC-R-FLQG Chloro.Q0LGE0 GKNC-R-FLQG Actino.Q1ARZ9 GRNC-R-FLQR Chrysochromulina_polylepis_Strain_CCMP1757_tax72548_locID_CAMPEP_0183028698_seqID6484332 GRNC-R-FLQG Prymnesium_parvum_Texoma1_tax97485_locID_CAMPEP_0113228876_seqID2464183 GKNC-R-FLQG S14C1_TRINITY_DN118670_c0_g1_i1_1 GKNC-R-FLQG Isochrysis_sp_Strain_CCMP1244_tax37098_locID_CAMPEP_0196714918_seqID13325056 GQNC-R-MLQG Emiliania_huxleyi_Strain_379_tax2903_locID_CAMPEP_0196452822_seqID12294048 GRSM-R-FLQG Isochrysis_galbana_Strain_CCMP_1323_tax37099_locID_CAMPEP_0183643828_seqID8907196 GRNC-R-FLQG Exanthemachrysis_gayraliae_Strain_RCC1523_tax119497_locID_CAMPEP_0206032324_seqID17854269 GHNC-R-FLQG non_described_non_described_Strain_CCMP_2436_tax483367_locID_CAMPEP_0179880362_seqID11754254 GKSC-R-FLQG Isochrysis_galbana_Strain_CCMP1323_tax37099_locID_CAMPEP_0179754564_seqID11317577 GRNC-R-FLQG S20C1_TRINITY_DN1789606_c0_g1_i1_1 GRNC-R-FLQG S23C1_TRINITY_DN1633959_c0_g2_i2_2 GRNC-R-FLQG S35C1_TRINITY_DN1785597_c0_g1_i1_1 GRNC-R-FLQG Chrysoculter_rhomboideus_RCC1486_tax259385_locID_CAMPEP_0119434504_seqID1815856 GRNC-R-FLQG Pavlova_gyrans_Strain_CCMP608_tax44452_locID_CAMPEP_0206036018_seqID17859450 GKNC-R-FLQG Pavlova_sp_Strain_CCMP459_tax298111_locID_CAMPEP_0185159732_seqID14214958 GRNC-R-FLQG non_described_non_described_Strain_CCMP_2436_tax483367_locID_CAMPEP_0179958750_seqID11784856 GRNC-R-FLQG RGS Pavlova_gyrans_Strain_CCMP608_tax44452_locID_CAMPEP_0206046046_seqID17863188 GRNC-R-FLQG Isochrysis_sp_Strain_CCMP1244_tax37098_locID_CAMPEP_0196676098_seqID13304108 GLNC-R-FLQG Isochrysis_galbana_Strain_CCMP1323_tax37099_locID_CAMPEP_0179834422_seqID11354567 GLNC-R-FLQG Chrysochromulina_ericina_Strain_CCMP281_tax156174_locID_CAMPEP_0181177964_seqID13410604 GRNC-R-FLQG Prymnesium_parvum_Strain_Texoma1_tax97485_locID_CAMPEP_0182832504_seqID4784413 GRSC-R-FLQG Chrysochromulina_polylepis_Strain_CCMP1757_tax72548_locID_CAMPEP_0183105886_seqID6506699 GRSC-S-FLQG Florenciella_parvula_CCMP2471_tax236787_locID_CAMPEP_0119514810_seqID2156995 GRNC-R-FLQG Aureococcus_anophagefferens_tax44056_locID_jgi|Auran1|28933_seqID4067178 GRNC-R-FLQG non_described_non_described_Strain_CCMP2097_tax483370_locID_CAMPEP_0184285208_seqID11704964 GRNC-R-FLQG S23C1_TRINITY_DN1655332_c6_g2_i1_2 GRNC-R-FLQG Ochromonas_sp_CCMP1899_tax2985_locID_CAMPEP_0119036490_seqID2149064 GRNC-R-FLQG S29C1_TRINITY_DN1477913_c0_g1_i1_3 GRNC-R-FLQG non_described_non_described_Strain_CCMP2298_tax483371_locID_CAMPEP_0173273302_seqID14299528 GRNC-R-FLQG unid_sp_Strain_CCMP2135_tax1462469_locID_CAMPEP_0198672072_seqID16612391 GRNC-R-FLQG S23C1_TRINITY_DN1680893_c1_g1_i4_3 GRNC-R-FLQG Aureococcus_anophagefferens_tax44056_locID_jgi|Auran1|30588_seqID4064323 GRNC-R-FLQG non_described_non_described_Strain_CCMP2097_tax483370_locID_CAMPEP_0184113078_seqID11642886 GRNC-R-FLQG non_described_non_described_Strain_CCMP2097_tax483370_locID_CAMPEP_0184224216_seqID11682975 GRNC-R-FLQG S15C1_TRINITY_DN905895_c0_g1_i1_3 GRNC-R-FLQG S17C1_TRINITY_DN2290155_c4_g1_i1_1 GRNC-R-FLQG S29C1_TRINITY_DN294297_c0_g1_i1_2 GCNA-R-FLQG S29C1_TRINITY_DN1824770_c0_g1_i2_1 GCNA-R-FLQG Dinobryon_sp_Strain_UTEXLB2267_tax98059_locID_CAMPEP_0201091724_seqID10255245 GCSC-R-FLQG Dinobryon_sp_Strain_UTEXLB2267_tax98059_locID_CAMPEP_0170084274_seqID4835126 GCSC-R-FLQG Chromulina_nebulosa_Strain_UTEXLB2642_tax96789_locID_CAMPEP_0196767552_seqID13386433 GTNC-R-FLQG Tree scale: 0.1 Ochromonas_sp_CCMP1899_tax2985_locID_CAMPEP_0119052840_seqID2140184 GTSC-R-FLQG Mallomonas_Sp_Strain_CCMP3275_tax2988_locID_CAMPEP_0182438282_seqID14708743 GTSC-R-FLQG Aureococcus_anophagefferens_Strain_CCMP1850_tax44056_locID_CAMPEP_0168878400_seqID11107537 GKKC-T-FLQG S23C1_TRINITY_DN1464167_c1_g1_i1_3GRTLKIMQG GKKC-T-FLQC S30C1_TRINITY_DN1689472_c0_g2_i1_2 GKSC-R-FLQC S21C1_TRINITY_DN2350756_c0_g1_i1_2GKQLACLQG GKSC-R-FLQC Colored ranges Aureococcus_anophagefferens_Strain_CCMP1850_tax44056_locID_CAMPEP_0168877706_seqID11105166GNTLKMLQG GQKC-S-FLQG S18C1_TRINITY_DN2552887_c6_g3_i2_2G-----IEG Haptolina brevifila GQSC-K-FLQG S22C1_TRINITY_DN1976611_c0_g2_i1_1 GQNC-K-FLQT homeodomain S17C1_TRINITY_DN616323_c0_g1_i1_2GQTLRILQG GQNC-K-FLQT Alphasp.Q2CIF5|Q2CIF5_9RHOBGRSLSIIQG GRNC-R-FLQG Alphasp.Q167W8|Q167W8_ROSDO GRNC-R-FLQG Alphasp.Q1YEU2|Q1YEU2_9RHIZGNDLKIIQG GRNC-R-FLQG EF hand Alphasp.Q0G496|Q0G496_9RHIZGTTLACIQG GRNC-R-FLQG Alphasp.Q0BT22|Q0BT22_GRABCGRTLRCIQG GKNC-R-FLQG Alphasp.Q0FR10|Q0FR10_9RHOB Calcidiscus leptoporus GHNC-R-FLQG bZIP Alphasp.Q2NB98.1|LVHTH_ERYLHG----RIQG GRNC-R-FLAG Alphasp.Q1N7J1|Q1N7J1_9SPHNGRTIDILQG GRNC-R-FMRG Alphasp.Q2G8Z7|Q2G8Z7_NOVAD Scyphosphaera apsteinii GRNC-R-FLRG Branch style Alphasp.Q1GUF5|Q1GUF5_SPHALGKTLDIIQG GRNC-R-FLTG HSF Alphasp.Q2NCA3.1|LVHK1_ERYLHGRTLDIIQG GRNC-R-FLQG Alphasp.Q2N9L9|Q2N9L9_9SPHNGHTLELIQG Chrysochromulina rotalis GRNC-R-FLQG Alphasp.Q2G5U0|Q2G5U0_NOVAD GRNC-R-FLQG pKinase Alphasp.Q0M3Z0|Q0M3Z0_9CAULGHTLELIQG GRNC-R-FLQG PF00989 Alphasp.Q1M667.1|LOVHK_RHIL3GHTFELIQG GRNC-R-FLQG Alphasp.A6X554.1|LOVHK_OCHA4 GKNC-R-FLQG Alphasp.Q8YC53.1|LOVHK_BRUMEGHTLELIQG GRNC-R-FLQG RGShmm-LOV Alphasp.A5VUS1.2|LOVHK_BRUO2GHTLELIQG GRNC-R-FLQG AlphaZP_00052303.2GHTLEIIQG GRNC-R-FLQG Alphasp.Q1YFS4|Q1YFS4_9RHIZ GRNC-R-FLQG HMC_N Alphasp.Q0LTE1|Q0LTE1_9CAULGHTLEIIQG GRNC-R-FLQG MMETSP and Env AlphaZP_00051334.2GQTLELIQG GRNC-R-LLQG Alphasp.Q35E64|Q35E64_9BRAD GTNC-R-FLQG Gamma.Q8P827 GQTLERIRG GNNC-R-FLQG Gamma.Q8PJH6 GHTLELIQG GNNC-R-FLQG Gamma.Q3BRX8 GRTLDLIQG GNNC-R-FLQG Gamma.Q2P134 Haptolina brevifila GNNC-R-FLQG Taxonomy strip Alphasp.Q1NI33|Q1NI33_9SPHNGRTLELLQG GRNC-R-FLQG Beta.Q0XDC0 GRTLELIQG GHNC-R-FLQG Gamma.Q4ZSY3.1 GSNC-R-FLQG Gamma.Q881J7.1 GRTLEIIQG GTNC-R-FLQG Pavlovales Gamma.Q48IV1.2 GKTLDLLRG GTNC-R-FLQG Plant.BAE20166.1 GKTLQILQG GRNC-R-FLQG Plant.BAE20165.1 Dictyocha speculum GRNC-R-FLQG Coccolithales Plant.BAD32622.1 GRSLNILQG GRNC-R-FLQG Plant.BAD32624.1 GRSLNIIQG GRNC-R-FLQG Plant.XP_471720.1 GRNC-R-FLQG Arabidopsis_thaliana_PHOT1_NP_190164.1_tax3702/459-586GKTLAMLQG GRNC-R-FLQG Isochrysidales Plant.BAD32623.1 GKTLGMLQG unclassified PelagophyceaeGRNC-R-FLQG Plant.BAC55266.1 GKSLSILQG GNNC-S-FLQG Plant.BAA36192.2 GNNC-R-FLQG Plant.BAC55265.1 GRTLGMLQG Florenciella Phaeocystales GNTLGMLQG Florenciella parvula GENC-R-FLQG Florenciella_parvula_CCMP2471_tax236787_locID_CAMPEP_0119508990_seqID2182769Pseudopedinella elastica GRNC-R-FLQG Florenciella_sp_Strain_RCC1587_tax236786_locID_CAMPEP_0182566960_seqID14917611GRXLGIXQG GRNC-R-FLQG Prymnesiales S19C1_TRINITY_DN1457055_c0_g1_i1_3GCTLRMLQG GRNC-R-FLQG Ochromonas_sp_Strain_CCMP1393_tax420556_locID_CAMPEP_0175007106_seqID4755300GRTLSMLQG GRNC-R-FLQG S06C1_TRINITY_DN1965650_c0_g1_i1_1 GKNC-R-FLQG Zygodiscales Mallomonas_Sp_Strain_CCMP3275_tax2988_locID_CAMPEP_0182416596_seqID14709890GRTLRMLQG GKNC-R-FLQG S15C1_TRINITY_DN1235659_c0_g1_i1_2GRTLRMLQG GRNC-R-FLQG S08C1_TRINITY_DN1378429_c0_g2_i2_1 GRNC-R-FLQG Durinskia_baltica_Strain_CSIRO_CS_38_tax400756_locID_CAMPEP_0170336860_seqID5895029GQTLKCLQG GRNC-R-FLQG Bacillariophyceae Nitszchia_sp_ROS97004_tax2857_locID_CAMPEP_0113484012_seqID471134GQTLRCLQG Florenciella parvula GRNC-R-FLQG Kryptoperidinium_foliaceum_Strain_CCMP_1326_tax160619_locID_CAMPEP_0176067624_seqID6200130GQTLRCLQG GRNC-R-FLQG S17C1_TRINITY_DN2355229_c4_g4_i1_2 GRNC-R-FLQG Chrysophyceae Amphiprora_sp_tax265536_locID_CAMPEP_0172471426_seqID4478203GRTPKMLQG GRNC-R-FLQG Craspedostauros_australis_Strain_CCMP3328_tax1486917_locID_CAMPEP_0198115238_seqID16345929SRTPAILQG Exanthemachrysis gayraliaeGRNC-R-FLQG Stauroneis_constricta_CCMP1120_tax265584_locID_CAMPEP_0119572860_seqID3284954Phaeocystis antarctica GRNC-R-FLQG Phatr2_49116_tax556484GETCGMLQG GRNC-R-FLQG Dictyochophyceae S26C1_TRINITY_DN1756548_c7_g1_i1_3GQSCAMLQG GRNC-R-FLQG bZIP S17C1_TRINITY_DN1153188_c0_g1_i1_2GQTPALLHG Haptolina ericina GRNC-R-FLQG Rhizochromulina_marina_cf_CCMP1243_tax1034831_locID_CAMPEP_0118974330_seqID3297256Prymnesium polylepis GRNC-R-FLQG Pelagophyceae S22C1_TRINITY_DN2367757_c0_g1_i1_1GRTCGELQG GRNC-R-FLQG S22C1_TRINITY_DN1916047_c3_g1_i1_3GQTCAMLQG GRNC-R-FLQG Florenciella_sp_Strain_RCC1007_tax236786_locID_CAMPEP_0205916412_seqID17655237 GRNC-R-FLQG Florenciella_sp_Strain_RCC1587_tax236786_locID_CAMPEP_0182556236_seqID14929167GQTCRVLQG GRNC-R-FLQG Chlorophyta S30C1_TRINITY_DN1962777_c0_g1_i1_2GKTCGVLQG GRNC-R-FIQG Florenciella_parvula_Strain_RCC1693_tax236787_locID_CAMPEP_0182552806_seqID14902007GETCAVLQG Phaeocystis antarctica GRNC-R-FIQG S16C1_TRINITY_DN511329_c0_g1_i1_2 GRNC-R-FIQG S06C1_TRINITY_DN1697914_c0_g2_i1_3GETCAVLQG GRNC-R-FMQG Dinophyceae S17C1_TRINITY_DN2451312_c0_g1_i1_1GKTCKMLQG GRNC-R-FMQG Rhizochromulina_marina_cf_CCMP1243_tax1034831_locID_CAMPEP_0118973304_seqID3299067Isochrysis galbana GRNC-R-FMQG Tetraselmis_sp_CCMP879_tax3165_locID_CAMPEP_0114235122_seqID1649103GKTCTTLQG Isochrysis GRNC-R-FLQG Tree scale:Environmental 0.1 Pterosperma_sp_Strain_CCMP1384_tax1461541_locID_CAMPEP_0197864872_seqID16220034GKDCSAVPG GRNC-R-FMQG Nephroselmis_pyriformis_Strain_CCMP717_tax156128_locID_CAMPEP_0182863216_seqID5195736GQTCRILQG GRNC-R-FLQG Tetraselmis_sp_Strain_GSL018_tax582737_locID_CAMPEP_0177609770_seqID8333005Isochrysis GRNC-R-FLQG Chlamydomonas_reinhardtii_CAC94940.1_tax3055/197-324GQTCRILQG GRNC-R-FLQG Pyramimonas_amylifera_Strain_CCMP720_tax36882_locID_CAMPEP_0196599018_seqID13243991GRTLKIMQGGQTCKILQG Coccolithus braarudii GRNC-R-FLQG Pyramimonas_sp_Strain_CCMP2087_tax36898_locID_CAMPEP_0198210304_seqID16395938GKQLACLQG Isochrysis galbana GRNC-R-FLQG ColoredProtein domainsranges pKinase Pyramimonas_amylifera_Strain_CCMP720_tax36882_locID_CAMPEP_0196572906_seqID13244844GRTCRILQG GRNC-R-FLQG Plant_pdb|2Z6D|B GNTLKMLQGGKTCRIMQG GRNC-R-FLQG Pyramimonas_sp_Strain_CCMP2087_tax36898_locID_CAMPEP_0198204480_seqID16393862G-----IEG Haptolina brevifilaericina GRNC-R-FLQG Tetraselmis_astigmatica_CCMP880_tax1074897_locID_CAMPEP_0117662318_seqID2524316GQTCKIMQG GRNC-R-FLQG homeodomain Tetraselmis_chuii_Strain_PLY429_tax63592_locID_CAMPEP_0177790518_seqID8582314GQTLRILQGGKTCRIMQG GRNC-R-FLQG Tetraselmis_striata_Strain_LANL1001_tax3165_locID_CAMPEP_0183883900_seqID10349776GRSLSIIQGGQTCSIMQG Prymnesium parvum GRNC-R-FLQG Tetraselmis_striata_Strain_LANL1001_tax3165_locID_CAMPEP_0183843048_seqID10348467GNDLKIIQG GHNC-R-FLQG EF hand Tetraselmis_sp_Strain_GSL018_tax582737_locID_CAMPEP_0177606492_seqID8344057GKTCTFMQG GQNC-R-FLQG Tetraselmis_astigmatica_CCMP880_tax1074897_locID_CAMPEP_0117655034_seqID2524084GTTLACIQGGHTCKFMHG GQNC-R-FLQG S20C1_TRINITY_DN1855695_c0_g1_i1_2GRTLRCIQG HRNC-R-FLQS Emiliania_huxleyi_Strain_PLY_M219_tax2903_locID_CAMPEP_0182074328_seqID14460743GQTCKFLHG Calcidiscus leptoporus HVNC-R-FLQS bZIP S35C1_TRINITY_DN1895158_c0_g3_i1_1G----RIQGGQTCKFLHG HRNC-R-FLQS S06C1_TRINITY_DN2387981_c0_g1_i1_1GRTIDILQGGLTCKFLHG HRNC-R-FLQS S06C1_TRINITY_DN2059090_c2_g2_i2_2GKTLDIIQG Scyphosphaera apsteinii HRNC-R-FLQS HSF Chrysochromulina_rotalis_UIO044_tax412157_locID_CAMPEP_0115842042_seqID1889115GQTCKFLHG HRNC-R-FLQS S22C1_TRINITY_DN2023257_c3_g4_i1_3GRTLDIIQGGQTCKILHG HRNC-R-FLQS Meta.Q7YW98 GHTLELIQG Chrysochromulina rotalis QKSCRCTWMYG Meta.Q9ULD8.2 GQTCKFLHG QRGCACSFLYG* pKinase S21C1_TRINITY_DN35381_c0_g1_i1_2GHTLELIQG NKNC-R-FLQG * Consensus S29C1_TRINITY_DN552410_c0_g1_i1_3(50%) GHTFELIQGGrNCRFLQG NKNC-R-FLQG S16C1_TRINITY_DN1474260_c0_g1_i1_2GHTLELIQG NKNC-R-FLQG S17C1_TRINITY_DN2598067_c0_g1_i1_1 GRNC-R-FLQG RGS S35C1_TRINITY_DN2048644_c0_g1_i1_2GHTLELIQG GRNC-R-FLQG S35C1_TRINITY_DN2052033_c0_g1_i1_1GHTLEIIQG GRNC-R-FLQG S21C1_TRINITY_DN2602224_c0_g1_i1_1 GRNC-R-ILQG HMC_NPAS WRKY S14C1_TRINITY_DN947937_c0_g1_i1_1GHTLEIIQG GRNC-R-FLQG Pyramimonas_amylifera_Strain_CCMP720_tax36882_locID_CAMPEP_0196582104_seqID13240685GQTLELIQG GRNC-R-FLQG Pterosperma_sp_Strain_CCMP1384_tax1461541_locID_CAMPEP_0197845174_seqID16211973GQTLERIRG GYNC-R-FLQG GHTLELIQG GRTLDLIQG Consensus (50%) GrNClRlFLQG GRTLELLQG Haptolina brevifila Taxonomy strip GRTLELIQG GRTLEIIQG Pavlovales GKTLDLLRG GKTLQILQG GRSLNILQG Dictyocha speculum 9 Coccolithales GRSLNIIQG GKTLAMLQG Isochrysidales GKTLGMLQG unclassified Pelagophyceae GKSLSILQG GRTLGMLQG Florenciella Phaeocystales GNTLGMLQG Florenciella parvula GRXLGIXQG Pseudopedinella elastica Prymnesiales GCTLRMLQG GRTLSMLQG GRTLRMLQG Zygodiscales GRTLRMLQG GQTLKCLQG Bacillariophyceae GQTLRCLQG Florenciella parvula GQTLRCLQG GRTPKMLQG Chrysophyceae SRTPAILQG Exanthemachrysis gayraliae GETCGMLQG Phaeocystis antarctica Dictyochophyceae GQSCAMLQG GQTPALLHG Haptolina ericina GRTCGELQG Prymnesium polylepis Pelagophyceae GQTCAMLQG GQTCRVLQG Chlorophyta GKTCGVLQG GETCAVLQG Phaeocystis antarctica GETCAVLQG Dinophyceae GKTCKMLQG GKTCTTLQG Isochrysis galbana Environmental GKDCSAVPG Isochrysis GQTCRILQG GQTCRILQG Isochrysis GQTCKILQG Coccolithus braarudii GRTCRILQG Isochrysis galbana GKTCRIMQG GQTCKIMQG Haptolina ericina GKTCRIMQG GQTCSIMQG Prymnesium parvum GKTCTFMQG GHTCKFMHG GQTCKFLHG GQTCKFLHG GLTCKFLHG GQTCKFLHG GQTCKILHG GQTCKFLHG Consensus (50%) GrNCRFLQG Tree scale: 10

Fig. S8 - part 2 10

DCTS_RHOCA/292-405 GRAQPM--PYA BVGS_BORBR/582-696 .KTI.EANVG- A0A080I6P3_ECOLX/21-125 .ESYIR-L.E- KINA_BACSU/143-252 .K.SYEFIE.- PER_DROPS/339-457 .R.IMDLYHH- NODV_BRADU/134-245 .Q.VL.VFEE- ARNT_HUMAN/163-270 .STLY.QVPD- AHR_HUMAN/113-227 HQSV.ELITE- SIM_DROME/102-215 GN.IF.Y.NY- NTRB_BRASR/13-121 RQ.LK.LVFG- NTRB_AZOBR/31-138 GMP.A..LPN- PYP_HALHA/17-125 .KNFFKDV.C- KINA_BACSU/267-377 ...IYDQL..- NTRB_VIBAL/4-111 EQSL-S..IQ- PHOR_ECOLI/98-194 G.NILNL.YP- DHAR_CITFR/203-306 .K..ND..L.- NTRY_AZOC5/388-493 HRHL-AEVV.- RESE_BACSU/253-355 NMNI----KEG YCF26_PORPU/291-408 GSI.VDYLED- YQIR_BACSU/236-339 .KPANTDISE. YQIR_BACSU/117-218 .R.IKEV.--- DCUS_ECOLI/222-324 E----KLSLS- ATOS_ECOLI/262-370 GQ.YSM.F--- PHYD_ARATH/789-911 .KLLVREV--- PHYC_ARATH/736-862 N.I.LG..--- PHY_ADICA/739-864 G.M.V..I--- PHYA_PETCR/754-879 N...L...--- PHYA3_AVESA/750-875 D...... V--- PHYC_ARATH/606-721 G.PVSD-LVE- PHYE_ARATH/597-716 ..SLA..I.Q- PHYE_IPONI/600-716 ..Y.I.DVTH- A0A2K1IDJ3_PHYPA/612-728 .RS.VK.LVT- PHYA_SOYBN/626-741 .KH.LT-..E- PLEC_CAUVC/320-434 SQD..ERIID- FIXL_BRADU/16-128 .DLF.S.L--- NODV_BRADU/260-371 AQ.VIE.TP.- NODV_BRADU/386-500 SETIRS.I.E- Y3085_AZOC5/187-301 GRPMTE-TIIA FIXL_RHIME/137-250 .QNL-R-ILME FIXL_BRADU/143-256 ...V-N-.... Dictyocha_speculum_Strain_CCMP1381_tax35687_locID_CAMPEP_0185792002_seqID14740007 .RSL-.-..QG S19C1_TRINITY_DN671014_c0_g1_i1_3 ....-.-.I.. EF hand S33C1_TRINITY_DN197144_c0_g1_i1_1 .KT.-D-LLR. S11C1_TRINITY_DN1367690_c0_g1_i1_1 ....-Q-I.Q. Genus_nov_species_nov_RCC1024_tax114041_locID_CAMPEP_0119276372_seqID2502127 ....-G-M... Tree scale: 0.1 S26C1_TRINITY_DN1294596_c0_g1_i2_1 ....-A-.... S35C1_TRINITY_DN799679_c0_g1_i1_2 .R..-S-.... S35C1_TRINITY_DN2004301_c2_g2_i2_1GRTLKIMQG .C..-R-.... S22C1_TRINITY_DN1970485_c0_g1_i1_3GKQLACLQG .R..-.-.... Colored ranges S21C1_TRINITY_DN1198969_c0_g1_i1_2GNTLKMLQG ....-.-.... bZIP S22C1_TRINITY_DN1942803_c0_g1_i1_1G-----IEG Haptolina brevifila .Q..-K-C... homeodomain S18C1_TRINITY_DN434288_c0_g1_i1_3GQTLRILQG ....-R-.... Florenciella_parvula_Strain_RCC1693_tax236787_locID_CAMPEP_0182532988_seqID14898624GRSLSIIQG ....-.-.... Pseudopedinella_elastica_Strain_CCMP716_tax35684_locID_CAMPEP_0181268728_seqID13435938GNDLKIIQG .RX.-G-IX.. EF hand S21C1_TRINITY_DN2651096_c0_g2_i1_1GTTLACIQG .KS.-S-.L.. Florenciella_sp_Strain_RCC1587_tax236786_locID_CAMPEP_0182564424_seqID14931205GRTLRCIQG .RT.-G-M... Florenciella_parvula_Strain_RCC1693_tax236787_locID_CAMPEP_0182534088_seqID14894724G----RIQG Calcidiscus leptoporus bZIP GRTIDILQG .N..-.-.... GKTLDIIQG Scyphosphaera apsteinii BranchHSF style Chrysochromulina_brevifilum_Strain_UTEX_LB_985_tax156173_locID_CAMPEP_0174747710_seqID13368074GRTLDIIQG .-N.-Q-LV.. S30C1_TRINITY_DN1795154_c1_g9_i1_2Chrysochromulina rotalis .RT.-K-IM.. S23C1_TRINITY_DN1437800_c0_g1_i1_3GHTLELIQG pKinase GHTLELIQG .KQ.-A-CL.. PF00989 S34C1_TRINITY_DN1640040_c9_g1_i1_1GHTFELIQG .NT.-K-M... S22C1_TRINITY_DN1579418_c0_g1_i1_1GHTLELIQG .Q..-R-I... RGShmm-LOV Chrysochromulina_brevifilum_Strain_UTEX_LB_985_tax156173_locID_CAMPEP_0174737858_seqID13371244GHTLELIQG .------IE. S30C1_TRINITY_DN700160_c0_g1_i1_2GHTLEIIQG .RS.-.-C.Q. HMC_N S30C1_TRINITY_DN680185_c0_g1_i1_3GHTLEIIQG .TT.-A-.... MMETSP and Env Calcidiscus_leptoporus_RCC1130_tax127549_locID_CAMPEP_0119362494_seqID1831724GQTLELIQG .----R--... S06C1_TRINITY_DN70048_c0_g1_i1_1GQTLERIRG .R..-.-.... S20C1_TRINITY_DN1723940_c0_g1_i1_1GHTLELIQG GRTLDLIQG ...I-D-IL.. S22C1_TRINITY_DN2412411_c0_g1_i1_2GRTLELLQG Haptolina brevifila ...M-.-.IR. Taxonomy strip Scyphosphaera_apsteinii_RCC1455_tax418940_locID_CAMPEP_0119333528_seqID3267663GRTLELIQG .K.L-.-..Q. S16C1_TRINITY_DN1031190_c0_g1_i1_3GRTLEIIQG .R..-.-.... Pavlovales Chrysochromulina_rotalis_UIO044_tax412157_locID_CAMPEP_0115846434_seqID1906924GKTLDLLRG .H..-E-L... S22C1_TRINITY_DN2011337_c2_g6_i1_2GKTLQILQG ....-.-.... homeodomain S17C1_TRINITY_DN1835458_c1_g1_i1_1GRSLNILQG Dictyocha speculum ...F-.-.... Coccolithales S18C1_TRINITY_DN2521839_c7_g1_i2_3GRSLNIIQG ...L-.-.... S30C1_TRINITY_DN1732528_c4_g5_i1_2GKTLAMLQG ....-.-.... Isochrysidales S30C1_TRINITY_DN97975_c0_g1_i1_2GKTLGMLQG unclassified Pelagophyceae ....-.-I... S30C1_TRINITY_DN1732528_c4_g3_i1_1GKSLSILQG Florenciella ....-.-.... Phaeocystales S06C1_TRINITY_DN2023845_c0_g1_i1_3GRTLGMLQG .Q..-.-R.R. S35C1_TRINITY_DN243162_c0_g1_i1_3GNTLGMLQG Florenciella parvula ....-.-L.Q. S16C1_TRINITY_DN1235841_c0_g2_i1_1GRXLGIXQG Pseudopedinella elastica Prymnesiales GCTLRMLQG .R..-D-.... Chrysochromulina_brevifilum_Strain_UTEX_LB_985_tax156173_locID_CAMPEP_0174693988_seqID13353701GRTLSMLQG ....-E-.L.. S30C1_TRINITY_DN1732528_c4_g4_i1_1GRTLRMLQG ....-.-.I.. Zygodiscales S30C1_TRINITY_DN1198869_c0_g1_i1_2GRTLRMLQG ....-.-I... GQTLKCLQG Bacillariophyceae Phaeocystis_antarctica_Strain_CCMP1374_tax33657_locID_CAMPEP_0198159106_seqID16382952GQTLRCLQG Florenciella parvula .E.C-G-ML.. S32C1_TRINITY_DN1467920_c0_g1_i1_2GQTLRCLQG .QS.-A-.... Chrysochromulina_ericina_Strain_CCMP281_tax156174_locID_CAMPEP_0181207470_seqID13399441GRTPKMLQG ..TP-.-L.H. Chrysophyceae Chrysochromulina_polylepis_Strain_CCMP1757_tax72548_locID_CAMPEP_0183192724_seqID6540065SRTPAILQG Exanthemachrysis gayraliae .R.C-G-E.Q. S22C1_TRINITY_DN1905880_c0_g1_i1_3GETCGMLQG Phaeocystis antarctica .Q..-R-I... Dictyochophyceae Isochrysis_galbana_Strain_CCMP1323_tax37099_locID_CAMPEP_0179749482_seqID11314429GQSCAMLQG .R..-.-.... Isochrysis_sp_Strain_CCMP1244_tax37098_locID_CAMPEP_0202802806_seqID15662006GQTPALLHG Haptolina ericina Prymnesium polylepis .Q..-.-.... Pelagophyceae Isochrysis_galbana_Strain_CCMP_1323_tax37099_locID_CAMPEP_0183614236_seqID8898376GRTCGELQG .K..-T-T... Isochrysis_sp_Strain_CCMP1244_tax37098_locID_CAMPEP_0202762222_seqID15669152GQTCAMLQG ..D.-S-AVP. S18C1_TRINITY_DN2519322_c2_g2_i1_1GQTCRVLQG Chlorophyta GKTCGVLQG .QT.-R-VLQ. S21C1_TRINITY_DN907642_c0_g1_i1_3GETCAVLQG Phaeocystis antarctica .K..-G-.... S22C1_TRINITY_DN2052338_c3_g2_i1_3GETCAVLQG .E..-A-.... Dinophyceae Phaeocystis_antarctica_Strain_Caron_Lab_Isolate_tax33657_locID_CAMPEP_0172962506_seqID13483540GKTCKMLQG ....-.-.... S18C1_TRINITY_DN2519322_c3_g1_i1_3GKTCTTLQG Isochrysis galbana .K..-R-IM.. Environmental Chrysochromulina_ericina_Strain_CCMP281_tax156174_locID_CAMPEP_0181203778_seqID13394773GKDCSAVPG Isochrysis .Q..-K-.... Tree scale: 0.1 S17C1_TRINITY_DN1352028_c0_g1_i1_3GQTCRILQG .K..-R-.... Prymnesium_parvum_Strain_Texoma1_tax97485_locID_CAMPEP_0195573352_seqID10273609GQTCRILQG Isochrysis .Q..-S-.... HSF S22C1_TRINITY_DN2373454_c0_g1_i1_1GRTLKIMQGGQTCKILQG Coccolithus braarudii .K..-T-F... S15C1_TRINITY_DN1525235_c1_g2_i3_1GKQLACLQGGRTCRILQG Isochrysis galbana .H..-K-..H. ColoredProtein domainsranges S26C1_TRINITY_DN1503594_c0_g1_i1_1GNTLKMLQGGKTCRIMQG G-----IEG Haptolina brevifilaericina .Q..-.-.L.. S35C1_TRINITY_DN1869910_c1_g4_i1_2GQTCKIMQG .L..-.-.... homeodomain S21C1_TRINITY_DN2399446_c1_g3_i1_2GQTLRILQGGKTCRIMQG GRSLSIIQGGQTCSIMQG Prymnesium parvum .Q..-.-.... S26C1_TRINITY_DN1695128_c0_g2_i2_1GNDLKIIQGGKTCTFMQG ....-.-.... EF hand S14C1_TRINITY_DN1214307_c0_g1_i1_2GTTLACIQGGHTCKFMHG ....-.-I... S24C1_TRINITY_DN1151023_c0_g1_i1_2GRTLRCIQG ....-.-F... S17C1_TRINITY_DN2609669_c0_g1_i1_3GQTCKFLHG Calcidiscus leptoporus bZIP G----RIQGGQTCKFLHG MGS.-S-I.Q. Pleurochrysis_carterae_Strain_CCMP645_tax13221_locID_CAMPEP_0170029422_seqID14196105GRTIDILQGGLTCKFLHG GRT.-K-F... S35C1_TRINITY_DN2229398_c0_g1_i1_1GKTLDIIQGGQTCKFLHG Scyphosphaera apsteinii .Q..-R-L... HSF Phaeocystis_antarctica_Strain_CCMP1374_tax33657_locID_CAMPEP_0198159354_seqID16382766GRTLDIIQGGQTCKILHG .N..-.-I... S15C1_TRINITY_DN1695386_c0_g1_i1_2GHTLELIQG Chrysochromulina rotalis GHTLELIQGGQTCKFLHG WS..-K-K... pKinase S21C1_TRINITY_DN2008402_c0_g1_i1_2GHTFELIQG G...-.-M... Consensus (50%)S29C1_TRINITY_DN1478556_c0_g2_i1_2GHTLELIQGGrNCRFLQG ....-.-.... RGS S35C1_TRINITY_DN1046069_c0_g1_i1_3GHTLELIQG .N..-.-L... S29C1_TRINITY_DN1273418_c0_g1_i1_2GHTLEIIQG .G..-R-A... S29C1_TRINITY_DN952916_c0_g1_i1_2GHTLEIIQG ....-.-I... HMC_NPAS S35C1_TRINITY_DN38586_c0_g1_i1_3GQTLELIQG .S..-.-M... S21C1_TRINITY_DN2041701_c0_g2_i2_1GQTLERIRG ....-.-L... S29C1_TRINITY_DN464818_c0_g1_i1_2GHTLELIQG ....-H-V... GRTLDLIQG GRTLELLQG Haptolina brevifila Taxonomy strip GRTLELIQG Consensus (50%) GrNClRlFLQG GRTLEIIQG Pavlovales GKTLDLLRG Fig. S8. Phylogenetic and domain analysis of LOVGKTLQILQG-containing environmental contigs with sequence GRSLNILQG Dictyocha speculum Coccolithales GRSLNIIQG GKTLAMLQG identifiersIsochrysidales and untrimmed light-sensitive motifGKTLGMLQG alignment.unclassified Maximum Pelagophyceae likelihood tree (RAxML) of two GKSLSILQG GRTLGMLQG Florenciella subsets (part1Phaeocystales and part2) of environmental LOVGNTLGMLQG-domain contigFlorenciellas parvula that are transcribed with a diel GRXLGIXQG Pseudopedinella elastica Prymnesiales GCTLRMLQG GRTLSMLQG rhythmicity and their closest MMETSP-derived GRTLRMLQGhomologues (black edges), PAS-domain (PF00989 seed Zygodiscales GRTLRMLQG GQTLKCLQG sequences;Bacillariophyceae grey edges) and LOV-domain (hmmGQTLRCLQG-LOV; dashedFlorenciella edges) parvula reference sequences, as described in GQTLRCLQG GRTPKMLQG Chrysophyceae SRTPAILQG Exanthemachrysis gayraliae Figure 3B and C, respectively. Effector domainsGETCGMLQG are indicatedPhaeocystis by antarctica the colored ranges. Bootstrap values of Dictyochophyceae GQSCAMLQG GQTPALLHG Haptolina ericina 80% and higher (100 iterations) are indicated withGRTCGELQG black circles.Prymnesium polylepisThe arrow indicates the placement site of Pelagophyceae GQTCAMLQG GQTCRVLQG the WRKYChlorophyta-LOV sequences (SI Appendix Fig. S9) GKTCGVLQGby pplacer. Shown are the clades in which the LOV-domain GETCAVLQG Phaeocystis antarctica GETCAVLQG Dinophyceae GKTCKMLQG was found associated with an effector domain.GKTCTTLQG Indicated Isochrysisare the galbana alignment of the light-sensitive motif Environmental GKDCSAVPG Isochrysis GQTCRILQG GXNCRFLQG within the LOV-domain (ClustalX colorscheme),GQTCRILQG Isochrysis with the untrimmed gaps introduced by the GQTCKILQG Coccolithus braarudii GRTCRILQG Isochrysis galbana GKTCRIMQG Potassium voltage (Kv) -gated channel motifs asGQTCKIMQG indicatedHaptolina by the ericina asterix. GKTCRIMQG GQTCSIMQG Prymnesium parvum GKTCTFMQG GHTCKFMHG GQTCKFLHG GQTCKFLHG 10 GLTCKFLHG GQTCKFLHG GQTCKILHG GQTCKFLHG Consensus (50%) GrNCRFLQG

10 20 30 40 50 60 70 80 A ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| TRINITY_DN1994156 ------TRINITY_DN463329 ------IKAGS TRINITY_DN1926794 ---LCGSEALGGRSASPEPLDLSDIWADDDAVLDDVSAEEAASCANPVPAKQMEPPRKRQVDNLPGNWRKYGQKSIKAGS TRINITY_DN2011055 GDGLCGSESLGGRSGSPEPLDLSDIWADDDAVLDDVSAEEAASCANPVPAKQMEPPRKRQVDNLPGNWRKYGQKSIKAGS TRINITY_DN2602224 GRGLCGSESLGGRSGSPEPLDLSDIWADDDAVLDDVSAEEAASCANPVPAKQMEPPRKRQVDNLPGNWRKYGQKSIKAGS TRINITY_DN2977501 FSG------A TRINITY_DN1998658 ASQVAGSNSVADLLGSDVSQAVYDVAEADQQGLDQLSQKSLESCPQPLPQPLPPPPPQQQLENIPGNWRKYGQKTMKGKG

90 100 110 120 130 140 150 160 WRKY-domain ....|....|....|....|....|....|....|....|....|....|....|....|....|....|....|....| TRINITY_DN1994156 ---RVRSYYRCTRPGCPAKKRVEIHPVGGETISVCLSSAHNHRVSEEPLSVTQGRKSLATPMPLASPDLDWNELLANQHP TRINITY_DN463329 DGARVRSYYRCTRPGCPAKKRVEIDPVGGETISVCLSSAHNHRVSEEPLSVTQGRKSLATPMPLASPDLDWNELLANQHP TRINITY_DN1926794 DGARVRSYYRCTRPGCPAKKRVEIDPDGGETISVCLSSAHNHRVSEEPSSVTQGRKSLATPMPLASPDLDWNELLANQHP TRINITY_DN2011055 DGARVRSYYRCTRPGCPAKKRVEIDPDGGETISVCLSSAHNHRVSEEPSSVTQGRKSLATPMPLASPDLDWNELLANQHP TRINITY_DN2602224 DGARVRSYYRCTRPGCPAKKRVEIDPVGGGTISVCLSSAHNHRVGE-ALSASQGQKPLATPMPLASPDLDWNELLGNQHP TRINITY_DN2977501 TRETTRCYYRCNIPGCKVRKVVVVH--GNERPIVKLTGTHSHPVNEENESDNEGPTVARDGI----PDLDQAKLAVTCQT TRINITY_DN1998658 DPDYTRCYFRCNIPGCGVRKVVE----GPFPVVVKVIGKHNHPCRESDGESEEGPVMTKDGFPV--PELDPYKIMKASMP

170 180 190 200 210 220 230 240 LOV-domain ....|....|....|....|....|....|...GXNCRFLQG..|....|....|....|....|....|....|....| TRINITY_DN1994156 HFVICDPNLPDCPIVFASSGFCQLTGYLLHX------TRINITY_DN463329 HFVICDPNLPDCPIVFASSGFCQLTGYLLHEVLGRNCRILQGKDTNPHTVSQLRLAIQRRREVHTTILNYRX------TRINITY_DN1926794 HFVVCDPNLPDCPIVFASSGFCQLTGYLLHEVLGRNCRILQGKDTNPHTVSQLRLAIQRRREVHTTILNYRKDGSPFWNL TRINITY_DN2011055 HFVVCDPNLPDCPIVFASSGFCQLTGYLLHEVLGRNCRILQGKDTNPHTVSQLRLAIQRRREVHTTILNYRKDGSPFWNL TRINITY_DN2602224 HFVVCDPNLPDCPIVFASSGFCQLTGYLLHEVLGRNCRILQGKDTNPHTVSQLRLAIQRRREVHTTILNYRKDGSPFWNL TRINITY_DN2977501 NFCISDPYRPDCPIIYASPGFSLMTGYDTREVMNKNCRFLQGPDTNPDAVRMIRTAIQREEHIRVIILNYKKNGX----- TRINITY_DN1998658 NFVITDPQRPDNPIIFASPAFAKMTGYSQKEVLNKNCRFLQGPSTNPLAVRQISEAVKQLRSIRIILLNYKKNGQPFWNL

250 260 270 280 290 300 ....|....|....|....|....|....|....|....|....|....|....|....|.... TRINITY_DN1994156 ------TRINITY_DN463329 ------TRINITY_DN1926794 LHLSPVQKDNKIFSYVGSQMDVTAHTAGQQSHQNICTWRSAVNSTIVPSRAANPQSTPTRILTV TRINITY_DN2011055 LHLSPVQKDNKIFSYVGSQMDVTAHAAGQQSHQNICTWRSAVNSTIVPSRAANPQSIPTRILTV TRINITY_DN2602224 LHLSPVQKDNKIFSYVGSQMDVTAHTAGQQSHQNICTWRSALNSTIVPSRAANPQSTPTRILTV TRINITY_DN2977501 ------TRINITY_DN1998658 LQIHPIMDDNKLVSCVGVQMDVSRKPAVRQMDKPLKRRMGALEVEDAEDMYANRQVTPRVVLGG

B C

Fig. S9. Environmental WRKY-LOV sequences. A) Alignment of the environmental WRKY-LOV sequences identified in this work. Amino acid residues are shaded at 80% identity. Indicated above the alignment are the WRKY (yellow), the LOV domain (blue) and the light-sensitive motif GXNCRFLQG. Cysteine residues located outside of the light-sensitive motive are indicated in red. B) Geographical distribution of WRKY- LOV homolog abundances (RPKM) in surface samples of the Ocean Gene Atlas (tara- oceans.mio.osupytheas.fr/ocean-gene-atlas/) for 0.8-5 µm (blue), 5-20 µm (yellow) and 20-180 µm (red) size classes. C) Taxonomic distribution (Krona plot) of WRKY-LOV homologs as deposited in the Ocean Gene Atlas (MATOU_v1_metaT database). The Tara Oceans sequence identifiers used for this analysis are given in Dataset S7C.

11

Color Key Color Key Color Key Colorsensory Key rhodopsins sensorysensory rhodopsins rhodopsins sensory rhodopsins 5e+05 3e+06 5e+05 3e+06 5e+05 3e+06

Value Value -1 Value 5e+05 2e+06 (reads L ) sum function-clade taxonomy Value reads L-1 9267178 Rhodopsin−Sensory − D.20 646906 Rhodopsin−Sensory − D.23 Myzozoa − Apicomplexa − Chromerida 14341 Rhodopsin−RhodopsinSensory − D.20−ChannelMyzozoa − Dinophyceae − Peridiniales 1114465 Rhodopsin−RhodopsinSensory − D.20−ChannelMyzozoa − Dinophyceae − Suessiales 2354421 Rhodopsin−Sensory − D.20 Chlorophyta − − 255101 Rhodopsin−RhodopsinSensory − D.20−SensoryChlorophyta − Chlorodendrophyceae − Chlorodendrales 11827 Rhodopsin−RhodopsinChannel − C.4−SensoryChlorophyta − Mamiellophyceae − 210137 Rhodopsin−RhodopsinSensory − D.19−SensoryChlorophyta − Trebouxiophyceae − 1972591 Rhodopsin−RhodopsinSensory − D.19−SensoryGlaucophyta − Glaucocystophyceae − 1691492 Rhodopsin−RhodopsinSensory − D.19−SensoryGlaucophyta − Glaucocystophyceae − Gloeochaetales 38289 Rhodopsin−Sensory − D.14 Cryptophyta − − 407675 Rhodopsin−RhodopsinSensory − D.6−SensoryCryptophyta − − 393494 Rhodopsin−RhodopsinSensory − D.6−SensoryCryptophyta − 10126 Rhodopsin−RhodopsinSensory − D.13−SensoryCryptophyta − Cryptophyceae − Pyrenomonadales 1319535 Rhodopsin−RhodopsinSensory − D.16−SensoryCryptophyta − Cryptophyceae − Pyrenomonadales 2768628 Rhodopsin−Sensory − D.20 Haptophyta − Pavlovophyceae − Pavlovales 11167 Rhodopsin−RhodopsinSensory − D.19−SensoryFungi − Ascomycota − _ 274840 Rhodopsin−RhodopsinSensory − D.19−SensoryFungi − Ascomycota − Dothideomycetes_Pleosporales 487846 Rhodopsin−RhodopsinSensory − D.20−SensoryCercozoa − Chlorarachneae − Chlorachniida 11950 Rhodopsin−RhodopsinChannel − C.2−SensoryBigyra − Labyrinthulea − 49558835 Rhodopsin−RhodopsinSensory − D.20−SensoryOchrophyta − Dictyochophyceae − Pedinellales 948514 Rhodopsin−Sensory − D.23 − Pelagophyceae − 4858468 Rhodopsin−RhodopsinSensory − D.20−SensoryOchrophyta − Pinguiophyceae − Pinguiochrysidales Rhodopsin−Sensory 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 22 02 Rhodopsin06 10 14 18 22 02 06 10 14 18 22 02 06 10 14 18 −22 02 Sensory06 10 14 18 22 02 Rhodopsin−Sensory Fig. S10. Sensory and channel rhodopsin transcript abundanceRhodopsin− Sensoryover the four-day sampling period. Heat map representing color-coded transcript levels (readsRhodopsin L-1; mean− Sensoryof biological duplicates) of transcripts Rhodopsin−Sensory mapped to channel and sensory rhodopsin over the four-day sampling period. Indicated are the number of reads, the functional06 10 14 18 and22 02 06 clade10 14 18 22 assignments02 06 10 14 18 22 02 06 10 14 (corresponding18 22 02 to SI Appendix Fig. S1/Dataset S3A), and the assigned taxonomy up to the phylogenetic order level. Included are the categories with an overall of > 100.000 reads L-1. Grey blocks indicate sampling points for which transcripts were not detected. Significant diel periodicity (RAIN, p-value 0.001) was not established for transcripts in these categories.

12

Table S1. 18S rDNA and rRNA sequence abundance for each taxonomic group averaged across 4- day sampling period Taxonomic group cnts DNA std DNA % total DNA cnts RNA std RNA % total RNA Amoebozoa Amoebozoa NA NA NA NA NA NA Opisthokonta Animalia, Fungi, Choanozoa 41003 13928 22 12566 7329 11 Rhizaria Chlorarachneae, Foraminifera 1 12 0 3 44 0 Alveolata Ciliophora 2752 1834 1 25852 16316 23 Apicomplexa NA NA NA NA NA NA Dinophyceae 128823 30053 70 47885 18669 42 Cryptophyta Cryptophyta 7 4 0 151 86 0 Haptophyta Haptophyta 1990 748 1 5045 5585 4 Stramenopiles Bigyra 315 186 0 198 139 0 Bacillariophyceae 8931 5644 5 15691 8600 14 Chrysophyceae 28 19 0 1260 762 1 Dictyochophyceae 163 105 0 3744 1922 3 Pelagophyceae 39 27 0 453 298 0 Pinguiophyceae 1 2 0 2 3 0 Synurophyceae NA NA NA NA NA NA Archaeplastida Chlorophyta 551 343 0 327 157 0 Rhodophyta NA NA NA NA NA NA Glaucophyta NA NA NA NA NA NA Viruses Viruses NA NA NA NA NA NA Total 184603 52895 100 113176 59871 100

This table is generated from data presented in S. K. Hu, P. E. Connell, L. Y. Mesrop, D. A. Caron, A Hard Day’s Night: Diel Shifts in Microbial Eukaryotic Activity in the North Pacific Subtropical Gyre. Front. Mar. Sci. 5, 205–217 (2018). Briefly, DNA and RNA were simultaneously extracted from filters collected over the four-day sampling period. Samples were taken at the same time as the samples in this study. The RNA was reverse transcribed to cDNA and both the cDNA and DNA were PCR amplified, targeting the V4 hypervariable region on the 18S rRNA gene and subsequently Illumina-sequenced. Sequences were clustered into Operational Taxonomic Units (OTUs) and taxonomy was assigned based on the PR2 database. The counts (cnts) and standard deviation (std) are calculated from the rDNA and rRNA sequence abundance for each taxonomic group averaged across 4-day sampling period. The fraction of 18S DNA or RNA sequences per taxonomic bin is indicated as the percentage of the total number of sequences (% total). 'NA' signifies taxonomies that were studied in this work, but that were not represented in the Hu et al. (2018) tag-sequencing survey.

13

Supplementary Datasets (separate files)

Dataset S1. Description of the genomes and transcriptomes included in the marine reference database used in this study. Dataset S2. Photoreceptor HMM profile input alignments (selex format) used in this study Dataset S3. Reference photoreceptor protein sequences (full length) with annotations used in this study. Dataset S4. Phylogenetic placement of synthetically-generated metatranscriptome short reads with known taxonomic origin. Dataset S5. Environmental photoreceptor transcript abundance and results for all diel periodicity and depth analyses. Dataset S6. NCBI Batch Web CD-search results for all full length LOV-domain and rhodopsin reference sequences, as well as diel-selected LOV-domain assembled environmental contigs. Dataset S7. 18s ribosomal nucleotide sequences of reference organisms with taxonomic and morphological information, environmental LOV-domain and rhodopsin sequences of diel transcribed assembled contigs (translated into amino acid space) with domain and taxonomic information, and Ocean Atlas WRKY-LOV identifiers used for SI Appendix, Fig. S9

14