bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Minimal Tool Set for a Prokaryotic

Nicolas M Schmelling1, Anika Wiegard1, Robert Lehmann2, Paushali Chaudhury3, Christian Beck2, Sonja V Albers3, and Ilka M Axmann∗1

1Institute for Synthetic Microbiology, Cluster of Excellence on Plant Sciences (CEPLAS), Heinrich-Heine-University D¨usseldorf,Universit¨atsstr. 1, 40225 D¨usseldorf, Germany 2Institute for Theoretical Biology, Humboldt University Berlin, Invalidenstr. 43, 10115, Berlin, Germany 3Molecular Biology of , University of Freiburg, Institute of Biology II, Sch¨anzlestr.1,79104 Freiburg, Germany

September 13, 2016

Abstract Circadian clocks can be found in almost all organisms including photosynthetic , whereby large diversity exists within the protein components involved. In the model cyanobacterium elongatus PCC 7942 circadian rhythms are driven by a unique KaiABC protein clock, which is embedded in a network of input and output factors. Homologous proteins to the Syne- chococcus elongatus PCC 7942 clock components have been observed in Bacteria and Archaea, where evidence for circadian behavior in these domains is accumulating. However, interaction and func- tion of non-cyanobacterial Kai-proteins as well as homologous input and output components remain mainly unclear. Using a universal BLAST analyses, we identified putative KaiC-based timing systems in organ- isms outside as well as variations within Cyanobacteria. A systematic analyses of publicly avail- able microarray data elucidated interesting variations in circadian gene expression between different cyanobacterial strains, which might be correlated to the diversity of genome encoded clock com- ponents. Based on statistical analyses of co-occurrences of the clock component homologous to Synechococcus elongatus PCC 7942, we propose putative networks of reduced and fully functional clock systems. Further, we studied KaiC sequence conservation to determine functionally important regions of diverged KaiC homologs. Biochemical characterization of exemplary cyanobacterial KaiC proteins as well as homologs from two thermophilic Archaea demonstrated that kinase activity is always present. However, a KaiA-mediated phosphorylation is only detectable in KaiC1 orthologs. Our analysis of 11,264 genomes clearly demonstrates that components of the Synechococcus elonga- tus PCC 7942 circadian clock are present in Bacteria and Archaea. However, all components are less abundant in other organisms than Cyanobacteria and KaiA, Pex, LdpA, and CdpA are only present in the latter. Thus, only reduced KaiBC-based or even simpler, solely KaiC-based timing systems might exist outside of the cyanobacterial phylum, which might be capable of driving diurnal oscillations.

1 Introduction

2 Life on Earth is under the influence of changing environmental conditions, which not only pose a chal-

3 lenge to organisms, but also present a chance of adaptation and therefore a possible fitness advantage over

4 competitors [1, 2]. Using inner timing systems organisms can coordinate their physiology and behavior

5 according to the daily recurring changes. Simple timing systems work in an hour glass like fashion and

6 need to be reset every day by environmental stimuli, whereas true circadian clocks generate self-sustained

7 and temperature-compensated 24-hour rhythms of biological activities [3, 4].

8 Circadian clocks are found in many eukaryotes such as algae, plants and mammals [5]. Even though

9 circadian clocks seem like a conserved trait in evolution, differences in the protein components, involved

10 in circadian timing, suggest a convergent evolution of timing mechanisms [5]. For many years it has been

[email protected]

1 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

11 believed that something as complex as a circadian clock could not have been evolved in unicellular organ-

12 isms like [5, 6, 7]. However, the existence of temperature compensated 24-hour rhythms of

13 cell division in Synechococcus sp. WH 7803 and circadian nitrogen fixation in Cyanothece sp. PCC 8801

14 proved otherwise [8, 9, 10, 11, 12]. The molecular basis of the cyanobacterial circadian clock was inten-

15 sively investigated in Synechococcus elongatus PCC 7942 (hereafter Synechococcus 7942), where the core

16 clockwork resembles a posttranslational oscillator [13, 14, 15]. In contrast, eukaryotic circadian rhythms

17 are believed to be mainly based on transcriptional-translational feedback loops. However, findings on

18 post-translational systems are accumulating [16, 17, 18] and might exist also in Archaea [16].

19 Light is assumed to be the driving stimulus in circadian clock entrainment [19]. In Synechococcus 7942,

20 contrary to eukaryotic circadian clock systems, a photoreceptor in the input pathway of the clock could

21 not be detected thus far. Instead, Synechococcus 7942 cells sense light indirectly through the redox and

22 energy state of the cell [20]. Here, two metabolic components are considered to play a major role [21]:

23 The ATP to ADP ratio and the redox state of the plastoquinone (PQ) pool [22, 23]. The core of the

24 circadian clock in Synechococcus 7942 consists of three proteins KaiA, KaiB and KaiC. KaiC monomers

25 are composed of two domains, which assemble into two hexameric rings [24, 25]. The C-terminal ring is

26 capable of autophosphorylation and –dephosphorylation [26, 27]. KaiC phosphorylation is stimulated by

27 the interaction with KaiA [28, 29], and additionally, affected by the ATP/ADP ratio of the cell [30]. KaiB

28 inhibits the activating effect of KaiA and initializes dephosphorylation [31]. Altogether, KaiC hexamers

29 phosphorylate and dephosphorylate rhythmically during the course of a day. The binding of oxidized

30 quinones to KaiA has been suggested to stop the clock directly by causing KaiA aggregation [32, 33].

31 The KaiABC core clock is embedded into a network of input and output pathways. The input factors

32 that interact with the core clock are Pex, LdpA, PrkE, NhtA, IrcA, CdpA [20, 34, 35, 36, 37, 38]. Output

33 factors are SasA, LabA, LalA, CpmA, Crm, RpaA, and RpaB, as well as CikA, which is functioning both

34 in input and output pathway of the circadian clock [39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51].

35 Sequence analysis indicated that at least three different types of timing systems are present in Cyanobac-

36 teria, (i) a KaiABC-based system as in Synechococcus 7942, (ii) a reduced system with a KaiBC core

37 and a reduced set of input/output factors as in Prochlorococcus and (iii) a reduced KaiABC system as

38 in Synechococcus sp. WH 8102, which despite including all three kai genes, has the same input/output

39 factors as the reduced KaiBC system [52]. Furthermore, multiple kai genes can exist in an organism

40 [53, 54]. In Synechocystis sp. PCC 6803 (hereafter Synechocystis 6803) for example, three copies of both

41 kaiB and kaiC are found. Besides the core clock components, KaiA, KaiB1 and KaiC1, most similar to

42 the kai genes of Synechococcus 7942 [55, 56], KaiB3 and KaiC3 are thought to function as fine tuning

43 factors for the circadian clock in Synechocystis 6803, whereas no circadian function has been found for

44 KaiB2 and KaiC2 [56]. However, recently kaiC2 -dependent adaptive growth and diurnal rhythms of

45 nitrogen fixation were observed in Rhodopseudomonas palustris [57]. Homologs of kaiB and kaiC genes

46 also exist in other Bacteria and even Archaea, where a shortend KaiC most often resembles only one

47 domain [53]. An archaeal one-domain KaiC homolog was shown to form a hexameric ring, similar to the

48 two duplicated domains of Synechococcus 7942 KaiC [58]. In Haloferax volcanii the transcripts of four

49 kaiC homologs display diurnal accumulation profiles [59]. However, the function of non-cyanobacterial

50 Kai-proteins is mainly unclear, so far. Although some of the input- and output factors were also found

51 in prokaryotes other than cyanobacteria [49, 52, 60, 61], it is unknown whether the Kai homologs outside

52 the cyanobacterial phylum or the additional cyanobacterial Kai homologs interact with (other) in- and

53 output factors.

54 In this study, we performed BLAST analyses to first identify possible KaiC-based timing systems in organ-

55 isms outside of Cyanobacteria and second to explore variations in circadian clocks within Cyanobacteria.

56 Further, we examined variations in circadian gene expression between different cyanobacterial strains us-

57 ing microarray data. Together, this aims at decoding the correlation between Kai proteins and additional

58 clock components. Based on the co-occurrence of clock components known from Synechococcus 7942 we

59 propose putative networks of reduced and fully functional clock systems. Further, we used the sequence

60 information of KaiC and its homologs in Cyanobacteria to determine the similarities at important sites of

61 the protein. We chose cyanobacterial KaiC proteins as well as homologs from two thermophilic Archaea

62 and demonstrated that kinase activity is always present. However, a KaiA-mediated phosphorylation is

63 only detectable in cyanobacterial KaiC1 homologs.

2 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

64 Materials and Methods

65 Programming languages

66 The programming languages Python (version 3.5.1) and R (version 3.2.3) were used in this work. The

67 processing and analysis of the microarray time series datasets was performed using R. Regarding the

68 distribution analysis, the Biopython project ([62]; version 1.66) was used to download from GenBank

69 as well as to work with FASTA files. Besides Biopython, the Python packages: IPython ([63] version

70 4.1.1) as an interactive Python environment with the IPython notebook; numpy and scipy ([64]; version

71 1.10.4, version 0.17.0) for numerical operations; matplotlib ([65]; version 1.5.1) for data visualization; and

72 pandas ([66]; version 0.17.1) for data analyses were used. The code necessary to reproduce the analyses

73 is available on GitHub (https://github.com/schmelling/reciprocal BLAST).

74 Reciprocal BLAST and NCBI

75 The coding sequences of all entries in the genbank protein database [67], which were labeled as ”Complete

76 Genome” or ”Chromosome”, were downloaded from the NCBI FTP server (version May 2016). These

77 sequences, including the coding sequences of Synechococcus elongatus PCC 7942 and Synechocystis sp.

78 PCC 6803, were used to construct a custom protein database for the homology search. Further, pro-

79 tein sequences of the 23 clock related proteins (Table S1), from Synechococcus elongatus PCC 7942 and

80 Synechocystis sp. PCC 6803, respectively, were checked against the entries in the Cyanobase Database

81 to ensure correctness [68] (version May 2016). These 23 protein sequences were used as queries for a

82 search of homologs within the custom protein database, applying the standalone version of BLASTP

83 2.2.30+ [69] (May 2016) with standard parameter (wordsize: 3, substitution matrix: BLOSUM62). The −5 84 10,000 best hits with an e-value of 10 or lower were filtered for further analyses. The first BLAST run

85 returned circa 65,000 hits for all 23 cyanobacterial proteins combined.

86 These hits were used as queries for a second reverse BLASTP run, searching for homologs in Synechococ-

87 cus 7942 or Synechocystis 6803 genomes using the same parameters as above with an altered e-value of

88 10. Only hits with the original query protein as best reversal hit were accepted for further analyses, thus

89 minimizing false positive results.

90 Raw and processed data is available on figshare (https://dx.doi.org/10.6084/m9.figshare.3823902.v2,

91 https://dx.doi.org/10.6084/m9.figshare.3823899.v2).

92 Testing of co-occurence

93 Co-occurrence of circadian clock proteins was examined by using the right-sided Fisher’s exact test [70].

94 For each of the 94 cyanobacterial strains, all identified homologous clock genes were gathered into one

95 set. The phylogenetic distribution of cyanobacteria in the NCBI genbank database is very imbalanced.

96 Some genera (e.g. Prochlorococcus and Synechococcus) are covered better than others. To avoid selection

97 bias, we removed sets with identical combinations of genes, resulting in 69 unique clock systems. Null

98 hypothesis of Fisher’s exact test is a pairwise independent distribution of the proteins across all clock

99 systems. P-values were corrected for multiple testing after Benjamini-Hochberg [71] with an excepted −2 100 false discovery rate of 10 . We denote that due to the nature of statistical testing, proteins appearing in

101 almost all clock systems are always virtually independent to others. All proteins were clustered according

102 to their corrected p-values.

103 Microarray analysis

104 The diurnal expression program of six cyanobacterial strains was probed using microarray time series

105 datasets (Table S3). Unfortunately, the data for the two reported Synechocystis 6803 experiments by

106 Labiosa and colleagues [72] and Kucho and coworkers [73] could not be obtained. The study by Toepel

107 and coworkers [74] had to be discarded due to the employed ultradian light cycles (6:6 LD cycles). The

108 Synechocystis 6803 datasets, the Synechococcus 7942 dataset of Ito and colleagues, and the Anabaena

109 sp. PCC 7120 dataset were l2m transformed, while the Cyanothece ATCC 51142 datasets are only

110 available after transformation. The two biological replicates of the Anabaena sp. PCC 7120 dataset were

111 concatenated for the following analyses [75], similar to the Synechocystis 6803 dataset [76]. Expression

112 profiles were smoothed using a Savitzky-Golay lowpass filter, as proposed by Yang and colleagues [77],

113 in order to remove pseudo peaks prior to the detection of periodic genes. Diurnally oscillating expression

114 profiles were detected using harmonic regression analysis [78]. The derived p-values for each gene is

115 based on the assumption of a linear background profile as compared to the sinoidal foreground model.

116 After multiple hypothesis testing correction according to Benjamini-Hochberg [71], all datasets yielded

3 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

117 significantly oscillating genes (q ≤ 0.05) except for Microcystis aeruginosa PCC 7806, with 7 samples the

118 shortest dataset.

119 Multiple sequence alignment

120 Multiple sequence alignments were constructed by the standalone version of CLUSTAL Omega 1.2.1-1

121 [79] using 20 iterations, while only one iteration was used to construct the guide tree (May 2016). The

122 sequences for the alignments were obtained from the processed data generated, as described in the method

123 section ”Reciprocal BLAST and NCBI”. Afterwards the alignments were adjusted to Synechococcus 7942

124 sequence with Jalview [80] and edited multialignments were used to create WebLogos [81].

125 Cloning, heterologous expression and purification of Kai proteins

126 To express either GST-fused or full length KaiC1 proteins from Synechocystis sp. PCC 6714, Nostoc

127 punctiforme ATCC 29133, Cyanothece sp. PCC 7424 as well as KaiC3 from Cyanothece sp. PCC

128 7424, Microcystis aeruginosa PCC 7806, Pycrococcus horikoshii OT3 PH0833, and Thermococcus litoralis

129 DSM5473, the respective kaiC genes were amplified by PCR from genomic wildtype DNA. The ORFs and

130 primers are listed in Table S4. Amplified sequences were ligated into BamHI and NotI restriction sites

131 of the plasmid pGEX-6P-1 (GE Healthcare). For P.horikoshii and T.litoralis KaiC3 the amplified PCR

132 products were ligated into pETDuet1 Vector using BamHI/HindIII and PstI/HindIII restriction enzymes

133 in MCS1 respectively. Escherichia coli DH5α or BL21 (DE3) cells were transformed with the resulting

134 plasmids (pGEX-kaiC1-Sy6714, pGEX-kaiC1-Npun29133, pGEX-kaiC1-Cy7424, pGEX-kaiC3-Cy7424,

135 pGEX-kaiC3-Mic7806, pSVA3151, pSVA3152). For GST-fused expression of KaiA-7942 and KaiC-7942

136 pGEX derivatives, kindly provided by T. Kondo (Nagoya University, Japan), were used. Expression of ◦ 137 GST-Kai proteins occurred at 37 C and 200 rpm in Terrific broth medium containing 100 µg ampicillin −1 138 ml . GST-KaiA-7942 expression was induced with 1 mM isopropyl β-D-thiogalactopyranoside (IPTG)

139 and carried out overnight, whereas GST-KaiC homologs were expressed for 72 hours without induction.

140 Cells were harvested and lysed in ice-cold extraction buffer (50 mM Tris/HCl (pH 8), 150 mM NaCl, 0.5 141 mM EDTA, 1 mM DTT, 5 mM MgCl2 and 1 mM ATP). Recombinant GST-Kai proteins were affinity 142 purified using Protino Gluthatione Agarose 4B (Macherey–Nagel) as described in Wiegard and colleagues

143 [55]. During the procedure the GST-tag was removed with PreScission protease in cleavage buffer (50 144 mM Tris/HCl (pH 8), 150 mM NaCl, 1 mM EDTA, 1 mM DTT, 5 mM MgCl2 and 0.5 mM ATP). 145 Homogeneity of the recombinant proteins was controlled by separating them via SDS-PAGE. If it was

146 not sufficient, proteins were further purified by anion-exchange chromatography using a MonoQ 5/50 GL

147 or ResourceQ column (GE Healthcare). After dialysis in reaction buffer (20 mM Tris/HCl (pH 8), 150 148 mM NaCl, 0.5 mM EDTA, 5 mM MgCl2, and 1 mM ATP) protein concentration was determined using 149 infrared spectroscopy (Direct detect, Merck Millipore). For full-length KaiC3 homologs, E. coli BL21

150 (DE3) RIL cells were transformed with pSVA3151 (P.horikoshii) and pSVA3152 (T.litoralis), and grown ◦ −1 151 as preculture overnight at 37 C in LB medium containing ampicillin (50 µg ml ) and chloramphenicol −1 152 (34 µg ml ). Fresh medium containing antibiotic was inoculated with 0.1 % preculture and grown at ◦ 153 37 C to an OD600 of 0.7. After induction with 0.3 mM of IPTG, growth was continued for 16 hours ◦ ◦ 154 at 16 C. Cells were collected by centrifugation, frozen in liquid nitrogen, and, after storage at -80 C,

155 resuspended in 50 ml lysis buffer (50 mM Hepes-NaOH, pH 7.2, 150 mM NaCl) containing Complete

156 EDTA-free protease inhibitor cocktail (Roche) together with DNase I and lysed by sonication. Cell ◦ 157 debris were removed by centrifugation at 4 C for 30 min at 20,000 x g. Further Ni-NTA (Sigma Aldrich,

158 Seelze, Germany) based purification was performed using columns equilibrated in purification buffer (50

159 mM Hepes-NaOH, pH 7.2, 150 mM NaCl). For the removal of unspecifically bound protein columns

160 were washed with 15 column volumes of equilibration buffer including 10 mM imidazole. T.litoralis

161 KaiC3 was eluted in the same buffer with 150 mM imidazole containing equilibration buffer, whereas for

162 P.horikoshii KaiC3 the elution was carried out in 20 mM MES pH 6.2, 150 mM NaCl, 150 mM imidazole

163 as this protein is stable in low pH buffer. Further purification of T.litoralis KaiC3 was achieved by size ◦ 164 exclusion chromatography using Superdex 200 10/300 GL. P.horikoshii KaiC3 was incubated at 50 C for

165 20 min and centrifuged for 15 min at 10.000 x g. Subsequently, the supernatant was dialyzed overnight

166 against 20 mM MES pH 6.2, 150 mM NaCl buffer. As a quality control, proteins were separated via ◦ 167 SDS-PAGE and pure proteins were frozen in liquid nitrogen and kept at -80 C.

168 In vitro phosphorylation assays

169 To investigate KaiA dependent phosphate uptake 12 µg of KaiC-7942, KaiC1-Sy6714, KaiC1-N 29133,

170 KaiC1-Cy7424, KaiC3-Cy7424, KaiC3-Mic7806, KaiC3-T.lit or KaiC3-P.hor were mixed with 10 µCi 32 171 γ-P -ATP in 60 µl Tris reaction buffer (20 mM Tris/HCl (pH 8), 150 mM NaCl, 0.5 mM EDTA, 5

4 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

172 mM MgCl2, 1 mM ATP) in the presence or absence of 6 µg KaiA-7942. 10 µl aliquots were taken after ◦ 173 0, 0.75, 1.5, 3 and 22 hours of incubation at 30 C and reaction was stopped by adding SDS-sample

174 buffer. Proteins were separated in high-resolution polyacrylamide gels (10 % T, 0.67 % C) by SDS-PAGE

175 (modified from [82]), stained with Coomassie brilliant blue and subjected to autoradiography. Signals

176 were analyzed using a Fujifilm FLA-3000 (FUJIFILM). To analyze in vitro phosphorylation of KaiC3-T.lit 32 177 and KaiC3-P.hor at higher temperatures, the recombinant proteins were incubated with 10 µCi γ-P - 178 ATP in HEPES reaction buffer (50 mM HEPES (pH 7.2), 150 mM NaCl, 5 mM MgCl2, 1 mM ATP) 179 or MES reaction buffer (50 mM MES (pH 6), 150 mM NaCl, 5 mM MgCl2, 1 mM ATP), respectively, ◦ 180 at 75 C. After 0, 5, 10 and 15 minutes 10 µl aliquots were taken and analyzed by SDS-PAGE and

181 autoradiograpy as described above.

5 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

182 Results and Discussion

183 Most complete set of circadian clock orthologs in Cyanobacteria

184 Circadian rhythms are reported for many Cyanobacteria and first sequence analyses revealed that the

185 core genes, known from Synechococcus 7942, are conserved in almost all cyanobacterial species [4, 53,

186 55, 83]. Even though daily rhythms seem to be conserved in Cyanobacteria, composition and quantity

187 of corresponding genes on the genome level show high variability [54]. Dvornyk and colleagues first

188 attempted to describe the variety of cyanobacterial core circadian clock systems in 2003 [53]. Since then,

189 the amount and depth of sequencing data increased manifold, which allowed us to perform a detailed

190 analysis of the KaiC-based circadian clock including the input and output pathways. The circadian clock

191 proteins of Synechococcus 7942 (Table S1) and the KaiB and KaiC homologs from Synechocystis 6803

192 (Table S1) were the basis of our analysis. Their protein sequences served as base for a reciprocal best

193 hit BLAST analysis. Organisms with at least one homolog to KaiC were retained for further analysis.

194 This stringent filter was essential since KaiC represents the core of the circadian clock in Synechococcus

195 7942. These organisms were grouped by their corresponding genus for a first overview of the homolog

196 distribution (Fig. 1A). We found homologs in Cyanobacteria, , Archaea, as well as other

197 Bacteria such as Chloroflexi. This finding is in good agreement with previous studies [53, 55, 84]. However,

198 our comprehensive study identified a plethora of new bacterial and archaeal genera harboring homologs

199 to the circadian clock genes (Fig. 1A). Nevertheless, Cyanobacteria represent the phylum with the highest

200 degree of sequence similarity and integrity of the system followed by Bacteria, mostly Proteobacteria. In

201 Archaea, homologs to only a fraction of core genes could be identified (Fig. 1A).

202 Core circadian clock factors KaiB and KaiC beyond the phylum of Cyanobacteria

203 Four out of seventeen studied factors are exclusively found in Cyanobacteria (Fig. 1A). One of these

204 factors is KaiA, as previously reported in studies with a smaller sample size [53, 84]. However, even

205 some Cyanobacteria, like Candidatus Atelocyanobacterium thalassa (previously named Cyanobacterium

206 UCYN-A), and all representatives from the genus Prochlorococcus lack kaiA (Fig. 1A) [54]. Interestingly,

207 multiple copies of kaiA in a single cyanobacterial genome could not be identified. This is of special interest,

208 because we could observe strong sequence length variations for KaiA (Fig. 5A, C) and multiple copies for

209 the other core proteins KaiB and KaiC have been reported (Fig. 1B) [53, 54, 55]. With the kaiA-lacking

210 Candidatus Atelocyanobacterium thalassa and a kaiA-containing cyanobacterial endosymbiont [85] only

211 two out of 94 studied Cyanobacteria do not contain a kaiB, whereas a third of the cyanobacterial genera

212 contain multiple copies of kaiB and kaiC, which are also homologs to other known KaiB or KaiC variants

213 (Fig 1B, C). There are a few exceptional cyanobacteria like Gloeobacter violaceus [86], which are even

214 lacking the kaiC gene and are thus not detected in this analysis due to the previously described filtering

215 criteria. However, all Cyanobacteria having homologs to kaiB and kaiC are coding for at least one pair of

216 proteins most similar to KaiB1 and KaiC1 from Synechocystis 6803 (Fig. 1B, C), which is consistent with

217 previous studies [53, 55]. Only Cyanothece sp. PCC 7822 has homologs similar to all three kaiB and kaiC

218 copies from Synechocystis 6803 (Fig. 1C). The majority (67.68%) of the bacterial genera, outside of the

219 cyanobacterial phylum, encode KaiC3-like proteins. Approximately half of them also contain additional

220 KaiC homologs, most similar to KaiC1, KaiC2, or sometimes even both (Fig. 1B). This observation

221 does not hold true for KaiB homologs, here KaiB2 is the major KaiB homolog (75.86%). One group of

222 organisms, all belonging to the phylum of Bacteroidetes, stands out in this analysis, because they encode

223 only proteins similar to KaiB2/KaiC2 (Fig. 1B). Archaeal genera show mainly homologs with highest

224 identity to KaiC1 or KaiC3. Furthermore, almost all of the Archaea have multiple copies of the KaiC

225 protein. Whereas only four have homologs to KaiB, which is either similar to KaiB2 or in thermophilic

226 Methanothermobacter similar to KaiB1 (Fig. 1B).

6 bioRxiv preprint was notcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade hw h ubro ooosfrec rti nec ynbceim oynumbers Copy cyanobacterium. each in condensed. protein (Red), each were Archaea for of five of homologs (Blue), of than proteins proteins Proteobacteria number higher the clock clock (Green), shows circadian Cyanobacteria circadian (C) the the (Grey). highlighted: Other for are for similarities groups similarities mean main mean individual the taxonomic of the shows number shows the (B) represents (A) from parenthesis 7942. in homolog number genus. KaiC The per a similarity. KaiC contains genomes and that group genus their by each sorted for protein each of 1: Figure doi: https://doi.org/10.1101/075291 A ( proteins. clock circadian of Distribution 0 KaiA KaiA KaiB KaiB

10 KaiC KaiC Pex Pex LdpA LdpA

20 PrkE PrkE NhtA NhtA IrcA IrcA CdpA CdpA 30 CikA CikA SasA SasA LabA LabA 40 LalA LalA Identity [%] Crm Crm RpaA RpaA available undera 50 RpaB RpaB CpmA CpmA Halanaerobium (1) Actinoplanes (2) Verrucomicrobium (1) Rubinisphaera (1) Granulicella ( Thermobaculum (1) Solibacter (1) Spirochaeta (1) Koribacter (1) Desulforudis ( Terriglobus (1) Thermacetogenium (1) Rhodopirellula (1) Carboxydothermus (1) Fimbriimonas (1) Hydrogenobacter (1) Opitutus (1) Bacillus ( Paenibacillus (1) Desulfotomaculum (1) Caldithrix (1) Thermosipho (1) Pseudothermotoga (1) Symbiobacterium (1) Flavobacterium (1) Fibrella (1) Niastella Thermotoga ( Hymenobacter (1) Chitinophaga (1) Chloroflexus ( Chloroherpeton (1) Phycisphaera (1) Roseiflexus (2) Pontibacter (1) Methanospirillum (1) Natronobacterium (1) Natrialba (1) Methanofollis (1) Natronococcus (1) Methanobacterium (4) Methanocella (2) Methanosaeta (2) Halanaeroarchaeum (1) Methanoculleus (2) Halopiger (1) Methanolobus (1) Aciduliprofundum (2) Salinarchaeum (1) Palaeococcus (1) Thermosphaera (1) Halalkalicoccus (1) Halomicrobium (1) Haloarcula (4) Halorhabdus (2) Natronomonas (2) Methanolinea (1) Methanohalophilus (1) Hyperthermus (1) Halorubrum (1) Methanoregula ( Methanolacinia (1) Halogeometricum (1) Caldisphaera (1) Caldivirga (1) Haloterrigena (1) Acidilobus ( Methanomethylovorans (1) Ignicoccus (1) Haloferax ( Methanococcoides (2) Sulfolobus (24) Thermoproteus ( Methanoplanus (1) Methanohalobium (1) Halobacterium (4) Natrinema Acidianus ( Thermococcus (1 Thermogladius (1) Korarchaeum (1) Archaeoglobus ( Desulfurococcus (3) Methanosalsum (1) Methanothermobacter ( Pyrococcus (7) Nitrosopumilus (2) Thermofilum (4) Vulcanisaeta (2) Pyrobaculum (1) Methanosarcina ( Pyrodictium ( Staphylothermus (2) Geoglobus ( Methanothermus (1) Pyrolobus (1) Ferroglobus (1) Methanosphaerula (1) Methanothermococcus (1) Methanotorris (1) Methanocaldococcus (5) Methylocystis (1) Myxococcus (4) Paraburkholderia ( Corallococcus (1) Lysobacter ( Methylocella (1) Pseudoxanthomonas (1) Nitratiruptor (1) Stigmatella (1) Bordetella (1) Alkalilimnicola (1) Herbaspirillum (4) Burkholderia (1 Chromohalobacter (1) Polyangium (1) Chondromyces (1) Sorangium (1) Paucibacter (1) Neorhizobium ( Acidithiobacillus (1) Phenylobacterium (1) Xanthomonas (3 Erythrobacter (1) Nitrosococcus (1) Roseateles (1) Ramlibacter (1) Anaeromyxobacter (4) Rhodomicrobium (1) Pseudomonas ( Nitrosospira (1) Desulfomicrobium (1) Rhizobium (1 Caulobacter (1) Herminiimonas (1) Variovorax (3) Dyella (1) Agrobacterium (5) Sphingobium ( Sinorhizobium ( Archangium (1) Stenotrophomonas (2) Sphingomonas (3) Massilia ( Spiribacter (1) Aureimonas (1) Methylobacterium (11) Janthinobacterium ( Azospirillum (4) Bradyrhizobium (8) Legionella ( Geobacter (1) Halorhodospira ( Mesorhizobium ( Magnetospirillum (1) Thioflavicoccus (1) Haliangium (1) Desulfobacter (1) Rubrivivax (1) Allochromatium (1) Thiocystis (1) Desulfovibrio (1) Rhodobacter ( Rhodopseudomonas (4) Rhodospirillum (4) Stanieria (1) Oscillatoria (2) Chroococcidiopsis (1) Cylindrospermum ( Synechocystis ( Microcoleus (1) Geitlerinema (1) Rivularia (1) Microcystis (3) Pleurocapsa (1) Cyanothece ( Cyanobacterium (1) Geminocystis (1) Calothrix (3) Atelocyanobacterium (1) Prochlorococcus (13) Pseudanabaena ( Nostoc ( Cyanobium (1) Acaryochloris (1) Endosymbiont (1) Fischerella (1) Nodularia (1) Planktothrix (1) Leptolyngbya (1) Crinalium (1) Gloeocapsa (1) Dactylococcopsis (1) Arthrospira (4) Anabaena (3) Chamaesiphon (1) Thermosynechococcus (2) Halothece (1) Synechococcus ( Trichodesmium 60 ; 6) (1) 2) 2) this versionpostedSeptember15,2016. (1) 1) 3) 1) 8) 2) 2) 0) 6) 2) 1) 9) 3) 1) 5) 5) 70 2) 81) (1) 5) 5) 9) 5) 2) 0) 8) 1) 2) 22) 6) 1) 21) 5) 1) 1) C B 3) 80 KaiB1 KaiB1

KaiB2 KaiB2 CC-BY 4.0Internationallicense

90 KaiB3 KaiB3 KaiC1 KaiC1 KaiC2 KaiC2

100 KaiC3 KaiC3 7 0 KaiA KaiA KaiB1 KaiB1 KaiC1 KaiC1 KaiB2 KaiB2 1

hw r h ensqec similarites sequence mean the are Shown A,B) KaiC2 KaiC2 KaiB3 KaiB3 KaiC3 KaiC3

No. Pex Pex

2 LdpA LdpA

of copies PrkE PrkE NhtA NhtA IrcA IrcA CdpA CdpA 3 CikA CikA SasA SasA Synechococcus LabA LabA . LalA LalA The copyrightholderforthispreprint(which

4 Crm Crm RpaA RpaA RpaB RpaB CpmA CpmA 5+ Synechocystis Nostoc cyanobacteriumOscillatoriales JSC-1 Acaryochloris marinaMBIC11017 Oscillatoria acuminata 63 PCC Planktothrix agardhii NIVA-CYA Nostoc Cylindrospermum stagnale 74 PCC sp Geitlerinema Chroococcidiopsis 7 PCC thermalis Microcoleus sp Microcystis panniformis FACHB-1757 Microcystis aeruginosa NIES- Microcystis aeruginosa NIES-8 Pleurocapsa 7327 sp.PCC 8801 Cyanothece sp.PCC Cyanothece sp 74 Cyanothece sp.PCC Rivularia sp Cyanothece sp.ATCC Cyanothece sp Oscillatoria nigro-viridis 7 PCC Calothrix sp cyanosphaeraStanieria 7437 PCC Nostoc 070 'Nostoc azollae' Anabaena sp.90 Synechococcus 7002 sp. PCC Synechococcus 73109 sp. PCC Leptolyngbya 7376 sp. PCC Anabaena sp Anabaena cylindrica 7 PCC Dactylococcopsis 8305 salina PCC Arthrospira YZ platensis Arthrospira NIES-39 platensis Arthrospira sp. PCC Arthrospira C1 platensis Synechococcus sp Thermosynechococcus elongatus BP-1 Nodularia spumigena CCY Halothece sp Trichodesmium IMS erythraeum 742 Cyanothece sp.PCC Gloeocapsa sp Crinalium epipsammum PCC Calothrix sp. PCC Fischerella sp. NIES-3754 Nostoc 7107 sp.PCC Nostoc 7120 sp.PCC Calothrix sp Atelocyanobacterium ALOHA thalassa isolate Prochlorococcus marinusMIT str. Prochlorococcus marinus subsp Prochlorococcus sp.MIT Prochlorococcus marinusMIT str. Prochlorococcus marinusNATL1A str. Prochlorococcus marinus str Prochlorococcus marinusMIT 921 str. Prochlorococcus marinusMIT 9301 str. Prochlorococcus marinus str Prochlorococcus marinus subsp Prochlorococcus marinusMIT 9 str. Prochlorococcus sp.MIT 0801 Prochlorococcus marinusMIT 9313 str. Synechococcus sp Thermosynechococcus sp. NK Endosymbiont Pseudanabaena 7367 sp. PCC Synechococcus sp. JA-2-3B'a(2-13) Synechococcus sp. JA-3-3Ab Cyanobium 6307 gracilePCC Synechococcus 7336 sp. PCC Synechococcus sp. WH 8 Synechococcus sp. CC9311 Synechococcus sp. WH 810 Synechococcus sp. WH 81 Synechococcus sp. CC960 Synechococcus sp. CC99 Synechococcus sp. KORDI- Synechococcus sp. KOR Synechococcus sp. RCC307 Synechococcus sp. WH 78 Synechococcus sp. KOR Synechococcus elongatus 63 PCC Synechococcus elongatus 79 PCC Synechococcus sp. UTEX Chamaesiphon minutus PCC Synechococcus 7 sp. PCC Synechocystis 6714 sp. PCC herdmaniiGeminocystis PCC Cyanobacterium aponinum 1060 PCC substr. PCC-N Synechocystis 6803 sp. PCC substr. PCC-P Synechocystis 6803 sp. PCC substr. GT-ISynechocystis 6803 sp. PCC Synechocystis sp. PCC punctiforme 7 PCC 7524 sp.PCC piscinale CENA . PCC 7116 . PCC . 336/3 . PCC 7507 . PCC . wa102 . PCC 7418 . PCC . PCC 7428 . PCC . PCC 7113 . PCC . PCC 8802 . PCC 7822 . PCC . PCC 7407 . PCC 6303 . PCC 6312 . PCC . WH 8109 8005 8 92 h eeaare genera The 7942. 51142 21 6803 24 DI-100 DI-49 5 0604 02 020 502 2973 9414 5 3102 03 03 52 122 2 . NATL2A . AS9601 2549 6605 9333 6308 55a 43 112 04 . marinus str.CCMP1375 . pastoris str.CCMP198 126/8 101 01 42 9211 9312 203 17 515 2 5 5 83 h four The 6803. Synechococcus 6 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

227 Circadian clock factors involved in solely cyanobacterial input pathways

228 Multiple clock input factors are present in all four taxonomic groups namely PrkE, NhtA, and CikA. The

229 latter also acts in the output of the circadian clock (Table S2, Fig. 1A) [36, 37, 44]. The prevalence of

230 CikA is in good agreement with its tremendous effect on resetting the circadian clock in Synechococcus

231 7942 [37], where its interaction with the Kai complex is mediated by additional proteins [36, 38]. CikA

232 destabilizes in the presence of oxidized quinones and thereby integrates information about the cellular

233 redox state into the oscillator [36]. Naturally, CikA-lacking Prochlorococcus are not able to reset their

234 timing mechanism in continuous light [87], pointing at the importance of this factor. However, single

235 mutations in sasA can restore circadian properties in Synechococcus 7942 cikA mutants and it has been

236 suggested that a simpler network with modified interactions of the other clock proteins can exist [88].

237 The input factors Pex, LdpA, and CdpA are found to be unique for cyanobacteria (Fig. 1A), confirming

238 a previous analysis of ldpA [89]. Hence, with Pex, a transcriptional repressor of kaiA [34], and LdpA, a

239 redox-sensing protein [20] two factors, which sense the cellular metabolic state of the circadian clock in

240 Synechococcus 7942, are missing outside of the cyanobacterial phylum. LdpA is also the only input factor

241 present in the reduced but functional timing system of Prochlorococcus [54, 83]. This might indicate the

242 necessity of this factor for the entrainment of the clock [60]. Nevertheless the possibility of a functional

243 clock without Pex and LdpA remains, since LdpA and Pex mutants in Synechococcus 7942 are only

244 altered in their period length [90, 35] and Pex is also missing in the KaiA-lacking Prochlorococcus and the

245 KaiA-containing Synechocystis 6803 (Fig. 1A). In Synechococcus 7942 the third unique cyanobacterial

246 input factor, CdpA, influences phase resetting and acts in parallel to CikA [38]. Since CdpA seems to be

247 essential in Synechococcus 7942 [38], it likely has a prior role in processes other than phase resetting and

248 is hence dispensable for input pathways in other organisms than Cyanobacteria.

249 Altogether, the absence of three important input factors outside of the cyanobacterial phylum suggests

250 that other entrainment systems might be used for putative timing systems. In Rhodobacter sphaeroides,

251 which displays circadian gene expression rhythms, a histidine kinase is encoded in an operon with kaiBC

252 and was suggested as a candidate for transducing the redox signal to KaiBC [91]. Further, the direct

253 entrainment by the ATP/ADP ratio [30, 92] might be the primary mechanism to synchronize the circadian

254 clock with metabolism and the environment.

255 Central output factors are missing in Archaea and non-cyanobacterial genera

256 The output pathway of the circadian clock in Synechococcus 7942 involves eight proteins (see Conclussion).

257 RpaA serves as a key regulator [43]. Its activity is indirectly modulated depending on the phosphorylation

258 state and the ATPase activity of KaiC [44, 93]. SasA (antagonistically to CikA) connects the core clock

259 to RpaA, which in turn regulates global gene expression, including the kaiBC promoter [41, 43, 44, 94].

260 LabA, CrmA and RpaB are also known to affect RpaA [42, 46, 47, 50, 94]. CpmA modulates kaiA

261 expression by an unknown mechanism [49]. In contrast to the unique cyanobacterial input factors, we

262 found none of the eight output proteins exclusively in Cyanobacteria. Homologs of five factors (SasA,

263 CikA, LalA, Crm, CpmA) are present in all four investigated taxonomic groups (Table S2). CpmA is

264 a member of a superfamily essential for purine biosynthesis and thus likely to have orthologs in other

265 organisms [49]. However, RpaA and RpaB are not present in Archaea and SasA is only found in the

266 methanogenic genera Methanospirillum, and Methanosalsum. Hence, the entire central output pathway is

267 missing in Archaea (Fig. 1A). In addition, orthologs for RpaA are found in only nine non-cyanobacterial

268 genera, questioning whether another transcription factor might read out the putative core timer in other

269 Bacteria (Fig. 1A). A previous BLAST search by Dvornyk and colleagues revealed that SasA homologs in

270 non-cyanobacterial prokaryotes lack the KaiB-like domain [61]. This finding is confirmed in our analysis,

271 indicating that stimulations of SasA homologs by KaiC outside of Cyanobacteria are very unlikely, because

272 interaction occurs via this KaiB-like domain, which adopts a thioredoxin-like fold [39, 95]. Altogether,

273 our analysis reveals that possible circadian clocks of Bacteria and Archaea must use an output pathway

274 that is different from the one described in Synechococcus 7942.

275 Co-occurrence analysis hints at the core module for circadian timing

276 The previous analysis revealed substantial differences in the composition of the clock components between

277 Cyanobacteria, other Bacteria, and Archaea. Even within Cyanobacteria there is a huge variety in the

278 composition of the potential circadian clocks (Fig. 1C). Cyanobacteria have either a severely reduced

279 timing systems, such as the one in Prochlorococcus, a standard system as seen in Synechococcus 7942, or

280 an inflated system as found in Synechocystis 6803 (Fig. 1C). This trichotomy of systems raises questions

281 about essentiality and pairwise co-occurrence of circadian clock proteins. These questions were answered

282 in a series of right-sided Fisher’s exact tests. To avoid systematic biases due to an overrepresentation

8 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

283 of closely related strains [96], we extracted 69 unique combinations of the 21 circadian clock factors as

284 described in materials and methods.

285 Within these 69 unique systems two factors are always present: (i) KaiC, because we selected for organisms

286 containing at least one KaiC-like protein and (ii) RpaB, which is associated with cell size and circadian

287 gene expression [42, 51]. RpaB competes for promoter binding sites with RpaA and its phosphorylated

288 state is thought to inhibit the phosphorylation of RpaA [47]. Other factors present in the majority

289 (≥ 90 %) of the observed unique clock systems are KaiA, KaiB, LdpA, IrcA, CikA, SasA, RpaA and

290 CpmA. Because of their abundance, most of these factors show no pairwise co-occurence. For example

291 RpaA and RpaB are found in 68 and all 69 clock systems, respectively. Thus their joint presence

292 comes to no surprise. Instead, the finding confirms essential roles in global transcription regulation of

293 Cyanobacteria [97]. With Fisher’s exact test we seek to identify gene pairs rather unexpectedly co-

294 occuring in a smaller subset of organisms. Such findings can indicate a common function in the circadian

295 clock system. Only KaiA and CikA, out of the most abundant factors, show significant co-occurence with

296 other factors. In addition, Pex, PrkE, CdpA, LabA, LalA, as well as KaiB2 and KaiC2, and KaiB3 and

297 KaiC3, co-occur significantly (Fig. 2A). Interestingly, all of these factors are missing in Prochlorococcus.

298 Additionally, PrkE, CikA, LabA, and LalA are also missing in most marine Synechococcus species.

299 We detected three significant co-occurences between factors of the input pathway (Fig. 2): (i) between

300 CikA and its interaction partner PrkE [38] (ii) between PrkE and CdpA and (iii) between the kaiA

301 repressor Pex and CdpA. The first two results are in good agreement with a previous study [38]. Within

302 the output pathway, there is a significant co-occurrence between LabA and its ortholog LalA. Interestingly,

303 we identified several significant co-occurrences between factors of the input and the output pathway.

304 CikA, which functions in the input and output of the clock, co-occurs significantly with LabA, and LalA

305 (Fig. 2). This fits well in the overall picture as CikA and LabA are thought to regulate the activity of

306 RpaA [44, 50]. Additionally, PrkE shows also significant occurrences with both LabA and LalA (Fig. 2).

307 Furthermore, CdpA was found to co-occur significantly with LabA. (Fig. 2) This is of special interest since

308 PrkE and CdpA are only known as interaction partners of CikA, and both are involved in phase resetting,

309 and cell division, respectively [38]. This result, however, hints at potential increased involvement of PrkE

310 and CdpA in the RpaA regulation and supports the view of an integrated network with overlapping

311 interactions of input and output factors [98]. Notably, no co-occurrence of NhtA and LdpA was detected,

312 although it was suggested that NhtA might be involved in assembly of the iron-sulfur cofactor of LdpA

313 [38]. Significant co-occurrences with core factors could only be observed between KaiA and LalA (Fig. 2).

314 However, KaiA shows a strong, but not significant, co-occurrence with CikA (p = 0.0105). Lastly, we

315 also found significant co-occurrences between KaiB2 and KaiC2 as well as KaiB3 and KaiC3 (Fig. 2).

316 This indicates two distinct function of the two pairs. In this context it is worth mentioning that kaiB2 -

317 containing archaeal genomes always encode a KaiC2 homolog.

318 In summary, we identified a conserved set of factors (Fig. 2), both in input and output that show

319 significant co-occurrences. This set, composed of KaiA, PrkE, CdpA, CikA, LabA, and LalA, is found in

320 Cyanobacteria with a true circadian clock such as Synechococcus 7942 and Synechocystis 6803 [73] but is

321 missing in cyanobacterial strains with reduced timing mechanisms such as Prochlorococcus. This finding

322 hints at the importance of these factors for the functionality of a circadian clock. On the other hand,

323 NhtA and Crm seem to play only a minor or extending role in clock regulation, because they are neither

324 always present nor show significant co-occurrence with other factors.

9 bioRxiv preprint was notcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade 346 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 327 326 325 351 350 349 348 347 ytmtcaayi fcrainepeso nCyanobacteria in node. expression that are circadian of links of degree analysis Missing the systematic to significance. A proportional of is level size Node the to 0.01. than corresponds the p-value of color higher results line a the had The to ( that regard test. those in algorithm. calculated exact clustering cyanobacteria, agglomerative Fisher’s p-values in hierarchical Only factors pair-wise a clock heatmap. by circadian a sorted co-occurring in significant are visualized Proteins are tests, significant. exact as Fisher’s pair-wise by lated 2: Figure uiiiybtentorsetv esrmnsi omo ttsial infiateeae correlation elevated significant p statistically 0.31, in are of = patterns repro- estimates (ρccc form expression good amplitude ex- in circadian indicates and oscillating amplitudes the measurements phase and with of respective cases phases genes two these oscillation to the in between of limited only ducibility comparison were since Direct datasets analyses descriptors. compared following meaningful both The established in phases. was profiles strain-specific pression expression programs of peak reproducibility expression estimated the circadian coefficient establish to two correlation to between circular datasets the Similarity replicate using biological programs. compare expression gene. we sinusoidal circadian each estimates step a yielding of first background, assumes amplitude a as and method profiles resulting In phase expression This the linear peak subjected detection. uses the and and oscillation of profiles raw-data regression expression the harmonic circadian and reprocessed of a instead we shape regulated regulated to comparison, diurnally circadian-clock series con- as direct were time of expression a which expression experiments combination oscillating allowing with microarray For Supplement), a genes chosen to circadian. S3, to the of refer Table leads of therefore which see We all conditions, details genes. Not light-induced light (for cyanobacterial analysis. constant different conditions this under of light for ducted datasets applicable diurnal pattern. time-series and or regulatory in microarray available transcriptional light changes published were common nine constant major a of under requires expect total which strains might a lifestyle, one compared night, photoautotrophic we and a Thus, that day Considering of between 101]. challenge metabolism 100, the the [99, patterns share regulation repeatedly Cyanobacteria circadian have with all strains genes cyanobacterial of of fractions range substantial a indicated in measurements expression time-resolved Genome-wide ekepeso hsswt ag al a n al ih lses u h ag eaiecorrelation negative large the but clusters, night early leh13 and the day to early compared large dataset with beck14 phases the expression in day peak both the Interestingly, during phases measurements. expression of concentration distinct the While A doi: https://doi.org/10.1101/075291 10 oocrec fcraincokpoen ncyanobacteria. in proteins clock circadian of Co-occurrence Synechocystis

-2 KaiB1 CpmA

 SasA RpaA .1,and 0.01), IrcA LdpA Pex p-values 83dtst hwsgicn iiaiy h orlto sdmnse ythe by diminished is correlation the similarity, significant show datasets 6803 CdpA PrkE available undera 10 KaiA Cyanothece

-1 LabA CikA

LalA ; this versionpostedSeptember15,2016. Cyanothece NhtA KaiC1

Synechococcus RpaB c sdsrbdb amlmdk n am 12 applied [102] Sarma and Jammalamadaka by described as ρccc KaiB2 p TC512( 51142 ATCC sp. KaiC2 CC-BY 4.0Internationallicense Crm KaiB3 10 p TC512dtst xii odareetof agreement good a exhibit datasets 51142 ATCC sp. KaiC3 0 KaiC3 KaiB3 Crm KaiC2 KaiB2 RpaB KaiC1 NhtA LalA CikA LabA KaiA PrkE CdpA Pex LdpA IrcA RpaA SasA CpmA KaiB1 10 c .1 p 0.61, = (ρccc 7942 B c 05,p -0.51, = ρccc 10 -17 KaiB2 . Crm KaiC3 KaiC2 The copyrightholderforthispreprint(which KaiB3 NhtA CikA RpaB LabA   p-values 0.01), .1 Fgr o row). top 4 (Figure 0.01) h -aus calcu- p-values, The (A) PrkE 10 KaiC1 KaiA ≤ -10 .1aeconsidered are 0.01 Synechocystis KaiB1 CdpA ewr of Network B) CpmA LalA Pex SasA LdpA RpaA 10 IrcA 6803 -3 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

352 emphasizes the presence of a significant number of anti-phasic gene pairs. The corresponding oscillation

353 amplitude values exhibit high statistically significant correlations (ρccc > 0.77) for all three datasets

354 (Fig. 3 bottom row).

355 This observation motivated the second step of the analysis, the comparison of expression patterns between

356 different cyanobacterial strains. To facilitate this comparison, the prediction of homologous genes in the

357 cyanobacterial clade by Beck and colleagues [103] was used as starting point. We focused first on the set of

358 genes with oscillating expression patterns in all datasets. The estimate of the core diurnal genome based

359 on the presented data collection spans 95 genes (Table S5), which are mostly involved in central metabolic

360 processes. Most strikingly, 18 out of 64 genes annotated by Cyanobase [68] as “ribosomal protein” genes

361 in Synechocystis 6803 fall into the core diurnal set, furthermore seven out of 27 genes annotated with

362 ”photosystem II”. The remaining diurnally expressed genes are found interspersed across the metabolic

363 network. While genes coding for parts of photosystem, the RNA polymerase, or the ribosomal proteins

364 can be expected to serve important roles in the adaptation to photic and aphotic phases, this analysis

365 ascribes similar importance to other metabolic processes e.g. in the repair of UV-damaged Photosystem

366 II centers (slr1390 ) [104], in the phosphate transport system (pstB2 ), pyrimidine and arginine biosyn-

367 thesis (sll1498 ), or the glycolysis/gluconeogenesis via the fructose-bisphosphate aldolase (sll0018 ). By

368 excluding genes, which possess light-inducible expression, marker gene candidates for a working clock can

369 be derived from the core diurnal genome, such as the light-independent protochlorophyllide reductase

370 subunit ChlB and the fructose-bisphosphate aldolase fbp. Indeed, fbp also shows circadian expression

371 patterns in Clamydomonas reinhardtii [105] and Arabidopsis thaliana where its late-night peaks may re-

372 flect the great importance of these aldolases in higher plants for the mobilization of plastidic starch [106].

373 Particularly in higher plants, the mobilization of starch, the conversion into sucrose, and its transport to

374 other parts of the plant occur mainly at night.

375 The group of 15 ”hypothetical protein” genes in the core diurnal genome constitutes an excellent can-

376 didate set for novel clock-driven genes in strains with a working core clock. Interestingly, several of

377 these genes are implicated with cell division, such as the YlmG-related hypothetical gene ssl0353, which

378 is required for proper distribution of nucleoids in cyanobacteria and chloroplasts [107]. Similarly, the

379 hypothetical protein slr1577 is suggested to function in the separation of chromosomes during cell divi-

380 sion (Uniprot entry P74610). For the gene slr1847 (Uniprot entry P73057) a DNA binding capability is

381 suggested, which could therefore regulate expression, aid nucleoid organization, or protect the DNA.

382 The core clock genes kaiA, kaiB, and kaiC are a good starting point for a detailed comparative expression

383 analysis. Only the kaiB1 is significantly oscillating in all considered datasets. Interestingly, the kaiA gene

384 features only very low amplitude expression oscillations and is arrhythmic in the vijayan09 Synechococcus

385 7942 dataset. The expression phases vary from dawn (Microcystis aeruginosa PCC 7806) to morning

386 (Synechocystis 6803), over midday (stoeckel08 dataset of Cyanothece sp. ATCC 51142), and dusk (ito09

387 dataset of Synechococcus 7942), into night (Anabaena sp. PCC 7120). The observed expression phases

388 of kaiB1 are comparable to those of kaiA, but with significantly larger amplitude in Synechococcus 7942

389 datasets. The phase of the kaiB1 homolog in Prochlorococcus marinus MED4 peaks before dawn, com-

390 parable to Anabaena sp. PCC 7120. The kaiC1 expression phases and amplitudes match those of kaiB1,

391 with the notable exception of Cyanothece sp. ATCC 51142 for which antiphasic late-night peaks are

392 observed. In Prochlorococcus marinus MED4, kaiC1 peaks during the early night in contrast to the late

393 night phase of kaiB1.

394 Many aspects agree well with previous knowledge. In Synechococcus 7942, the core clock genes kaiB

395 and kaiC are arranged in the kaiBC operon resulting in similar expression patterns [108, 109, 110],

396 while Cyanothece sp. ATCC 51142 features the kaiAB1C1 operon [111]. Interestingly, Cyanothece sp.

397 ATCC 51142 features consistent anti-phasic expression of kaiB1 and kaiC1 whereas the remaining strains

398 show co-expression, hinting at Cyanothece-specific post-transcriptional regulation of kaiB1 or kaiC1. Os-

399 cillations in the kaiA gene expression, as reported by Ishiura and colleagues, feature small expression

400 amplitudes compared to kaiB and kaiC [112]. In fact, kaiA consistently falls below the threshold of

401 2-fold expression change for the classification as circadian oscillator, which is commonly employed in

402 microarray studies.

403 In the second step we generalized the detailed analysis of expression phases as presented for the core

404 diurnal genome. We applied the circular correlation measure to all possible combinations of expres-

405 sion datasets. The resulting distribution reveals a clear separation between pairs of biological replicate

406 datasets, featuring large numbers of shared oscillating genes and more extreme correlations (Fig. 4 red),

407 and pairs of different cyanobacterial strains with fewer shared oscillating genes and much less extreme

408 correlation coefficients (Fig. 4 blue). The only exception to this separation is the Cyanothece sp. ATCC

409 51142 dataset stoeckel08 (Fig. 4 green), which shares many oscillating genes with both Synechocystis

410 6803 datasets (leh13, beck14). The corresponding correlation coefficients are, however, similarly small

411 compared to other inter-strain pairs. The full set of pairwise phase comparisons, which underlay this

11 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

412 analysis are shown in Figure S1. This result indicates that the diurnal peak expression phase is not

413 preserved amongst homologous genes in the cyanobacterial clade but might instead be tuned according

414 to the metabolic gene outfit and the environmental needs of the respective strain.

xx xxxxxxxxxx x x x x x xxx xxxxxxx x x xx x xxx x x xxx xx x x xxxxx xxxxxxxxxxxxxx x xxx x x xxxxxxx x xx x xx x xx x xx x x xxxxxxxxxx xxxxx xxxxxxxxxx xxxxxxxxxxx xxxx x x xx xxxxxxx xx x x xx x xxxxxxxx x x x x x xxx x x xxxxxxxxx A xxxxxxxxxxx xx xx x xx x x x xx xxxxxxxxxxxxxxx B x x x x C x x xxx x x xx xxxxx xxxxxx xxxx xxxxxxxx xxxxx x xxxxxxxx x xxxxxxx x xx x xx x x x xx xxxx xxxxxx xxxxxxxxxxxxx xx x x x x xx xxxxxxxxxxxxxxxxxxxxxxx xx x x xxx x xxxxx xx xx x x xx xx xxxxxxxxxxxxx xx x x xxxxxxxxxxxxxxxxxxxxx x xxx xx x x x x x x x xx x xxx xxx xx xx x x xxxxxxxxxxxxxxxxxxxxxxxxx x x xxxxx x x x x x x x xxxxxxxx x xx xx xxxxx x x x x x xxxxxxxxxxxxxxxxx x xxx xxx x x xx x x x x xxxx xxx xx x xxxxx x x xxx xxxxxxxxxxxxxxxxxxxx x xxxx xx x x xx x x x xx x xxxxxxx xxxxxxxxxx x x xx x x x xxx xx xx xxx xxx xx xx x x xxx x x x xxx x x x xxxxxxx xxxx xx xx x x xx xx xx x x xx xx x xx xx xx xx xx x xxxx xx xxx x x xx x x xxx xxxxx xxxxxxxxx xxxx xx x x xx xx xx x x xxx xxxxxxx xx x xxx xxxxx xx xx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxx x x xxx xx x x xx xxxxxx x xxxxx x xxxxxxxxxxxxxxxxx x x xxxx xx xxxxxxxxxxxxxxx xxxxxxxx xxx x x x x x x x xxx xxxx xx x x xx xxxx x xxxxxxxxxx x x x x x x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xx x x x xx xxx xx x xxx xxxxxx xx x x x xx xx x x xx x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xx x xx x x xxxxxxxxx xx x x xxxxxx x x xx xx x x xxxxx xxxxxxxxxxxxxxx xx x xx x xxxxxxxx x x xxxx x xx x xx xxx x xxxx x xx xxxxxxxxxxxx xxxxxxxx xxxx xx x xx xxxx xx x x x xxxxxxx xx xxx xx x x x x xxxxxxxxxxxxxx x x x xx xxxx xx x x x xx xxxxxxx xx x x xx x x x x xxx x xx xxxxxxxxxxxxxxxxxx xx x x x x x x xxxxxx xx x x x x xxxx xx x xxx x x xx x x xxxxxxxxxxxxx xx x xx x x xxxxx x x x x x xxxxx xxxx x x x x x x xxxxxxxxxxxxx xxxxxxxx x x xx x xxxxxxxxxxxxx xx x xxx xxx xxx x x x x x xxx xx x xx x xxxxxxxxxxxxxxxxxx x x x x x x xxxxxxx x x x x x x xxxxx x x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xx xxxx xxxxx xxx x x xxxx x x x x x x xx x x x xxx xxxxxxxxxxxxx xxxxxxxxxxxxx xx x x x x xxx xxxxxxx xx x xxxx xx x x x x xxxxxxxxxxxxxxxxxxx xxxx x x xx xxx x xxxxxxx x x xx xx xx xx x xx x xxxxxxxxxxxxxxxxxxx x xxxx xx x x xxx x xxxxxxxx xx x x xxxxxxxx x x x x x x xx x x x xxxxxxxxxxxxxxxxxxxx xx xx xx x x xxxxxxxxxxxxxxxxxx xxxxx x xxxxxx x x x x x x x xx xxxx xxxxxxxxxxxxxx xx x x xx x xx x x x xxx xxxxxxxxxxx xxxxxxxxxxxxxxxxx x xxxx x x x xx xx xx x xxxx xxxxxxxxxxx x xxxx xxxxx xxx x x x x xx x xx xxxxxxxxxxxxxxxxxxxxx x x x x xxxx x xx x xxx x x x xxxxx xxxxxxxxxxxx x x x x x xx xx xx xxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxx xx x x xx x x x xxx x xxxxxx x xx x x x x x xxxxxx xxxxxxxxxxxxxxxxxx xxx xx x xxxxxxxxxx xx xx xx x x xx xxx xx xxxxx x xx x x x x xx xxxx x xxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxx xx xxxxx xxx xxx x x xx x xxx xx xx xxx xx xxxxx x x x xx xxxx x x xxxxxxxxx xxxx x x x x x xxxxxxxxxxxxxx xxxxxxxxxxxx xx x x x x xx xx xxx xxxxxx xxxxxx x x xx x x x x xxxxx x x xxx x x x x xxxxxxxxxxxx xxxxxxxxxxxxxxx xx x xxx x x x x xxx xx x x x x xxx xxxxx φ [CT] ito09 x xxx x xx x x xxx xx xxx xxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x xx xx x xxx xxxxxxx xxx x x x xxx xx xxx x xxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x xxxxxxxxxxx xxxx xxx x x x xx x x x x x x φ [CT] beck14 x xx xxxxxxxxxxxxxxx x x x x xxx x x xx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x x xxxxxxxxxx xxxxxx x x x x x x xxxxxxxxxxxxxxxxxxxxx x φ [CT] toepel08 xxxxx x x x x x x x x x xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxx xxx x x x x xx xxxxxxxxxxxxxxxxxxxxxxx x xxx x x xxx xx x x xxx xx xxxx x xxx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxx x x x xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxx xx x x x x xx xx xxx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x xxx xxxxxxxxxxxxxxxxxxxxxxxxxx xx x x x x x xx xxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxxxxxxxx x x xx xxx xx xxx x x x x x x xxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxx x xx xx xxx x xx x x x xx xxx xxxx x x x x xxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxx x xxxx x x x xxxx x x x x xxxx x x x x xxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx xx x xxxxx x xxxxx xx xx xxxxx x x xxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxx xx xxx x xxx xxxx xxxxxxxxxxx xxx x x xxx xx x xxxx x xxxxxxx xxxxxxxxxx xxxxx xxx x x x xxxxx x xxxxx xxxxxxx xxxx xx xxx x x x xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx xx x xxxx xxx xxxxx xxx xxxxxxxxx xxxxx x xxx xxx x x xxxxxxxxxxxxxxxxx x x xxxx x xxx x xxxxxxxxxxxx x xxxxxxx xx x x x xx x x x xxxxxxxxxxxx x xxxxxxxx xx xx xxxxxx xxxxxxxxxxx xxxx x x x x xxx xxxxx xxx xx xxx xxxx xx x x x xxxxxx x xx xx xxxxxxxxxxxxxxxxx x x xx xxxx x x x xxx xx x x x xx x xxx xxxx xxx xxxxx 0x 5 10 15 20 x x x xx x x x 0 5 10 15 20 x x 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 φ [CT] vijayan09 n=1490 φ [CT] leh13 n=2566 φ [CT] stoeckel08 n=2345 ρccc =0.61 (p <<0.01 ) ρccc =0.31 (p <<0.01 ) ρccc =−0.51 (p <<0.01 )

x x x x x x x x x x x xx x xx x x x x x x x x xxx x x x x x x x xx x x x xx xxxx xx x x x x x xx x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x xx xx xx x x x x x x x x x x xxxxx x x x xx x x x x x x xx x x x x x x x x x x x x x x x x x x x xx x x x x x x x x x x x xx x xx x x xxx x x x x x x x x x x x xx x xx xxx x x xx x x x x x x x x x x xxxxxx x x x x x x xxxxxxx x x x x x xxxxx xxx x x xxxxx xx x x x x x x x x x x x xx x x x xx xxx xx x xx xxx xx x x x x xx xx xx xx x xx xx x xxxxx x xxxxxxx x x x x x x x A [a.u.] ito09 x x x xx x x xxxx xx xxxxx x x x xx x xx xxxx x xxx xx x xxxxxxxx x xxx xxxx x x x xx x x x xx xx x xx x x A [a.u.] beck14 xx xxxx xxxxxx xxx x xx xx x x x x xxxx xxxxxxxx xxxxxxx x x xx xxxxxxxx xxxxxxxx xx xxxxxx x x x x xxxx xxxxxxxxxxxxx x x A [a.u.] toepel08 xx xxx x x xx x xx x x x x xxxxx xx x xx xx x xxxxxxxxxxxxxxxxxx xxxxx x x xx xx xxxxxxxxxxxxxxx xxxxx x xx x xxx x xx xxx x x xxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxx xx xxxx xxxxxxxxxxxxxxxxxxxx xxx xx xxx x x xx xx xxxxxxxxxxxxxxxxxxxxxxxxxxx xxx x xxxxxx xxxxxxxxxxxxxxxxxxxxx xxx x x x x x xx xxx xxxxx xx x xxxxxxxxxxxxxxxxxxxxx xxxx xxxxxxx xxx xxxxxxxxxxxxxxxxxxxxxxxxxxx x x x xxxxxxxxx x xx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx xxx xxxxxxxxxxxxxxxxxx xxxx xxxx xxxxxxxxxxxxxxxxxxxxxxxxx xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxx x x x xxxxxxxxxxxxxxxxxx xx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxx xxx xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x xxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxx xx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxx xxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxx xxxxxxxxxxxx x xx xxxxxxxxx xxxx xx xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxx xx x xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxx xxx xxxxxxxx x x xxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxx xx x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx xxxx x xxxxxxx xx xxxxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x xxxxxxxxxxxxxxxxxxxxxx x xx x xxxxxxxxxxxxxx xxxxxxx xxxxx xxxx xx 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 0.0 0.5 1.0 1.5 2.0 0 1 2 3 4 0 1 2 3 4 A [a.u.] vijayan09 n=1490 A [a.u.] leh13 n=2566 A [a.u.] stoeckel08 n=2345 ρ =0.78 (p <<0.01 ) ρ =0.93 (p <<0.01 ) ρ =0.77 (p <<0.01 )

Figure 3: Phase and amplitude reproducibility of diurnal genes within cyanobacterial strains. Comparison of expression phase ϕ (top, [CT]) and amplitude A (bottom, [a.u.]) of diurnal genes shared between independent datasets of the same cyanobacterial strain. (A) Synechococcus datasets of Vijayan and colleagues [113] (x-axis) and Ito and coworkers [101] (y-axis), (B) Synechocystis in the datasets of Lehmann and coworkers [76] (x-axis) and Beck and colleagues [103] (y-axis), and (C) Cyanothece in St¨ockel and colleagues [114] (x-axis) and Toepel and colleagues [115] (y-axis). The number of genes n found to oscillate significantly and the corresponding Pearson correlation coefficient ρ between ϕ and A is provided below each panel, followed by the respective p-value for ρ differing from 0. For ϕ, the circular correlation coefficient ρccc is also provided. Axis labels are shown in the strain-specific color.

12 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

2500 Synechocystis Cyanothece sp. 6803 ATCC 51142 2000

Synechococcus s c

o el. 7942 1500 N 1000 500

−0.4 −0.2 0.0 0.2 0.4 0.6

ρccc

Figure 4: Expression phase similarity versus number of shared oscillating genes. The compar- ison is carried out between all dataset combinations. Biological replicate experiments available for three cyanobacterial strains show a large number of shared oscillating genes as well as large correlation coeffi- cients (red). The comparison of the stoeckel08 dataset (Cyanothece) with both Synechocystis datasets is marked in green. The remaining comparisons between various strains are shown in blue.

415 Core circadian clock proteins, KaiA, KaiB, KaiC, vary in number and length

416 The previously observed diversity of circadian clocks within Cyanobacteria, and between other Bacteria

417 and Archaea prompted further sequence analyses of the core clock proteins KaiA, KaiB and KaiC. Length

418 comparisons gave rise to some new features of variations between the core factors of the circadian clock

419 (Fig. 5). As described in the preceding, our BLAST analysis detected KaiA exclusively in Cyanobacteria.

420 Interestingly, we could distinguish three subtypes of KaiA. While the sequence length of most KaiA

421 is around 300 amino acids (AA) (Synechococcus 7942: 284 AA) some stains have shortend homologs

422 with a length of roughly 200 and 100 AA, respectively (Fig. 5A, C). Truncated KaiA proteins are almost

423 exclusively found in members of the order Nostocales. Similar results were reported previously by Dvornyk

424 and colleagues, who also observed a higher degree of polymorphism for the kaiA gene in comparison to

425 kaiB and kaiC [52, 53]. Multiple alignments of the KaiA proteins (Fig. S2) verified that the truncated

426 KaiA proteins have a shortened N-terminal sequence, which functions in the complete protein as the

427 amplitude amplifier [116]. However, all of these KaiA orthologs contain the C-terminal part important

428 for clock oscillation [116].

429 Distribution of the KaiB protein length reveals two distinct groups. The KaiB homologs are either as

430 long as the one from Synechococcus 7942 (102 AA) or about 250 AA in length (Fig. 5B, C). Microcoleus

431 sp. PCC 7113 even has a KaiB with the length of 381 AA. KaiB homologs with query length are

432 present in all four groups (Fig. 5B). Elongated KaiB proteins are mainly present in Cyanobacteria in the

433 subclass Oscillatoriophycideae and the order Nostocales, specifying findings of Dvornyk [53] (Fig. 5C).

434 BLAST analyses using the KaiB homologs from Synechocystis 6803 revealed that elongated variants are

435 most similar to KaiB1. Elongation via concatenation of two KaiB was ruled out by visually inspecting

436 alignments with two artificially concatenated KaiB1. Instead the elongated KaiB1 have a ∼150 AA N-

437 terminal extension. BLAST searches of the N-terminal region showed no homologous sequences in other

438 organisms than Cyanobacteria, and no putative conserved domains could be identified. However, the

439 N-terminal part is highly conserved within the Cyanobacteria having this KaiB variant. Interestingly,

440 those Nostocales with an elongated KaiB1, also show a truncated KaiA.

441 The KaiC protein from Synechococcus 7942 is 519 AA in length and is build up by two domains, the CI and

442 CII domain, which have a high similarity and are connected by a linker-domain [117, 118]. The C-terminal

443 CII domain of KaiC comprises the interaction sites with KaiA as well as the specific phosphorylation

444 sites [29, 119, 120, 121]. KaiC homologs were detected with lengths varying between 101 AA and 741

445 AA (Fig. 5A, B). There are KaiC homologs in Archaea representing the whole observed length spectrum

446 of KaiC, whereas bacterial KaiCs are almost always about 500 AA in length (Fig. 5A, B). Furthermore,

447 in Bacteria and Archaea KaiB (and KaiA in Cyanobacteria) is only found when a ‘full-length’ KaiC is

448 present. In these bacterial organisms the length of KaiC is almost constant, regardless of the length of

449 KaiA and KaiB homologs (Fig. 5A, B). However, in KaiB-possessing Archaea additional shorter KaiC

450 homologs are found (Fig. 5B).

13 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

451 Moreover, the length distribution of KaiC revealed a substantial amount of KaiC homologs with a length

452 of circa 250 AA, which is approximately the length of one KaiC domain. This KaiC variant is mainly

453 found in Archaea, but also in a few Bacteria. In these bacterial species no KaiB homolog could be

454 identified. Shorter KaiC homologs do not contain the important phosphorylation sites for maintaining

455 the oscillator function. Therefore, they might not restore the full functionality of the Synechococcus 7942

456 KaiC, but can rather answer questions about the evolution of KaiC [53]. Regarding the evolution of

457 KaiC, two valid hypotheses exist, both of which state that KaiC arose from a shorter ancestral recA

458 gene followed by a gene duplication and fusion. However, on the one hand Leipe and colleagues [122]

459 hypothesize that an ancestral single-domain KaiC originated in Bacteria, was transferred into Archaea,

460 where its two-domain version originated, and a second lateral transfer event introduced the double domain

461 KaiC into cyanobacteria. On the other hand Dvornyk and coworkers [53] argue in a follow up study that

462 KaiC has to be of cyanobacterial origin. Given the amount of new genomic data further studies would

463 help to unravel the evolutionary history of KaiC.

A B

700 700

600 600

500 500

400 400

300 300

200 200 KaiC Length [AA] KaiC Length [AA]

100 100

0 0 0 100 200 300 400 0 100 200 300 400 KaiA Length [AA] KaiB Length [AA] C

400 Cyanobacteria Proteobacteria 300 Archaea Other

200

100 KaiB Length [AA]

0 0 100 200 300 400 KaiA Length [AA]

Figure 5: Protein length distribution of the circadian clock factors from Synechococcus 7942 and its homologs. The four taxonomic main groups are highlighted: Cyanobacteria (Green), Pro- teobacteria (Blue), Archaea (Red), Other (Grey). The curves outside of the plot represent the cumulative density distribution of the respective protein. (A) KaiC length distribution in dependency of the KaiA length. (B) KaiC length distribution in dependency of the KaiB length. (C) KaiB length distribution in dependency of the KaiA length.

464 Conserved motifs and activities in the cyanobacterial KaiC subgroups

465 For KaiC2 homologs outside of the cyanobacterial phylum an involvement in stress response (Legionella

466 pneumophila [123]) and adaptive growth under rhythmic conditions (Rhodopseudomonas palustris [57])

467 has been demonstrated. Both proteins display autophosphorylation and KaiC2 from Rhodopseudomonas

468 palustris shows elevated ATPase activity [57]. Nevertheless, the function of cyanobacterial KaiC2 and

14 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

469 KaiC3 homologs remains unclear. We already demonstrated that KaiC2 and KaiC3 from Synechocystis

470 6803 displays kinase activity, which is independent of KaiA, whereas KaiC1 behaved like its Synechococcus

471 7942 ortholog [55]. Those activities could also be predicted from the C-terminal amino acid sequences

472 [55]. To test whether general features of the three KaiC subgroups can be predicted, multiple alignments

473 of the cyanobacterial KaiC1, KaiC2 and KaiC3 sequences were constructed. A WebLogo analysis revealed

474 that relevant motifs for phosphorylation and dephosphorylation in the CII domain are highly conserved.

475 The ATP-binding Walker Motif A (P-loop in Fig. 6A, GXXXXGKT, [112, 124, 125]) is present in all three

476 KaiC subgroups. Strikingly, the respective sequence of KaiC-7942 (GATGTGKT) shows almost no mod-

477 ifications in KaiC1 and KaiC3 proteins. Furthermore, catalytic glutamates (EE in Fig. 6A, [126, 127]),

478 the R-finger contacting the γ-phosphate of ATP [128], and the truncated Walker motif B (WalkerB in

479 Fig. 6A, [26, 112, 125]), were found in all cyanobacterial KaiC subgroups. Notably, the arginine residue

480 of the Synechococcus 7942 Walker B motif is not conserved in KaiC2 homologs. Serine and subsequent

481 threonine are the dominant phosphorylation sites in KaiC1 and KaiC3 proteins, like S431 and T432 in

482 KaiC-7942 [119, 120], whereas KaiC2 homologs display two serine residues. In some KaiC3 homologs

483 a tyrosin is present as second phosphorylation site. T426, which is important for dephosphorylation of

484 KaiC-7942 [119, 126, 129, 130], is also highly conserved. Therefore, phosphorylation and dephosphoryla-

485 tion via autokinase [26], ATP synthase and ATPase activity [27, 126], respectively, are very likely for all

486 cyanobacterial KaiC homologs. The same holds true for the N-terminal ATPase activity: We observed

487 high conservation of the Walker motif A (P-loop in Fig. 6A), the catalytic glutamate residues (EE in

488 Fig. 6A) and the R-finger in the CI domains of all cyanobacterial KaiC subgroups. The presence of the

489 R-linker in CI domains of KaiC1 and KaiC3 homologs indicate a structural coupling of the N-terminal

490 CI and the C-terminal CII-domain as it was demonstrated for Thermosynechococcus KaiC [128], whereas

491 KaiC2 homologs lack the R-linker in CI.

492 KaiC homologs from the genus Prochlorococcus were classified as KaiC1 orthologs in our BLAST anal-

493 ysis (Fig. 1B). However, Prochlorococcus strains do not contain KaiA, and KaiC from Prochlorococcus

494 marinus MED4 was demonstrated to phosphorylate independently of KaiA due to a modified A-loop

495 sequence [83]. Therefore, we compared WebLogos of the A-loop sequence [28] for KaiC1 orthologs from

496 Cyanobacteria with and without KaiA (Fig. 6A). The most obvious difference to KaiC1 proteins from

497 cyanobacterial strains with KaiA is the presence of neutral glutamine in the second position, instead of

498 a positively charged arginine. In KaiC2 and KaiC3 Weblogos motifs are even less conserved as already

499 described for the Synechocystis 6803 representatives [55]. Interestingly, the KaiC3 WebLogo motif does

500 not display any charged residue anymore. The absence of A-loop residues that are important to keep it

501 in the buried state [28] indicate that the KaiA-independent phosphorylation is characteristic for KaiC2

502 and KaiC3 homologs. This is supported by the low conservation of the 438-444-loop and/or the 422-

503 loop, which are part of the interaction network that mediates inhibition of phosphorylation by the buried

504 A-loops in Synechococcus 7942 [28, 118]. The high conservation of these loops in KaiA-lacking strains

505 remains enigmatic.

506 To test this hypothesis, phosphate uptake as an exemplary KaiC activity was analyzed for representa- 32 ◦ 507 tive cyanobacterial KaiC1 and KaiC3 proteins by incubation with γ-P -ATP at 30 C in the presence

508 and absence of Synechococcus 7942 KaiA (KaiA-7942) The well-studied KaiC from Synechococcus 7942

509 (KaiC-7942) served as control. As demonstrated in Figure 6B and 6C all recombinant KaiC proteins in-

510 corporated phosphate over time. The intrinsic kinase activity of KaiC1 homologs from Nostoc punctiforme

511 ATCC 29413 (KaiC1-N294133), Synechocystis sp. PCC 6714 (KaiC1-Sy6714), and Cyanothece sp. PCC

512 7424 (KaiC1-Cy7424) was stimulated by KaiA-7942, similar to KaiC-7942 (Fig. 6B). As expected, KaiA

513 had no effect on autophosphorylation of KaiC3 from Cyanothece sp. PCC 7424 (KaiC3-Cy7424) and Mi-

514 crocystis aeruginosa PCC 7806 (KaiC3-Mic7806, Fig. 6C). To extend the analysis to non-cyanobacterial

515 proteins, KaiC3 from the hyperthermophilic Archaea Thermococcus litoralis (KaiC3-T.lit) and Pyrococ- ◦ ◦ 516 cus horikoshii (KaiC3-P.hor), which show optimal growth at 85 C and 98 C [131, 132], were analyzed ◦ 517 in a similar way. Again both recombinant KaiC3 proteins displayed phosphorylation at 30 C, which ◦ 518 was independent of KaiA-7942 (Fig. 6C). Incubation at 75 C indicated that the two archaeal KaiC3

519 homologs display kinase activity also at high growth temperatures (Fig. 6D). Hence kinase activity of

520 KaiC proteins seems to be well-conserved, independent of the growth conditions of the strains, they are

521 originating from.

15 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 6: Activity of diverged KaiC homologs. (A) Conservation of important motifs in cyanobac- terial KaiC1, KaiC2 and KaiC3 homologs based on a WebLogo analyses. Motifs in KaiC1 homologs are displayed for proteins from organisms, which encode a KaiA protein or lack KaiA, respectively. Numbers indicate the residues in KaiC-7942. Properties of the residues are displayed as follows: polar (green), neutral (purple), basic (blue), acidic (red), and hydrophobic (black). (B,C) Phosphate uptake analyses of selected KaiC1 (B) and KaiC3 (C) homologs at 30 ◦C in dependence of KaiA. KaiC proteins were incubated with or without KaiA-7942 in the presence of γ-P32-ATP. After 0, 0.75, 1.5, 3, and 22 hours samples were separated via SDS-PAGE, stained with Coomassie (CBB), and subjected to autoradio- graphy (ARG). The asterix indicates a contaminating protein. (D) Kinase activity of archaeal KaiC3 homologs at 75 ◦C. KaiC3 proteins from Thermococcus litoralis and Pyrococcus horikoshii were incubated with γ-P32-ATP and autophosphorylation was analyzed after 0, 5, 10, and 15 minutes. Shown are the Coomassie stained proteins (CBB) and autoradiography (ARG).

16 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

522 Conclusion

523 A core module for circadian regulation

524 Our analysis of 11,264 genomes clearly demonstrates that components of the Synechococcus 7942 circadian

525 clock are present in various bacteria and archaea. However, the frequency of Kai-clock related proteins

526 is highest in Cyanobacteria. In fact KaiA, Pex, LdpA, and CdpA are exclusive to organisms of this

527 phylum. In other organisms, e.g Rhodobacter sphaerodies, reduced KaiBC-based clock systems are likely

528 able to drive circadian oscillations [91]. An even simpler system solely dependent on KaiC might enable

529 diurnal rhythms in Haloferax volcanii [59], probably using the ATP/ADP ratio for clock entrainment.

530 Predictions for KaiC activities based on sequence alignments and motif analyses were validated through

531 a series of biochemical experiments. We confirmed ATPase and kinase activity for ‘full-length’ KaiC

532 proteins composed of one CI and one CII domains, even in organism without kaiA or kaiB. KaiA from

533 Synechococcus 7942 enhanced KaiC-phosphorylation only in strains naturally possessing a kaiA gene.

534 Our co-occurrence analysis hints to a conserved extension set for circadian regulation, which is present in

535 cyanobacteria with observed circadian behavior and absent in cyanobacteria having a diurnal, hourglass-

536 like lifestyle only (see also Fig. 7). A diurnal core set, which is important to enable an hourglass-like

537 timing system that resets every day, might be composed of KaiB, KaiC, LdpA, IrcA, SasA, RpaA, RpaB,

538 and CpmA. However, our identified circadian core set, which potentially enables a selfsustained clock,

539 consists of KaiA, the two input factors CdpA, and PrkE as well as the input and output factor CikA,

540 and the output factors LabA, and LalA.

541 The systematic comparison of microarray timeseries datasets indicates that the diurnal peak expression

542 phase is not conserved amongst homologous genes in the cyanobacterial clade. Instead, the expression

543 phase may be tuned according to the gene outfit and varying environmental needs. The analysis yielded

544 a set of 95 genes in the core diurnal genome, which can be considered critical for the adaptation to day

545 and night. Particularly the subset of non-light induced genes are prime candidates for circadian clock

546 marker genes. This set furthermore contains several hypothetical genes, which are interesting candidates

547 for novel clock-driven genes.

548 The gained insights about the diversity within the composition of the components involved in the circadian

549 protein clock as well as the diversity on the sequence level of the core factors call for further modification

550 and simplification of the clock. The exponential increase of molecular tools for synthetic applications in

551 recent years sets the stage for such ambitious projects. A future goal could be the reduction of complexity

552 by removing as much factors as possible so that an integration of a circadian clock in synthetic and

553 industrially valuable organisms becomes feasible.

17 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure 7: Schematic overview of the circadian clock in Synechococcus 7942. Shown are the protein interactions based on [20, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]. The plastoquinone pool and ATP/ADP ratio serve as input signals to entrain the circadian clock with the environment and metabolic state of the cell. The signals are recognized by the input factors LdpA and CikA and the core clock factors KaiA and KaiC. Other input factors are Pex, and the CikA interaction partner NhtA, PrkE, IrcA, and CdpA. The stimulating effect of KaiA on KaiC is antagonized by KaiB. The output of the clock is comprised by SasA, CikA, RpaA, RpaB, LabA, LalA, Crm, and CpmA. SasA interacts with KaiC and further phosphorylates RpaA, whereas CikA acts as a phosphatase on RpaA. RpaA is also regulated by LabA, Crm, and RpaB. RpaA together with RpaB controls global gene expression as well as the expression of the kaiBC cluster. CpmA is a transcriptional regulator of kaiA. The seven most interconnected factors we found in our co-occurrence analysis are highlighted in green. Factors colored in light grey showed no co-occurrence in the Fisher’s exact test and are present in < 90% of all observed systems. Asterisks (**) indicate the factors missing in Prochlorococcus.

18 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

554 Supplemental Material

x xxxxxxx x x xx x x xxxxxx xx x xxxx x x xxxxxxxxxxxxxxxxx x x x xx xxxxxxxxxxxxxxxxx x x xxxxxxxxxx x xxxxxxxxxxxxxxxxxxxx x x x xxx xxx x x xxxxxxxxxxxxxxxxxxxxx x xx xxxxxxxxxxxxxxxxxxx x x x x xx x x xxxxxx x x xxxxxxxxxxxxx x x x x xxxxx x x xxxxxxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxx xx x x xxxxxxxxxxxxxx x xxxxxxxxxxxxxxx x xxxxx xx xxxxxxxxxxxxxxxx xxx x xxxxxxxxxxxxxxxxxxxxxxxxxx x x x x x xxxxxxxxxxxx x xxxxxxxxxxxxxxxxxxxx x x x xxxx xxxxx x x x x x xxxxxxxxxxxxxxxx x x x xxxxxxxxxxxxxxxxxx xx x x xxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxx xx x xx x 20 x x xx x x x xxxxxxx x xxxxxxxxx x x xx xxx x x xxxx x xxxx xxxx x x xxxxx xx x xx xxxxxxxxxxxxx x xxxxxxxxxxxxxxx x xxxxxxxxxxxxxxx 20 xxxxxxxxxxxxxx x x x x x xx xxxxxxxx x xxxxxxxxx xx x xxxxxxxxxxxx x xx xxxxxxxxxx x xxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxx x x x xx x x xxxxxx xxxx x x xxxxxxxxxxxxxxxx x x xxxxxxx x x x x xxxxx x x xxxxxxxxxxxxxxx x x x el08 xx x xxxxxxxxxxxxx x xx x xxxxxxxxxx x xxxxxxxxxxxx x x xxxxxxxxx x xxxxxxxxx x x xx xxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxx xx x xx xx x xx xxxxxxxxxxxxx xx x xxxxxxxx xx x x xxxxxxxxxxx xx x xxxxx xxxxxxxxxx xx x xxxxxxxxxxxxxxxxxxxxxx x x xxxxxxx xxxxxxxxxxx x x xxxxxxx xxxxxxxxxx x x x xxxxx x x xxxxxxxxx x x x x xx xxxxxxxxxxxxxxxxxxx x xx xx x x xx x xxxxxxxxxxxx x xxxx xx xx x x x xxxxxxxxxx xxxxxx x xxxxxxxxxx xx xxxxxxxxxxxxx x x x x x x xxxxxxxxxxxxxxxxxxx xxxxxx x xxx xxxxx xx 15 x x x x x x xxx x x xx x xx x x xxxxxxxxx xxx x xxxx x xxxxxxxxxxxx xx x x xxxxxxxx x x x x xx xxxxxxxxxxx x x 15 x x x xxxxxxxxxxxxxxxx x xxxxxx xxx x x xxxxxx x x x x x xxxxx x x xxxxxx x x Beck14 xx xxx x x xxxxxxx x xx xxxx x xxx x x xxxxxxx x x xxx xxx xxxxxxx x x xx x x xxxxxxxxxxxxxxxxxxx xxx x xx xxxxxxxxxxx xx x xxxxxxxxxxxx x x x x x x xxxxxxxxxxxxxxxxxxxxx x x xx x x x xx x x x x x an09 xxxxxxxxxxxxxx x x xxxxxxx x xxxxxxxxxx x xxxxxxxxxx xx x x x x xxxxx xx xxxxxxxxxxxxxxxxxxx xx x x x x x xx xxxxxxxxxxxxxxxxx x xxxxxxxxxxxx x x xx x xxxxxxxxxxxxxxxxxxx x x xxxxxxx x x x x y xx xxxxxx xx xx x x xxxxxxxxxxxxx x x xxxxxxxxxxxxx x x xxx xxxxxxxxxxxx x x x x x x xxxx xx x xx x xxxxx x Stoec k x x x xx x x xxxxxxxx x xx x xx x x xxxxxxxx xxxx xxx xxxxxxxxxxxxxxxxxxxxxxxx xx xx x x xx xxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxx x x x x x xxxxxxxx x xxx x x x xxxxxxxx x x x Ito09 xxx xxxxxxxxxxxxxxxxxxxxx xx x xxx xx xxxxxxxxxxxxxxxx x x x x xxxxxxxxxxx x x xx x 10 10 xxxxx xxxxx x x x xxx x x xxxxxxxxxxxxxxxx xxxxxxxxxx xxxxxxxx xxx x xxxxxxxxxxxxxxx xxxx xxx x xxxxxxx xxxx x x xxxxxxxxxxxxxxx x x x x x xxxxxxxxxxxxxxx xx x xxxxxxxxxxxx x xx x xxxxxxxxxxxxxxxxxxxxxxx x x xxx x xx x x x x xxxx xx x Zinser09 x xxxxxxxxxxxx x x xxxxx x x xxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxx x Vij a x xxxxx xxxxxxxxxxxx x x xxxxxxxxx x xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxx x x xx xxxxxxx xx xxxxxxxxxxxxx x x xxxxxx xx xxxxxxxxxxx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x xxxxxxxxxx x x xxxxxxxx x x xxxx x x xxxxxxxxxxxxxxx xxx x x xx φ (CT), xxxxxxxxxxxxxxxxxxxxxxxx x x xxxx xx x x x xxxxxx x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxxx xxx xx x xxxxxx x x x x xx x x xxxxxxxxxxxxxx xxx x x xx x xxxxxxxxxxxxxxxx x xxx xxx 5 xxxxxx 5 x xxxxxxxxxxxxxxxxxxxxxxxx x x x x x xxxxxxxxxxx x x x x x x x x xx xx x x xxxxxxxxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxxx xxxxxxx xx x x xxx x (CT), x xxxxxxxxxxxxxxxxxx xx x xxxxxxxxxxxxxxxxxxxx x x x xxxxxxxxxxxxx xx x x xxxxxxx x x xxxxx xxxxxxxxxxxxxxxxxxxxxxx x x x x xxxxxxxxx x xx xxxxxxxxxxxxx x x x x x x φ xxxxxxxxxxxxxx x x xx xxxxxxxxx x xxxxxxxxxxxxxxxxxx x xxxx xxx x xxx xxxxxxxxxxxxxxxxxxxxxxxx xxx xxx x xx xx xxxxxxxxxx xx xxxxxxxxxxxxxxxxxxxx x x x xx xxxxxxx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxx x x x x xxx x xxxxxxxxxxxxxxxxxxx xx x xxxxxxxxxxx x x x xxxxxxxx x x xx xxxxxxxxxxxxxxxxxxxxx x x xxx x xxxxx x x xxx xxxxxxxxxxxxxxxxxxxx xx x x x xxxxxxxxxxxxxxxxxxx x x x xxxx xx xx xxxxxxxxxxxxxxxxxxxxx xx xx x x xx xxxxxxxxxxxxx x x xxxxxx x x xxx 0 x xxxxxxxxxxxxx x 0 x x x xxxxxxxxxxxxxxxx x xx xxxxxxxx x x x x xxx xxx x xxxxxxxxxxxxxx

0 5 10 15 20 Leh13 , n= 1208 Leh13 , n= 1047 Leh13 , n= 825 0 5 10 15 20 φ φ (CT), Leh13, n= 2566 ccc= 0.032 , p< 0.28 ccc= 0.025 , p< 0.43 ccc= −0.029 , p< 0.38 (CT), Leh13, n= 1737 ccc= 0.31 , p<< 0.001 ccc= 0.088 , p< 0.00023 xxxxxxxxxx xx x x x xx xxx x xx xx xxxxxxxxxxxxxxx xxx x xxx xx xx xxx x xxxxxxxxxxxxxxxxxxxx x x xxx xx x x x xxxxxxxxx x xxxxxxxx x x x x x x xxxxx xxxx x x xxxxxxxxxxxxxxxxxxxxxx x x xxx xx x xx xx x xxxxxxxxxxxxxxxxxxxxxxx x x x x xxxx x x x x x xxxx x x x x xxxxxxxx x xxxxxxxxxxxxxxxx x x x xxxxxxxxxxxxxxxxxxx x xx xxxxxxxx x x xx xx xxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxx xxxx xx xxxxx xx xx xxxxxxxxx x x xxxxxxxx x x x x xxxxxxxxxxxxxxxxxxxxx x x x x xx x x xxxxxxxxxxxxxxxx x x x x x x xxx x xx xx xxxxxxxxxxxxxxxxxxxxxxxx xx xxxx xxx x xxxxxx x xxxxxxxxxxxxxxxxxx x x x x xxxxxxx x x xxx x xxxxxxxxxxxxxxxxx x x x x xxxxxxxxxxxxxxx xxxxx x x xxxxxxxxxxxxxxxxx x x xx xxx x x xx x xxxxxxxxxxxxxxxxxx x x x x x x x x xxxxxxxxxx x x x xxxxxxxxxxxxxxx x x x x xxxxxxxxxxxxxxxxxx xx xxx xxx xx xxxxxxxxxxxxxxx xx xx x x x x x xx xxxxxxxxxxxxxx x x x x x x x xxxxxxxxxxx x x xx x xxxxxxxxxx x x x xxxxxxxxxxxxxx x x x xxxxxxxxx xxx x x x xxxxxxxxxxxxxx x x x x xx x xxxxxxxxxxxx x x x x xxxxxxx x x x xxxxxxxx xx x xx x x xxx xxx x x x x xx xxxxxxxxxxxxxxx x xx xx xx x x xxxxxxxxxxxx xx xxxx xx xx x xxxxxxxxxxxxxxx x x x xxx x xxxxxxx xx x xxxxxxxxxxxxx xxx x x x x xxx x x x xxx xxxxxxx xx x x x xxxxxx x xxx xx x x x x xxxxxxxxxx xx x xx x xxx x xx x x x x xxxxxxxxxx x xxx x x x x xxxxxxxxxxxx xxx x x x xx x x xxx x xxxxxxxxxx x xx xx x xxxx x x x xxxxxxxxxx xx x x xx xx x x xxxxxxxxxx x x x x xx x xxxxxxxxxxxxxxxxxxx x xxxxxxxx x x xx xxxxxxxx xx xxxxxx x x xxxxxxxxx xx x xx x x xxxx x x xxxxxxxxx xxx x x x x xxxxxxxxxxxx x x xx x x xxxxxxxxxxx x xx xx x x xxxxxxxxx xxxx x x xxx xx x xx x x xxxxxxxxxxxxx x xx x x xxx xxxxx x x x x x xxxxxxxxx x xx xx xxxxxxxxxxxxxxxxxxx x x xxx x xxxxxxxxxxxxxxxxxxxxx x x xx xxx xx x x x xxxxxxxxxx x x xxxx xxxxxxxxx x x x xxx x x x x xxxxxxxxxxxxxx x x x x an09 x xxxxxxxxxxxxx xxx xx xx xxxx x xx x x xxxxxxx xx x xxxxxxxxxxxxxxx xxx x xx xxxxxx x x xxxxxxxxxxxxxxxxxxxxx x x x x x x x x xx xxxxxxxxxxx xx x x x x xx x xxxxxxxxxxxxx x x xxxx x x xxx xxxxxxxxxxxxxx x x xxxx y xxxxxxxxxxxxx x x x x xxx x xxxxxxxxxxxxxxx x x x x xxxxxxxxxx x x xxxx x xxxx x xx xxxxxxxxxxx x x xx xxxxxxx xx xxxx xxxxx x x x xx xxxxxxxxxxxxxxxxxxx x xxxx xx x xxxxx x x x xxxxxxxxxxxxxxxx x x x xxxx x x x x x xxx x x x x x xxxxxx x x x x Ito09 xxxxxxxxxxxxxxxxx xxx xxxx x x xxxxxxxxxxxx xx x x x xx xxxxxxxxxx x xxxxxxxxxxxx x xxxxxxxxxx x x xx x xxxxxxxxxxxxxxxxxxxx x x x x xxx x xx xxxxxxxxxxxxxxxxx x x xx xxxxxxxxxxx xxxxxxxxxxx x xxxxxx xx x x x x xxxxxxxxxxxxxxx x xxxxx xx xxxxxxxxxxxxxxxx xxx xx x x x x x x x Zinser09 x Toepel08 x xxxxxx x xx xxxxx x x xx xx xxx xxxxx xx xxxxxxxxx xx x xxxxxxxxx x x x x x xxxxxxx x x Vij a x xx x x x xxxx x x x x xxxx xx x x xx x Kushige13 x x x x x xxx x xxxxxx x xxxxxxxxxxxxx x x xxxxxxxx xx x x x xxxxxx x xx xxxxxxxx x xx x x x x x x x xxx x x x x xxxxxxxxxx x x xxxxx x x x xxxxxxxxxx x x x xxxx x x x xxxxxxxxxx xx xxx x x xxxx xxxxxxxxxxxxxxx xxx x x x xxxxx xxxxx x x x xx xxxx x x x x xx x x xxxxxxxxxxxxxxxxxxxxxx x x x xx xxxx x xxxxxxxxxxxxxx x x x x xx xx x x x x x xxxxxxxxxxxxxxxxx xx x x x xxxx x xx xxxxxx x xxxxx xxx xx x x x xx xx x xxxxxxxxxxxxxxxxxxx xx x x x xxx x xxxxxxxxxxxxxxxxx xx xx x x xxxxx xx x x x xx x x x xxxxxxxxxxxxxxxxxxxxxx x x x xxxxx x xxxxxxxxxxxxxxxxxxxxx x x x xx xx xxxxxxxxx x xx x x x x xx xxxxxxxxxxxxxxxxx x x x x x x xxxxxxxxxxxxxxxxx xx xxx x xxx xxxx x x x x x x x xxxxxxxxxxxxxxxxxx x x x x x x x xxxxxxxxxxx xxx x xx xx x xxxx x x xxxxxxx x x x x x xx x x x x x xxxxxxxxxxxxxxxxx xxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxx x xx x xx x x xxxxxxxxxx x xxxxx x x x xxxxxxxxxxxxxxx x x xxx xxxx x x xxxxxxxxxxxxxxxxxxxxxxx xxx x x x xxx xxxxxxx x xxxxxxxxxxx xxx x x xxx xxx x x

Leh13 , n= 1140 Leh13 , n= 472 Beck14 , n= 1149 Beck14 , n= 985 Beck14 , n= 791 ccc= 0.037 , p< 0.24 ccc= −0.076 , p< 0.13 ccc= 0.036 , p< 0.21 ccc= −0.036 , p< 0.24 ccc= −0.13 , p< 0.00034

x xxx x x x x x x x xx xxxx x x xxxxxxxxxxxxxxxxxxxxxx x xx x x x x xxxxxxxxx x x xx x xx xxxx x x x x x xxxxxxxxxxxxxxxxxx x x x x x x xxxxxxxxxxxxxxxxxx x x x x x x xx xxx x xxxxxxxxxxxx x x xx xxx x x x xxxxxxxxxxx xx x xx x x x xxxxxxx x x x xxxxxxxxxxxxxxxxx xx x xx x xxxxxxxxxxxxx xxx x x x x x x xx x xxxxxxxxxxxx xxx x xx xxxxxxxxxxxx x x xxxxxxxxxxxxxxxxxxx xx xx x x xxxxxxxxxxxxxxxxx xx x x xxxxxxxxxxxxxxxxxxxxxxx xxx x x x xx xxxx x xxxxxxx xx xxxxxxxxxxxxxxxxx x x x xx x xxxxxxx x xxxxxxxxxxx x x x xxxxxxxxxxxxxxxxxx x x xxxxxxx xx xxxxxxxxxxxxx xxx x x x x x xxxxxxxxxxxxxxxxxxxxxx xxxx x x x x xxxxxxxxxxxxxxxxx xxxxxxxxxxxxxx x xxx xx xxx x x x xxxxxx x x x x x xxxxxxxxxxxxxxx x xxxx xxx x x xxxxx xx x x xxxxxxxxxxxxxxxxxxxxxx xxxx xxxxx xx x x x xxx x x xxxxxxxxxxxxxxx x xxxxxxxxx xx x xxx xxxxxxxxxxx xx x xxx x xx x x x xxxxxxxxxxxxx x xxx x x x x x x xxxxxxxx x x xxxxxxxxxxxxxxxxxxxxxxx xx xxx xxx x x xxx xxxxxxxxx xxxx xxxxxx xxxxxxxxxxxxxxxxxx x x x xx x xx x x xxxxxx xxx xx x x xxxxxxxx xx x x x x x x x xxxx xx x x xxx xxxxxxxxxxxxxxx xx xxx x xxx xxxxxxxxxxx xxxxxxxxxxxxxxxxx xxxxxxxxxxxxx xx xx x xx x x x xxxxxxxxx xxx x xx x x xx x xxx x xxx x xxxxxxxxx xxxx xxxxx x x xxx xxxxxxxxxx xxxx xxx xx x x xxxxxxxxxxxxxxxxxx x xx x xxxxx xxxxx x xxxxxxxxxxxxxxx xxxxxxxxxxxxxx x xx x xx x x x x xxxxxxx xx x x x xx x xxx xxxxxxx x xx xx xxxxxxx x xx xxxxxxxxxxxxxxxxx xx x x xxxxxxxx xxxxxx x x xxxxxxx xxx x xx xx x xxxx x x x xx x x xxx xx x xxxx x x xxx x x xxx xxxxxx x xxxxxxxxxxxxxxx xxxx xxxx xxxxxxxxxxxxxxx xxxxxxxxxxxx x x x xxxxxx x x x x xx xxx x xx xxx x x x xxx x x x xxx xxx x x xxxxxxxxxxxxxxxx xxxx x xxxxxxxxxxxxxx xxxxx xxxxxxxxx x x xxxxx x x x xx x x x x xxx xxxx xxx x xxxx x xx x xxxxx x xx x xxx x xxxxxxxxxxxx x xxxx xx xxxxxxxxxxxxxx x xxxxxx xxxxx x x x x x xx x x x xxxxxxxxxxxxxxxx xxxx x x xxxx xx x xxxxxxxx el08 x x xxx x x x x x xx x x x x x x xxxxxxxxxxxxxxxxxx xx xxx xxx xx xx xxxxxxxxxxx xxx xx x x x x x x x x xxxxxxxxxxxxxxxxxxx x xxx x xx xxxxxxx x xxx x x xxxx xxxxxxxxxxxxxxxxxx xxx xx x xx xx x xxxxxxxxxxxxxxxx x x x xx x xx x xx x x x xxxxxxxxxxxx xx xxx x x x x xxx xxxxxx xx x x x x xxxxx xxxxxxxxxxxxxxxxxx xx xxx xx xxx x xxxxxxx xxxxxxxxxx xx x x x x x x x x xxxxxx x x x x xxxx xxxxxxxxxxxxxxxxxxx x xxxx xxx x xxxxxx x x x xxxxxxxxxxxxxxx x xxxxxxxxxxxxx x x xxx xxx xxxxx x x xxxxxxxxxxxxxx x x xxxxxxx x x x xxx x x xxx x xxxxxxxxxxxx xx xxxxxxxxx x x x xxxxxxxxx xxxxxxxx xx xxxxxxx x x xxxxxxxxx x x x x xxx x x x x xxxxx x x xxxxx xxxxxxx x x x x xxx x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxx x xxxxx x x x xxx xxxxxxxxxxx x x x x xx x x x x x Ito09 xx x x xxxxxxxxxxxxxxxxxxxx x x x xxx xx x xxxxx x xxxxxxxxxxxxxxx x x xxxxxx xx xxxx x x xxxxxxxxxxxxx x xx x xxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx x xxxxxxxxx x xx xxxx xxxxx x xxxxxxxxx xx xxxxxxxxxxxxxx x x xxxx x xx xxxx xx x x x x xxxxxxxxxxx x x x xx xx xxxxxxxxxxxxxxxxxxxxxx xxxxx xx xx x xx xxxxx x xxx xxxx xxx xxx xxxxxxxx x x x x xxxxx Zinser09 x x x x x Toepel08 xx x x x x x x xxxxxxxx xxxx x x x x x xxxxxxx xxx x x x x x xxxxx xxxx x xx xxxxxxx xxxxxxxxxxxx xxxxxx xxx x xxx x xxx x xxx x xxx xxxx xxx x x x x xx x x Kushige13 x x x xx x x x x xx xx xx x xx x x x x Stoec k xxxxxxxxxxxx x xx xxx x xxxxxxxxxxxxx xx xx xxxxxxx x xx x xxxxxxxx xxxxxxxx xx xxx xxx xx x x xx x xx xx x xxxxxxxxxxxxxxxxxxxx x xx xx x xxx xxxxxxx x xx x x x x xx xx xxx x xxxxxxxxxx xxx xxxx xx x x xx x xxxxxxxxxxxxxxx xxxx x x x x x xxxxxxxxxxxxxxx x x x xxxx x x x xxx x xxxx x xxx x x x x x x x x x x x xxxxxxxxxxxxxxxxx x xxx xxx x x xxxxxxxxxxxxxxxxxxxx x x x xx x x xxx xxx x xxxxxxxxxxxx xxxx x x x x x xx x x xx xxxxxxxxxxxxxxxxxxxxx xx x xx x xx x xxxxxxxxxxxxxxxxxxxx x x xx x xx x xxx x x x xxxxxxxxx xx x x x x x x xxxxx xxxxxxxxxxxxxxxxxxxx x x x x xxxx xx x x xxxxxxxxxxxxxxxxxxxxxx x xx xxxxx x xx x x xxxxxx xxx x x x x x xx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxx x xxxxx x x x xxxxxxxxxxxxxxxxxxxxx xx x x xxxxx x xx xxx xxxxxxxxxxxxx x x x x x x x x xxx xxxxxxxxxxxxxxxxx xx xxx xxxxxx x x x xxxxxxxxxxxxxxxxxxxxx xxx xx x xxxxxx x x xxxxxx xxxx xxx x x x x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxx x x xxx xxx x xxxxxxxxxxxxxxxxxxx x x xx x x xxxxxxxxxxx x xxx x xxxx x xx x xx x x x x x x xxxxxxxxxxxxxxxxxx x x x x x x x xx xxxxxxxxxxxxxxx x xx x x xx x xxxxxxxxxxxxxx x x x x xxxxxx x x x x xxx x x x xxxxxxxxxxxxxxxxxxx xxx xxx x x xxxxxxxxxxx xx x x xx x x x xxxxxxxx xxxxxxxxxxxxxxxxxx x x x x x xxx xxxxx x x x x x x xxxxx x

Beck14 , n= 1635 Beck14 , n= 1065 Beck14 , n= 426 Vijayan09 , n= 1490 Vijayan09 , n= 794 ccc= −0.055 , p< 0.024 ccc= 0.078 , p< 0.013 ccc= −0.085 , p< 0.063 ccc= 0.61 , p<< 0.001 ccc= −0.038 , p< 0.28

x x xx x x x x x x xxxxx xx x x x x xx x x xxxxxxxxxxxxxxx xxxxx x x x x xxx xxxx xxx xxx x x x x x x x x xxxx xx xxx xx xx xx x xxxxxxxxxxxxxxx xxxxx xxxx x xx xxx x xxx x xxxxxxxxxxx x xxxx xx x xx x x xx x x xx x xx x x xxx xx xx x x x xx x xx xxx x xxxx xxx xxxxxxxxxxxxxxxxx x xxxxx xxxxx xxxx xxxxxx xxx xxx x x x xx x xxx xxxx x xxxx xx x xxx x xx xxx xxx x xxxxxx x x xxxx x x xx xxxxxx x xx xxxx x xxxxxxx xxx xx x x x x xxxx xxxx xx xxxxxxxxxxxxx x x xxx xx x x x xx x xx x xxxxxxxxxxxxxxxxx xxxx x x x x x xx x x xxxxxxxxxxxxxxx x x x x xxxxx xxxxxxxxxxxxx xx xx xx xx xx x x x xxxxxx x xxxxxx x xxxx xxx x x x x xx xx xx xxx x x xxxxx xxxxxx xxxxxxxx xx xxx x x xxxx x x xxxxxxxxxxxxxxxxxx x x xx x xxxxx x xxx xxxx xxxxxxxxx xx x x xxx xx xx xxxxx x x xxx xxxxxxx x xx xx xxx xx x x x xx x x xxxxxxx x xxxxx xxxxx xxxx x x x xxxxxxxxxx x xxxx xxxxxxxxxxxxx xx x xxx xx xx xxxxxxxxx x xxx xx x xx xx xxxx xxxxx x x x xxx xxxxxxx xxx x x xx x xx x xx x xxx xx x xx x x xxx xx x x x x xxx xxxxxxxxxxxx xxx xxxxxxxxxxxxxx xx xxx x xxxxxxxx x x xx xxxx xxxxx xxxxxxx x x xxxx xx xxxxxxx x xxxxxxx xx xxxx x x x xxxxxxxxxxx x x xx x x x x xx xx xxxxxxxxxxxxx x x xx xxxxxxx xx x x xxx x xxxx xx xxxxx xxxxx x xxx xx xxxxxxxxxxxxxxxxxxxxxx xxxxxxxx x x x x x xx xxxx x x xxxxxxxxx x x x xxx xxxxxx xxxxx xx xxx xxxxxxxxx x x x x x x x x xxxxx xx x xxxxxxxxxx xxxxxxx xx x x x xxxxxxxxxxxxxxxxxx xxxxx xx xxxxx xx xx x x xxxxx xxx xx x x x x xx x x x xxxx x x xxxxxxxxxx x x x xxx xxxxxxxxxx xxxxxxxxxx xxxx xx xxxxxxxxx xxxxxx xx xxx xxxxx xxxxxxxxxxxxxxxx xxxx xxxxx xx x xx x x x xxx x x xxxx x xx x x x x x x x xx x x xxxx x x xx xxx xxxxxxx xxxxxxxxxxx x xxxxxxxxxxxxxx xxxxxx xxxx x x xxxxxxxxx xxxxxxxxxxx xxx xxx xx x xx xxx x x x xx xx x x xxxx x x x xx xxxxx x xxxx xx x xxxxxxxxxxx xxx xxxxxxxxxxxxxx xxxx x x x xxxxx x xx xx xxxxx xx xx x x xxx x xx xx x xx x x x x x x xx xxx x xxxx xxx x xxxxxxxx x xxx x xxxx xxxx el08 x x x x x xx x xxxxx xxx x xx xxx xx xx x x x xxxx x x x x x xx x x x x xxx x x xx el08 x xxxxxx xx x x xx xx x x xxxx x xxxxxxxx xxx xxx x xxxx xxx xxxxx x x xxxxxx xxx x xxx x xxxxxxxxx x x x x xxxxxx xxx xx xxxxxx x xx x xx xxxxxx x x xx xxxxxx xxxxxx x xx x x xxxxxxx xxx x x x xxxx xx x x x xxx xxxxxxxxxxxxxx xx x xx xx x x x x x x x x xxxxx x xx x xxxxxxxxxxx x x xxxxxxxxxx x x xxxxxxxxxxx xx xxx xxxx x x xx xx xxxxxxxxxxxx xxx x xx x x x x x x xxxx x xxxxxxxx x x xxx xxxxx x x x x x xx x xxxxxxx xxx x xxxxxxxx x xx x xx xxxxxxxxxxxxx x xx xxxxxxxx xx xx x x x x x xxxxx x xx x xx xxxxxx xxx xx x x x xxxxxx xx xxx x xxx x x x x x x xx x xx xxxxx xxx x x xx xxxxxxx x x xx xxxx x x xxxxxx xxxx xx x x xxxxxx x xx xxx x xxxx x x x xx x xx x x x x x x xx xx x xxx xxx x xx x xxxxxxxxxxxxxx x x x xx xx xx x xx x xxx x x x x xx xx x x xx xx xxx xx x xx xx x x xxx x xx x xxx x xx xx x xx xx x x xxxxxxxx x xx xxxxxxx xxxxx x xx xx x x x x xxx xxxx x x x xxx x x x x x x xxx x x x x x xx x x x xxx xxx xx xxx xxxxxxxxx xx xx xx xxxxxxxxxxxx x x x xx xx xxx x x x Zinser09 x x x Toepel08 x x xxx x xx x x x xx xx x xx xx x x x xx x x xx xxx xx x x x x x x xx x x xx xx x xxxxxxx xx x xxx x xxxxx x x xxx Kushige13 x x x x x x x x x Stoec k xxxx x x xx xxx xx xxxx xx x xx x x xx x x x xx xx x x x x x x x xx x Stoec k xxxx xxxx x x x x x x xxxxx xx xxx xxxx xxx xx x xx x x x x xx x x x xx x x x x x x x x x x x x x xx x x xxx xxxxxx xx xx xx xx xxxxxxx xxx xxxx x xxxxxxxxx x x x xxx xx xxxxxxxx xx x x xxxxx xx xx xxx x x x x x x x x xx x x xxx xxxx xx xxxxxxxx x x x xxxxxxx xx x xxxxxx x x xxxx xx x x xx xxxxxx xxxxxx xxxxx xx x xx xxxxxx xx xx xxxxx x x x x x xx x x x x xxxxxxxx x x xx xxxx xxxx xxxxxxxxxxx x x x xxxxxxxxx x xx xx xx xxxxxxxxxxxx xxx x x xxx x x xxxxxxxx x x x x xx x x x x xxxxxx xxxxxxxxxxxxxx xxxxxxxx xxxxxxxxxxxxxxx xxxx xx xx xxxx x xx xxxx x xxxxxxxxxx xxxx xxxxx x xx x xx x xx x x xxxxxxxxx x x xx xx xx xxxxxx x x xxxxxxxx xx x xxxxxxxx xxxxx xxxxxxxxxxxx xx xx xxxxxxxxxxxxx x xx x xxxxxxxxxxx xxxxxx xxxx x xx x xxxxxx xxxx xxxxx xxxxxxxxxxxxxxx x x x x x x x x x x x xxxxxxxxxxxxxx x xxxx xxxxxxxxxxxx xxxxxxxxxxxxxx x x xxxxxx xxxx xxxxxxxxxxxxxxxxxxxx xxxxxxxx xx x x x xxxxxxx xxxxxxxxxxxxxxxxxx x x x x xx xxxx xx x x x x x xxxxxxxxx xx x xxx xxxxxxxx xxxxxxxxxxxxxx x xx xx xxxx xxxxxxxxxxx xxx xxx x xxx xxxx xx xxxxxxxxxx x x x x xx xx x xxx x xxxxxxx xxx xxxxxxxxxxxx xxxxxxxxxxx xxxxxxxxxxx x x xx x x xx x xxxxx xxxxxxxxxxx xx xx xx x x x x x xx xx xxxxxx x x x x xx x x x x x x x x xxx x x x x x xxxxx xxxxxxxxxxxxxxxxxx xxxxxxxxxxxx x xx xx xx xxxxxxxxxxxxxxx xxxxxxx xx x xx x x xxx xxxxx x xx x x x x xx xxx x x x xx x x xxxx xxxx x x x xx x xx x x x xxxxxxxxxxxxxxxx

Vijayan09 , n= 1264 Vijayan09 , n= 829 Vijayan09 , n= 380 Ito09 , n= 709 Ito09 , n= 1103 ccc= 0.23 , p< 3.8e−15 ccc= 0.025 , p< 0.48 ccc= 0.15 , p< 0.0029 ccc= 0.027 , p< 0.47 ccc= 0.21 , p< 3.4e−11

x x xxx x x xx x x x x x xx x xxxx x x x x x x x x xx xx xxx x x x x xx xx x xxx x xxxx xx xx x x xx x xxx x x x xx x x x xxxxxx x x xxx x x x x x xx xx x xxx x x x xxxxxx x x x x xxxx x x x x x xxxx xx x xxx xxxxxx xx xxxxxx xxx x xxxxxx x x x x x xx xxx x x x x xx xx xxxx xx x x x x x xx xxxxxxx xxxxxxxxxxx xx xxx xx xxx x x xxxxxxxxx x x xx x xx xx x xxx xxxxxxxx xxxx xxxxxxx xx x xx xx x xx xx xx x xxxxxxxxx x x xxxxxxxxxxx xx xx xxxxxx xx x xxxxxxxxxx x xx x x x x xxx x xxxx xxxxxx x xxxxxxxxxxxxxx x x x x x x x x xxx x x x x xx xxxxx x xxx xxx xxxxx xxxx x xx xxx x xxx x xxxxxx xx x x x xx x x x xxxx xxx xxxxx xx xxxxxxxxxx x x x xx x xx x xx xx x x x x x xx xx x xx x x xx x x xx xxx xx x x xx x 20 x x xx xxxxxxx x x x x x x x xx xx xx x x xxxxxx x x xxxxx xxx x xx xxxxxxxxx xxxxxxxxxxxxxxxx x xx x x xx xxx xx x x x xxxxx x x x x x x xxx x x xx xxxxxxxx x x x x xxx xxxxxxx xxxxxxx x x x xxx x xx x xx x x xx x x x x xxxxxxx x xx xx xx x x x x xxxxxxxx xx x x xxxxxxxxxxxxxxxxxxxxxxx x xxx xxx x xxxxx x x x xx x xxx x x x x xxxxx x x x x x x xxxxx xx xx xxxxxxxxxxxx x x x x xx xx x x x x x x x xx x xxx x xxxx xxx x xx xxx xxxxxxxx x x xxxxxxxxxxx x x x x x xx x xxx x x xxx xx xxxxxxx x x x xx xx x xxx xxxxxx x xx x xx xxx x xx x x x xx xx x x x x x x xx xx xxx xxx xx xxxx x xxx x x 15 x x xxxxx x x xx xxxx x x x x el08 xx xx xx xx x xxx x x x xxxx x xxxxxx x x xxx x xx xxxxxx x xx x x x xxxxxxxxx x xx xx xxx xxxxxxxxx x xxxx x x xxxxxxx xx xx x x xxxx xx x x x xx x x xxxxxxxxx x x x xxxxxxxx x xx x xx x xx xxxxxxxxx xx x x x x xx x xxxxxxxxx x xx xxx x x x x x x xx xx xx xx xxxx x x xxxxxxxxx x x xxxxxxxxxxx x x x x xxxx xxxx x x xxx xx x xx x xxx xx x x x x xx x x x xx x x x x Kushige13 x x x xx x xx x xx x x x x x xx xx xxx x xx x x x x xxx xx x xx x xx xx x xxxxx x x xxxx x x x x x x x xxxxx x x xx x x x x xxxxx x x x x x x x xx xx x x xxx x xx x x x x x x x x x x 10 x x x x x xxx xxx x x x xx xxxx x x x x xx xx x x x x xxxx xx x xxx x x x xxxx x xx x x x

Toepel08 x Toepel08 x xxxx xx x x xx x xx xxxx x x xxxxx xxxx x x x x x xxxxx x x xx xx x x x x xx x x x xx Kushige13 x x x xx x x x x x x x x xx x Stoec k x x x xx x x x x x x xx x x x x x x x x x xx x x xxx xx x xxxxxxxx x xxx x xxxx x x x x xx x x x x x x xx x x x xx xx xx x x x x x x xx xx xxxxx xx x x xx xx x x xx x x x x x xx xx xx xxx xxxxxxxxxxx x x xxxx x x xxx x x x xx x x xxx x xx x x x x xxx xx x x xx xx xx xxxxxxxx x xx xxx x x x x x 5 xx xx xx xxxx xxxx x xxxxxxxxxxxxx xx x x x xx x x xx xx xxxxxxx x x xx x xx x x x xx x x x xxxxxxxxx xx x x x x x x xxx x x x x xx x xx xx x xx xxxxxxxxxxxx x xxxx x x xxxx x xxx x xxxxxxx xx x x x x xxxxx x x xxxxxxxxxxxxxx x x xx x xxxxxx xxxxxx x x x xxxxx xxxxxxxxx x x x x x x xx x x xxx x x x xxxxxx xx xx xxx x x xxxxxxx xx xxxxxxxxxxx x φ (CT), x x x xxxx xxxxxxxxxxxxxxxxx x xxxxxxxxxxxxxxxxxx x x x xx x x xx xxxxxxx xx x x xx xxxxxxxxx xxxx x x x xxxxxxxxxxxx xxxxxxxxxxxxx xxxxxxxxx x x x x x x xxxx xx xxx xxx xxx x xx xxxxxxxxxxx x xx x x xxxxxx xxxxx x xxxxxxx xxxxxx x x x x xxx x xxxxxx xxxxxxx x xxx x x x xx x x xxxxxxx x xxxxxxxxx x x x x xx x xxxx xx xxxxx xxxxxx x xxxx x xx x x x xx xxx xxxxxxx x x x x xxx x x x x xxxx x x xx x xxx xxx xxx xx x x xxxxxxxxxxxx x xx x xxxx x x xx x x x x x xx x x xxxx xx x x xxx xx x x xxxx xxx x xxx x x x x xx xx xxxx x x 0 x x xx xx

Ito09 , n= 722 Ito09 , n= 322 Zinser09 , n= 847 Zinser09 , n= 552 0 5 10 15 20 φ ccc= −0.014 , p< 0.72 ccc= 0.14 , p< 0.0068 ccc= 0.12 , p< 0.00046 ccc= −0.2 , p< 3.8e−06 (CT), Zinser09 , n= 274 ccc= 0.23 , p< 0.00047 xx xxxxx x xxx x x x xxx x x x xxxxxxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xx x x x x x xxx xx x x x x x xx x x x x x x x xxxxx xx x x x xxxxxxxxxx xxxxxxxxxxxxx x xxxx xx xxx xxxx xxxx xx xx x xxx x xxx x xx x x xx x x xx xxx x x xxxxxxxxxx x xxxxxxxxxxxxx xxxxxxxx xx x xx x xxxxx xxxxxx xxxxxx x xxx xxxxxx xx xxx xx xxxxxxxxxxx x xx x x x x x x xxxxxxxxxxxx xx xxxxxxxxxxxxxxxx x x x xxxxxxxxxxxxxxxxxx xxx x xx xxxxxxxxxxxxxx x xxx xxxx xxxxxxx xxx x x xxxxxxxxxx xxxx xx xxx xxxxxxxx xx x x x xxxx xxxxxxx xxxxxxxxxxxxxxxxxxxxx xx x xxxxxxxxxxxx x xxx xxxx xxx xxxxx x 20 x xx xxxxxxxxxxxxxxxxxxxxx xx x xxxxx xxxx x xxxx x xxxxxxx xxx xxxxx x xxxxxxx xx x x xx xxx x x xxxxx xxxx x x xx x xxxxxxxxxxxxxxxxxxxxxxxxxx xx x xx xxxx x x x xxxx xx x x x x xxx xx x xx x x x x xxxx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x xxx x x x x xx x x xx x x xxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx x xx xx x xx x x x x x x x x xxx x xx x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x x x xx x x x x x xx xx xxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x xxx xxx x xx xxx xxx x xxxxxxxxxxxxxxxxxx xx x xx xx x x x x 15 x x xxxxxxxx xx x x x x x x xx x xxx xxx xx x xx xxxxxxxxxxxxxxxxxxxxx xx x x x xx xx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x x x x xx xxx x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x x x x x x x x x x x x x x Toepel08 x xxxxxxxxxxxxxxx x x x x x xxxx xxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x x x x x x x x xxx xxxxxx xxxxxxxxxxxxxxxxxxx xxxx x xx x x x x xx x x x x x x x x x x x x xxxxxxxxxxxxxxxxxxxxxxxxx xxxxx x x xx xxxxx x x x x x x x x xxxx x x x xx 10 x xxx xxxxx x x x x x x xxxx xx x xx xxx x x xx xx x x xxxxx xxxxxxx xxxxxxxxxxxxx x xx x x xx xxxxx x x xx x x xx xx x xxx x xx xxx x x x xx x x x xx xxxxxxxxxxxxxxxxxxx xxx xxx x xxxxx xx xxx xx x x x xxx xx x xx xxx xx x xx x xx x Kushige13 x x x x x Kushige13 x x x x x x xx xxxxxxxxxxxxxxxxxxxxxx x x x x x x x x xx xx xx xx x x x xxxx x x xx xx x xxxxxxxxxx xxxxx x x xxx xxx xx xxx xx x x x x x x x x xx x x xxxx xx xx xxxxxxxxxxxxxxxxxxxxxxxxxxxx xx xxx xxxxxxx xx x x x x x x xx xx x (CT), xxx xxxxxxxxxxxxxxxxxx xx x x x x x x x x x x x 5 xxxxxxx x xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx x xx xxxxx xx xx xxx xx xx x x x x x xx x x φ xxxxxxxxxxxxxxxxxxxxxxxx xxxxx xx xxxxxx x xx x xx xxx x x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xx xxxxxxx x x xxxx xxx xx x x x x xxxxxxxxxxxxxxxxxxxxxxxxxxxxx x xxx xxxxx xxxxxxxxxxxxx x x x x xx x xxxxxxxxxxxxxxxxxxxxx xxx x x x xxxxxxxxxxxxxxxxx xxxxxx x xxxxxxxxxxxxxxxxxxxx xx x x xxxxxxxxx xxxxxxxxxxxxxx x x x xx xx xxx xxx x x x x xxxxxxxxxxx xx x xxxxxxxxxxx xxxx xxxxx x x xxx x x x x x x xx x 0 xxxxx xx x x x xx xxxxxx xxxxxxxxxxxx x xx x x x x x xx x x x

0 5 10 15 20 Stoeckel08 , n= 538 Toepel08 , n= 350 φ (CT), Stoeckel08, n= 2345 ccc= 0.13 , p< 0.0029 ccc= −0.16 , p< 0.0031 ccc= −0.51 , p<< 0.001

Figure S1: Expression phase similarity across cyanobacterial strains by pairwise comparison between available datasets. All possible pairwise combinations of available circadian datasets are compared with respect to peak expression phase, considering only genes, which oscillate significantly (fdr < 0.05) in both datasets. While phases are compared directly for same-strain combinations, gene pairs across different strains are derived via homology prediction. The respective dataset is shown on each axis, the count of homologous genes significantly oscillating is provided with the x-axis label, together with the circular correlation coefficient ρccc and the resulting p-value.

19 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Figure S2: Multiple sequence alignment of KaiA. Conserved regions are highlight: ≤100% (black), ≤80% (grey), ≤60% (light grey), ≤40% (white).

Table S1: Proteins of the circadian clock used as queries for the reciprocal BLAST analysis. Organism Name Function Protein Name Gene ID KaiA Synpcc7942 1218 Core KaiB Synpcc7942 1217 KaiC Synpcc7942 1216 Pex Synpcc7942 0677 LdpA Synpcc7942 0624 NhtA Synpcc7942 2160 Input PrkE Synpcc7942 0600 IrcA Synpcc7942 2383 Synechococcus elongatus PCC 7942 CdpA Synpcc7942 1604 Input/Output CikA Synpcc7942 0644 SasA Synpcc7942 2114 LabA Synpcc7942 1891 LalA Synpcc7942 1143 Output Crm Synpcc7942 0096 RpaA Synpcc7942 0095 RpaB Synpcc7942 1453 CpmA Synpcc7942 1168 KaiB1 slr0757 Core KaiC1 slr0758 KaiB2 sll1596 Synechocystis sp. PCC 6803 KaiC2 sll1595 KaiB3 sll0486 KaiC3 slr1942

20 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Table S2: Distribution of Synechococcus 7942 based circadian clock proteins in the four main groups. The percentage indicates the coverage of the orthologs per group. Group Core Clock Input Pathway Output Pathway KaiA (94.29%), Pex (62.86%), CikA (94.29%), KaiB (94.29%), LdpA (97.14%), SasA (100%), KaiC (100%) CikA (94.29%), LabA (88.57%), NhtA (48.57%), LalA (91.43%), Cyanobacteria PrkE (85.71%), Crm (45.71%), IrcA (97.14%), RpaA (97.14%), CdpA (74.29%) RpaB (100%), CpmA (97.14%) KaiB (28.13%), CikA (56.25%), CikA (56.25%), KaiC (100%) NhtA (26.56%), SasA (32.81%), PrkE (51.56%) LabA (4.69%), LalA (39.06%), Proteobacteria Crm (20.31%), RpaA (7.81%), RpaB (48.44%), CpmA (21.86%) KaiB (6.06%), CikA (12.12%), CikA (12.12%), KaiC (100%) NhtA (22.73%), SasA (3.03%), PrkE (13.64%) LabA (1.51%), Archaea LalA (33.33%), Crm (7.58%), CpmA (63.64%) KaiB (31.43%), CikA (25.71%), CikA (25.71%), KaiC (100%) NhtA (17.14%), SasA (31.43%), PrkE (71.43%), LalA (5.71%), Other IrcA (54.29%) Crm (2.86%), RpaA (11.43%), RpaB (48.57%), CpmA (51.43%)

21 bioRxiv preprint Table S3: A collection of circadian and diurnal expression datasets in the cyanobacterial clade. Datasets with assigned abbreviation were used to was notcertifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmade determine the core oscillatory genome. Information is provided about the ability to fix nitrogen, the habitat (freshwater F, saltwater S), the total number of genes, the publication reference, the absolute and relative number of diurnally expressed genes reported in the original publications, the applied light and sampling schema, the experimental culture conditions, and the methods for microarray normalisation and oscillating gene detection. References to datasets employed in the following

comparison are shown bold. doi:

Strain Strain N2 Habitat Total Ref. Dataset Diurnal Light Conditions Culture Conditions Normalization / Oscilla- https://doi.org/10.1101/075291 Abbrev. Fixa- Genes Abbrev. Genes tion Detection tion Prochlorcoccus ProMED4 - M 1766 [133] zinser09 1403 LD (14:10, Tsamp Pro99 Medium, RMA, Fourier Analysis marinus (79%) 2h) stirred, 24◦C, Batch MED4 Synechocystis Syn6803 - F 3628 [76] leh13 1133 LD (12:12, Tsamp ir- BG11 Medium, air LOS, Fourier Analysis ◦

sp. PCC 6803 (31%) reg.) bubbling, 30 C, available undera Batch [103] beck14 (27%) LD (12:12, T BG11 Medium, air LOS, Fourier Analysis samp ; 2h) bubbling, 30◦C, this versionpostedSeptember15,2016. Batch

[72] - 1349 LD (14:10, Tsamp BG11 Medium, Standorf Microarray CC-BY 4.0Internationallicense (37%) 2h) stirred, 3% CO2air Database standard, bubbling, 27◦C, ANOVA and correlation Turbidostat with light 22 [73] - 237 LL (Tsamp 4h) BG11 Medium, air LOWESS, modified (9%) bubbling, stirred, Cosiner 30◦C, Batch with manual dilution Synechococcus Syc7942 - F 2719 [113] vijayan09 1748 LL (Tsamp 4h) BG11 Medium, 1% Loess and Quantile, elongatus (64%) air CO2 bubbling, Fourier Analysis . PCC 7942 30◦C, Continuous The copyrightholderforthispreprint(which Culture ◦ [101] ito09 800 LL (Tsamp 2h) BG11 Medium, 30 C, Replicate Mean Polish- (29%) Continuous Culture ing, Correlation to Sine Microcystis Mic7806 - F 6360 [134] straub11 1344 LD (12:12, Tsamp ir- BG11 Medium, 1% LOWESS, significant aeruginosa (21%) reg.) air CO2 bubbling, difference to CT0 PCC 7806 22◦C, Batch Anabaena sp. Ana7120 • F 6222 [75] kushige13 78 LL (Tsamp 4h) BG11 + N Medium, Replicate Mean Polish- PCC 7120 (1.25%) 30◦C, Contiuous Cul- ing, Correlation to Sine ture ◦ Cyanothece Cyn51142 • M 5354 [114] stoeckel08 1445 LD (12:12, Tsamp ASP2 Medium, 30 C, LOWESS, Correlation ATCC 51142 (≈ 30%) 4h) air bubbling, Batch Network [115] toepel08 1424 LD (12:12)/24h, LL ASP2 Medium, 30◦C, LOWESS, Differential (≈ 20%) Tsamp 4h) Airlift Bioreactor Expression ◦ [74] - 1400 LD (6:6, Tsamp 2h) ASP2 Medium, 30 C, LOWESS, Correlation (27%) Airlift Bioreactor Network bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

Table S4: Oligonucleatides used for cloning. Number Primer Name Primer Sequence ORF 1 fw-kaiC1-Syn6714-BamHI CTACGGATCCAACTCACCCA D082 30580 TCGTTAACG 2 rev-kaiC1-Syn6714-NotI GAAGCGGCCGCCTACTCGAC D082 30580 GGTTTTATC 3 fw-kaiC1-Npun29133-BamHI CTACGGATCCAGTCAAAACG Npun R2886 AGCAAG 4 rev-kaiC1-Npun29133-NotI CGAAGCGGCCGCTTAGGGTT Npun R2886 CGGAAC 5 fw-kaiC1-Cy7424-BamHI CATAGGATCCAATGAACCCA PCC7424 0599 TTCCCAACG 6 rev-kaiC1-Cy7424-NotI CATTGCGGCCGCTTATTCAT PCC7424 0599 CTAAAGTTTTATC 7 fw-kaiC3-Cy7424-BamHI CGAAGGATCCAATCAAGACA PCC7424 3006 ACGAAC 8 rev-kaiC3-Cy7424-NotI CTGTGCGGCCGCCTAAGACC PCC7424 3006 GTTCTTCAAAC 9 fw-kaiC3-Mic7806-BamHI CTACGGATCCACGCAAAATA IPF 2046 ATCCCCTAG 10 rev-kaiC3-Mic7806-NotI GAAGCGGCCGCCTAACTACG IPF 2046 ATCCTCA 11 fw-KaiC3-PH RS03935 CCCGGATCCGATGCTCTTAA Gene ID: BamHI (DSM 12428) TTGTTGGAACTCC 1443164 12 rev-KaiC3-PH RS03935 CGGGGAAGCTTTTACTCATA Gene ID: HindIII (DSM 12428) AATTTCCACCCCTC 1443164 13 fw-KaiC3-OCC RS02010 GGGCTGCAGATGAGCAGAAC Gene ID: PstI (DSM 5473) GGGAATTG 16548747 14 rev-KaiC3-OCC RS02010 GCCGGGAAGCTTTTATTCAT Gene ID: HindIII (DSM 5473) AAATTTCCACCC 16548747

Table S5: Diurnal core CLOGs across cyanobacterial strains excluding Microcystis aeruginosa PCC 7806. In every CLOG (row) at least one gene of each of the considered datasets (columns) ex- hibited diurnal expression.

Syn6803 Syc7942 ProMED4 Cyn51142 Ana7120 Function sll0158 1085 PMM0584 cce 2248 all0713 1,4-alpha-glucan branching enzyme sll1817 2210 PMM1536 cce 4038 all4192 30S ribosomal protein S11 sll1096 0887 PMM1511 cce 4091 all4340 30S ribosomal protein S12 ssl3437 2223 PMM1549 cce 4023 asl4206 30S ribosomal protein S17 ssl3432 2228 PMM1554 cce 4018 asl4211 30S ribosomal protein S19 sll1260 2530 PMM0753 cce 0705 all4792 30S ribosomal protein S2 sll1804 2226 PMM1552 cce 4020 all4209 30S ribosomal protein S3 sll1812 2216 PMM1542 cce 4031 all4199 30S ribosomal protein S5 sll1097 0886 PMM1510 cce 4090 all4339 30S ribosomal protein S7 sll1809 2219 PMM1545 cce 4028 all4202 30S ribosomal protein S8 sll1822 2205 PMM1531 cce 4043 all4187 30S ribosomal protein S9 sll1821 2206 PMM1532 cce 4042 all4188 50S ribosomal protein L13 sll1802 2229 PMM1555 cce 4017 all4212 50S ribosomal protein L2 slr1678 1219 PMM1344 cce 1391 all0147 50S ribosomal protein L21 sll1803 2227 PMM1553 cce 4019 all4210 50S ribosomal protein L22 sll1807 2221 PMM1547 cce 4025 asl4204 50S ribosomal protein L24 sll1799 2232 PMM1558 cce 4013 all4215 50S ribosomal protein L3 sll1800 2231 PMM1557 cce 4015 all4214 50S ribosomal protein L4 sll1810 2218 PMM1544 cce 4029 all4201 50S ribosomal protein L6 sll0329 0039 PMM0770 cce 3746 alr5275 6-phosphogluconate dehydrogenase sll1323 0333 PMM1454 cce 4485 all0008 ATP synthase subunit b’ of CF(0)

23 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

sll1908 1501 PMM1354 cce 2134 alr1890 D-3-phosphoglycerate dehydroge- nase sll0519 1343 PMM0160 cce 2224 alr0223 NADH dehydrogenase subunit 1 slr0331 1976 PMM0150 cce 4299 alr3957 NADH dehydrogenase subunit 4 (in- volved in photosystem-1 cyclic elec- tron flow) sll0689 2359 PMM1600 cce 1688, alr0091, Na+/H+ antiporter cce 2468 all1303 sll1818 2209 PMM1535 cce 4039 all4191 RNA polymerase alpha subunit slr1265 1523 PMM1484 cce 3838 alr1595 RNA polymerase gamma-subunit sll0306, 1746, 0672 PMM1629 cce 3594, alr3810, RNA polymerase group 2 sigma fac- sll2012 cce 0644 alr3800 tor slr1424 1740 PMM0021 cce 2372 alr5066 UDP-N- acetylenolpyruvoylglucosamine reductase sll0108 0442 PMM0263 cce 3261 alr0991 ammonium/methylammonium per- mease ssl2667 0450 PMM0418 cce 1017 asr1309 an assembly factor for iron-sulfur culsters slr0585 0009 PMM1707 cce 4370 alr4798 argininosuccinate synthetase slr0549 1848 PMM1654 cce 0293 all3680 aspartate beta-semialdehyde dehy- drogenese slr1720 1313 PMM1688 cce 4152 all2436 aspartyl-tRNA synthetase sll1498 2122 PMM0951 cce 0902 alr1155 carbamoyl-phosphate synthase small chain sll1028, 1421 PMM0549 cce 4283, alr0867, carbon dioxide concentrating mecha- sll1029 cce 4282 all0868 nism protein CcmK slr1390 0942 PMM0743 cce 1270 all3642 cell division protein FtsH sll0109 1915 PMM1181 cce 3372 all4012 chorismate mutase slr0757 1217 PMM1343 cce 0423 alr2885 circadian clock protein KaiB ho- molog slr1138 2604 PMM0444 cce 1975 alr0952 cytochrome c oxidase subunit III slr0550 1847 PMM1653 cce 0294 all3679 dihydrodipicolinate synthase slr1626 0040 PMM0591 cce 4420 alr3512 dihydroneopterin aldolase slr2026 2303 PMM0830 cce 4305 alr4386 dihydropteroate synthase slr1051 0126 PMM0282 cce 0460 all4391 enoyl-[acyl-carrier-protein] reductase sll0018 1443 PMM0781 cce 1357 all4563 fructose-bisphosphate aldolase, class II slr1843 2334 PMM1074 cce 2536 all4019 glucose 6-phosphate dehydrogenase slr0638 2457 PMM1165 cce 3990 all1985 glycyl-tRNA synthetase alpha chain slr1718 2589 PMM0614 cce 1018 all2568 hypothetical protein sll0098 1758 PMM1481 cce 2727 all0355 hypothetical protein sll0996 2374 PMM1305 cce 2810 alr1312 hypothetical protein slr1847 0464 PMM0020 cce 4379 alr5067 hypothetical protein slr1471 1617 PMM1186 cce 1364 alr3415 hypothetical protein slr1577 0426 PMM0909 cce 0429 all5166 hypothetical protein ssr0332 0658 PMM1563 cce 4619 asr4076 hypothetical protein slr0742 2023 PMM1491 cce 1842 alr3828 hypothetical protein ssl0353 2017 PMM0061 cce 0650 asl0940 hypothetical protein slr0362 2396 PMM0789 cce 0268 alr3102 hypothetical protein sll1898 2601 PMM0447 cce 4599 all0949 hypothetical protein slr1896 0286 PMM0390 cce 2055 all0876 hypothetical protein sll0372 0362 PMM0239 cce 2501 all2849 hypothetical protein sll1866 1864 PMM0319 cce 4203 alr0116 hypothetical protein sll0854 0826 PMM1517 cce 3387 all3378 hypothetical protein slr0506 2503 PMM0542 cce 0320 all1743 light-dependent NADPH- protochlorophyllide oxidoreductase slr0772 1838 PMM0544 cce 1954 alr3441 light-independent protochlorophyl- lide reductase subunit ChlB slr1540 1826 PMM0797 cce 3779 alr4831 mRNA-binding protein

24 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

slr1055 2137 PMM0831 cce 4358 all4365 magnesium protoporphyrin IX chelatase subunit H slr2033 1179 PMM0295 cce 1309 alr3843 membrane-associated rubredoxin, essential for photosystem I assembly sll0902 2514 PMM1263 cce 3251 alr4907 ornithine carbamoyltransferase sll1553 1293 PMM0871 cce 1321 alr4958 phenylalanyl-tRNA synthetase sll0684 2441 PMM0725 cce 0883 all4572 phosphate transport ATP-binding protein PstB homolog slr0597 0396 PMM0266 cce 4354 all3093 phosphoribosyl aminoimidazole carboxy formyl formyltrans- ferase/inosinemonophosphate cyclohydrolase (PUR-H(J)) slr1645 0343 PMM0507 cce 3633 all1258 photosystem II 11 kD protein sll0851 0656 PMM1158 cce 0659 alr4291 photosystem II CP43 protein slr1311, 0893, PMM0223 cce 3411, alr3742, photosystem II D1 protein sll1867, 1389, cce 0267, alr4866, slr1181 0424 cce 3501, alr4592, cce 0636 alr3727, all3572 slr0906 0697 PMM0315 cce 1837 all0138 photosystem II core light harvesting protein sll0427 0294 PMM0228 cce 2572 all3854 photosystem II manganese- stabilizing polypeptide sll0849, 1637, 0655 PMM1157 cce 2485, alr4290, photosystem II reaction center D2 slr0927 cce 0660 alr4548 protein sll0171 2308 PMM1687 cce 4346 all4609 probable aminomethyltransferase slr1673 1199 PMM1299 cce 0402 alr0175 probable tRNA/rRNA methyltrans- ferase slr0774 0142 PMM0929 cce 4646 all0121 protein-export membrane protein SecD sll1786 1521 PMM1486 cce 3489 alr1593 putative deoxyribonuclease, tatD ho- molog slr2034 1178 PMM0296 cce 1308 alr3844 putative homolog of plant HCF136, which is essential for stability or as- sembly of photosystem II sll1841 1068 PMM0405 cce 2750 alr3606 pyruvate dehydrogenase dihy- drolipoamide acetyltransferase component (E2) sll1282 2244 PMM1643 cce 4679 alr3993 riboflavin synthase beta subunit slr0194 0584 PMM1489 cce 0103 all0888 ribose 5-phosphate isomerase slr0012 1427 PMM0551 cce 3164 alr1526 ribulose bisphosphate carboxylase small subunit slr0743 2022 PMM1492 cce 1841 alr3829 similar to N utilization substance protein ssr1600 2121 PMM0950 cce 1801 asr1156 similar to anti-sigma f factor antago- nist sll1820 2207 PMM1533 cce 4041 all4189 tRNA pseudouridine synthase 1 sll1980 2128 PMM0242 cce 0972 alr0570 thiol:disulfide interchange protein TrxA sll1615 1582 PMM0189 cce 4596 all4677 thiophen and furan oxidation protein sll0755 2309 PMM0856 cce 2409 alr4641 thioredoxin peroxidase slr1793 2297 PMM0519 cce 4687 all2563 transaldolase sll1957 0688 PMM0714 cce 3556 alr2766 transcriptional regulator slr1884 1308 PMM0598 cce 3534 all1269 tryptophanyl-tRNA synthetase

25 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

555 References

556 [1] Y. Ouyang, C. R. Andersson, T. Kondo, S. S. Golden, and C. H. Johnson. Resonating circadian

557 clocks enhance fitness in cyanobacteria. Proceedings of the National Academy of Sciences of the

558 United States of America, 95(15):8660–4, 1998.

559 [2] M. A. Woelfle, Y. Ouyang, K. Phanvijhitsiri, and C. H. Johnson. The adaptive value of circadian

560 clocks: an experimental assessment in cyanobacteria. Current Biology, 14(16):1481–6, 2004.

561 [3] M. J. Simons. The evolution of the cyanobacterial posttranslational clock from a primitive ”phoscil-

562 lator”. Journal of Biological Rhythms, 24(3):175–82, 2009.

563 [4] J.L. Ditty, S.B. Williams, and S.S. Golden. A Cyanobacterial Circadian Timing Mechanism. Annual

564 Review of Genetics, 37(1):513–543, 2003.

565 [5] C. H. Johnson, P. L. Stewart, and M. Egli. The cyanobacterial circadian system: from biophysics

566 to bioevolution. Annual Review of Biophysics, 40:143–67, 2011.

567 [6] S. S. Golden and S. R. Canales. Cyanobacterial circadian clocks–timing is everything. Nature

568 Review Microbiology, 1(3):191–9, 2003.

569 [7] C. S. Pittendrigh. Temporal organization: reflections of a darwinian clock-watcher. Annual Review

570 of Physiology, 55:16–54, 1993.

571 [8] B. M. Sweeney and M. B. Borgese. A in cell division in a , the

572 cyanobacterium synechococcus wh7803. Journal of Phycology, 25(1):183–186, 1989.

573 [9] A. Mitsui, S. Kumazawa, A. Takahashi, H. Ikemoto, S. Cao, and T. Arai. Strategy by which

574 nitrogen-fixing unicellular cyanobacteria grow photoautotrophically. Nature, 323(6090):720–722,

575 1986.

576 [10] T. C. Huang and T. J. Chow. New type of n-2-fixing unicellular cyanobacterium (blue-green-alga).

577 FEMS Microbiology Letters, 36(1):109–110, 1986.

578 [11] N. Grobbelaar, T. C. Huang, H. Y. Lin, and T. J. Chow. Dinitrogen-fixing endogenous rhythm in

579 synechococcus rf-1. FEMS Microbiology Letters, 37(2):173–177, 1986.

580 [12] T. C. Huang and T. J. Chow. Characterization of the rhythmic nitrogen-fixing activity of syne-

581 chococcus sp rf-1 at the transcription level. Current Microbiology, 20(1):23–26, 1990.

582 [13] J. Tomita, M. Nakajima, T. Kondo, and H. Iwasaki. No transcription-translation feedback in

583 circadian rhythm of phosphorylation. Science, 307(5707):251–4, 2005.

584 [14] M. Nakajima, K. Imai, H. Ito, T. Nishiwaki, Y. Murayama, H. Iwasaki, T. Oyama, and T. Kondo.

585 Reconstitution of circadian oscillation of cyanobacterial kaic phosphorylation in vitro. Science,

586 308(5720):414–5, 2005.

587 [15] J. L. Ditty, S. R. Canales, B. E. Anderson, S. B. Williams, and S. S. Golden. Stability of the

588 synechococcus elongatus pcc 7942 circadian clock under directed anti-phase expression of the kai

589 genes. Microbiology, 151(Pt 8):2605–2613, 2005.

590 [16] R. S. Edgar, E. W. Green, Y. Zhao, G. van Ooijen, M. Olmedo, X. Qin, Y. Xu, M. Pan, U. K.

591 Valekunja, K. A. Feeney, E. S. Maywood, M. H. Hastings, N. S. Baliga, M. Merrow, A. J. Millar,

592 C. H. Johnson, C. P. Kyriacou, J. S. O’Neill, and A. B. Reddy. Peroxiredoxins are conserved

593 markers of circadian rhythms. Nature, 485(7399):459–64, 2012.

594 [17] J. S. O’Neill and A. B. Reddy. Circadian clocks in human red blood cells. Nature, 469(7331):498–

595 503, 2011.

596 [18] J. S. O’Neill, G. van Ooijen, L. E. Dixon, C. Troein, F. Corellou, F.-Y. Bouget, A. B. Reddy, and

597 A. J. Millar. Circadian rhythms persist without transcription in a eukaryote. Nature, 469(7331):554–

598 8, 2011.

599 [19] C. R. McClung. Circadian rhythms in plants. Annual Review of Plant Physiology and Plant

600 Molecular Biology, 52:139–162, 2001.

26 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

601 [20] Natalia B Ivleva, Matthew R Bramlett, Paul a Lindahl, and Susan S Golden. Ldpa: a component

602 of the circadian clock senses redox state of the cell. The EMBO Journal, 24(6):1202–1210, 2005.

603 [21] Gopal K. Pattanayak and Michael J. Rust. The Cyanobacterial Clock and Metabolism. Current

604 Opinion in Microbiology, 18:90–95, 2014.

605 [22] Y.-I. Kim, D. J. Vinyard, G. M. Ananyev, G. C. Dismukes, and S. S. Golden. Oxidized quinones

606 signal onset of darkness directly to the cyanobacterial circadian oscillator. Proceedings of the

607 National Academy of Sciences of the United States of America, 109(44):17765–17769, 2012.

608 [23] Y. Hihara, K. Sonoike, M. Kanehisa, and M. Ikeuchi. Dna microarray analysis of redox-responsive

609 genes in the genome of the cyanobacterium synechocystis sp. strain pcc 6803. Journal of Bacteri-

610 ology, 185(5):1719–25, 2003.

611 [24] F. Hayashi, H. Suzuki, R. Iwase, T. Uzumaki, A. Miyake, J. R. Shen, K. Imada, Y. Furukawa,

612 K. Yonekura, K. Namba, and M. Ishiura. Atp-induced hexameric ring structure of the cyanobacte-

613 rial circadian clock protein kaic. Genes to Cells, 8(3):287–96, 2003.

614 [25] T. Mori, S. V. Saveliev, Y. Xu, W. F. Stafford, M. M. Cox, R. B. Inman, and C. H. Johnson.

615 Circadian clock protein kaic forms atp-dependent hexameric rings and binds dna. Proceedings of

616 the National Academy of Sciences of the United States of America, 99(26):17203–8, 2002.

617 [26] T Nishiwaki, H Iwasaki, M Ishiura, and T Kondo. Nucleotide binding and autophosphorylation of

618 the clock protein KaiC as a circadian timing process of cyanobacteria. Proceedings of the National

619 Academy of Sciences of the United States of America, 97(1):495–499, 2000.

620 [27] Taeko Nishiwaki and Takao Kondo. Circadian autodephosphorylation of cyanobacterial clock

621 protein KaiC occurs via formation of ATP as intermediate. Journal of Biological Chemistry,

622 287(22):18030–18035, 2012.

623 [28] Y.-I. Kim, G. Dong, Jr. Carruthers, C. W., S. S. Golden, and A. LiWang. The day/night switch

624 in kaic, a central oscillator component of the circadian clock of cyanobacteria. Proceedings of the

625 National Academy of Sciences of the United States of America, 105(35):12825–30, 2008.

626 [29] R. Pattanayek, D. R. Williams, S. Pattanayek, Y. Xu, T. Mori, C. H. Johnson, P. L. Stewart,

627 and M. Egli. Analysis of -kaic protein interactions in the cyano-bacterial circadian clock using

628 hybrid structural methods. The EMBO Journal, 25(9):2017–28, 2006.

629 [30] Michael J Rust, Susan S Golden, and Erin K O’Shea. Light-driven changes in energy metabolism

630 directly entrain the cyanobacterial circadian oscillator. Science, 331(6014):220–3, 2011.

631 [31] Y. Kitayama, H. Iwasaki, T. Nishiwaki, and T. Kondo. Kaib functions as an attenuator of kaic

632 phosphorylation in the cyanobacterial circadian clock system. The EMBO Journal, 22(9):2127–34,

633 2003.

634 [32] S. B. Williams, I. Vakonakis, S. S. Golden, and A. C. LiWang. Structure and function from

635 the circadian clock protein kaia of synechococcus elongatus: a potential clock input mechanism.

636 Proceedings of the National Academy of Sciences of the United States of America, 99(24):15357–62,

637 2002.

638 [33] T. L. Wood, J. Bridwell-Rabb, Y. I. Kim, T. Gao, Y. G. Chang, A. LiWang, D. P. Barondeau,

639 and S. S. Golden. The kaia protein of the cyanobacterial circadian oscillator is modulated by a

640 redox-active cofactor. Proceedings of the National Academy of Sciences of the United States of

641 America, 107(13):5804–9, 2010.

642 [34] Shinsuke Kutsuna, Takao Kondo, and Setsuyuki Aoki. A Period-Extender Gene , pex , That

643 Extends the Period of the Circadian Clock in the Cyanobacterium Synechococcus sp . Strain A

644 Period-Extender Gene , pex , That Extends the Period of the Circadian Clock in the Cyanobac-

645 terium Synechococcus sp . Strain PCC. Journal of Bacteriology, 180(8):2167–2174, 1998.

646 [35] Shinsuke Kutsuna, Takao Kondo, Haruki Ikegami, Tatsuya Uzumaki, Mitsunori Katayama, and

647 Masahiro Ishiura. The circadian clock-related gene pex regulates a negative cis element in the kaiA

648 promoter region. Journal of Bacteriology, 189(21):7690–7696, 2007.

27 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

649 [36] Natalia B Ivleva, Tiyu Gao, Andy C LiWang, and Susan S Golden. Quinone sensing by the circadian

650 input kinase of the cyanobacterial circadian clock. Proceedings of the National Academy of Sciences

651 of the United States of America, 103(46):17468–17473, 2006.

652 [37] O Schmitz, M Katayama, S B Williams, T Kondo, and S S Golden. Cika, a bacteriophytochrome

653 that resets the cyanobacterial circadian clock. Science, 289(5480):765–768, 2000.

654 [38] Shannon R. Mackey, Jong Soon Choi, Yohko Kitayama, Hideo Iwasaki, Guogang Dong, and Susan S.

655 Golden. Proteins found in a CikA interaction assay link the orcadian clock, metabolism, and cell

656 division in synechococcus elongatus. Journal of Bacteriology, 190(10):3738–3746, 2008.

657 [39] Hideo Iwasaki, Stanly B. Williams, Yohko Kitayama, Masahiro Ishiura, Susan S. Golden, and Takao

658 Kondo. A KaiC-interacting sensory histidine kinase, SasA, necessary to sustain robust circadian

659 oscillation in cyanobacteria. Cell, 101(2):223–233, 2000.

660 [40] R. M. Smith and S. B. Williams. Circadian rhythms in gene transcription imparted by chromosome

661 compaction in the cyanobacterium synechococcus elongatus. Proceedings of the National Academy

662 of Sciences of the United States of America, 103(22):8564–9, 2006.

663 [41] Naoki Takai, Masato Nakajima, Tokitaka Oyama, Ryotaku Kito, Chieko Sugita, Mamoru Sugita,

664 Takao Kondo, and Hideo Iwasaki. A KaiC-associating SasA-RpaA two-component regulatory sys-

665 tem as a major circadian timing mediator in cyanobacteria. Proceedings of the National Academy

666 of Sciences of the United States of America, 103(32):12109–12114, 2006.

667 [42] Mitsumasa Hanaoka, Naoki Takai, Norimune Hosokawa, Masayuki Fujiwara, Yuki Akimoto, Nami

668 Kobori, Hideo Iwasaki, Takao Kondo, and Kan Tanaka. RpaB, another response regulator operating

669 circadian clock-dependent transcriptional regulation in Synechococcus elongatus PCC 7942. Journal

670 of Biological Chemistry, 287(31):26321–26327, 2012.

671 [43] Joseph S. Markson, Joseph R. Piechura, Anna M. Puszynska, and Erin K. O’Shea. Circadian control

672 of global gene expression by the cyanobacterial master regulator rpaa. Cell, 155(6):1396–1408, 2013.

673 [44] Andrian Gutu and Erin K. O’Shea. Two Antagonistic Clock-Regulated Histidine Kinases Time the

674 Activation of Circadian Gene Expression. Molecular Cell, 50(2):288–294, 2013.

675 [45] Yasuhito Taniguchi, Tomoe Nishikawa, Takao Kondo, and Tokitaka Oyama. Overexpression of

676 lalA, a paralog of labA, is capable of affecting both circadian gene expression and cell growth in

677 the cyanobacterium Synechococcus elongatus PCC 7942. FEBS Letters, 586(6):753–759, 2012.

678 [46] Joseph S Boyd, Juliana R Bordowitz, Anna C Bree, and Susan S Golden. An allele of the crm gene

679 blocks cyanobacterial circadian rhythms. Proceedings of the National Academy of Sciences of the

680 United States of America, 110(34):13950–5, 2013.

681 [47] Javier Espinosa, Joseph S. Boyd, Raquel Cantos, Paloma Salinas, Susan S. Golden, and Asuncion

682 Contreras. Cross-talk and regulatory interactions between the essential response regulator RpaB

683 and cyanobacterial circadian clock output. Proceedings of the National Academy of Sciences of the

684 United States of America, page 201424632, 2015.

685 [48] F. Moronta-Barrios, J. Espinosa, and A. Contreras. In vivo features of signal transduction by the

686 essential response regulator rpab from synechococcus elongatus pcc 7942. Microbiology, 158(Pt

687 5):1229–37, 2012.

688 [49] Mitsunori Katayama, Nicholas F. Tsinoremas, Takao Kondo, and Susan S. Golden. cpma, a gene

689 involved in an output pathway of the cyanobacterial circadian system. Journal of Bacteriology,

690 181(11):3516–3524, 1999.

691 [50] Yasuhito Taniguchi, Mitsunori Katayama, Rie Ito, Naoki Takai, Takao Kondo, and Tokitaka

692 Oyama. laba : a novel gene required for negative feedback regulation of the cyanobacterial cir-

693 cadian clock protein kaic. Genes and Development, pages 60–70, 2007.

694 [51] F. Moronta-Barrios, J. Espinosa, and A. Contreras. Negative control of cell size in the cyanobac-

695 terium synechococcus elongatus pcc 7942 by the essential response regulator rpab. FEBS Letters,

696 587(5):504–9, 2013.

697 [52] V. Dvornyk. The Circadian Clock Gear in Cyanobacteria: Assembled by Evolution, book section 14,

698 pages 241–258. Springer, Berlin Heidelberg, 2009.

28 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

699 [53] Volodymyr Dvornyk, Oxana Vinogradova, and Eviatar Nevo. Origin and evolution of circadian

700 clock genes in prokaryotes. Proceedings of the National Academy of Sciences of the United States

701 of America, 100(5):2495–2500, 2003.

702 [54] Ilka M. Axmann, Stefanie Hertel, Anika Wiegard, Anja K. D¨orrich, and Annegret Wilde. Diversity

703 of KaiC-based timing systems in marine Cyanobacteria. Marine Genomics, 14:3–16, 2014.

704 [55] Anika Wiegard, Anja K. D¨orrich, Hans Tobias Deinzer, Christian Beck, Annegret Wilde, Julia

705 Holtzendorff, and Ilka M. Axmann. Biochemical analysis of three putative kaic clock proteins from

706 synechocystis sp. pcc 6803 suggests their functional divergence. Microbiology, 159(PART 5):948–

707 958, 2013.

708 [56] S. Aoki and K. Onai. Circadian clocks of Synechocystis sp. strain PCC 6803, Thermosynechococcus

709 elongatus, Prochlorococcus spp., Trichodesmium spp. and other species. Springer Berlin Heidelberg,

710 2009.

711 [57] P. Ma, T. Mori, C. Zhao, T. Thiel, and C. H. Johnson. Evolution of kaic-dependent timekeepers:

712 A proto-circadian timing mechanism confers adaptive fitness in the purple bacterium rhodopseu-

713 domonas palustris. PLoS Genetics, 12(3):e1005922, 2016.

714 [58] H. J. Kang, K. Kubota, H. Ming, K. Miyazono, and M. Tanokura. Crystal structure of kaic-like

715 protein ph0186 from hyperthermophilic archaea pyrococcus horikoshii ot3. Proteins, 75(4):1035–9,

716 2009.

717 [59] M. Maniscalco, J. Nannen, V. Sodi, G. Silver, P. L. Lowrey, and K. A. Bidle. Light-dependent

718 expression of four cryptic archaeal circadian gene homologs. Frontiers in Microbiology, 5:79, 2014.

719 [60] Ivan Baca, Daniel Sprockett, and Volodymyr Dvornyk. Circadian input kinases and their homologs

720 in cyanobacteria: Evolutionary constraints versus architectural diversification. Journal of Molecular

721 Evolution, 70(5):453–465, 2010.

722 [61] Volodymyr Dvornyk, Hong Wen Deng, and Eviatar Nevo. Structure and molecular phylogeny of

723 sasa genes in cyanobacteria: Insights into evolution of the prokaryotic circadian system. Molecular

724 Biology and Evolution, 21(8):1468–1476, 2004.

725 [62] P. J. Cock, T. Antao, J. T. Chang, B. A. Chapman, C. J. Cox, A. Dalke, I. Friedberg, T. Hamel-

726 ryck, F. Kauff, B. Wilczynski, and M. J. de Hoon. Biopython: freely available python tools for

727 computational molecular biology and bioinformatics. Bioinformatics, 25(11):1422–3, 2009.

728 [63] Fernando P´erezand E. Granger. Ipython: A system for interactive scientific computing, computing

729 in science and engineering. Computing in Science and Engineering, 9(3):21–29, 2007.

730 [64] Stefan van der Walt, S. Chris Colbert, and Gael Varoquaux. The numpy array: A structure for

731 efficient numerical computation. Computing in Science and Engineering, 13(2):22–30, 2011.

732 [65] J. D. Hunter. Matplotlib: A 2d graphics environment. Computing in Science and Engineering,

733 9(3):90–95, 2007.

734 [66] Wes McKinney. Data structures for statistical computing in python. In Proceedings of the 9th

735 Python in Science Conference, volume 445, pages 51–56.

736 [67] D. A. Benson, I. Karsch-Mizrachi, D. J. Lipman, J. Ostell, and D. L. Wheeler. Genbank. Nucleic

737 Acids Research, 36(Database issue):D25–30, 2008.

738 [68] Mitsuteru Nakao, Shinobu Okamoto, Mitsuyo Kohara, Tsunakazu Fujishiro, Takatomo Fujisawa,

739 Shusei Sato, Satoshi Tabata, Takakazu Kaneko, and Yasukazu Nakamura. Cyanobase: the

740 cyanobacteria genome database update 2010. Nucleic Acids Research, 38(Database issue):D379–

741 D381, 2010.

742 [69] S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman.

743 Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids

744 Research, 25(17):3389–402, 1997.

2 745 [70] R. A. Fisher. On the interpretation of χ from contingency tables, and the calculation of p. Journal

746 of the Royal Statistical Society, 85(1):87–94, 1922.

29 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

747 [71] Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful

748 approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological),

749 57(1):289–300, 1995.

750 [72] Rochelle G. Labiosa, Kevin R. Arrigo, Chao J. Tu, Devaki Bhaya, Stephen Bay, Arthur R. Gross-

751 man, and Jeff Shrager. Examination of diel changes in global transcript accumulation in syne-

752 chocystis (cyanobacteria). Journal of Phycology, 42(3):622–636, 2006.

753 [73] K.-i. Kucho, K. Okamoto, Y. Tsuchiya, S. Nomura, M. Nango, M. Kanehisa, and M. Ishiura. Global

754 analysis of circadian expression in the cyanobacterium synechocystis sp. strain pcc 6803. Journal

755 of Bacteriology, 187(6):2190–9, 2005.

756 [74] J. R. Toepel, J. E. McDermott, T. C. Summerfield, and L. A. Sherman. Transcriptional analysis

757 of the unicellular, diazotrophic cyanobacterium cyanothece sp. 51142 grown under short day/night

758 cycles. Journal of Phycology, 45(3):610–20, 2009.

759 [75] H. Kushige, H. Kugenuma, M. Matsuoka, S. Ehira, M. Ohmori, and H. Iwasaki. Genome-wide

760 and heterocyst-specific circadian gene expression in the filamentous cyanobacterium anabaena sp.

761 strain pcc 7120. Journal of Bacteriology, 195(6):1276–84, 2013.

762 [76] Robert Lehmann, Rainer Machn´e,Jens Georg, Manuela Benary, Ilka Axmann, and Ralf Steuer.

763 How cyanobacteria pose new problems to old methods: challenges in microarray time series analysis.

764 BMC Bioinformatics, 14:133, 2013.

765 [77] Qiong Yang, Bernardo F. Pando, Guogang Dong, Susan S. Golden, and Alexander van Oude-

766 naarden. Circadian gating of the cell cycle revealed in single cyanobacterial cells. Science,

767 327(5972):1522–1526, 2010.

768 [78] P. O. Westermark and H. Herzel. Mechanism for 12 hr rhythm generation by the circadian clock.

769 Cell Reports, 3(4):1228–38, 2013.

770 [79] F. Sievers, A. Wilm, D. Dineen, T. J. Gibson, K. Karplus, W. Li, R. Lopez, H. McWilliam,

771 M. Remmert, J. Soding, J. D. Thompson, and D. G. Higgins. Fast, scalable generation of high-

772 quality protein multiple sequence alignments using clustal omega. Molecular Systems Biology, 7:539,

773 2011.

774 [80] Andrew M. Waterhouse, James B. Procter, David M. A. Martin, Mich`eleClamp, and Geoffrey J.

775 Barton. Jalview version 2—a multiple sequence alignment editor and analysis workbench. Bioin-

776 formatics, 25(9):1189–1191, 2009.

777 [81] Gavin E. Crooks, Gary Hon, John-Marc Chandonia, and Steven E. Brenner. Weblogo: A sequence

778 logo generator. Genome Research, 14(6):1188–1190, 2004.

779 [82] U. K. Laemmli. Cleavage of structural proteins during the assembly of the head of bacteriophage

780 t4. Nature, 227(5259):680–5, 1970.

781 [83] Ilka M. Axmann, Ulf D¨uhring,Luiza Seeliger, Anne Arnold, Jens T. Vanselow, Achim Kramer,

782 and Annegret Wilde. Biochemical evidence for a timing mechanism in Prochlorococcus. Journal of

783 Bacteriology, 191(17):5342–5347, 2009.

784 [84] M. Loza-Correa, L. Gomez-Valero, and C. Buchrieser. Circadian clock proteins in prokaryotes:

785 hidden rhythms? Frontiers in Microbiology, 1:130, 2010.

786 [85] Takuro Nakayama, Ryoma Kamikawa, Goro Tanifuji, Yuichiro Kashiyama, Naohiko Ohkouchi,

787 John M. Archibald, and Yuji Inagaki. Complete genome of a nonphotosynthetic cyanobacterium

788 in a diatom reveals recent adaptations to an intracellular lifestyle. Proceedings of the National

789 Academy of Sciences of the United States of America, 111(31):11407–11412, 2014.

790 [86] Yasukazu Nakamura, Takakazu Kaneko, Shusei Sato, Mamoru Mimuro, Hideaki Miyashita, Tohru

791 Tsuchiya, Shigemi Sasamoto, Akiko Watanabe, Kumiko Kawashima, Yoshie Kishida, Chiaki

792 Kiyokawa, Mitsuyo Kohara, Midori Matsumoto, Ai Matsuno, Naomi Nakazaki, Sayaka Shimpo,

793 Chie Takeuchi, Manabu Yamada, and Satoshi Tabata. Complete genome structure of gloeobacter

794 violaceus pcc 7421, a cyanobacterium that lacks thylakoids. DNA Research, 10(4):137–145, 2003.

30 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

795 [87] Julia Holtzendorff, Fr´ed´ericPartensky, Daniella Mella, Jean-Fran¸coisLennon, Wolfgang R Hess,

796 and Laurence Garczarek. Genome streamlining results in loss of robustness of the circadian clock

797 in the marine cyanobacterium Prochlorococcus marinus PCC 9511. Journal of Biological Rhythms,

798 23(3):187–199, 2008.

799 [88] Ryan K Shultzaberger, Joseph S Boyd, Takeo Katsuki, Susan S Golden, and Ralph J Greenspan.

800 Single mutations in sasa enable a simpler δcika gene network architecture with equivalent circadian

801 properties. Proceedings of the National Academy of Sciences of the United States of America,

802 111(47):E5069–75, 2014.

803 [89] Volodymyr Dvornyk. Molecular evolution of ldpA, a gene mediating the circadian input signal in

804 cyanobacteria. Journal of Molecular Evolution, 60(1):105–112, 2005.

805 [90] Mitsunori Katayama, Takao Kondo, Jin Xiong, S Susan, and Susan S Golden. ldpA Encodes

806 an Iron-Sulfur Protein Involved in Light-Dependent Modulation of the Circadian Period in the

807 Cyanobacterium Synechococcus elongatus PCC 7942. Journal of Bacteriology, 185(4):1415–1422,

808 2003.

809 [91] H. Min, H. Guo, and J. Xiong. Rhythmic gene expression in a purple photosynthetic bacterium,

810 rhodobacter sphaeroides. FEBS Letters, 579(3):808–12, 2005.

811 [92] T. Nishiwaki-Ohkawa, Y. Kitayama, E. Ochiai, and T. Kondo. Exchange of adp with atp in the cii

812 atpase domain promotes autophosphorylation of cyanobacterial clock protein kaic. Proceedings of

813 the National Academy of Sciences of the United States of America, 111(12):4455–60, 2014.

814 [93] G. Dong, Q. Yang, Q. Wang, Y. I. Kim, T. L. Wood, K. W. Osteryoung, A. van Oudenaarden,

815 and S. S. Golden. Elevated atpase activity of kaic applies a circadian checkpoint on cell division in

816 synechococcus elongatus. Cell, 140(4):529–39, 2010.

817 [94] Y. Taniguchi, N. Takai, M. Katayama, T. Kondo, and T. Oyama. Three major output path-

818 ways from the kaiabc-based oscillator cooperate to generate robust circadian kaibc expression in

819 cyanobacteria. Proceedings of the National Academy of Sciences of the United States of America,

820 2010.

821 [95] Ioannis Vakonakis, Douglas A. Klewer, Stanly B. Williams, Susan S. Golden, and Andy C. LiWang.

822 Structure of the n-terminal domain of the circadian clock-associated histidine kinase sasa. Journal

823 of Molecular Biology, 342(1):9–17, 2004.

824 [96] Yanmei Shi, Gene W Tyson, John M Eppley, and Edward F DeLong. Integrated metatranscriptomic

825 and metagenomic analyses of stratified microbial assemblages in the open ocean. The ISME Journal,

826 5(6):999–1013, 2011.

827 [97] Annegret Wilde and Yukako Hihara. Transcriptional and posttranscriptional regulation of

828 cyanobacterial . Biochimica et Biophysica Acta - Bioenergetics, 2015.

829 [98] Ryan K. Shultzaberger, Joseph S. Boyd, Spencer Diamonad, Ralph J. Greenspan, and Susan S.

830 Golden. Giving time purpose: The synechococcus elongatus clock in a broader network context.

831 Annual Review of Genetics, 49(1):485–505, 2015.

832 [99] Y. Liu, N. F. Tsinoremas, C. H. Johnson, N. V. Lebedeva, S. S. Golden, M. Ishiura, and T. Kondo.

833 Circadian orchestration of gene expression in cyanobacteria. Genes and Development, 9(12):1469–

834 78, 1995.

835 [100] Mark A. Woelfle and Carl Hirschie Johnson. No promoter left behind: Global circadian gene

836 expression in cyanobacteria. Journal of Biological Rhythms, 21(6):419–431, 2006.

837 [101] H. Ito, M. Mutsuda, Y. Murayama, J. Tomita, N. Hosokawa, K. Terauchi, C. Sugita, M. Sugita,

838 T. Kondo, and H. Iwasaki. Cyanobacterial daily life with kai-based circadian and diurnal genome-

839 wide transcriptional control in synechococcus elongatus. Proceedings of the National Academy of

840 Sciences of the United States of America, 106(33):14168–73, 2009.

841 [102] S Rao Jammalamadaka and Ashis SenGupta. Topics in Circular Statistics. World Scientific, 2001.

31 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

842 [103] Christian Beck, Stefanie Hertel, Anne Rediger, Robert Lehmann, Anika Wiegard, Adrian K¨olsch,

843 Beate Heilmann, Jens Georg, Wolfgang R. Hess, and Ilka M. Axmann. Daily expression pattern of

844 protein-encoding genes and small noncoding RNAs in synechocystis sp. strain PCC 6803. Applied

845 and Environmental Microbiology, 80(17):5195–5206, 2014.

846 [104] Otilia Cheregi, Cosmin Sicora, Peter B. Kos, Peter J. Nixon, and Imre Vass. The ftsh protease

847 is required for the repair of photosystem ii in the cyanbacterium synechocystis6803 damaged uv-b

848 radiation. BMC Plant Biology, 5(1):1–2, 2005.

849 [105] S. Jacobshagen, J. R. Whetstine, and J. M. Boling. Many but not all genes in chlamydomonas

850 reinhardtii are regulated by the circadian clock. Plant Biology, 3(6):592–597, 2001.

851 [106] Stacey L. Harmer, John B. Hogenesch, Marty Straume, Hur-Song Chang, Bin Han, Tong Zhu, Xun

852 Wang, Joel A. Kreps, and Steve A. Kay. Orchestrated transcription of key pathways in arabidopsis

853 by the circadian clock. Science, 290(5499):2110–2113, 2000.

854 [107] Yukihiro Kabeya, Hiromitsu Nakanishi, Kenji Suzuki, Takanari Ichikawa, Youichi Kondou, Minami

855 Matsui, and Shin-ya Miyagishima. The ylmg protein has a conserved function related to the

856 distribution of nucleoids in chloroplasts and cyanobacteria. BMC Plant Biology, 10(1):1–13, 2010.

857 [108] K. Imai, T. Nishiwaki, T. Kondo, and H. Iwasaki. Circadian rhythms in the synthesis and degrada-

858 tion of a master clock protein kaic in cyanobacteria. Journal of Biological Chemistry, 279(35):36534–

859 9, 2004.

860 [109] Hideo Iwasaki, Taeko Nishiwaki, Yohko Kitayama, Masato Nakajima, and Takao Kondo. KaiA-

861 stimulated KaiC phosphorylation in circadian timing loops in cyanobacteria. Proceedings of the

862 National Academy of Sciences of the United States of America, 99(24):15788–93, 2002.

863 [110] D. Zwicker, D. K. Lubensky, and P. R. Ten Wolde. Robust circadian clocks from coupled protein-

864 modification and transcription-translation cycles. Proceedings of the National Academy of Sciences

865 of the United States of America, 107(52):22540–5, 2010.

866 [111] Danish Memon, Abhay K. Singh, Himadri B. Pakrasi, and Pramod P. Wangikar. A global analysis

867 of adaptive evolution of operons in cyanobacteria. Antonie van Leeuwenhoek, 103(2):331–346, 2013.

868 [112] M. Ishiura, S. Kutsuna, S. Aoki, H. Iwasaki, C. R. Andersson, A. Tanabe, S. S. Golden, C. H.

869 Johnson, and T. Kondo. Expression of a gene cluster kaiabc as a circadian feedback process in

870 cyanobacteria. Science, 281(5382):1519–23, 1998.

871 [113] Vikram Vijayan, Rick Zuzow, and Erin K O’Shea. Oscillations in supercoiling drive circadian gene

872 expression in cyanobacteria. Proceedings of the National Academy of Sciences of the United States

873 of America, 106(52):22564–8, 2009.

874 [114] J. St¨ockel, Eric a Welsh, Michelle Liberton, Rangesh Kunnvakkam, Rajeev Aurora, and Himadri B.

875 Pakrasi. Global transcriptomic analysis of cyanothece 51142 reveals robust diurnal oscillation of

876 central metabolic processes. Proceedings of the National Academy of Sciences of the United States

877 of America, 105(16):6156–6161, 2008.

878 [115] J¨orgToepel, Eric Welsh, Tina C. Summerfield, Himadri B. Pakrasi, and Louis A. Sherman. Dif-

879 ferential transcriptional analysis of the cyanobacterium Cyanothece sp. strain ATCC 51142 during

880 light-dark and continuous-light growth. Journal of Bacteriology, 190(11):3904–3913, 2008.

881 [116] Tatsuya Uzumaki, Masayasu Fujita, Toru Nakatsu, Fumio Hayashi, Hiroyuki Shibata, Noriyo Itoh,

882 Hiroaki Kato, and Masahiro Ishiura. Crystal structure of the c-terminal clock-oscillator domain of

883 the cyanobacterial kaia protein. Nature Structural and Molecular Biology, 11(7):623–631, 2004.

884 [117] H Iwasaki, Y Taniguchi, M Ishiura, and T Kondo. Physical interactions among circadian clock

885 proteins KaiA, KaiB and KaiC in cyanobacteria. The EMBO Journal, 18(5):1137–1145, 1999.

886 [118] M. Egli, R. Pattanayek, J. H. Sheehan, Y. Xu, T. Mori, J. A. Smith, and C. H. Johnson. Loop-loop

887 interactions regulate kaia-stimulated kaic phosphorylation in the cyanobacterial kaiabc circadian

888 clock. Biochemistry, 52(7):1208–20, 2013.

32 bioRxiv preprint doi: https://doi.org/10.1101/075291; this version posted September 15, 2016. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license.

889 [119] Y. Xu, T. Mori, R. Pattanayek, S. Pattanayek, M. Egli, and C. H. Johnson. Identification of

890 key phosphorylation sites in the circadian clock protein kaic by crystallographic and mutage-

891 netic analyses. Proceedings of the National Academy of Sciences of the United States of America,

892 101(38):13933–8, 2004.

893 [120] T. Nishiwaki, Y. Satomi, M. Nakajima, C. Lee, R. Kiyohara, H. Kageyama, Y. Kitayama,

894 M. Temamoto, A. Yamaguchi, A. Hijikata, M. Go, H. Iwasaki, T. Takao, and T. Kondo. Role

895 of kaic phosphorylation in the circadian clock system of synechococcus elongatus pcc 7942. Pro-

896 ceedings of the National Academy of Sciences of the United States of America, 101(38):13927–32,

897 2004.

898 [121] I. Vakonakis and A. C. LiWang. Structure of the c-terminal domain of the clock protein kaia in

899 complex with a kaic-derived peptide: implications for kaic regulation. Proceedings of the National

900 Academy of Sciences of the United States of America, 101(30):10925–30, 2004.

901 [122] Detlef D. Leipe, L. Aravind, Nick V. Grishin, and Eugene V. Koonin. The bacterial replicative

902 helicase dnab evolved from a reca duplication. Genome Research, 10(1):5–16, 2000.

903 [123] Maria Loza-Correa, Tobias Sahr, Monica Rolando, Craig Daniels, Pierre Petit, Tania Skarina,

904 Laura Gomez Valero, Delphine Dervins-Ravault, Nadine Honor´e,Aleksey Savchenko, and Carmen

905 Buchrieser. The Legionella pneumophila kai operon is implicated in stress response and confers

906 fitness in competitive environments. Environmental Microbiology, 16(2):359–381, 2014.

907 [124] R. Pattanayek, J. Wang, T. Mori, Y. Xu, C. H. Johnson, and M. Egli. Visualizing a circadian clock

908 protein: crystal structure of kaic and functional insights. Molecular Cell, 15(3):375–88, 2004.

909 [125] J. E. Walker, M. Saraste, M. J. Runswick, and N. J. Gay. Distantly related sequences in the alpha-

910 and beta-subunits of atp synthase, myosin, kinases and other atp-requiring enzymes and a common

911 nucleotide binding fold. The EMBO Journal, 1(8):945–51, 1982.

912 [126] M. Egli, T. Mori, R. Pattanayek, Y. Xu, X. Qin, and C. H. Johnson. Dephosphorylation of the core

913 clock protein kaic in the cyanobacterial kaiabc circadian oscillator proceeds via an atp synthase

914 mechanism. Biochemistry, 51(8):1547–58, 2012.

915 [127] Y. Kitayama, T. Nishiwaki-Ohkawa, Y. Sugisawa, and T. Kondo. Kaic intersubunit communication

916 facilitates robustness of circadian rhythms in cyanobacteria. Nature Communications, 4:2897, 2013.

917 [128] R. Pattanayek, Y. Xu, A. Lamichhane, C. H. Johnson, and M. Egli. An arginine tetrad as mediator

918 of input-dependent and input-independent atpases in the clock protein kaic. Acta Crystallographica

919 Section D: Biological Crystallography, 70(Pt 5):1375–90, 2014.

920 [129] R. Pattanayek, T. Mori, Y. Xu, S. Pattanayek, C. H. Johnson, and M. Egli. Structures of kaic

921 circadian clock mutant proteins: a new phosphorylation site at t426 and mechanisms of kinase,

922 atpase and phosphatase. PLoS One, 4(11):e7529, 2009.

923 [130] Y. Xu, T. Mori, X. Qin, H. Yan, M. Egli, and C. H. Johnson. Intramolecular regulation of phos-

924 phorylation status of the circadian clock protein kaic. PLoS One, 4(11):e7509, 2009.

925 [131] Annemarie Neuner, Holger W. Jannasch, Shimshon Belkin, and Karl O. Stetter. Thermococcus

926 litoralis sp. nov.: A new species of extremely thermophilic marine archaebacteria. Archives of

927 Microbiology, 153(2):205–207, 1990.

928 [132] Juan M. Gonzalez, Yaeko Masuchi, Frank T. Robb, James W. Ammerman, Dennis L. Maeder, Miki

929 Yanagibayashi, Jin Tamaoka, and Chiaki Kato. Pyrococcus horikoshii sp. nov., a hyperthermophilic

930 archaeon isolated from a hydrothermal vent at the okinawa trough. Extremophiles, 2:123–130, 1998.

931 [133] Erik R. Zinser, Debbie Lindell, Zackary I. Johnson, Matthias E. Futschik, Claudia Steglich, Mau-

932 reen L. Coleman, Matthew A. Wright, Trent Rector, Robert Steen, Nathan McNulty, Luke R.

933 Thompson, and Sallie W. Chisholm. Choreography of the transcriptome, photophysiology, and cell

934 cycle of a minimal photoautotroph, Prochlorococcus. PLoS One, 4(4), 2009.

935 [134] C´ecileStraub, Philippe Quillardet, Julia Vergalli, Nicole Tandeau de Marsac, and Jean Francois

936 Humbert. A day in the life of microcystis aeruginosa strain pcc 7806 as revealed by a transcriptomic

937 analysis. PLoS One, 6(1), 2011.

33