bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Functional innovation in the evolution of the -dependent system of the eukaryotic

1 Daniel E. Schäffer1,2†, Lakshminarayan M. Iyer 1, A. Maxwell Burroughs1, L. Aravind1*

2 1National Center for Biotechnology Information, National Library of Medicine, National Institutes of 3 Health, Bethesda, MD, USA 4 2Science, Mathematics, and Computer Science Magnet Program, Montgomery Blair High School, 5 Silver Spring, MD, USA.

6 * Correspondence: 7 L. Aravind 8 [email protected]

9 †School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA

10 Keywords: calcium binding, calcium stores, , channels, eukaryote origins, 11 endomembranes, SOCE pathway, wolframin.

12 Abstract

13 The origin of eukaryotes was marked by the emergence of several novel subcellular systems. One 14 such is the calcium (Ca2+)-stores system of the endoplasmic reticulum, which profoundly influences 15 diverse aspects of cellular function including , motility, division, and 16 biomineralization. We use comparative genomics and sensitive sequence and structure analyses to 17 investigate the evolution of this system. Our findings reconstruct the core form of the Ca2+- stores 18 system in the last eukaryotic common ancestor as having at least 15 that constituted a basic 19 system for facilitating both Ca2+ flux across endomembranes and Ca2+-dependent signaling. We 20 present evidence that the key EF-hand Ca2+-binding components had their origins in a likely bacterial 21 symbiont other than the mitochondrial progenitor, whereas the phosphatase subunit of the 22 ancestral calcineurin complex was likely inherited from the asgardarchaeal progenitor of the stem 23 eukaryote. This further points to the potential origin of the eukaryotes in a Ca2+-rich biomineralized 24 environment such as stromatolites. We further show that throughout eukaryotic evolution there were 25 several acquisitions from bacteria of key components of the Ca2+-stores system, even though no 26 prokaryotic lineage possesses a comparable system. Further, using quantitative measures derived 27 from comparative genomics we show that there were several rounds of lineage-specific 28 expansions, innovations of novel gene families, and gene losses correlated with biological innovation 29 such as the biomineralized molluscan shells, coccolithophores, and animal motility. The burst of 30 innovation of new in animals included the wolframin protein associated with Wolfram 31 syndrome in humans. We show for the first time that it contains previously unidentified Sel1, EF- 32 hand, and OB-fold domains, which might have key roles in its .

33 bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

34 1 Introduction

35 The emergence of a conserved endomembrane system marks the seminal transition in cell structure 36 that differentiates eukaryotes from their prokaryotic progenitors (Jekely, 2007). This event saw the 37 emergence of a diversity of eukaryotic systems and organelles such as the nucleus, the endoplasmic 38 reticulum (ER), vesicular trafficking, and several, novel signaling systems that are uniquely 39 associated with this sub-cellular environment. A major eukaryotic innovation in this regard is the 40 intracellular ER-dependent calcium (Ca2+)-stores system that regulates the cytosolic concentration of 41 Ca2+ (Ashby and Tepikin, 2001). Although Ca2+ ions are maintained at a 4000 to 10,000-fold higher 42 concentration in the endoplasmic reticulum (ER) lumen as compared to the (Woo et al., 43 2018), upon appropriate stimulus they are released into the cytoplasm by Ca2+-release channels such 44 as the receptors (IP3Rs) and the ryanodine receptors (RyRs). The process is 45 then reversed and Ca2+ is pumped back into the ER by the ATP-dependent SERCA 46 (sarcoplasmic/endoplasmic reticulum calcium ATPase) pumps, members of the P-type ATPase 47 superfamily (Ashby and Tepikin, 2001; Altshuler et al., 2012). In addition to the above core 48 components that mediate the flux of Ca2+ from and into the ER-dependent stores, several other 49 proteins have been linked to the regulation of this process and transmission of Ca2+-dependent 50 signals, including: 1) chaperones such as , , and in the ER lumen 51 (Kozlov et al., 2010); 2) diverse EF-hand proteins such as calmodulin (and its relatives), calcineurin 52 B, and sorcin that bind Ca2+ and regulate the response to the Ca2+ flux (Denessiouk et al., 2014); 3) 53 channel proteins such as the voltage-gated calcium channels (VGCC) and trimeric intracellular cation 54 channels (TRIC) that influence the flow of Ca2+ (Lanner et al., 2010; Zhou et al., 2014); and 4) 55 protein (calcium/calmodulin-dependent kinases (CaMKs)) and protein phosphatases 56 (calcineurin A) that mediate the Ca2+-dependent signaling response (Berridge, 2012). Together, the 57 Ca2+-stores system and intracellular Ca2+-dependent signaling apparatus regulate a variety of cellular 58 functions required for eukaryotic life, such as transcription, cellular motility, cell growth, stress 59 response, and cell division (Clapham, 2007; Berridge, 2012; Krebs et al.).

60 Comparative evolutionary analyses of the proteins in the ER Ca2+-stores and -signaling system have 61 revealed that some components were either present in the last eukaryotic common ancestor (LECA) 62 (e.g. Calmodulin and SERCA) or derived early in the evolution of the eukaryotes (e.g. IP3R) (Nolan 63 et al., 1994; Moreno and Docampo, 2003; Reiner et al., 2003; Prole and Taylor, 2011; Plattner and 64 Verkhratsky, 2013; Verkhratsky and Parpura, 2014; Perez-Gordones et al., 2015). Other proteins 65 show a patchier distribution in lineages outside of metazoans (e.g. calreticulin and calnexin) (Moreno 66 and Docampo, 2003; Banerjee et al., 2007), or were reconstructed to have been derived in lineages 67 closely related to the metazoans (e.g. RyR, which diverged from the ancestor of IP3R at the base of 68 filozoans) (Alzayady et al., 2015). A substantial number of the components that have been studied in 69 this system are primarily found in the metazoans, with no identifiable homologs outside of metazoa 70 (Cai et al., 2015). Most studies have focused on animal proteins of these systems, highlighting the 71 general lack of knowledge regarding the regulation of ER Ca2+-stores and the potential diversity in 72 the regulatory systems present in other eukaryotes. To our knowledge, a systematic assessment of the 73 evolutionary origins of the entire Ca2+-stores system and its regulatory components, as currently 74 understood, has yet to be attempted.

75 Given our long-term interest in the origin and evolution of the eukaryotic subcellular systems, we 76 conducted a comprehensive analysis of the core and regulatory components of the Ca2+-stores 77 system, analyzing their known and predicted interactions and inferring the evolutionary depth of 78 various components. We show that an ancient core of at least 15 protein families was already in place 79 at the stem of the eukaryotic lineage. Of these, a subset of proteins is of recognizable bacterial

2

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

80 ancestry, although there is no evidence of a bacterial Ca2+-stores system resembling those in 81 eukaryotes. We also show that gene loss and lineage-specific expansions of these components shaped 82 the system in different eukaryotic lineages, and sometimes corresponds to recognized adaptive 83 features unique to particular organisms or lineages. Further, we conducted a systematic domain 84 analysis of the proteins in the system, uncovering three novel unreported domains in the enigmatic 85 wolframin protein. These provide further testable hypotheses on the functions of wolframin in the 86 context of the Ca2+-stores system and in protecting cells against the response stresses that impinge on 87 the ER.

88 2 Methods

89 2.1 Sequence analysis 90 Iterative sequence profile searches were performed using the PSI-BLAST program (RRID: 91 SCR_001010) (Altschul et al., 1997) against a curated database of 236 eukaryotic proteomes 92 retrieved from the National Center for Biotechnology Information (NCBI), with search parameters 93 varying based on the query sequence (see Supplementary Figure S1) and composition. For building a 94 curated dataset of eukaryotic proteomes, completely sequenced eukaryotic genomes were culled from 95 Refseq and Genbank, and representative genomes (with a preference for reference genomes) were 96 chosen from different phyletic groups using the eukaryotic phylogenetic tree as guide. If more than 97 one genome was available for a species, we typically chose the one listed as a reference genome, or 98 one that had the best assembly, or one with the most complete proteome set. The list of genomes is 99 given in the supplementary data of the supplementary material. The program HHpred 100 (RRID:SCR_010276) (Soding, 2005; Alva et al., 2016) was used for profile-profile comparisons. The 101 BLASTCLUST program1 (RRID: SCR_016641) was used to cluster protein sequences based on 102 BLAST similarity scores. Support for inclusion of a protein in an orthologous cluster involved 103 reciprocal BLAST searches, conservation of domain architectures, and, when required, construction 104 of phylogenetic trees with FastTree 2.1.3 (RRID: SCR_015501) (Price et al., 2010) with default 105 parameters. The trees were visualized using FigTree2 (RRID: SCR_008515). Selected taxonomic 106 absences were further investigated with targeted BLASTP (RRID: SCR_001010) (Altschul et al., 107 1990) and TBLASTN (RRID: SCR_011822) (Gertz et al., 2006) searches against NCBI’s non- 108 redundant (nr) and nucleotide (nt) databases (Benson et al., 2013), respectively. Multiple sequence 109 alignments were constructed using the MUSCLE (Edgar, 2004) and GISMO (Neuwald and Altschul, 110 2016) programs with default parameters. Alignments were manually adjusted using BLAST high- 111 score pair (hsp) results as guides. Secondary structure predictions were performed with the Jpred 4 112 program (RRID: SCR_016504) with default settings (Drozdetskiy et al., 2015). EMBOSS (RRID: 113 SCR_008493) pepwheel3 was used to generate renderings of amino acid positions on the 114 circumference of an α-helix.

115 2.2 Protein network construction 116 Protein-protein interactions were extracted from published data sources, updating any outdated 117 gene/protein names and making substantial efforts to disambiguate between paralogs (Supplementary 118 Data). High-throughput/predicted protein-protein interactions were extracted from the FunCoup

1 ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html 2 http://tree.bio.ed.ac.uk/software/figtree 3 http://www.bioinformatics.nl/cgi-bin/emboss/pepwheel 3

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

119 database (Ogris et al., 2018). Networks were visualized using the R-language implementations of the 120 iGraph and qGraph packages. For network rendering, the Fruchterman-Reingold force-directed 121 algorithm was used (Fruchterman and Reingold, 1991).

122 2.3 Comparative genome analyses 123 Quantitative analysis of phyletic patterns and paralog counts for different proteins/families were first 124 obtained using a combination of sequence similarity searches as outlined above. We filtered the 125 counts to exclude multiple identical sequences annotated with the same gene name, using the latest 126 genome assemblies for each of these organisms as available in NCBI GenBank (RRID: 127 SCR_002760) as anchors. Proteomes with identifiable quality issues were removed from downstream 128 analyses (e.g. genomes containing sequences with ambiguous strain assignment), leaving counts for 129 216 organisms. These counts and their phyletic patterns defined two sets of vectors, namely the 130 distribution by organism for a given protein and the complement of proteins for a given organism. 131 These vectors for the protein families and organisms were used to compute the inter-protein or inter- 132 organism Canberra distances (Lance and Williams, 1966), which is best suited for such integer data 133 of the form of presences and absences. The Canberra distance between two vectors and is defined 134 as: Ä |¿ ¿| , |¿||¿| ¿Ͱ̊ 135 These distances were used to cluster the protein families and organisms through agglomerative 136 hierarchical clustering using Ward’s method (Kaufman and Rousseeuw, 1990). The results were 137 rendered as dendrograms. Ward’s method takes the distance between two clusters, A and B to be the 138 amount by which the sum of squares from the center of the cluster will increase when they are 139 merged. Ward’s method then tries to keep this growth as small as possible. It tends to merge smaller 140 clusters that are at the same distance from each other as larger ones, a behavior useful in lumping 141 “stragglers” in terms of both organisms and proteins with correlated phyletic patterns. 142 143 The protein complement vectors for organisms were also used to perform principal component 144 analysis to detect spatial clustering upon reducing dimensionality. The variables were scaled to have 145 unit variance for this analysis. Similarly, a linear discriminant analysis was performed on these 146 vectors using representatives of the major eukaryotic evolutionary lineages (see Supplementary Table 147 S1 for list) as the prior groups for classification. This was then used on our complete phyletic pattern 148 data for classification of the organisms based on their protein complements. 149 150 Organism polydomain scores were calculated as follows: if , counts the number of paralogs of 151 some family in some organism , is the set of all proteins studied, and is the 152 set of all organisms studied, then the polydomain score for an organism is defined as: , ÆǨ¬ 153 where is the mean of for all proteins , and is defined as: ∑ÅǨ« , ̋ ∑∑ÇǨ¬ ÅǨ« , 154 Computations and visualizations were performed using the R language.

155 3 Results

4

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

156 3.1 Protein-protein interaction network for the ER-dependent calcium stores 157 regulatory/signaling apparatus 158 Metazoan Ca2+ stores have been extensively studied, leading to the identification of several proteins 159 directly or indirectly involved in calcium transport or signaling and regulation of these processes. In 160 order to apprehend the global structure of this system, we used published literature and the FunCoup 161 database to derive a network of protein-protein interactions (PPIs) containing human genes that have 162 roles in Ca2+ stores, centering the network on the three families of ER Ca2+ channels (SERCA, IP3R, 163 RyR) (see Methods). The resulting network totaled 173 protein nodes and 761 interaction edges 164 (Figure 1A, Supplementary Data). The distribution of the number of connections per node (degree 165 distribution) in this network displays an inverse (rectangular hyperbolic) relationship (R2= 0.88; 166 Figure 1C). This is a notable departure from typical PPI networks which tend to show power-law 167 degree distributions (Bader and Hogue, 2002; Rodrigues et al., 2011). To better understand this 168 pattern, we studied its most densely connected subnetworks by searching for cliques, where every 169 node is connected to every other node in the subnetwork. The largest cliques in this network have ten 170 nodes. As the degree distribution graph shows an inflection around degree 6, we merged all cliques 171 of size 6 or greater resulting in a subnetwork of 46 nodes comprising close to 50% of the edges of the 172 overall network (Figure 1B). This suggests that the inverse relationship of the degree distribution is a 173 consequence of the presence of a core of several highly connected nodes (6 or more edges), which is 174 in contrast to other system-specific PPI networks showing a power-law degree distribution, as in the 175 case of the network (Venancio et al., 2009). 176 177 Analysis of the proteins in this highly connected sub-network suggests that the 46 proteins can be 178 broadly classified into 5 groups: 1) the channels and ATP-dependent pumps which constitute the core 179 Ca2+ transport system for ER stores; 2) EF-hand domain proteins such as calmodulin and sorcin that 180 bind Ca2+ and consequently interact with and regulate the biochemistry of numerous other proteins; 181 3) proteins involved in folding and stability of other proteins, such as chaperones, redox proteins, and 182 protein disulfide isomerases; 4) components of the protein phosphorylation response that is 183 downstream of Ca2+ flux into the cytoplasm; and 5) proteins linking the network to other major 184 functional systems, such as Bcl2, which is involved in the apoptosis response in animals, and beclin- 185 1, which is involved in autophagy.

186 3.2 Phyletic distribution of key proteins and implications for the evolution of the ER Ca2+- 187 stores and -signaling pathway 188 This densely connected sub-network invites questions about its evolutionary origin, particularly 189 given that the wet-lab results that inform the connections are predominantly drawn from mammalian 190 studies (see Methods). To understand better the emergence of this sub-network and the conservation 191 of its nodes across eukaryotes, we systematically analyzed their phyletic patterns (Figure 2A, 192 Supplementary Figure S1 and Data). A list of the 34 proteins and protein families studied, as well as 193 their domain architectures, is in Supplementary Table S2. Further, Ward clustering analysis of the 194 core components based on their phyletic patterns (see Methods) revealed the presence of 5 distinct 195 clusters (Figure 2C). These clusters appear to have an evolutionary basis with distinct clusters 196 accommodating proteins that could be inferred as having been in the LECA (e.g. cluster 1) and those 197 that emerged in a metazoan-specific expansion of the ER Ca2+-stores system (e.g. clusters 4 and 5). 198 3.2.1 The core LECA complement of the Ca2+-stores system 199 At least 15 proteins of the ER Ca2+-stores and signaling system are found across all or most 200 eukaryotic lineages, suggesting that they were present in the LECA. These proteins include key 201 components of the dense sub-network, such as 1) the SERCA Ca2+ pumps; 2) the Ca2+-binding EF-

5

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

202 hand proteins like calmodulin; 3) chaperones involved in that are mostly found in the 203 ER and that sometimes act as either Ca2+ binding proteins (e.g. calreticulin, calnexin) or as regulators 204 of other components of the Ca2+-stores and -signaling system (e.g. ERp57/PDIA3, calstabin, TMX1, 205 ERdj5/DNAJC10); and 4) core of the Ca2+-dependent phosphorylation-based signaling 206 system including the CaMKs and the calcineurin A phosphatase. This set of components is likely to 207 have comprised the minimal ER-Ca2+-stores and -signaling system in the LECA and suggests that 208 there were already sub-systems in place to mediate: 1) the dynamic transport of Ca2+ ions across the 209 emerging eukaryotic intracellular membrane system and 2) the transmission of signals affecting a 210 wide-range of subcellular processes based on the sensing of Ca2+ ions. 3) A potential stress response 211 system that channels Ca-dependent signals to regulation of protein folding (Krebs et al.). 212 213 Notably, the IP3R channels are absent in lineages that are often considered the basal-most 214 eukaryotes, namely the parabasalids and diplomonads; however, they are present in some other early- 215 branching eukaryotes such as kinetoplastids (Cavalier-Smith et al.). Their absence in certain extant 216 eukaryotes (Figure 2A) suggests that they can be dispensable, or that their role can be taken up by 217 other channels in eukaryotes that lack them. A comparable phyletic pattern is also seen for certain 218 other key components of the densely connected sub-network, namely the ERdj5 and calnexin 219 chaperones. 220 221 3.2.2 Components with clearly identifiable prokaryotic origins 222 Deeper sequence-based homology searches revealed that at least three protein families of the ER 223 Ca2+-stores system have a clearly-identifiable bacterial provenance, namely the P-type ATPase pump 224 SERCA, , and calmodulin and related EF-hand proteins. Phylogenetic analyses suggest 225 that the P-type ATPases SERCA and plasma membrane calcium-transporting ATPase (PMCA) were 226 both present in the LECA. They are most closely-related to bacterial P-type ATPases (Plattner and 227 Verkhratsky, 2015), which commonly associate with transporters (e.g. Na+-Ca2+ antiporters), ion 228 exchangers (Na+-H+ exchangers), permeases, and other distinct P-type ATPases in conserved gene- 229 neighborhoods (Supplementary Figure S2) suggesting a role in maintenance of ionic homeostasis 230 even in bacteria. 231 232 The GTPases EHD and sarcalumenin, whose GTPase domains belong to the dynamin family (Leipe 233 et al., 2002), also show clear bacterial origins based on their phyletic patterns and phylogenetic 234 affinities. The closest bacterial homologs possess a pair of transmembrane helices C-terminal to the 235 GTPase domain, suggesting a possible role in membrane dynamics (Figure 3B, Supplementary Data). 236 Phylogenetic trees show that although they are related to the dynamins, the progenitor of the 237 eukaryotic sarcalumenin and EHD was acquired independently of the dynamins via a separate 238 transfer from a proteobacterial lineage early in eukaryotic evolution (Figure 3A) (Leipe et al., 2002). 239 This gave rise to the EH-domain-containing EHD clade of GTPases, which are involved in regulating 240 vesicular trafficking and membrane/Golgi reorganization. A further secondary transfer, likely from 241 the kinetoplastid lineage to the metazoans, gave rise to sarcalumenin proper, which has characterized 242 roles in the Ca2+-stores system (Figure 3A). This raises the possibility that in other eukaryotes, EHD 243 performs additional roles in the Ca2+-stores system overlapping with metazoan sarcalumenin. 244 245 Our analysis revealed that the bacterial calmodulin-like EF-hands show a great diversity of domain- 246 architectural associations. Versions closest to the eukaryotic ones are found in actinobacteria, 247 cyanobacteria, proteobacteria, and verrucomicrobiae. These prokaryotic calmodulin-like EF-hand 248 domains are found fused to a variety of other domains (Figure 3E, Supplementary Data), such as a 7- 249 transmembrane domain (7TM) (cyanobacteria and to a lesser extent in actinobacteria), the

6

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

250 prokaryotic Tic110-like α-helical domain (cyanobacteria), heme-oxygenases (actinobacteria), the 251 nitric oxide synthase and NADPH oxidase with ferredoxin and nucleotide-binding domains 252 (cyanobacteria and δ-proteobacteria), as well as cNMP-binding, thioredoxin, cytochrome, and 253 sulfatase domains (verrucomicrobiae). Comparable architectural associations with several of these 254 domains, such as the fusions to the sulfatases and the redox-regulator domains thioredoxin, 255 cytochromes, glyoxylases, and NADPH oxidases, are also observed in eukaryotes. However, our 256 analysis showed that these fusions appear to be independently derived in the two superkingdoms. 257 These diverse associations suggest that even in bacteria the calmodulin-like EF-hands function in the 258 context of membrane-associated signaling and redox reactions, possibly regulated in a Ca2+-flux- 259 dependent manner (Zhou et al., 2006; Dominguez et al., 2015). We infer that a version of these was 260 transferred to the stem eukaryote, probably from the cyanobacteria or the actinobacteria, and had 261 already triplicated by the LECA, giving rise to the ancestral versions of calmodulin and calcineurin 262 B, which function as part of the Ca2+-stores system, and the centrins, which were recruited for a 263 eukaryote-specific role in cell division is association with the centrosome (Dantas et al., 2012). 264 265 Among components with indirect regulatory roles in the ER Ca2+-dependent system is the peptidyl- 266 prolyl cis-trans isomerase (PPIase) calstabin (FKBP1A/B in humans), which is a member of the 267 FKBP family of PPIase chaperones. Phyletic pattern analysis indicates that calstabin was present in 268 the LECA (Figure 2A). Eukaryotic PPIases are more similar to bacterial PPIases (Supplementary 269 Data); hence, the ancestral eukaryotic PPIase was likely acquired from the alphaproteobacterial 270 mitochondrial progenitor. This is also supported by the evidence from extant pathogenic/symbiotic 271 bacteria wherein the bacterial FKBP-like PPIases play a role in establishing associations with 272 eukaryotic hosts (Unal and Steinert, 2014). In the stem eukaryotes, the ancestral PPIase acquired 273 from the bacterial source underwent a large radiation resulting in diverse PPIases that probably went 274 hand-in-hand with the eukaryotic expansion of low-complexity proteins with potential substrate 275 prolines. Thus, eukaryotes acquired a wide range of substrate proteins in several eukaryote-specific 276 pathways and function in several cellular compartments (Trandinh et al., 1992). Calstabin is one of 277 the paralogs that arose as part of this radiation and appears to have been dedicated to the ER Ca2+- 278 stores system. 279 280 Beyond the above-mentioned, several other components inferred to be part of the LECA complement 281 of the ER Ca2+-stores system are likely of bacterial origin. However, they do not have obvious 282 bacterial orthologs and might have diverged considerably from their bacterial precursors in the stem 283 eukaryote itself. These include the TRIC-like channels (Silverio and Saier, 2011), the CaMKs, and 284 the thioredoxin domains found in ERp57, ERdj5, and TMX1. 285 286 In contrast to the several components of bacterial provenance, the large eukaryotic assemblage of 287 calcineurin-like protein phosphatases, which includes the Ca2+-stores regulator calcineurin A, are 288 specifically related to an archaeal clade to the exclusion of all other members of the superfamily 289 (Figure 3A, Supplementary Data). Notably, these close relatives are present in several 290 Asgardarchaea, suggesting the eukaryotes may have directly inherited the ancestral version of these 291 phosphoesterases from their archaeal progenitor (Zaremba-Niedzwiedzka et al., 2017). Strikingly, the 292 archaeal calcineurin-like phosphatases occur in a conserved operon (Figure 3A) with genes for two 293 others proteins, one combining a zinc ribbon fused to a phosphopeptide-recognition FHA domain and 294 the second with a vWA domain fused to a β-barrel-like domain. This supports a similar role for these 295 archaeal calcineurin-like phosphatases to their eukaryotic counterparts in transducing a signal 296 through dephosphorylation of a protein-substrate. 297

7

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

298 Thus, the core, ancestrally-conserved components of the ER-dependent stores system predominantly 299 descend from the bacteria, although at least one component was inherited from the archaea. While the 300 roles for some of these domains in possible bacterial Ca2+-dependent systems are apparent, there is 301 no evidence that these versions function in a coordinated fashion in any single bacterial species or 302 clade. Further, it is also clear that there was likely more than one bacterial source for the proteins: 303 components such as SERCA and calstabin, whose closest relatives are proteobacterial, are likely to 304 have been acquired from the mitochondrial ancestor, whereas calmodulin is likely to have been 305 derived from a cyanobacterium or actinobacterium. Thus, the ER-dependent Ca2+-stores network was 306 assembled in the stem eukaryote from diverse components drawn from different prokaryotic lineages. 307 This assembly of the system in eukaryotes is strikingly illustrated by the case of the calcineurin 308 complex. Here, the protein phosphatase component has a clear-cut origin from the archaeal precursor 309 of the eukaryotes, whereas the Ca2+-binding calcineurin B component descends from a bacterial 310 source. It was the combination of these proteins with very distinct ancestries that allowed the 311 emergence of a Ca2+-signaling system. 312 313 3.2.3 Lineage-specific expansions and gene loss shape the Ca2+ response across eukaryotes 314 In order to obtain some insights regarding the major developments in the evolution of the eukaryotic 315 Ca2+-stores system, we systematically assembled protein complement vectors for each of the 316 organisms in the curated proteome dataset (see Methods). After computing the pairwise Canberra 317 distance between these vectors, we performed clustering using the Ward algorithm (see Methods). 318 The resulting clusters recapitulate aspects of eukaryote phylogeny, with animals, fungi, plants, 319 kinetoplastids and apicomplexans forming distinct monotypic clusters (Figure 2B). These 320 observations together indicate that most major eukaryotic lineages have likely evolved specific 321 component around the core Ca2+-stores system inherited from the LECA that distinguish them from 322 related lineages. We then used principal component and linear discriminant analysis (see Methods) 323 on these vectors to obtain a global quantitative view of the diversification of the Ca2+-stores system. 324 Plotting the first two principal components/discriminants reveals discernable spatial separation of the 325 metazoans from their nearest sister group, the fungi (Figure 2D-E). Further, diverse photosynthetic 326 eukaryotes tended to be spatially colocalized. These observations suggest that certain accretions to 327 the core Ca2+-stores system probably occurred alongside unique functional developments such as 328 motility and multicellularity (metazoa) and autotrophy (photosynthetic lineages). Hence, 329 investigations on individual lineages are likely to unearth novel, lineage-specific regulatory devices, 330 which potentially reflect adaptations to their respective environments and provide hints about the 331 biological contexts in which they were found. 332 333 We further investigated these potential lineage-specific developments by defining a measure, the 334 polydomain score (see Methods), which captures the overall amplification of the Ca2+-stores system 335 of an organism in terms of the contribution of the different constituent protein families of the system 336 (Figure 2F). While PCA and LDA help identify the overall tendencies among organisms in terms of 337 the system under consideration, the polydomain score is better-suited for identifying specific unusual 338 features in terms of individual protein families/domains in particular organisms or clades. This score 339 allowed us to capture some of the key developments in the reconstructed Ca2+-stores system of 340 particular organisms and lineages. Notably, this led to the identification some of the distinctions 341 between Ca2+-regulatory systems that may reflect differential strategies for the incorporation of 342 calcium into exo/endo-skeletal structures in eukaryotes. 343 344 Comparison of polydomain scores within metazoa (Figure 2G) showed a striking reduction of the 345 Ca2+-stores system in arthropods as well as in some molluscs and other marine lineages, consistent

8

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

346 with their distinct grouping in the PCA/LDA plots. This appears to be generally related to the lower 347 dependence on biomineralized structures in these organisms – arthropods utilize chitin as the primary 348 exoskeletal material as opposed to calcareous structures, and the molluscan lineages showing this 349 pattern have lost their calcified shells (e.g. octopi). Similarly, fungi, which use chitin as a central cell 350 wall component, show lower polydomain scores relative to sister eukaryotic lineages. Conversely, we 351 noted elevated polydomain scores in molluscs with calcareous shells. Likewise, ciliates, which have a 352 evolved a distinct extension to the ancestral eukaryotic Ca2+-stores system in the form of the alveoli 353 (Plattner, 2017), also have elevated polydomain scores. Ca2+-based signaling has been shown to be 354 important for defensive trichocyst exocytosis, ciliary action, and other pathways in ciliates (Plattner, 355 2015). These observations suggest a linkage between strategies for structural and signaling usage of 356 calcium and the regulation of Ca2+ distribution between intracellular compartments. 357 358 In qualitative terms, the distinct spatial positioning of different eukaryotic lineages in the PCA/LDA 359 plots and the difference in their polydomain scores could be explained on the basis of lineage-specific 360 expansions (LSEs), gene losses, and domain architectural diversity between lineages (Lespinet et al., 361 2002). We identified specific gains and losses based on inferred ancestral protein complements 362 (Supplementary Figure S1). One of the most frequent proteins displaying LSEs is the CaMK: 363 expansions have occurred independently in animals, plants, oomycetes and alveolates (Figure 3C, 364 Supplementary Data). Ciliates in particular show a dramatic expansion of the family with over one 365 hundred copies in Paramecium and Stentor. Other proteins that show LSEs in specific lineages 366 (Supplementary Data) include calmodulin (in metazoans, certain fungal lineages, and plants), 367 calstabin (independently in different stramenopile lineages and haptophytes), PP2R3-like proteins (in 368 kinetoplastids and Trichomonas), calcineurin A (in ciliates and Entamoeba), calnexins (in 369 Trichomonas and diatoms), ORAI (in Emiliania, Figure 3D), and SERCA (in certain fungi). The 370 metazoans also display LSEs for several protein families that appear to have specifically emerged in 371 metazoa (see below). 372 373 Some of these LSEs have clear biological correlates that were also suggested by the polydomain 374 scores. For example, LSEs of the calmodulin family in the shelled molluscs (20-46 copies) might 375 correspond to or be involved in the regulation of Ca2+ concentration during the biomineralization 376 processes involved in the formation of calcareous shells. Concomitant with this expansion, in shelled 377 molluscs several transporters, including P-type ATPases, voltage-gated channels, and ion exchangers, 378 have been adapted for the transportation of Ca2+ ions for shell formation (Sillanpaa et al., 2018). 379 Similarly, the development of the distinct set of Ca2+ stores in the form of the alveoli, which play 380 several key ciliate-specific roles (Plattner, 2015; 2017), might explain the expansions in these 381 organisms. Thus, the very large LSEs of the CaMK and calcineurin A-like phosphoesterase likely 382 reflect the many diverse pathways into which Ca2+-based signaling is incorporated in these 383 organisms. Additionally, experimental evidence shows that some pan-eukaryotic members of the ER 384 Ca2+ stores system have acquired additional roles in ciliates; for example, SERCA-like Ca2+-pumps 385 are also in the membranes of Paramecium alveolar calcium stores (Plattner et al., 1999). 386 387 ORAI family LSEs in Emiliania might correspond to the need to regulate Ca2+ transport for 388 mineralizing the calcareous coccoliths (Yin et al., 2018). Further, calcium for coccolith formation 389 may also supplied in part by an ER-membrane-localized Ca2+/H+ exchanger, as well as by increased 390 activity of SERCA and a plasma membrane Ca2+/H+ exchanger (Mackinder et al., 2011). Here, as in 391 ciliates, these expansions might have accompanied the emergence of a distinct Ca2+-stores system 392 associated with the nuclear envelope (an extension of the ER), which is close in proximity to the 393 envelope of the coccolith and the site of biomineralization (Brownlee et al., 2015). 394

9

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

395 In contrast, we observed extensive gene losses of proteins including ERdj5, sarcalumenin, IP3R, and 396 ORAI across several or all fungi and losses of at least 10 conserved proteins in Entamoeba (Figure 397 2A). However, experimental studies have shown that both fungi and Entamoeba have Ca2+ transients 398 and associated signaling systems (Makioka et al., 2002; Kim et al., 2012); hence, despite these losses, 399 the evidence favors these organisms retaining at least a limited Ca2+-store-dependent signaling 400 network. The CREC (calumenin, reticulocalbin, and Cab45) family displays an unusual phyletic 401 pattern of particular note, being found only in metazoans and land plants (Figure 2A). Barring the 402 unusual possibility of lateral transfer between these two lineages, this would imply extensive loss of 403 these proteins across most other major eukaryotic lineages. Within metazoa, several proteins display 404 unusual loss patterns, including sorcin, calsequestrin, and SelN (Figure 2A). 405 406 Notably, the land plants and their sister group the green algae have entirely lost ancient core Ca2+- 407 dependent signaling proteins such as calcineurin A and HOMER. The IP3R channels are retained by 408 the chlorophytes but lost entirely by the land plants, while the ORAI channels are present in the basal 409 land plants Physcomitrella and Selaginella but are not observed in crown land plants (Edel et al., 410 2017). Land plants have also lost the TRIC channels, but the basal streptophyte Chara braunii retains 411 a copy. The differential retention and expansion of various ancestral Ca2+ components (Figure 2A) in 412 the land plants might be seen as adaptations to a sessile lifestyle no longer requiring Ca2+-signaling 413 components associated with active cell or organismal motility. However, the paradoxical LSEs of the 414 calmodulin-like and EF-hand-fused CaMK-like proteins (CDPK family) (Edel et al., 2017) in land 415 plants relative to their chlorophyte sister-group (Figure 2A, 3C). These might have emerged as part 416 of Ca2+ sequestration and signaling mechanisms relevant in the Ca2+-poor freshwater ecosystems, 417 wherein the land plants originated from algal progenitors (Delwiche and Cooper, 2015). 418 419 Little in the way of domain architectural diversity is observed in the core conserved proteins of the 420 animal Ca2+-stores system. A notable exception is seen in the CaMKs: metazoan CaMKs show a 421 higher architectural complexity via their fusion to several distinct globular domains. These domains 422 act as adaptors in interactions with Ca2+-binding regulatory domains, effectively broadening the total 423 range of signaling pathways in which the CaMKs participate in metazoa (Wang et al., 2015). In 424 contrast, land plants display lower architectural complexity with CaMK orthologs directly fused to 425 Ca2+-binding domains (Klimecka and Muszynska, 2007). 426 427 3.2.4 Accretion of novel Ca2+ signaling pathway components in the Metazoa 428 Case by case examination revealed that the distinct position of the metazoans in the PCA/LDA plots 429 relative to other eukaryotes (Figure 2D-E) is due to a major accretion of novel signaling and flux- 430 related Ca2+-stores components at the base of the metazoan lineage (Figure 2A). Our analyses suggest 431 three distinct origins of these proteins: 1) several emerged as paralogs of domains already present in 432 the LECA and functioning in the context of Ca2+-signaling, including the EF-hand domains found in 433 the STIM1/2, calumenin, SelN, S100, sorcin, and NCS1 proteins, the thioredoxin domain found in 434 the ERp44 and calsequestrin proteins, and the RyR channel proteins. 2) Proteins or component 435 domains which are involved in a broader range of functions but have been recruited to roles in Ca2+- 436 stores regulation specifically in the animal lineages, such as the SAM domain in the STIM1/2 437 proteins, the PDZ domain in the neurabin protein, and the TRPC channels. 3) Proteins containing 438 domains which appear to be either novel metazoan innovations, such as the KRAP domain of the 439 Tespa1 protein, or whose origins have yet to be traced, such as VGCC and wolframin. Further, the 440 origin of certain metazoan-specific proteins such as , which inhibits the SERCA 441 ATPase, remain difficult to trace because of their small size and highly-biased composition as 442 membrane proteins. Inspection of the neighbors of these proteins in the constructed network suggests

10

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

443 that in almost all instances, these proteins were added to the Ca2+ signaling network via interactions 444 with one or more of the ancient components of the system (Figure 1A). 445 446 This sudden accretion of Ca2+-stores components likely coincided with emergence of well-studied 447 aspects of differentiated metazoan tissues, such as muscle contraction and neurotransmitter release 448 (Zucchi and Ronca-Testoni, 1997; Clapham, 2007). Other additions to the network likely arose via 449 interface with other pathways like apoptosis and autophagy in the context of ER stress response 450 (Smaili et al., 2013). Notable in this regard is the metazoan-specific Bcl2 family of membrane- 451 associated proteins associated with regulation of apoptosis. They appear to have been derived via 452 rapid divergence in metazoans from pore-forming toxin domains of ultimately bacterial provenance 453 (Peng et al., 2009), which are found in pathogenic bacteria and fungi (Aravind et al., 2012). 454 455 Still other proteins were recruited to the system as part of newly-emergent regulatory subnetworks, 456 such as the Store-Operated Calcium Entry (SOCE) pathway for re-filling the ER from extracellular 457 Ca2+ stores (Prakriya and Lewis, 2015; Ong et al., 2016). Of the proteins identified in this pathway, 458 the ORAI Ca2+ channels are present in the early-branching kinetoplastid lineage but were lost in 459 several later-branching lineages (Figure 2A), while other components like the ORAI-regulating 460 STIM1/2 proteins and the TRPC channels emerged around the metazoan accretion event, suggesting 461 the core SOCE pathway came together at or near the base of the animal lineage. In course of this 462 analysis, we identified a common origin for the ORAI and Jiraiya/TMEM221 ER channels, the latter 463 of which are characterized in BMP signaling (Aramaki et al., 2010). The Jiraiya channels are 464 observed in animals but are absent in earlier-branching eukaryotic lineages, suggesting they emerged 465 from a duplication of an ORAI channel early in metazoan evolution (Figure 3D, Supplementary 466 Data). Jiraiya channel domains lack the Ca2+ binding residues seen in ORAI channels, suggesting 467 they are unlikely to directly bind Ca2+. However, it is possible that Jiraiya/TMEM221 physically 468 associates with components of the Ca2+-stores system to regulate them.

469 3.3 Domain architectural anatomy and functional analysis of the wolframin protein 470 As noted above, one of the uniquely metazoan proteins in the Ca2+-stores system is wolframin, a 471 transmembrane protein localized to the ER membrane (Hildebrand et al., 2008; Rigoli et al., 2011; 472 Qian et al., 2015) (Figure 2A, Supplementary Data). Wolframin, along with the structurally-unrelated 473 and more widely phyletically distributed (Figure 2A) Wolfram syndrome 2 (WFS2) protein, is 474 implicated in Wolfram syndrome (Inoue et al., 1998; Strom et al., 1998; Amr et al., 2007; Urano, 475 2016). Experimental studies have attributed biological roles to unannotated regions upstream and 476 downstream of the TM region in wolframin, and our analyses revealed four uncharacterized globular 477 regions therein (Figure 4A). The cytosolic N-terminal region was identified through iterative 478 database searches (see Methods) as containing Sel1-like repeats (SLRs; query: NP_005996, hit: 479 OYV16035, iteration 2, e-value: 5x10-16), which are α-helical superstructure forming repeats 480 structurally comparable to the tetratricopeptide repeats (TPRs) (Ponting et al., 1999; Karpenahalli et 481 al., 2007). Profile-profile searches (see Methods) unified the remaining two globular regions 482 respectively with various EF-hand domains (e.g., query: XP_011608878, hit: 1SNL_A, p-value 483 2.3x10-4) and OB fold-containing domains (e.g. query: XP_017330582, hit: 2FXQ_A, p-value: 484 1.3x10-4). 485 486 The wolframin SLR possess several conserved residues seen in the classical SLRs (Figure 4C, 487 Supplementary Table S4)(Mittl and Schneider-Brachert, 2007), and additionally display a truncated 488 first loop when compared to known SLRs (Figure 4C). The SLRs of wolframin appear most closely- 489 related to bacterial versions, suggesting a possible horizontal transfer at the base of animals (Ponting

11

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

490 et al., 1999); however, the short length and rapid divergence of such repeats complicates definitive 491 ascertainment of such evolutionary relationships. SLRs and related α/α repeats are often involved in 492 coordinating interactions within protein complexes (Karpenahalli et al., 2007; Mittl and Schneider- 493 Brachert, 2007) and have been characterized specifically in ER stress and misfolded protein 494 degradation responses (Jeong et al., 2016) and in mediating interactions of membrane-associated 495 protein complexes (Mittl and Schneider-Brachert, 2007). As the wolframin SLRs roughly correspond 496 to an experimentally-determined calmodulin-binding region (Yurimoto et al., 2009), they might 497 specifically mediate that protein interaction. 498 499 The EF-hand region of wolframin contains the two characteristic copies of the bihelical repeat that 500 form the basic EF-hand unit. However, it lacks the well-characterized Ca2+-binding DxDxDG motif 501 or any comparable residue conservation (Figure 4D) (Gifford et al., 2007; Denessiouk et al., 2014). It 502 also features striking loop length variability in between the two helix-loop-helix motifs (Figure 4D). 503 Such “inactive” EF-hand units typically dimerize with other EF-hand proteins (Kawasaki et al., 1998; 504 Gifford et al., 2007), suggesting that wolframin could be self-dimerizing or that its EF-hand could 505 interact at the ER membrane with calmodulin and/or a distinct EF-hand protein. 506 507 The C-terminal OB fold domain (Figure 4B,E) lacks the conserved polar residues typical of nucleic 508 acid-binding OB-fold domains (Watson et al., 2007; Guardino et al., 2009). Further, the localization 509 of this region to the ER lumen suggests that it is unlikely to be involved in nucleic acid binding 510 (Arcus, 2002; Flynn and Zou, 2010; Krishna et al., 2010). Alternatively, this OB-fold domain could 511 mediate protein-protein interactions as has been observed for other members of the fold (Flynn and 512 Zou, 2010) and is also consistent with previous experimental studies implicating this region of 513 wolframin in binding the pre-folded form of ATP1B1 (Zatyka et al., 2008). Strikingly, in the region 514 N-terminal to the OB fold domain we observe a further globular region containing a set of six 515 absolutely-conserved cysteines (Figure 4E). This region is not unifiable with any known domains and 516 could conceivably represent an extension to the core OB fold domain. These conserved cysteine 517 residues could contribute to disulfide-bond-mediated cross-linking, a well-studied regulatory 518 mechanism of Ca2+-stores regulation (Ushioda et al., 2016). Additionally, C-terminal to the OB-fold, 519 wolframin has a hydrophobic helix that might be involved in intra- or inter-molecular interactions 520 (Figure 4E). 521 522 The wolframin TM region, located in the central region of the protein (Figure 4A), consists of nine 523 transmembrane helices (Hofmann et al., 2003; Rigoli et al., 2011; Qian et al., 2015). Despite 524 extensive studies on the TM region, its precise role in affecting Ca2+ flow across the ER membrane 525 remains the subject of some debate (Osman et al., 2003; Aloi et al., 2012; Zatyka et al., 2015; 526 Cagalinec et al., 2016). Inspection of a multiple sequence alignment of the wolframin transmembrane 527 region (Supplementary Figure S3) revealed a concentration of polar residues which are spatially 528 alignable in helices 4 and 5 (Supplementary Figure S4). This is reminiscent of membrane associated 529 polar residue configurations seen in proteins that allow transmembrane flux of ions. Hence, it would 530 be of interest to investigate if these residues might play a role in ion transport by wolframin.

531 4 Discussion

532 4.1 Evolutionary and functional considerations 533 4.1.1 Early and later landmarks in the evolution of the ER Ca2+-stores system 534 The ER Ca2+-stores system displays several parallels in its evolutionary history to other 535 endomembrane-dependent systems such as the nuclear membrane and vesicular trafficking systems

12

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

536 (Mans et al., 2004; Jekely, 2008). Like in the case of these systems, dedicated ER Ca2+-stores 537 systems are absent in the prokaryotes, despite the presence of Ca2+ transients and Ca2+-dependent 538 signaling pathways in them (Dominguez et al., 2015). However, several of the more ancient 539 individual components of eukaryotic Ca2+-stores systems are of clear-cut prokaryotic origin. Notably, 540 we show here that not all of these proteins originated from the proto-mitochondrion; notably, 541 calmodulin is of likely cyanobacterial or actinobacterial provenance and the calcineurin-like 542 phosphoesterases originally descended from the archaea. The former observation adds to 543 accumulating evidence of LECA acquiring bacterial contributions from non-α-proteobacterial 544 lineages. It is therefore possible that LECA had a more extensive set of associated symbionts than 545 what was fixed as the mitochondrion in eukaryotic evolution (Huang and Gogarten; Burroughs et al., 546 2017; Verma et al., 2018). 547 548 The cyanobacterial/actinobacterial origin of calmodulin and the role for the closely-related and early- 549 branching centrin family of EF-hand proteins in microtubule dynamics during cell division suggest 550 that the LECA already had a strong Ca2+ dependency. This raises the possibility that the prokaryotic 551 ancestors of the eukaryotes might have existed in a calcium-rich environment such as the 552 biomineralized structures (e.g. stromatolites) formed by cyanobacteria (Bosak et al., 2013). This is 553 consistent with the diversity of domain architectures for calmodulin-like proteins with ramifications 554 into various functional systems in the cyanobacteria that we reported here. This diversified pool of 555 Ca2+-binding domains could have contributed raw materials needed during the initial emergence of 556 Ca2+ flux-based signaling and regulation across endomembranes in eukaryotes. The newly emerged 557 intracellular Ca2+ gradients were likely fixed by the myriad advantages it bestowed in the stem 558 eukaryotes, including increased signaling capacity and a regulatory mechanism for processes like 559 growth/proliferation, , and motility. 560 561 Figure 5 shows a summary of key acquisitions and losses of Ca2+-stores proteins during eukaryotic 562 evolution, placed onto a simplified model of eukaryotic evolution. In inferring ancestral states, and 563 thereby gains and losses, we followed the (relatively) certain contours of the eukaryotic tree (see also 564 Supplementary Figure S1). For example, we assumed that the root of eukaryotes lies in the excavates 565 and that the SAR (stramenopile/alveolate/rhizarian) group is monophyletic. In general, we strove to 566 make the least number of assumptions possible; therefore, many of our estimates – for example of the 567 number of Ca2+-stores proteins in the LECA – are lower bounds and conservative. By these criteria, 568 we infer that the ancestral eukaryotic ER-dependent Ca2+-stores system likely consisted of a 569 combination of the SERCA pump, a cation channel, EF-hand-containing proteins, and 570 phosphorylation enzymes (Figure 5). Early in eukaryotic evolution, domains associated 571 with protein folding and thioredoxin fold domains were added to the system, likely recruited from 572 roles as general regulatory domains, some of which could also bind Ca2+ (Figure 5). Association with 573 thioredoxin fold domains, involved in disulfide-bond isomerization, is of note as this points to early 574 emergence of a link between redox-dependent folding of cysteine-rich proteins and Ca2+ 575 concentrations. The striking presence of cysteine-rich domains associated with cyanobacterial 576 calmodulin homologs (Figure 3E; DES, LMI, and LA unpublished observations) suggests that such a 577 connection might have emerged even before the origin of the eukaryotic Ca2+-stores system. These 578 functional links might have persisted until later in eukaryotic evolution as hinted by the cysteine-rich 579 domain present in wolframin. This led to the basic system as reconstructed in the LECA (Figure 2A). 580 581 Waves of additional accretion events added components to the Ca2+-stores system at distinct points in 582 eukaryotic evolution, often appearing to correlate with adaptations to distinct lifestyles, such as the 583 evolution of motile multicellular forms, loss of motility (the crown plant lineage), or the evolution of

13

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

584 calcium-rich biomineralized skeletons and shells (Figures 2A, 5, Supplementary Figure S1). 585 Strikingly, we appear to observe some correlation between the loss-and-gain patterns of the 586 regulatory components of Ca2+-stores systems and the degree of structural utilization of calcium. For 587 example, a relative dearth of these components is observed in the arthropod and fungal lineage, 588 which use chitin-based structural components as opposed to the calcium-based biomineralized 589 skeletons and shells of vertebrates and certain molluscs (Figure 2F-G). “Because the data in this 590 study necessarily relies on experimental findings primarily from animals, it is best suited to 591 characterizing the system from the viewpoint of metazoa. However, variations in presence/absence 592 across lineages and LSEs provide insight into the dynamic evolution of Ca2+ stores regulation and 593 suggests there are further complexities to explore in more poorly-characterized eukaryotes/

594 4.1.2 Wolframin domain architecture and interactors 595 Even within animal Ca2+-stores systems, several proteins with important regulatory roles remain 596 poorly-understood in terms of their functional mechanisms. Wolframin is such a protein whose 597 domain composition has eluded researchers for over two decades (Inoue et al., 1998; Strom et al., 598 1998; Osman et al., 2003). Assignment of domains at the N- and C-termini of the central TM region 599 (Figure 4A), as well as the positioning of wolframin in the assembled interaction network, (Figure 600 1A) (Zatyka et al., 2015) supports a role in coordinating interactions on both sides of the ER 601 membrane. These are likely to take the form of PPIs with Ca2+-binding proteins or through disulfide 602 bond interactions (see above). 603 604 However, outside of a possible bacterial origin for the SLR region (see above, Figure 4C), the precise 605 evolutionary origins of the remaining domains comprising wolframin remain mostly unclear. It 606 appears likely these domains were derived from paralogs of existing EF-hand and OB fold domains, 607 both of which had already undergone extensive domain radiations in the eukaryotes prior to the 608 emergence of the animals, and then assembled and recruited to a Ca2+-stores regulatory role at the ER 609 (Lespinet et al., 2002). Their rapid divergence, evident by the lack of recognizable relationships to 610 known families with their respective folds, could have resulted from the extraordinary selective 611 pressures occurring with the major burst of evolutionary changes during the emergence of the 612 metazoan lineage. 613 614 Despite observations that genic disruptions contribute to similar though not identical phenotypes 615 (Urano, 2016), WFS1 and WFS2 are structurally unrelated. Our interaction network analysis further 616 failed to uncover any shared interactors (Figure 1A), although both have been linked in the past to 617 activity (Lu et al., 2014), mitochondrial dysfunction (Chang et al., 2012; Cagalinec et al., 618 2016), and apoptosis (Yamada et al., 2006; Chang et al., 2010). Recent research on WFS2 has 619 particularly focused on its role in calcium stores regulation at the intersection of the ER and 620 mitochondrial membranes (Rouzier et al., 2017). We believe that our identification of the constituent 621 domains of wolframin reported herein might help clarify its function better through target deletion 622 and mutagenesis of these domains.

623 4.2 Conclusions 624 Reconstruction of the evolution of the eukaryotic Ca2+-stores regulatory system points to a core of 625 domains inherited from distinct prokaryotic sources conserved across most eukaryotes. Lineage- 626 specific differentiation of the system across eukaryotes is driven by complexities stemming from 627 both the loss and/or expansion of the core complement of domains by the addition of components via 628 LSEs or recruitment of domains of diverse provenance. We analyze in depth one such striking 629 example of the latter, namely wolframin, which has been previously implicated in human disease.

14

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

630 The evolution of wolframin provides a model for how regulatory components of the Ca2+-stores 631 system emerged: through the combining of existing mediators of Ca2+ signaling, like the EF-hand 632 domains, with other domains originally not found in the system. Such transitions often happened at 633 the base of lineages that subsequently underwent substantial diversification. We hope the findings 634 presented here open novel avenues for the ongoing research on the regulation of calcium stores 635 across eukaryotes, including providing new handles for understanding the functional mechanisms of 636 wolframin and its dysregulation in Wolfram syndrome.

637 5 Conflict of Interest

638 The authors declare that the research was conducted in the absence of any commercial or financial 639 relationships that could be construed as a potential conflict of interest.

640 6 Author Contributions

641 Conceptualization, DS, AMB, LA; formal analysis, DS, AMB, LMI; analytical tools, DS, LA; 642 project administration, AMB, LMI, LA; visualization, DS; writing—original draft DS; AMB.; 643 writing—review and editing, AMB, LMI, and LA

644 7 Funding

645 DES, LMI., AMB., and LA are supported by the Intramural Research Program of the NIH, National 646 Library of Medicine.

647 8 Acknowledgments

648 This is a short text to acknowledge the contributions of specific colleagues, institutions, or agencies 649 that aided the efforts of the authors.

650 9 Data Availability Statement

651 All datasets analyzed for this study are included in the manuscript and the supplementary files.

652 10 Figure legends

653 Figure 1. Protein-protein interaction network of the ER-dependent Ca2+-stores and -signaling system. 654 (A) The network. Edges are classified as follows: interactions reported in the literature in which each 655 partner gene either has no other paralog or has been specifically identified (solid black), reported 656 interactions where the specific paralog for at least one of the two genes is unclear (dashed gray), and 657 interactions found in FunCoup 4.0 with P ≥ 0.9 (light pink). Medium- and large-sized nodes represent 658 the proteins whose phyletic distribution was selected for detailed analysis and are colored based on 659 phylogeny as follows: pan-eukaryotic, green; metazoans and close relatives, yellow; metazoan- 660 specific, orange; chordate-specific, red. Nodes labeled by HUGO nomenclature to capture paralog- 661 specific interactions. The borders of nodes involved in selected intracellular systems are colored 662 based on the key given in the lower left. (B) The highly-connected subnetwork. Edge coloring has no 663 significance; nodes are colored and sized as in (A). (C) A histogram of the degrees of the nodes of 664 the network shown in (A). A curve of best fit is shown in gray.

665 Figure 2. Comparative genome analyses. (A) A visualization of the number of paralogs of the 34 key 666 proteins/protein families (vertical axis) across 216 eukaryotes (horizontal axis). Each circle

15

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

667 represents the number of paralogs of one protein/family in one organism; the radius of each circle is 668 scaled by the hyperbolic arcsine of the represented number of paralogs. The proteins and protein 669 families shown can be broadly divided into two categories: those that are largely pan-eukaryotic 670 (calmodulin to ORAI) and those that originate in the metazoans or their close relatives (STIM to 671 S100). Eukaryotic clades are color-coded and labeled below the plot. Where needed, protein/protein 672 family names are supplemented in parenthesis with human gene names from Figure 1, where node 673 labels distinguish between paralogs. (B) A dendrogram of the 216 eukaryotes based on the counts of 674 their paralogs of the 34 proteins/families. Tip labels are colored by clade using the same system as in 675 panel A. (C) A dendrogram of the 34 proteins/families based on the counts of their paralogs. Five 676 major clusters are numbered to their left. Branch coloring corresponds to major clusters. (D-E) Plots 677 of the first two (D) principal components and (E) discriminants of each of the 216 eukaryotes 678 resulting from principal component and linear discriminant analyses on the paralog count dataset. 679 Color keys linking colors to high-level clades are given in the upper right and lower right, 680 respectively, of the plots. (F-G) Histograms of the polydomain scores of (F) all 216 eukaryotes, 681 shown on a base-10 log scale, and (G) 50 metazoans, shown on a linear scale. Contributions of some 682 clades to each bar of the histograms are shown through coloring given by the keys in the upper right 683 of the histograms.

684 Figure 3. (A-D) Stylized phylogenetic trees showing (A) sarcalumenin, EHD, and related bacterial 685 dynamins; (B) a partial tree of the calcineurin-like superfamily, containing the classical calcineurin A 686 phosphatases with their immediate eukaryotic relatives and newly-recognized archaeal orthologs, 687 along with the related MRE11/rad32/sbcD-like phosphoesterases as an outgroup; (C) the expansions 688 of the calcium/calmodulin-dependent in eukaryotes; and (D) ORAI and Jiraiya. Collapsed 689 groups are colored as follows: universal distribution, yellow; bacterial, blue; archaeal, red; pan- 690 eukaryotic, dark green; metazoan, light green; restricted to non-metazoan eukaryotes, blue-green. 691 Branches with bootstrap support of greater than 85% are marked with a black circle. Genome 692 contextual associations pertinent to a given clade are provided within context of trees. Conserved 693 gene neighborhoods are depicted as boxed arrows and protein domain architectures as boxes linked 694 in the same polypeptide. The trees are also provided in Newick format in the Supplementary Data. 695 (E) Domain architectures of proteins containing calmodulin-like EF-hands. Each EF-hand is a dyad 696 of EF repeats. Long (>200 residue) regions without an annotated domain are collapsed using “//”.

697 Figure 4. Structural and sequence overview of wolframin. (A) Domain architecture of wolframin. (B) 698 Topology diagram of the wolframin OB-fold. The labeled secondary structure elements correspond 699 with the labeled secondary structure elements and coloring in the OB-fold alignment in part E. The β- 700 strand shaded in gray is a possible sixth strand stacking with the core OB-fold domain barrel. (C-E) 701 Multiple sequence alignments of the (C) Sel1-like repeats, (D) EF-hand, (E) and cysteine-rich and 702 OB-fold domains of wolframin. Sequences are labeled to the left of the alignments by organism 703 abbreviation (see Supplementary Table S3) and NCBI GenBank accession number. Secondary 704 structure is provided above the alignments; green arrows represent strands and red cylinders represent 705 helices. The two EF motifs are marked with blue arrows above the secondary structure line. The 706 boundaries of the core OB-fold and the cysteine-rich region (E) are marked with pentagonal arrows 707 above the secondary structure line and pointing towards the center of the domain. The six secondary 708 structure elements that comprise the OB-fold are labeled with blue text in the secondary structure 709 line. Conserved cysteines are marked with an asterisk. A 90% consensus line is provided below the 710 alignments; the coloring and abbreviations used are: h (hydrophobic), l (aliphatic), and a (aromatic) 711 are shown on a yellow background; o (alcohol) is shown in salmon font; p (polar) is shown in blue 712 font; + (positively charged), - (negatively charged), and c (charged) are shown in pink font; s (small) 713 is shown in green font; u (tiny) is shown on a green background; b (big) is shaded gray.

16

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

714 Figure 5. Summary of the evolution of the Ca2+ stores system. The temporal timing of the emergence 715 of the core components of the system are overlaid on a consensus phylogenetic tree depicting 716 eukaryotic evolution. Colors indicate component provenance, as labeled in the key provided in upper- 717 left corner. Components with identified losses or expansions within a particular lineage are listed to 718 the right, with ‘-’ and ‘*’ labels denoting loss and expansion of a particular component, respectively. 719 A detailed breakdown of losses/expansions within the labeled lineages are provided in Figure 2 and 720 Supplementary Figure S1.

721 11 References

722 Aloi, C., Salina, A., Pasquali, L., Lugani, F., Perri, K., Russo, C., et al. (2012). Wolfram syndrome: 723 new mutations, different phenotype. PLoS One 7(1), e29150. doi: 724 10.1371/journal.pone.0029150. 725 Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment 726 search tool. J Mol Biol 215(3), 403-410. doi: 10.1016/s0022-2836(05)80360-2. 727 Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., et al. (1997). Gapped 728 BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic 729 Acids Res 25(17), 3389-3402. 730 Altshuler, I., Vaillant, J.J., Xu, S., and Cristescu, M.E. (2012). The evolutionary history of 731 sarco(endo)plasmic calcium ATPase (SERCA). PLoS One 7(12), e52617. doi: 732 10.1371/journal.pone.0052617. 733 Alva, V., Nam, S.Z., Soding, J., and Lupas, A.N. (2016). The MPI bioinformatics Toolkit as an 734 integrative platform for advanced protein sequence and structure analysis. Nucleic Acids Res 735 44(W1), W410-415. doi: 10.1093/nar/gkw348. 736 Alzayady, K.J., Sebe-Pedros, A., Chandrasekhar, R., Wang, L., Ruiz-Trillo, I., and Yule, D.I. (2015). 737 Tracing the Evolutionary History of Inositol, 1, 4, 5-Trisphosphate Receptor: Insights from 738 Analyses of Capsaspora owczarzaki Ca2+ Release Channel Orthologs. Mol Biol Evol 32(9), 739 2236-2253. doi: 10.1093/molbev/msv098. 740 Amr, S., Heisey, C., Zhang, M., Xia, X.J., Shows, K.H., Ajlouni, K., et al. (2007). A homozygous 741 mutation in a novel zinc-finger protein, ERIS, is responsible for Wolfram syndrome 2. Am J 742 Hum Genet 81(4), 673-683. doi: 10.1086/520961. 743 Aramaki, T., Sasai, N., Yakura, R., and Sasai, Y. (2010). Jiraiya attenuates BMP signaling by 744 interfering with type II BMP receptors in neuroectodermal patterning. Dev Cell 19(4), 547- 745 561. doi: 10.1016/j.devcel.2010.09.001. 746 Aravind, L., Anantharaman, V., Zhang, D., de Souza, R.F., and Iyer, L.M. (2012). Gene flow and 747 biological conflict systems in the origin and evolution of eukaryotes. Front Cell Infect 748 Microbiol 2, 89. doi: 10.3389/fcimb.2012.00089. 749 Arcus, V. (2002). OB-fold domains: a snapshot of the evolution of sequence, structure and function. 750 Curr Opin Struct Biol 12(6), 794-801. 751 Ashby, M.C., and Tepikin, A.V. (2001). ER calcium and the functions of intracellular organelles. 752 Semin Cell Dev Biol 12(1), 11-17. doi: 10.1006/scdb.2000.0212. 753 Bader, G.D., and Hogue, C.W. (2002). Analyzing yeast protein-protein interaction data obtained 754 from different sources. Nat Biotechnol 20(10), 991-997. doi: 10.1038/nbt1002-991. 755 Banerjee, S., Vishwanath, P., Cui, J., Kelleher, D.J., Gilmore, R., Robbins, P.W., et al. (2007). The 756 evolution of N--dependent endoplasmic reticulum quality control factors for 757 folding and degradation. Proc Natl Acad Sci U S A 104(28), 11676-11681. doi: 758 10.1073/pnas.0704862104. 759 Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., et al. (2013). 760 GenBank. Nucleic Acids Res 41(Database issue), D36-42. doi: 10.1093/nar/gks1195.

17

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

761 Berridge, M.J. (2012). Calcium signalling remodelling and disease. Biochem Soc Trans 40(2), 297- 762 309. doi: 10.1042/BST20110766. 763 Bosak, T., Knoll, A.H., and Petroff, A.P. (2013). The Meaning of Stromatolites. Annual Review of 764 Earth and Planetary Sciences 41(1), 21-44. doi: 10.1146/annurev-earth-042711-105327. 765 Brownlee, C., Wheeler, G.L., and Taylor, A.R. (2015). Coccolithophore biomineralization: New 766 questions, new answers. Semin Cell Dev Biol 46, 11-16. doi: 10.1016/j.semcdb.2015.10.027. 767 Burroughs, A.M., Kaur, G., Zhang, D., and Aravind, L. (2017). Novel clades of the HU/IHF 768 superfamily point to unexpected roles in the eukaryotic centrosome, 769 partitioning, and biologic conflicts. Cell Cycle 16(11), 1093-1103. doi: 770 10.1080/15384101.2017.1315494. 771 Cagalinec, M., Liiv, M., Hodurova, Z., Hickey, M.A., Vaarmann, A., Mandel, M., et al. (2016). Role 772 of Mitochondrial Dynamics in Neuronal Development: Mechanism for Wolfram Syndrome. 773 PLoS Biol 14(7), e1002511. doi: 10.1371/journal.pbio.1002511. 774 Cai, X., Wang, X., Patel, S., and Clapham, D.E. (2015). Insights into the early evolution of animal 775 machinery: a unicellular point of view. Cell Calcium 57(3), 166-173. doi: 776 10.1016/j.ceca.2014.11.007. 777 Cavalier-Smith, T., Chao, E.E., and Lewis, R. (2015). Multiple origins of Heliozoa from flagellate 778 ancestors: New cryptist subphylum Corbihelia, superclass Corbistoma, and monophyly of 779 Haptista, Cryptista, Hacrobia and Chromista. Mol Phylogenet Evol 93, 331-362. doi: 780 10.1016/j.ympev.2015.07.004. 781 Chang, N.C., Nguyen, M., Germain, M., and Shore, G.C. (2010). Antagonism of Beclin 1-dependent 782 autophagy by BCL-2 at the endoplasmic reticulum requires NAF-1. Embo j 29(3), 606-618. 783 doi: 10.1038/emboj.2009.369. 784 Chang, N.C., Nguyen, M., and Shore, G.C. (2012). BCL2-CISD2: An ER complex at the nexus of 785 autophagy and calcium homeostasis? Autophagy 8(5), 856-857. doi: 10.4161/auto.20054. 786 Clapham, D.E. (2007). Calcium signaling. Cell 131(6), 1047-1058. doi: 10.1016/j.cell.2007.11.028. 787 Dantas, T.J., Daly, O.M., and Morrison, C.G. (2012). Such small hands: the roles of 788 centrins/caltractins in the centriole and in genome maintenance. Cell Mol Life Sci 69(18), 789 2979-2997. doi: 10.1007/s00018-012-0961-1. 790 Delwiche, C.F., and Cooper, E.D. (2015). The Evolutionary Origin of a Terrestrial Flora. Curr Biol 791 25(19), R899-910. doi: 10.1016/j.cub.2015.08.029. 792 Denessiouk, K., Permyakov, S., Denesyuk, A., Permyakov, E., and Johnson, M.S. (2014). Two 793 structural motifs within canonical EF-hand calcium-binding domains identify five different 794 classes of calcium buffers and sensors. PLoS One 9(10), e109287. doi: 795 10.1371/journal.pone.0109287. 796 Dominguez, D.C., Guragain, M., and Patrauchan, M. (2015). Calcium binding proteins and calcium 797 signaling in prokaryotes. Cell Calcium 57(3), 151-165. doi: 10.1016/j.ceca.2014.12.006. 798 Drozdetskiy, A., Cole, C., Procter, J., and Barton, G.J. (2015). JPred4: a protein secondary structure 799 prediction server. Nucleic Acids Res 43(W1), W389-394. doi: 10.1093/nar/gkv332. 800 Edel, K.H., Marchadier, E., Brownlee, C., Kudla, J., and Hetherington, A.M. (2017). The Evolution 801 of Calcium-Based Signalling in Plants. Curr Biol 27(13), R667-r679. doi: 802 10.1016/j.cub.2017.05.020. 803 Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. 804 Nucleic Acids Res 32(5), 1792-1797. doi: 10.1093/nar/gkh340. 805 Flynn, R.L., and Zou, L. (2010). Oligonucleotide/-binding fold proteins: a growing 806 family of genome guardians. Crit Rev Biochem Mol Biol 45(4), 266-275. doi: 807 10.3109/10409238.2010.488216. 808 Fruchterman, T.M.J., and Reingold, E.M. (1991). Graph drawing by force-directed placement. 809 Software: Practice and Experience 21(11), 1129-1164. doi: 10.1002/spe.4380211102.

18

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

810 Gertz, E.M., Yu, Y.K., Agarwala, R., Schaffer, A.A., and Altschul, S.F. (2006). Composition-based 811 statistics and translated nucleotide searches: improving the TBLASTN module of BLAST. 812 BMC Biol 4, 41. doi: 10.1186/1741-7007-4-41. 813 Gifford, J.L., Walsh, M.P., and Vogel, H.J. (2007). Structures and metal-ion-binding properties of the 814 Ca2+-binding helix-loop-helix EF-hand motifs. Biochem J 405(2), 199-221. doi: 815 10.1042/BJ20070255. 816 Guardino, K.M., Sheftic, S.R., Slattery, R.E., and Alexandrescu, A.T. (2009). Relative stabilities of 817 conserved and non-conserved structures in the OB-fold superfamily. Int J Mol Sci 10(5), 818 2412-2430. doi: 10.3390/ijms10052412. 819 Hildebrand, M.S., Sorensen, J.L., Jensen, M., Kimberling, W.J., and Smith, R.J. (2008). Autoimmune 820 disease in a DFNA6/14/38 family carrying a novel missense mutation in WFS1. Am J Med 821 Genet A 146A(17), 2258-2265. doi: 10.1002/ajmg.a.32449. 822 Hofmann, S., Philbrook, C., Gerbitz, K.D., and Bauer, M.F. (2003). Wolfram syndrome: structural 823 and functional analyses of mutant and wild-type wolframin, the WFS1 gene product. Hum 824 Mol Genet 12(16), 2003-2012. 825 Huang, J., and Gogarten, J.P. (2007). Did an ancient chlamydial endosymbiosis facilitate the 826 establishment of primary plastids? Genome Biol 8(6), R99. doi: 10.1186/gb-2007-8-6-r99. 827 Inoue, H., Tanizawa, Y., Wasson, J., Behn, P., Kalidas, K., Bernal-Mizrachi, E., et al. (1998). A gene 828 encoding a transmembrane protein is mutated in patients with diabetes mellitus and optic 829 atrophy (Wolfram syndrome). Nat Genet 20(2), 143-148. doi: 10.1038/2441. 830 Jekely, G. (2007). Origin of eukaryotic endomembranes: a critical evaluation of different model 831 scenarios. Adv Exp Med Biol 607, 38-51. doi: 10.1007/978-0-387-74021-8_3. 832 Jekely, G. (2008). Origin of the nucleus and Ran-dependent transport to safeguard ribosome 833 biogenesis in a chimeric cell. Biol Direct 3, 31. doi: 10.1186/1745-6150-3-31. 834 Jeong, H., Sim, H.J., Song, E.K., Lee, H., Ha, S.C., Jun, Y., et al. (2016). Crystal structure of SEL1L: 835 Insight into the roles of SLR motifs in ERAD pathway. Sci Rep 6, 20261. doi: 836 10.1038/srep20261. 837 Karpenahalli, M.R., Lupas, A.N., and Soding, J. (2007). TPRpred: a tool for prediction of TPR-, 838 PPR- and SEL1-like repeats from protein sequences. BMC Bioinformatics 8, 2. doi: 839 10.1186/1471-2105-8-2. 840 Kaufman, L., and Rousseeuw, P.J. (1990). Finding Groups in Data: An Introduction to Cluster 841 Analysis. New York: John Wiley & Sons. 842 Kawasaki, H., Nakayama, S., and Kretsinger, R.H. (1998). Classification and evolution of EF-hand 843 proteins. Biometals 11(4), 277-295. 844 Kim, H.S., Czymmek, K.J., Patel, A., Modla, S., Nohe, A., Duncan, R., et al. (2012). Expression of 845 the Cameleon calcium biosensor in fungi reveals distinct Ca(2+) signatures associated with 846 polarized growth, development, and pathogenesis. Fungal Genet Biol 49(8), 589-601. doi: 847 10.1016/j.fgb.2012.05.011. 848 Klimecka, M., and Muszynska, G. (2007). Structure and functions of plant calcium-dependent 849 protein kinases. Acta Biochim Pol 54(2), 219-233. 850 Kozlov, G., Bastos-Aristizabal, S., Maattanen, P., Rosenauer, A., Zheng, F., Killikelly, A., et al. 851 (2010). Structural basis of cyclophilin B binding by the calnexin/calreticulin P-domain. J Biol 852 Chem 285(46), 35551-35557. doi: 10.1074/jbc.M110.160101. 853 Krebs, J., Agellon, L.B., and Michalak, M. (2015). Ca(2+) homeostasis and endoplasmic reticulum 854 (ER) stress: An integrated view of calcium signaling. Biochem Biophys Res Commun 460(1), 855 114-121. doi: 10.1016/j.bbrc.2015.02.004. 856 Krishna, S.S., Aravind, L., Bakolitsa, C., Caruthers, J., Carlton, D., Miller, M.D., et al. (2010). The 857 structure of SSO2064, the first representative of Pfam family PF01796, reveals a novel two- 858 domain zinc-ribbon OB-fold architecture with a potential acyl-CoA-binding role. Acta

19

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

859 Crystallogr Sect F Struct Biol Cryst Commun 66(Pt 10), 1160-1166. doi: 860 10.1107/s1744309110002514. 861 Lance, G.N., and Williams, W.T. (1966). Computer Programs for Hierarchical Polythetic 862 Classification (“Similarity Analyses”). The Computer Journal 9(1), 60-64. doi: 863 10.1093/comjnl/9.1.60. 864 Lanner, J.T., Georgiou, D.K., Joshi, A.D., and Hamilton, S.L. (2010). Ryanodine receptors: structure, 865 expression, molecular details, and function in calcium release. Cold Spring Harb Perspect 866 Biol 2(11), a003996. doi: 10.1101/cshperspect.a003996. 867 Leipe, D.D., Wolf, Y.I., Koonin, E.V., and Aravind, L. (2002). Classification and evolution of P-loop 868 GTPases and related ATPases. J Mol Biol 317(1), 41-72. doi: 10.1006/jmbi.2001.5378. 869 Lespinet, O., Wolf, Y.I., Koonin, E.V., and Aravind, L. (2002). The role of lineage-specific gene 870 family expansion in the evolution of eukaryotes. Genome Res 12(7), 1048-1059. doi: 871 10.1101/gr.174302. 872 Lu, S., Kanekura, K., Hara, T., Mahadevan, J., Spears, L.D., Oslowski, C.M., et al. (2014). A 873 calcium-dependent as a potential therapeutic target for Wolfram syndrome. Proc 874 Natl Acad Sci U S A 111(49), E5292-5301. doi: 10.1073/pnas.1421055111. 875 Mackinder, L., Wheeler, G., Schroeder, D., von Dassow, P., Riebesell, U., and Brownlee, C. (2011). 876 Expression of biomineralization-related ion transport genes in Emiliania huxleyi. Environ 877 Microbiol 13(12), 3250-3265. doi: 10.1111/j.1462-2920.2011.02561.x. 878 Makioka, A., Kumagai, M., Kobayashi, S., and Takeuchi, T. (2002). Possible role of calcium ions, 879 calcium channels and calmodulin in excystation and metacystic development of Entamoeba 880 invadens. Parasitol Res 88(9), 837-843. doi: 10.1007/s00436-002-0676-6. 881 Mans, B.J., Anantharaman, V., Aravind, L., and Koonin, E.V. (2004). Comparative genomics, 882 evolution and origins of the nuclear envelope and nuclear pore complex. Cell Cycle 3(12), 883 1612-1637. doi: 10.4161/cc.3.12.1345. 884 Mittl, P.R., and Schneider-Brachert, W. (2007). Sel1-like repeat proteins in signal transduction. Cell 885 Signal 19(1), 20-31. doi: 10.1016/j.cellsig.2006.05.034. 886 Moreno, S.N., and Docampo, R. (2003). Calcium regulation in protozoan parasites. Curr Opin 887 Microbiol 6(4), 359-364. 888 Neuwald, A.F., and Altschul, S.F. (2016). Bayesian Top-Down Protein Sequence Alignment with 889 Inferred Position-Specific Gap Penalties. PLoS Comput Biol 12(5), e1004936. doi: 890 10.1371/journal.pcbi.1004936. 891 Nolan, D.P., Reverlard, P., and Pays, E. (1994). Overexpression and characterization of a gene for a 892 Ca(2+)-ATPase of the endoplasmic reticulum in Trypanosoma brucei. J Biol Chem 269(42), 893 26045-26051. 894 Ogris, C., Guala, D., and Sonnhammer, E.L.L. (2018). FunCoup 4: new species, data, and 895 visualization. Nucleic Acids Res 46(D1), D601-D607. doi: 10.1093/nar/gkx1138. 896 Ong, H.L., de Souza, L.B., and Ambudkar, I.S. (2016). Role of TRPC Channels in Store-Operated 897 Calcium Entry. Adv Exp Med Biol 898, 87-109. doi: 10.1007/978-3-319-26974-0_5. 898 Osman, A.A., Saito, M., Makepeace, C., Permutt, M.A., Schlesinger, P., and Mueckler, M. (2003). 899 Wolframin expression induces novel ion channel activity in endoplasmic reticulum 900 membranes and increases intracellular calcium. J Biol Chem 278(52), 52755-52762. doi: 901 10.1074/jbc.M310331200. 902 Peng, J., Ding, J., Tan, C., Baggenstoss, B., Zhang, Z., Lapolla, S.M., et al. (2009). Oligomerization 903 of membrane-bound Bcl-2 is involved in its pore formation induced by tBid. Apoptosis 904 14(10), 1145-1153. doi: 10.1007/s10495-009-0389-8. 905 Perez-Gordones, M.C., Serrano, M.L., Rojas, H., Martinez, J.C., Uzcanga, G., and Mendoza, M. 906 (2015). Presence of a thapsigargin-sensitive in Trypanosoma evansi:

20

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

907 Immunological, physiological, molecular and structural evidences. Exp Parasitol 159, 107- 908 117. doi: 10.1016/j.exppara.2015.08.017. 909 Plattner, H. (2015). Molecular aspects of calcium signalling at the crossroads of unikont and bikont 910 eukaryote evolution--the ciliated protozoan Paramecium in focus. Cell Calcium 57(3), 174- 911 185. doi: 10.1016/j.ceca.2014.12.002. 912 Plattner, H. (2017). Signalling in ciliates: long- and short-range signals and molecular determinants 913 for cellular dynamics. Biol Rev Camb Philos Soc 92(1), 60-107. doi: 10.1111/brv.12218. 914 Plattner, H., Flotenmeyer, M., Kissmehl, R., Pavlovic, N., Hauser, K., Momayezi, M., et al. (1999). 915 Microdomain arrangement of the SERCA-type Ca2+ pump (Ca2+-ATPase) in 916 subplasmalemmal calcium stores of paramecium cells. J Histochem Cytochem 47(7), 841- 917 854. doi: 10.1177/002215549904700701. 918 Plattner, H., and Verkhratsky, A. (2013). Ca2+ signalling early in evolution--all but primitive. J Cell 919 Sci 126(Pt 10), 2141-2150. doi: 10.1242/jcs.127449. 920 Plattner, H., and Verkhratsky, A. (2015). The ancient roots of calcium signalling evolutionary tree. 921 Cell Calcium 57(3), 123-132. doi: 10.1016/j.ceca.2014.12.004. 922 Ponting, C.P., Aravind, L., Schultz, J., Bork, P., and Koonin, E.V. (1999). Eukaryotic signalling 923 domain homologues in archaea and bacteria. Ancient ancestry and horizontal gene transfer. J 924 Mol Biol 289(4), 729-745. doi: 10.1006/jmbi.1999.2827. 925 Prakriya, M., and Lewis, R.S. (2015). Store-Operated Calcium Channels. Physiol Rev 95(4), 1383- 926 1436. doi: 10.1152/physrev.00020.2014. 927 Price, M.N., Dehal, P.S., and Arkin, A.P. (2010). FastTree 2--approximately maximum-likelihood 928 trees for large alignments. PLoS One 5(3), e9490. doi: 10.1371/journal.pone.0009490. 929 Prole, D.L., and Taylor, C.W. (2011). Identification of intracellular and plasma membrane calcium 930 channel homologues in pathogenic parasites. PLoS One 6(10), e26218. doi: 931 10.1371/journal.pone.0026218. 932 Qian, X., Qin, L., Xing, G., and Cao, X. (2015). Phenotype Prediction of Pathogenic 933 Nonsynonymous Single Nucleotide Polymorphisms in WFS1. Sci Rep 5, 14731. doi: 934 10.1038/srep14731. 935 Reiner, D.S., Hetsko, M.L., Meszaros, J.G., Sun, C.H., Morrison, H.G., Brunton, L.L., et al. (2003). 936 Calcium signaling in excystation of the early diverging eukaryote, Giardia lamblia. J Biol 937 Chem 278(4), 2533-2540. doi: 10.1074/jbc.M208033200. 938 Rigoli, L., Lombardo, F., and Di Bella, C. (2011). Wolfram syndrome and WFS1 gene. Clin Genet 939 79(2), 103-117. doi: 10.1111/j.1399-0004.2010.01522.x. 940 Rodrigues, F.A., Costa Lda, F., and Barbieri, A.L. (2011). Resilience of protein-protein interaction 941 networks as determined by their large-scale topological features. Mol Biosyst 7(4), 1263- 942 1269. doi: 10.1039/c0mb00256a. 943 Rouzier, C., Moore, D., Delorme, C., Lacas-Gervais, S., Ait-El-Mkadem, S., Fragaki, K., et al. 944 (2017). A novel CISD2 mutation associated with a classical Wolfram syndrome phenotype 945 alters Ca2+ homeostasis and ER-mitochondria interactions. Hum Mol Genet 26(9), 1599- 946 1611. doi: 10.1093/hmg/ddx060. 947 Sillanpaa, J.K., Sundh, H., and Sundell, K.S. (2018). Calcium transfer across the outer mantle 948 epithelium in the Pacific oyster, Crassostrea gigas. Proc Biol Sci 285(1891). doi: 949 10.1098/rspb.2018.1676. 950 Silverio, A.L., and Saier, M.H., Jr. (2011). Bioinformatic characterization of the trimeric intracellular 951 cation-specific channel protein family. J Membr Biol 241(2), 77-101. doi: 10.1007/s00232- 952 011-9364-8. 953 Smaili, S.S., Pereira, G.J., Costa, M.M., Rocha, K.K., Rodrigues, L., do Carmo, L.G., et al. (2013). 954 The role of calcium stores in apoptosis and autophagy. Curr Mol Med 13(2), 252-265.

21

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

955 Soding, J. (2005). Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7), 956 951-960. doi: 10.1093/bioinformatics/bti125. 957 Strom, T.M., Hortnagel, K., Hofmann, S., Gekeler, F., Scharfe, C., Rabl, W., et al. (1998). Diabetes 958 insipidus, diabetes mellitus, optic atrophy and deafness (DIDMOAD) caused by mutations in 959 a novel gene (wolframin) coding for a predicted transmembrane protein. Hum Mol Genet 960 7(13), 2021-2028. 961 Trandinh, C.C., Pao, G.M., and Saier, M.H., Jr. (1992). Structural and evolutionary relationships 962 among the immunophilins: two ubiquitous families of peptidyl-prolyl cis-trans isomerases. 963 Faseb j 6(15), 3410-3420. doi: 10.1096/fasebj.6.15.1464374. 964 Unal, C.M., and Steinert, M. (2014). Microbial peptidyl-prolyl cis/trans isomerases (PPIases): 965 virulence factors and potential alternative drug targets. Microbiol Mol Biol Rev 78(3), 544- 966 571. doi: 10.1128/mmbr.00015-14. 967 Urano, F. (2016). Wolfram Syndrome: Diagnosis, Management, and Treatment. Curr Diab Rep 968 16(1), 6. doi: 10.1007/s11892-015-0702-6. 969 Ushioda, R., Miyamoto, A., Inoue, M., Watanabe, S., Okumura, M., Maegawa, K.I., et al. (2016). 970 Redox-assisted regulation of Ca2+ homeostasis in the endoplasmic reticulum by disulfide 971 reductase ERdj5. Proc Natl Acad Sci U S A 113(41), E6055-E6063. doi: 972 10.1073/pnas.1605818113. 973 Venancio, T.M., Balaji, S., Iyer, L.M., and Aravind, L. (2009). Reconstructing the ubiquitin network: 974 cross-talk with other systems and identification of novel functions. Genome Biol 10(3), R33. 975 doi: 10.1186/gb-2009-10-3-r33. 976 Verkhratsky, A., and Parpura, V. (2014). Calcium signalling and calcium channels: evolution and 977 general principles. Eur J Pharmacol 739, 1-3. doi: 10.1016/j.ejphar.2013.11.013. 978 Verma, R., Reichermeier, K.M., Burroughs, A.M., Oania, R.S., Reitsma, J.M., Aravind, L., et al. 979 (2018). Vms1 and ANKZF1 peptidyl-tRNA hydrolases release nascent chains from stalled 980 ribosomes. Nature 557(7705), 446-451. doi: 10.1038/s41586-018-0022-5. 981 Wang, Y.Y., Zhao, R., and Zhe, H. (2015). The emerging role of CaMKII in cancer. Oncotarget 982 6(14), 11725-11734. doi: 10.18632/oncotarget.3955. 983 Watson, E., Matousek, W.M., Irimies, E.L., and Alexandrescu, A.T. (2007). Partially folded states of 984 staphylococcal nuclease highlight the conserved structural hierarchy of OB-fold proteins. 985 Biochemistry 46(33), 9484-9494. doi: 10.1021/bi700532j. 986 Woo, J.S., Srikanth, S., and Gwack, Y. (2018). "Modulation of Orai1 and STIM1 by Cellular 987 Factors," in Calcium Entry Channels in Non-Excitable Cells, eds. J.A. Kozak & J.W. Putney, 988 Jr. (Boca Raton (FL): CRC Press/Taylor & Francis 989 Yamada, T., Ishihara, H., Tamura, A., Takahashi, R., Yamaguchi, S., Takei, D., et al. (2006). WFS1- 990 deficiency increases endoplasmic reticulum stress, impairs cell cycle progression and triggers 991 the apoptotic pathway specifically in pancreatic beta-cells. Hum Mol Genet 15(10), 1600- 992 1609. doi: 10.1093/hmg/ddl081. 993 Yin, X., Ziegler, A., Kelm, K., Hoffmann, R., Watermeyer, P., Alexa, P., et al. (2018). Formation and 994 mosaicity of coccolith segment calcite of the marine algae Emiliania huxleyi. J Phycol 54(1), 995 85-104. doi: 10.1111/jpy.12604. 996 Yurimoto, S., Hatano, N., Tsuchiya, M., Kato, K., Fujimoto, T., Masaki, T., et al. (2009). 997 Identification and characterization of wolframin, the product of the wolfram syndrome gene 998 (WFS1), as a novel calmodulin-binding protein. Biochemistry 48(18), 3946-3955. doi: 999 10.1021/bi900260y. 1000 Zaremba-Niedzwiedzka, K., Caceres, E.F., Saw, J.H., Backstrom, D., Juzokaite, L., Vancaester, E., et 1001 al. (2017). Asgard archaea illuminate the origin of eukaryotic cellular complexity. Nature 1002 541(7637), 353-358. doi: 10.1038/nature21031.

22

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Evolution of ER Ca2+ storage

1003 Zatyka, M., Da Silva Xavier, G., Bellomo, E.A., Leadbeater, W., Astuti, D., Smith, J., et al. (2015). 1004 Sarco(endo)plasmic reticulum ATPase is a molecular partner of Wolfram syndrome 1 protein, 1005 which negatively regulates its expression. Hum Mol Genet 24(3), 814-827. doi: 1006 10.1093/hmg/ddu499. 1007 Zatyka, M., Ricketts, C., da Silva Xavier, G., Minton, J., Fenton, S., Hofmann-Thiel, S., et al. (2008). 1008 Sodium-potassium ATPase 1 subunit is a molecular partner of Wolframin, an endoplasmic 1009 reticulum protein involved in ER stress. Hum Mol Genet 17(2), 190-200. doi: 1010 10.1093/hmg/ddm296. 1011 Zhou, X., Lin, P., Yamazaki, D., Park, K.H., Komazaki, S., Chen, S.R., et al. (2014). Trimeric 1012 intracellular cation channels and sarcoplasmic/endoplasmic reticulum calcium homeostasis. 1013 Circ Res 114(4), 706-716. doi: 10.1161/circresaha.114.301816. 1014 Zhou, Y., Yang, W., Kirberger, M., Lee, H.W., Ayalasomayajula, G., and Yang, J.J. (2006). 1015 Prediction of EF-hand calcium-binding proteins and analysis of bacterial EF-hand proteins. 1016 Proteins 65(3), 643-655. doi: 10.1002/prot.21139. 1017 Zucchi, R., and Ronca-Testoni, S. (1997). The Ca2+ channel/ryanodine 1018 receptor: modulation by endogenous effectors, drugs and disease states. Pharmacol Rev 1019 49(1), 1-51. 1020

23

A

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

B C A

B C D

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

E

F G A Universal distribution Pan-eukaryotic Archaeal Calcineurin-like Phosphatases Bacterial Metazoan ZNR+linker+FHA Calcineurin VWA+ β-barrel+α-helical Archaeal Other eukaryotic clade AKV76728, AKV77601, AKV76729 from Metallosphaera sedula

B Closest Bacterial Dynamins TM Dynamin TM Metazoan EHD Parabasalid expansions AHF04338 from Marichromatium purpuratum 984 (γ-proteobacteria) Fungal EHD } Viridiplantae EHD Stramenopile EHD BSU-like phosphatases

Metazoan Protein phosphatase 1 Sarcalumenin

Kinetoplastid Sarcalumenin

Parabasalid } expansions Diplomonad expansion Other Protein phosphatase 2 Bacterial Dynamins

0.6 Metazoan PP4/6-like CAMK2-like C Metazoan CAMK1-like Protein phosphatase 6 Fungal CMK1-like Stramenoplie CAMK

Protein phosphatase 4

Protein phosphatase 3 (Calcineurin A) CHEK2 }PP5/7-like Protein phosphatase 5

Protein phosphatase 7

MRE11/rad32/sbcD-like Alveolate phosphatases expansion Plant CDPK (land plant expansion) 0.6

0.8 D Alveolate ORAI Emiliania ORAI E

bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not

-

- - certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also - AB Glyox- Glyox-

Expansionmade available for use under a CC0 license.

TM

TM

TM

TM

TM TM

TM Tic110-like // //

EF

EF

EF

EF

hand

hand hand Hydrolase alase alase hand Alveolate ORAI Fer4_5 AKE65596 from Microcystis aeruginosa NIES-2549 (cyanobacteria) AKJ65436 from Kiritimatiella glycovorans (verrucomicrobia) Emiliania ORAI

Expansion

- - - Glyox-

Heme-Oxygenase Protein kinase

EF

EF

EF

hand hand hand alase

Metazoan AJC61110 from Streptomyces sp. 769 (actinobacteria) EJK50743 from Thalassiosira oceanica (stramenopiles) Jiraiya FAD- NAD-

Ferric FAD-dep NAD-dep - - dep- dep-

reductase oxidore- oxidore-

TM TM

// // NO synthase TM

BFD EF

EF oxidore oxidore

Fer2_ hand hand TM region ductase ductase ductase ductase

AFZ12720 from Crinalium epipsammum PCC 9333 (cyanobacteria) ABW28148 from Acaryochloris marina MBIC11017 (cyanobacteria) Oomycete

Ferric FAD-dep NAD-dep

-

-

-

- -

ORAI reductase oxidore- oxidore-

TM

TM TM

Sulfatase TM

EF

EF

EF

EF

EF

hand

hand

hand hand

hand TM region ductase ductase DUF4994

OYW31404 from Chthoniobacter sp. 12-60-6 (verrucomicrobia) ANZ34757 from Lentzea NP_001171708 from Homo sapiens (metazoa)

guizhouensis (actinobacteria)

-

-

-

- -

- Thiore- cNMP cNMP

TM

TM

TM

TM

TM TM

CytC // TM

EF

EF

EF

EF

EF

EF

hand

hand

hand

hand hand hand doxin binding binding

OYW74718 from Verrucomicrobia bacterium 12-59-8 (verrucomicrobia) NP_008819 from Homo AHF92076 from Opitutaceae bacterium TAV5 (verrucomicrobia)

sapiens (metazoa)

-

- - Viridiplantae ORAI cNMP - cNMP

Sulfatase Ion channel //

EF

EF

EF

hand hand

binding hand binding doxin Metazoan ORAI Thiore XP_005787975 from Emiliania huxleyi CCMP1516 (haptophyta) XP_013754210 from Thecamonas trahens ATCC 50062 (apusomonadida)

0.6 A B H1 S4

C-rich - - S2

EF TM region

hand

fold

OB

Sel1 Sel1 bioRxivSel1 preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder fordomain this preprint (which was not S5 certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. S3 S1 NP_005996 from Homo sapiens N C C Secondary Structure Hsapiens__NP_005996.2 56 D A A A P A E P Q A Q H T R S R E R A D G T G P T K G D M E I P F E E V L E R A K A G D P - - K A Q T E V G K H Y L Q L A G D T - - Mmusculus__NP_035846.1 58 E A A A P - E P R A P Q T G S R E E T D R A G P M K A D V E I P F E E V L E K A K A G D P - - K A Q T E V G K H Y L R L A N D A - - Ggallus__XP_420803.2 42 D A A T L S S S V P G Y S Q S R E K A E K N E T M K E E P E V L F E E L L E R A K A G E P - - K A Q T E V G K H F L R L A E E E - - Drerio__XP_695252.5 161 - A A Q K V M M Q E K T R K A E E Q A K A E E C D D M E D D L P F E E L Q K K A E A G D P - - R A Q S R L G R Y Y L K L A E E K - - Dmelanogaster__NP_651079.2 1 ------M A T W T Q N E P T G V T K R R R W N L E D R A S L N K L K H H I A E E G C P - - Q M Q Y D L A K E L L D N S I V E P N Lgigantea__XP_009046103.1 1 ------M A D Q E C N K Q G S E D T N D D T N V N L L K Q E - A E N G S G - - E H Q Y E L G K R Y L K L A D - - S E Hrobusta__XP_009012051.1 1 ------M L W Q K T T I Y T V I Y - - E I Y R I S G V V I L N A T N S A D F Epallida__KXJ15729.1 1 ------M S S Y Q S L L Q D K M E D I L A S E S K K N Y S T D I S K A K E L L H E D - - - - - Tadhaerens__XP_002114744.1 2006 - L K N I F Q P K V P A T P G D N P I D G K Q D R K T S K Q L - F K E E L Q K A R D G D A - - D A Q L Q I S N F Y F E G F G - - - - Aqueenslandica__XP_011409218.2 1 ------M A L A M A E P K G M T E I A P E D K P R A T E L I V K G K R L I D E A K E S S S consensus/90% ...... b b . . p . A . . s p . . . p . b . . . u p . h l p . s . . . . .

Secondary Structure Hsapiens__NP_005996.2 D E E L N S - C T A V D W L V L A A K Q G R R E A V K L L - R R C L A D R R - - G I T S E N E R E V R Q L - - - - S S E T D L E R A 175 Mmusculus__NP_035846.1 D E E L N S - C S A V A W L I L A A K Q G R R E A V K L L - R R C L A D R K - - G I T S E N E A E V K Q L - - - - S S E T D L E R A 176 Ggallus__XP_420803.2 D E E L N N - C S A V D W F I L A A K Q G R R E A V K L L - R R C L A D R R - - G I T S E N E Q E V K K L - - - - S S E T D L E R A 161 Drerio__XP_695252.5 D A E K N N - L T A V E W L M K A A K Q G R R D A A K L L - Q K C W M Q K K - - G I T A E N A Q E V R C L - - - - S S E S K F E Q A 279 Dmelanogaster__NP_651079.2 L A K G N Q S Q K A V N W L V S A A H N G H E D A V K L L - R Q C Y N D G S - - G I T A E N T D E V R R C - - - - L A M T P G E R A 117 Lgigantea__XP_009046103.1 I N R D E N I S F A L H W L V K S S K Q G N E D A T K E L - Q K C M E D D I - - G I N E E N G P D I R W C - - - - I Q T S S L E K Q 108 Hrobusta__XP_009012051.1 I N - Y G E A I S V V G M L L M M S E S P G S V I L S V L T L N C L I I K I F L G I T N K N G N D V H W C - - - - V N T S S W E K K 93 Epallida__KXJ15729.1 - - - C K K - E E A I K I L I D I S K T G D E B A T E I L - A Q C L A S G N - - G I T N D N R A S V E W C - - - - V K T S E A D K R 93 Tadhaerens__XP_002114744.1 - - V K R N R K E A L Q W L K L S S D S G N Q Q A A E M L - Q D Y E D E N H - - K I H V R R K A S I R W Q S A M G K A L E R Y V Q S 2124 Aqueenslandica__XP_011409218.2 S E K P H L V A Q G L K L F I R A A A E E S R D A V K R I N S L F S D L S N - - S V I N E L P S D L R K M A Y V L V K S T D R E K E 105 consensus/90% . . . . p p . . p u l . h h h . . u c p s p p . A s c b L . . p h b ...... u I p s c p . . p l + b h ...... o p . - b .

D EF motif 1 EF motif 2 Secondary Structure Hsapiens__NP_005996.2 176 V R K A A L V M Y W K L N P K K K K Q V A V A E L L E N V G Q V N E H D G 1 0 L Q K Q R R - M L E R L V S S E S K N Y I A L D D F V E I T K K Y A 255 Mmusculus__NP_035846.1 177 V R K A A L V M Y W K L N P K K K K Q V A V S E L L E N V G Q V N E Q D G 1 0 L Q K Q R R - M L E R L V S S E S K N Y I A L D D F V E L T K K Y A 256 Ggallus__XP_420803.2 162 V R K A A L V M Y W K L N P K K K K Q L A V S E L L E N V G Q V D N E D G 1 0 V Q K Q R R - M L E R L V S C E S K K F I A L D D F V E I T K K Y A 241 Drerio__XP_695252.5 280 V R R A A M T M Y W K L N P D R K K K V A M S E M L E N V S Q V N T V P G 1 0 I Q T Q R K - I L E T M V S 1 E S S S K Y V D V E D F V E M T K K F T 360 Dmelanogaster__NP_651079.2 118 A R K A A R E L F A C L S N G N E 2 T P K Q L E R K M R R I Y N L Q R K R R 1 3 E Q E P E C E P L E D V P T 5 V E R R R L I T E A H L V S A A S N Y S 208 Obimaculoides__XP_014767880.1 108 I R L A A R Q M F G S L N K T H K E V L S K D E Y M K A I K E I P G - - - D E Q A R K - L L A A A G K - K I G D A I S E D A F V K T I S K K I 173 Hrobusta__XP_009012051.1 94 G R N A A S K L F K K I C M G D - K H L T K E E Y K R R V A E L S E - - - N K V E R K - L L E K A I K - - - - E N V S E E D F V N R M M K Q L 155 Epallida__KXJ15729.1 94 M N H A M T E L F M S L K Q E G K Q N I T V K D I K N A L K A A K K E K E 1 4 E I G G L M G V L K S A M S I G G K D E L T L T E F L D S A M S Y A 178 Tadhaerens__XP_002114744.1 2125 K E D E N D S F L V G N K K L Q S 8 G A T T L H K L I Q D V Q Q N K K M V S 1 3 A K D E E V - F L D Q A T N 1 H N K E K L P I E E E V Q R K A M T K E 2216 Aqueenslandica__XP_011409218.2 106 V Y V V A K D I F E T M A E R R D 6 I G E A V E R L L A T K R E D S D S Y K 2 L R Q T M K K L L N S C L S 2 D K G D I I V T E D K F C L T A C L Y S 186 consensus/90% . b . h h . . h a . p h p . . p p . . . s . p c h b p . l . p . p p . . . . b . . b . . h L p p h . p . . . . p . l s b p c h h p . . . p b .

E C-rich domain Secondary Structure * * * Hsapiens__NP_005996.2 666 W Q Q Y G A L C G P R A W K E T N M A R T Q I L C S H L E G H R V T W T - G R F K Y V R V T D I D N S A E S A I - N M L P F F I G D W M R C L Y G E Mmusculus__NP_035846.1 668 W Q Q Y G F L C G P R A W K E T N M A R T Q I L C S H L E G H R V T W T - G R F K Y V R V T E I D N S A E S A I - N M L P F F L G D W M R C L Y G E Ggallus__XP_420803.2 651 W N Q Y A F L C G P R S W K E T N M A R T Q I L C S H L E G H R V T W T - G R F K Y V R V T E I D N S A E S A I - N M L P L F I G D W M R C L Y G E Drerio__XP_695252.5 818 W E E Y G T L C G P Q A W K E R G M A Q T Q L S C S H L E G H R V T W T - G I F R Y V R V A E K E N G A Q S V I - N M L P V F M G D W L R C L Y G E Skowalevskii__XP_002741405.1 619 W Q Q Y Y N L C G P V T G K S T - V A N T Q I V C D H L T G Q W V T W Q - G N V T G V K I S A V D N K A E A A L - N V F P A T L S N W L S C V Y G D Dmelanogaster__NP_651079.2 621 W D R F H A L C A Q P V H E Q P N K I K A Q L R C S L L N G M P V I W E - G S V T K V E I S R V S N F L E D T I A N Y L P V W L G R M L R C L H G E Dpulex__EFX90302.1 643 W E R Y H E F C S 2 P R S Y S S S S I A S V Q M A C F A L S G L T V K W E - G V V R Q V S V E S R N N V V E S L L - Q W L P P K W M E W L K C R I G E Lpolyphemus__XP_013784410.1 601 W D D Y I H V C H Q P A W D I T N M A E V E I G C S Q F E G V G I T G E - G I V N F V N V A R K V N R V K T L V - S L L P R C V A N Y I E C F I G N Tadhaerens__XP_002114744.1 2646 W S Q Y V A H C G P P A W H S S N Q A R V Q V Q C R Q L A D Q I V T W S - G E I D T I E V V D I D N K F Q D M F - S S L P W L I H Q W L T C V F G K Aqueenslandica__XP_011409218.2 605 F E E Y N H L C G V D T S I N D N L I Q N Q L D C L H L K G S V L S V E I A Q I K E V R I S E I T N S P L T T L - S N F P D S V R N T L T C L M G E consensus/90% W p p Y . . h C u . . s . p p s s b h p s Q l . C . . L p G . . l p h p . G . h p . V p l s p . s N . h b s . l . s . h P . . h . p h h p C h h G c Core OB-fold Secondary Structure * * * S1 S2 S3 H1 Hsapiens__NP_005996.2 A Y P A C S P G 3 T A E E E L C R L K L L A K H P C H I K K F D R Y K F E I T V G M P F 5 G S R S R E E 4 K D I V L R A S S E F K S V L L S L R Mmusculus__NP_035846.1 A Y P S C S S G 3 T A E E E L C R L K Q L A K H P C H I K K F D R Y K F E I T V G M P F 3 G N R G H E E 4 K D I V L R A S S E F K D V L L N L R Ggallus__XP_420803.2 T Y P L C D P K 3 M E E E E L C R L K Y L T K H N C H M K M F D R Y K F E I T V G M P F 4 G T K L V E E 4 K D I V L K A S N E F K K V L L N L R Drerio__XP_695252.5 V Y P K C E P Q 2 5 Q E E E E L C R I K A Y A T H Q C H V K R F D S Y R F E V T V G M P V 1 G V T K V D N 2 G D I L L M A S H E F R Q V L L N L N Skowalevskii__XP_002741405.1 P Y P E C N T T 5 N G T N K L C E I K Q I S G R D C H M S K Y D T L T F Q I S A K M K - S T D G K S - S V I N I V A S H S F R H T A L S L Q Dmelanogaster__NP_651079.2 3 Q H F K C D P K 3 Q C E E W R S V F K T F N A Q 2 s C T L Q R W N R Y E Y E L L V K V G T 3 G R L L G R S 2 T D V I L R A H H D F G N F T R L L S Dpulex__EFX90302.1 -K W P P C H N H 6 S L D S R R C N L F W K L N H 4 L C H V D R F S K F T L R L E V E S K I- 5 S W S G D -N N- 4 L T V E L L A G P R F Q K L A F R L S Lpolyphemus__XP_013784410.1 H F V C D E A 3 P M E F E R C H L F A S H N I 1 w C D L E N W T S Y Q F Q V T L K M S S V K M F T E L I L E V D N H C K D F I V N L R Tadhaerens__XP_002114744.1 K Y T A C D N P - - D D P I C R L R R Q D - - G C H L H N N D H Y K F Q I R T K M P I P Q R G N E - L D V L L E A E H D F A T Q I Y Q L R Aqueenslandica__XP_011409218.2 T K P F C G R Q - A D S K T C I F K - - - - - G C H F D S K N Q Y N V H I S T M I P S 3 S L R M T A V L S V W M G H Q Y L L D S N L L K L K consensus/90% . a . . C p . . . . - p b b C . h b ...... C c h p p b s p a p h p l p s . h . . u . p . . . . . s l . l . h p . p h . p . h h p L p

Secondary Structure S4 S5 * * Hsapiens__NP_005996.2 Q G S L I E F S T I L E G - R L G S K W P V F E L K A I S C L N C M A Q L S P T R R H V K I E H D W R S T V H G A V K F A F D F F F F P F L S A A 890 Mmusculus__NP_035846.1 Q G S L I E F S T I L E G - R L G S K W P V F E L K A I S C L N C M T Q L S P A R R H V K I E Q D W R S T V H G A L K F A F D F F F F P F L S A A 890 Ggallus__XP_420803.2 Q G S I I E F S T I L E G R L G S K W P V F E L K A I T C L N C M S K L L P A G R H V K I E H D W R S T V H K A I K F V F D F F F F P F L S A A 874 Drerio__XP_695252.5 P G S M V E F S T K L E G K L G S K L P A F E L K A I H C L N C S S N L V P E G R Q V K I E R N W R K S M L K A I K F A F D F F F A P F L C A R 1058 Skowalevskii__XP_002741405.1 I G N R I E F T G T L T T - N L G A P G L T L T L Y S L Q C L T C T S I Q A D L I T Q Q Q T I S E - - - Q L M E S L F F A L N F F I P P F F S T S 830 Dmelanogaster__NP_651079.2 E G D V V L F Y G I L H N S- R L L A D N V Q V K L K T I E C V E C R S R D L G T A S I E R V V A A 5 R L Q D L M R G I K Y L L N A L L N P L I T F K 853 Dpulex__EFX90302.1 P G Q E I R F H G T L S S P S V G G L K P E L V L N S L E C L T C Q P D S T F N S H A E S T S H Q 1 1 I D R A L Q G A V D F V L P H F L R L N V T Q L 888 Lpolyphemus__XP_013784410.1 R G H V L S I S G I L H T - N V G K R K P H L W L Y S A N C T N C Q K E L S C S K I T N G I Y A - - - - Q W N K S F E L L M A F F T F P I V S Y V 809 Tadhaerens__XP_002114744.1 Q G M L A D F K G I L K S - N L G G D T P V L E L I S I N S E G L K K Q S I Y Q ------D N T T L M V F H F L L Q F I F Y P M L T M S 2843 Aqueenslandica__XP_011409218.2 S N Q F I S F N A T L E D - G L G T Q T L N L K L L S Y Q L K D G S D E V F D S S E K G S L D K E S K G E I L D D F W S S V I S T A K F V T E I L 815 consensus/90% . G p . l p F p s . L p s . p l G s . . . . h . L b u h p h . s h . . p ...... p h . . s h . h h h . . h h . . h h s . . bioRxiv preprint doi: https://doi.org/10.1101/716472; this version posted January 15, 2020. The copyright holder for this preprint (which was not - Clearly-discernablecertified by peer archaeal review) origin is the author/funder. This article is a US Government work. It is Identifiednot subject to copyright gene under loss/expansion 17 USC 105 and is also made available for use under a CC0 license. - Clearly-discernable bacterial origin events within lineages Emergence in eukaryotes, derived - from a bacterial origin loss expansion * Eukaryotic provenance or innovation - of novel family in eukaryotes

Calstabin ERp57 CaMKs ERdj5 Calcineurin B TMX1 PPP2R3 * TRIC PPP2R3 calnexin * Homer Diplomonadida Calreticulin Calnexin Calcineurin A

Parabasalia archaea

bacteria Heterolobosea LECA PPP2R3 Calmodulin family * SERCA Kinetoplastids EHD/Sarcalumenin

IP3R

NCS1 Calcipressin Haptophyta

ORAI L-type VGCC STIM TRPC ERp44 calstabin * WFS2 RyR ORAI Sorcin SAR Rhizaria *

CREC family Stramenopiles SelN Archaeplastida Wolframin Neurabin Apicomplexa Tespa1 Calsequestrin S100 Ciliates

Bcl2 CaMK * calstabin * calnexin * Rhodophyta

Plants

Amoebozoa

Apusomonada Animals CaMK * Fungi calcineurin A *

Filasterea

Choanoflagellates

calcineurin A sorcin HOMER calsequestrin IP3R SelN ERdj5 sarcalumenin TRIC IP3R ORAI CaMK * CaMK * calmodulin * calmodulin * calmodulin * calcineurin A * SERCA *