bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

1 Breakthrough Report 2 3 Florigen revisited: proteins of the FT/CETS/PEBP/PKIP/YbhB family may be the enzymes 4 of small molecule metabolism 5 6 Short title: Florigen-family proteins may be enzymes 7 8 Olga Tsoy1,4 and Arcady Mushegian2,3* 9 10 11 1 Chair of Experimental Bioinformatics, TUM School of Life Sciences Weihenstephan, Technical 12 University of Munich (TUM), 3, Maximus-von-Imhof-Forum, Freising, 85354, Germany 13 14 2 Molecular and Cellular Biology Division, National Science Foundation, 2415 Eisenhower 15 Avenue, Alexandria, Virginia 22314, USA 16 17 3 Clare Hall College, University of Cambridge, Cambridge CB3 9AL, United Kingdom 18 19 4 current address: Chair of Computational Systems Biology, University of Hamburg, 20 Notkestrasse, 9, 22607, Hamburg, Germany 21 22 * corresponding author: [email protected] 23 24 25 26 27 The author responsible for distribution of materials integral to the findings presented in this 28 article in accordance with the policy described in the Instructions for Authors 29 (www.plantcell.org) is Arcady Mushegian ([email protected]).

1 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

30 Abstract 31 Flowering signals are sensed in plant and transmitted to the shoot apical , where 32 the formation of is initiated. Searches for a diffusible -like signaling entity 33 (“florigen”) went on for many decades, until in the 1990s a product of plant gene FT was 34 identified as the key component of florigen, based on genetic evidence and protein localization 35 studies. Sequence homologs of FT protein are found throughout prokaryotes and eukaryotes; 36 some eukaryotic family members appear to bind phospholipids or interact with the components 37 of the signal transduction cascades. We studied molecular features of the FT homologs in 38 prokaryotes and analyzed their genome context, to find tentative evidence connecting the 39 bacterial family members with small molecule metabolism, often involving sugar- or 40 ribonucleoside-containing substrates. Most FT homologs share a constellation of five charged 41 residues, three of which, i.e., two histidines and an aspartic acid, circumfere the rim of a well- 42 defined cavity on the protein surface. We argue that this conserved feature is more likely to be an 43 enzymatic active center than a catalytically inactive ligand-binding site. We propose that most of 44 FT-related proteins are enzymes operating on small diffusible molecules, which may constitute 45 an overlooked essential ingredient of the florigen signal. 46 47

2 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

48 Introduction 49 production in plants occurs in response to the environmental cues – most 50 importantly, changes in the day length. The role of photoperiodicity in all living forms from 51 bacteria to higher eukaryotes is well established, and it has been shown in numerous studies that 52 the perception of the photoperiodic signal in plants occurs primarily in the leaves. Flowers, 53 however, are formed mostly by shoot apical meristems (SAM), which are typically shielded from 54 direct light, and the mechanisms which convey the flowering signal from leaves to SAMs have 55 been a matter of speculation and investigation for most of the 20th century. 56 Plant physiologists have established, by the 1930s, that angiosperms generally fall in 57 three categories, i.e., long-day plants, in which blooming is turned on by day lengthening / night 58 shortening; short-day plants, which initiate blooming upon day shortening / night lengthening; 59 and day-neutral plants, which bloom in response to the cues other than day length (Garner and 60 Allard, 1920; Knott, 1934). Experiments on flowering in partially shaded plants and in grafts 61 between light-induced and uninduced plants suggested the existence of a flower-promoting 62 diffusible substance; the idea may have been first proposed by Julius Sachs (Sachs, 1865), but 63 was consolidated in the modern form by M. Chailakhyan (1902-1991), who proposed the term 64 “florigen” as the name for the flower-inducing chemical entity (Chailakhyan, 1936). The early 65 research on the photoperiodic signal sensing in leaves is reviewed in (Zeevaart, 2006; 66 Kobayashi and Weigel, 2007), and an account of Chailakhyan's remarkable life and research 67 work can be found in (Romanov, 2012). 68 The transfer of a flowering signal from leaves to the shoot apex has been studied over the 69 years in many species of flowering plants, and general similarity of its properties to those of 70 small molecules was noted; for example, the florigen fraction appeared to move in at the 71 speed comparable to that of other plant assimilates (King et al., 1968). It has been established 72 also that grafts and extracts of induced plants could transfer flowering ability not only in the 73 uninduced vegetatively developing plants of the same species, but sometimes also in other 74 species or genera, suggesting the evolutionary conservation of the signaling pathways (Zeevaart, 75 1976). 76 Despite general acceptance of the idea of florigen as a conserved hormone-like substance, 77 the attempts at the isolation and chemical characterization of the responsible entity did not 78 succeed for several decades. Reviewing the state of the affairs in 1976, J.A.D. Zeewart listed the

3 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

79 factors that have been tested unsuccessfully for the ability to induce plant flowering; the list of 80 failed candidates included sugars, amino acids, sterols, , , ethylene, 81 , the photosynthetic capacity, initiation of protein synthesis, and others (Zeevaart, 82 1976). Several hypotheses have been put forward to explain the difficulties of identifying 83 florigen: perhaps it was not one molecule but several distinct or metabolites, acting 84 jointly in a specific succession, or as a mix with specific ratio of components; or, possibly, the 85 identity of florigen was masked by simultaneous presence of flowering inhibitors in the same 86 samples (Lang et al., 1977); or, maybe, flowering was caused by propagation of electric, rather 87 than chemical, signals between leaves and meristems, such as plasma membrane potential (Penel 88 et al., 1985). 89 Despite all the work to test these and other hypotheses, there was little progress in 90 biochemical characterization of florigen until early 1990s, when the search for a flower-inducing 91 activity started employing the tools of molecular biology. For example, in what may have been 92 the last published study by Chailakhyan, a radiolabeled protein band of ~27 kDa was observed in 93 the induced, but not in naïve, leaves of Rudbeckia, followed by accumulation of the similar-sized 94 band in the shoot apex tissues (Milyaeva et al., 1991); the work is largely unavailable to the 95 Western reader because of the temporary interruption of translation and indexing of Russian- 96 language journals upon the collapse of the Soviet Union. Around the same time, it has been 97 shown (Takeba et al., 1990) that aqueous extracts of Lemna, Pharbitis and Brassica contained a 98 flower-inducing fraction dominated by a 20-30 kDa protein band; the authors noted, however, 99 that the florigen activity was resistant to proteinase K digestion that removed the protein, perhaps 100 suggesting a role for an associated small molecule. 101 Nearly simultaneously, the results of genetic screens for A.thaliana mutants with late 102 flowering phenotypes were published (Koornneef et al., 1991). This was a watershed moment in 103 the studies of the molecular determinants of flowering initiation, and the important discoveries 104 that ensued in the following three decades have been reviewed extensively (Andrés and 105 Coupland, 2012; Lifschitz et al., 2014; Kinoshita and Richter, 2020). The state of the knowledge 106 may be summarized as follows. 107 Flowering in Arabidopsis is LD-dependent, and is enabled through the circadian clock- 108 controlled transcriptional co-regulator CONSTANS (CO) and its target FLOWERING LOCUS T. 109 The protein product of the latter gene has emerged as the integrator of the environmental inputs,

4 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

110 relaying these signals into the gene regulatory network that controls flowering. FT protein is 111 produced in the phloem companion cells of the leaves, enters the phloem sieve elements and is 112 transported from the leaves to the base of shoot apical , where it has been detected 113 experimentally. Considerable evidence exists that the FT gene is not expressed in SAM above its 114 basal portion, and that the FT protein may move from cell to cell in plants. This is a key set of 115 properties expected of florigen. A strong genetic and transgenic evidence suggests that FT is 116 required for activating floral meristem identity genes and flowering time controllers, such as 117 MADS-box factors SUPPRESSOR OF OVEREXPRESSION OF CONSTANS 1 118 (SOC1), FRUITFULL (FUL/AGL8) and AGL24, and ultimately the master regulator of flower 119 development LEAFY (LFY). Orthologs of FT, such as SINGLE FLOWER TRUSS (SFT) in 120 tomato, HEADING DATE 3a (HD3a) in rice, and their counterparts in many other species 121 (sometimes called CETS proteins in the plant context, after the names of several better-studied 122 representatives from Antirrhinum, Arabidopsis and Lycopersicon) share many of these properties 123 with FT, with some variation in the precise wiring of the transcriptional networks. The paralogs 124 of FT, found in varying numbers in all examined angiosperm species, also tend to be involved in 125 flowering control, acting either synergistically or antagonistically with FT, though additional or 126 exclusive roles in other aspects of plant development have been shown for a subset of them 127 (Abelenda et al., 2019; Ahn et al., 2006; Conti and Bradley, 2007; Danilevskaya et al., 2011; Jin 128 et al., 2021; Krieger et al., 2010; Pin and Nilsson, 2012; Pnueli et al., 2001; Tsuji et al., 2011). 129 Despite all effort, FT protein has not been detected in phloem-free tips of SAM in any 130 plant, only in the basal cells of the meristem zones. Protein localization studies at a single-cell 131 resolution in vivo remain challenging, and it is not clear whether FT actually reaches the cells 132 where the meristem identity genes are expressed. The argument that FT does just that – as 133 opposed, for example, to activating an additional low molecular weight messenger – has been 134 made in the literature, but the evidence is indirect, showing for example that some engineered 135 fusions of FT to green fluorescent protein-based reporters are retained in the phloem at the 136 bottom of the SAM zone, and in those cases they do not complementation the ft recessive 137 mutants (Corbesier et al., 2007). A study in rice (Tamaki et al., 2007) has made a more direct 138 claim that Hd3a, the protein ortholog of FT in rice, does reach the tip of the SAM to induce 139 flowering. However, their Fig. 3, cited as the key supporting evidence, does not seem to show the 140 protein reporter activity at the tip, where it is supposed to be. Therefore it remains possible that

5 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

141 FT is but one component of the flower induction signal, and that a small molecule, possibly 142 activated by or acting synergistically with FT, is also involved (Lifschitz and Eshed, 2006; 143 Andrés and Coupland, 2012). 144 Another gene of Arabidopsis involved in flowering control, FD, has been identified as a 145 recessive suppressor of FT – the ability of overexpressed FT to induce precocious flowering is 146 impaired in plants with the lowered production of FD (Abe et al., 2005). The FD gene encodes a 147 bZIP-type , which interacts with the FT protein in yeast two-hybrid system, in 148 the bi-molecular fluorescence complementation assays in vivo, and apparently in the affinity- 149 purified protein complexes (Abe et al., 2005; Wigge, 2005). In rice, the FT ortholog HD3a does 150 not interact with the bZIP factors directly, but the two proteins co-precipitate in a tripartite 151 complex that also includes a scaffolding 14-3-3 protein (Taoka et al., 2011). These interaction 152 data are the major part of the evidence on the basis of which FT has been assigned the molecular 153 function of a transcriptional co-activator, proposed to form a putative multisubinit complex that 154 binds to the regulatory regions and controls the expression of the flowering identity genes. A 155 corroboration of these protein interaction experiments is provided by the analysis of gene 156 expression and ChIP-Seq data, which reveal that tagged FT binds, apparently mostly in the FD- 157 dependent manner, near many of the genes involved in flowering control (Zhu et al., 2021). 158 Analysis of the amino acid sequence has shown that FT protein belongs to a widely 159 conserved sequence family, members of which are encoded by genomes of many bacteria, 160 archaea and nearly all eukaryotes. The founding member of the family, isolated from bovine 161 brain, is a cytoplasmic protein that can bind in vitro to many low molecular weight compounds 162 of different chemical structure, including certain phospholipids (Bernier and Jollès, 1984). The 163 name phosphatidylethanolamine-binding protein (PEBP) became attached to the family, though 164 functional relevance of phosphatidylethanolamine binding has not been demonstrated for any 165 homolog of this protein in any species. The genes or protein products of FT/PEBP family turn up 166 in many genetic screens and binding assays. This has resulted in a variety of putative properties 167 assigned to these proteins, including, in addition to phospholipid binding in plants and animals, 168 also inhibition of carboxypeptidase Y and regulation of Ras GTPase in yeast, inhibition of Raf-1 169 kinase in mammals (hence an alternative name of the protein family, RKIP), suppression of 170 transepithelial migration of the mammalian host neutrophiles by a secreted, plasmid-encoded

6 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

171 PEBP homolog from uropathogenic E.coli (Lau et al., 2014), and an uncharacterized role in the 172 modification of polyketide chains in Streptomyces (Li et al., 2008, 2009). 173 Structural studies of crystallized FT/PEBP-like proteins from diverse species of bacteria 174 and eukaryotes have revealed a distinct spatial fold, with a structural core dominated by two 175 beta-sheets and superficially resembling an immunoglobulin-like arrangement (entry 11.1.7 in 176 the ECOD database (Cheng et al., 2015)). A prominent structural feature of this family is an 177 evolutionary conserved cavity on the surface of the molecule. The cavity sometimes 178 accommodates anions included in the crystallization media, including the phosphoryl moiety of 179 the co-crystallized phosphoethanolamine, but is neither hydrophobic nor large enough to bind 180 lipids; on the other hand, several sites suitable for binding hydrophobic ligands have been 181 inferred on the molecule by in silico studies, but those sites tend to be located in the least 182 conserved regions of the molecule (Roderick et al., 2002; Ahn et al., 2006; Shemon et al., 2010; 183 Rudnitskaya et al., 2012; Guo et al., 2018; Nakamura et al., 2019). Curiously, the site of 184 interaction between FT and FD in A.thaliana has not been structurally characterized. 185 In this work, we argue that the patterns of sequence and structure conservation in the 186 family, as well as the genomic context of the FT/PEBP proteins in prokaryotes, are best 187 compatible with the idea that these proteins are enzymes involved in production, attachment or 188 removal of low molecular weight ligands, and that some of such small molecules could play a 189 role in induction of flowering in angiosperms. Thus, rather than consisting only of the FT protein 190 itself, florigen's active moiety may comprise a low molecular weight product of an enzymatic 191 activity of FT. 192 193 Results 194 Sequence conservation in the FT homologs throughout the Tree of Life suggests shared ancestry 195 and common molecular function 196 We have collected the homologs of plant FT proteins by the PSI-BLAST searches of the 197 NCBI NR database, focusing mostly on completely sequenced genomes. We also consulted the 198 NCBI COG resource that annotates conserved orthologous genes in bacteria and archaea. In the 199 following, unless specified otherwise, we refer to the prokaryotic orthologs from this family as 200 YbhB proteins, after the name of the chromosomal gene in E.coli, and use some of the 201 FT/CETS/PEBP/PKIP name aliases for eukaryotic homologs. Complete genomes of the

7 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

202 unicellular and multicellular eukaryotes tend to encode at least one, or commonly more than one, 203 FT homolog; a rare exception are some parasitic eukaryotes with reduced genomes, such as 204 microsporidia, which do not appear to have genes from this family. In bacteria and archaea, the 205 distribution of YbhB orthologs is more complex. YbhB genes (NCBI COG1881) are found in 206 almost all major lineages of bacteria and archaea. Among the clades of archaea, only 207 methanogenic Euryarchaeota appear to lack the YbhB homologs, and among bacteria, only the 208 species in phylae Firmicutes, Mollicutes and in the order Spirochaetales tend to be YbhB-free. In 209 most other clades of bacteria and archaea, between 30 and 80% of all species encode YbhB 210 homologs. All told, there are 782 copies of YbhB family proteins found in 594 bacterial and 211 archaeal genomes out of the 1309 genomes in the 2020 release of the COG database. YbhB 212 homologs are also encoded by the genomes of many giant DNA viruses from the Megaviricetes 213 class and from some related lineages. 214 We produced a multiple sequence alignment of the representative proteins from many of 215 these clades and inferred a phylogenetic tree of these sequences. The alignment is shown in 216 Figure 1, and the tree in the Newick format is available as the Supplemental Data Set 1. Many of 217 the branches in the phylogenetic tree are long, and the position of a few clades is not well- 218 resolved. The partitioning of prokaryotic sequences in the tree, however, shows close 219 correspondence to the phylogenetic positions of the species in which they are found; the 220 statistically supported clades generally do not mix bacterial and archaeal representatives, and it is 221 only occasionally that a sequence from a phylogenetically distant bacterium is found within a 222 group of more closely related homologs, as in the case of one sequence from Gram-negative 223 proteobacterium Helicobater pylori nested inside a group of YbhB-like sequences from Gram- 224 positive bacteria. All this indicates that the evolution of the YbhB family in prokaryotes must 225 have included the episodes of rapid evolution and differential gene loss, while the horizontal 226 gene transfer events may have been relatively rare in this family. 227 The eukaryotic sequences are separated from bacterial and archaeal sequences by a long 228 internal branch in the unrooted tree. Within eukaryotes, nearly all plant FT/CETS homologs 229 segregate as one clade, and fungal/metazoan homologs represent another assembly, occasionally 230 intermingled with the representatives of unicellular eukaryotes; evolution and specialization of 231 FT homologs in different plant lineages has been extensively discussed, e.g., (Klintenäs et al., 232 2012; Karlgren et al., 2011; Jin et al., 2021). Within fungal and metazoan proteins, a distinct

8 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

233 clade includes FT/PEBP homologs detected recently in an unexpected biological context, namely 234 as the constituents of mitochondrial ribosomal large subunit, where they are known as 235 mitochondrial ribosomal proteins (Mrp) L35/L38. These FT/PEBP/YbhB homologs have distinct 236 N-terminal sequence extensions, found in one gene product within each completely sequenced 237 animal, fungal and protist genome (many of animal and fungal genomes also include additional, 238 extension-free homologs). This domain is not included in the alignment in Figure 1; none of the 239 plant FT homologs appear to contain such a region. 240 Taken together, these data suggest the ubiquity of the FT/CETS/PEBP/PKIP/YbhB-like 241 proteins throughout the evolution of life and a single origin of all eukaryotic homologs from a 242 prokaryotic ancestor. It is likely also that all or most of those proteins share an ancient conserved 243 molecular function, reflected in the nearly-universal sequence conservation of several key amino 244 acid residues and the common structural scaffold of the entire protein family (Figure 1, and see 245 the next section). 246 247 Structural analysis of PEBP/PKIP/FT/YbhB proteins supports the hypothesis of an enzymatic 248 function 249 A common biochemical function of the entire PEBP/YbhB protein family is further 250 suggested by mapping of the conserved and variable sequence elements of the family onto the 251 available three-dimensional structures (Figures 1 and 2). There are five nearly-invariant polar 252 amino acid residues in the alignment – two aspartic acids located at the C-terminus of strand 3, 253 two histidines found at the N-termini of strands 4 and 6, and an arginine that adjoins the second 254 conserved histidine. Substitutions in these positions in prokaryotes are rare, and multiple 255 replacements are seen mostly in the eukaryotic mitochondrial ribosomal proteins L35/L38, which 256 is the clade that is most likely to have acquired a new molecular function during evolution. There 257 are also several highly conserved residues, in particular prolines, in the loops around these 258 signature amino acids (Figure 1). The five most conserved residues are also found close to each 259 other in the spatial structures of the proteins (Figure 2). 260 Both histidines, the first of the two aspartates, and the proline-rich loop before the strand 261 6 form the rim of a hydrophilic hole, which is the largest cavity conserved in all structurally 262 characterized proteins in the family. The second of the conserved aspartates forms forms three

9 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

263 hydrogen bonds with the side chain of the conserved arginine, constituting a conserved 264 “shoulder” on the surface of the molecule (Figure 2). 265 The polar cavity on the surface of all PEBP/PKIP/FT/YbhB protein molecules has been 266 interpreted as a site that binds a ligand containing a polar group, such as phosphate; many of the 267 described or suspected interactions of eukaryotic PEBP/PKIP/FT with phospholipids and 268 phosphoproteins have been rationalized in this way. Recently, a report that FT from Arabidopsis 269 may bind phosphatidylcholine (PC) more strongly and more specifically than other lipids has 270 prompted the analysis of high-resolution structures of FT crystallized in the presence of PC. 271 Unexpectedly, in the crystals obtained under various conditions no electron density for PC could 272 be detected, and computational docking suggested four potential binding sites for PC, all located 273 far from the conserved polar cavity (Nakamura et al., 2014a, 2014b). It is also notable that the 274 electrostatic calculations suggest that the overall charge of the cavity, at least in vacuum, is 275 strongly negative, making it perhaps not so conducive for direct interactions with phosphate, 276 whereas the charge on the “shoulder” may be more positive (Figure 2B). On the whole, the 277 attempts to relate the known biological properties of the FT homologs to their sequence and 278 structure still do not provide a satisfactory explanation of the role of the most prominent, 279 conserved elements of the family. 280 We suggest that these signature elements of sequence and structure are indicative of an 281 enzymatic function of the FT/YbhB homologs. The search of Mechanism and Catalytic Site 282 Atlas (M-CSA) resource, which contains annotated information about the amino acid 283 determinants of catalysis in conserved enzyme families, identifies 21 families of enzymes that 284 have two His, two Asp and one Arg in their active centers. These families represent all 7 of the 285 top-level Enzyme Classification classes of activities, i.e., oxidoreductases, transferases, 286 hydrolases, lyases, isomerases, ligases and translocases. If the selection criteria are relaxed to 287 only two histidines and one aspartic acid surrounding the hydrophilic hole, the search retrieves 288 105 families, corresponding to almost 11% of the 964 entries in the database. The particular 289 three-dimensional configuration of these residues in PEBP/PKIP/FT/YbhB proteins appears to be 290 quite unique, as judged by the analysis with the PINTS program, which compares similar spatial 291 arrangements of key residues in non-homologous proteins (Stark and Russell, 2003). 292 Nonetheless, it is clear that a combination of two histidines, two aspartates and a lysine has been

10 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

293 repeatedly utilized in evolution to build enzymatic active centers, enabling catalytic conversions 294 of different kinds. 295 These observations are in sharp contrast to what is known about sequence conservation in 296 non-catalytic ligand-binding protein domains. A collection of such domains was extracted from 297 the PFAM database, using keywords “ligand+bind” and “phosphate-binding”, resulting in 1044 298 conserved domains. Removal of the clearly annotated enzymatic domains that bind phosphate in 299 their active centers, such as the protein kinases or ATPases, and selection of a non-redundant set 300 of sequence families, produces 651 putative non-catalytic ligand-binding domains. Examination 301 of the sequence profiles of these families suggests that the majority of the conserved positions in 302 these domains tend to be occupied by non-polar residues, although conserved charged residues, 303 such as lysine, histidine, arginine, glutamate or aspartate, are also found occasionally. We 304 recorded the identity of all charged residues that were characterized by the information content 305 of more than 50% of the maximum possible value for their position, and found only 25 domains 306 that had more than two conserved charged residues; none of those domains simultaneously had 307 two conserved histidines and a conserved aspartate (Supplemental File 2). Thus, the known 308 conserved ligand-binding domains do not appear to use the combination of two histidines and an 309 aspartic acid residue for their non-catalytic interactions with small molecules, suggesting again 310 that the conserved feature in PEBP/FT/YbhB proteins is more likely to be a part of the catalytic 311 center than to serve solely as a binding interface. 312 313 Genomic context of FT homologs in bacteria and yeast suggests connections with small molecule 314 metabolism 315 We took advantage of the broad taxonomic distribution of the YbhB homologs and tried 316 to infer the putative functional linkages for these gene products on the basis of their genome 317 context (Table 1). One kind of a linkage between proteins may be revealed when two protein 318 domains exist as separate open reading frames in some species, but are fused into multidomain 319 proteins in others; such translational fusions, especially those that are evolutionary conserved, 320 are enriched in proteins that work in concert with one another (Huynen et al., 2000; Suhre and 321 Claverie, 2004) . The YbhB homologs in bacteria and archaea are most commonly encoded as 322 stand-alone open reading frames and form domain fusions infrequently. In actinobacteria, 323 however, they are found in the C-terminal portions of longer proteins, which also contain the

11 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

324 modules implicated in carbohydrate metabolism. For example, protein WP_014179790.1 in 325 Streptomyces sp. consists of the N-terminal pectate lyase-like carbohydrate-binding module 326 (domain ID in NCBI CDD database cd04082), followed by the FN3 repeat region (cd00063), a 327 putative glucose/sorbosone dehydrogenase (GSDH) region with predicted beta-propeller 328 structure (cl268170), and finally the C-terminal YbhB homology domain. This theme is partially 329 preserved, albeit with domain rearrangement, in some species of evolutionarily distant 330 proteobacteria, where the order of domains is GSDH-YbhB, or occasionally GSDH-FN3-YbhB; 331 examples include WP_014747134.1 in an alphaproteobacterium Tistrella and proteins in 332 gammaproteobacteria, such as WP_096298086.1in Luteimonas, PYD93448.1 in Pseudomonas 333 syringae pv. pisi, or WP_145513070.1 in Xantomonas perforans. GSDH enzymes utilize quinone 334 cofactors to convert hexoses and their derivatives into a variety of products in bacteria (Miyazaki 335 et al., 2006). 336 We then surveyed the databases of conserved genomic neighborhoods and scanned the 337 literature for the evidence about the conservation of FT/YbhB chromosomal neighbors in 338 different genomes. As with the protein fusions, the FT/YbhB family genes are not commonly 339 found within predicted operons or have conserved positional associations with other ORFs. In 340 two species of Streptomyces, however, YbhB homologs are located within the large (20 genes) 341 operons responsible for biosynthesis of polyketide-based small molecules, i.e., tautomycin in 342 S.spiroverticillatus and tautomycetin in S.griseochromogenes. These polyketides are of 343 pharmacological interest, because they are potent inhibitors of mammalian protein phosphatases. 344 The YbhB family genes, called, respectively, ttnL and ttmL, together with the adjacent seven 345 open reading frames, are required to manufacture a rare dialkylmaleic anhydride moiety of 346 tautomycin and tautomycetin and to conjugate it to the polyketide backbone, produced by the 347 remaining genes in the same operon. Dialkylmaleic anhydride is required for the activity of the 348 mature product, and is made de novo from propionate and an unidentified C5 compound, through 349 the succession of the steps that are incompletely understood (Li et al., 2008, 2009) . 350 Another approach of connecting genes into functionally related groups is to analyze their 351 phyletic vectors, i.e., the representations of the presences and absences of their orthologous 352 genes in different genomes (Glazko and Mushegian, 2004; Glazko et al., 2006). We have 353 searched the space of phyletic vectors associated with the 2014 release of NCBI COGs, using the 354 vector of YbhB (COG1881) as the query. After retaining the matching vectors with comparable

12 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

355 cardinality (i.e., the genes present in 300-400 species, to avoid spurious matching to almost- 356 ubiquitous proteins), we found two COGs that were most often co-inherited by the same 357 genomes as COG1881. One of those, COG1042, is annotated as Acyl-CoA synthetase (NDP 358 forming); it is found in 358 species, 220 of which encode both COG1881 and COG1042. The 359 enzyme, best studied in hyperthermophilic archaea and protists, is involved in the substrate-level 360 phosphorylation, by the equation acetyl-CoA + ADP + Pi ⇌ acetate + ATP + CoA (Musfeldt and 361 Schönheit, 2002). The roles of the bacterial homologs is less clear, as some of them appear to be 362 catalytically inactive and possibly play auxiliary roles in the acylation and deacylation of 363 proteins (Thao and Escalante-Semerena, 2012) . The other gene with matching phyletic pattern, 364 COG2835, annotated as “uncharacterized conserved protein YbaR, Trm112 family”, is found in 365 390 genomes, 233 of which also encode COG1881. YbaR/Trm112 family encodes activators of 366 several methyltransferases involved in modification of rRNA, tRNA and peptide release factors 367 (van Tran et al., 2018) 368 One more way to predict functional linkages between gene products is to examine their 369 putative regulatory regions, which in bacteria tend to be located in the intergenic regions, usually 370 to the 5' ends of the open reading frames or operons they regulate. We performed such analysis 371 of YbhB homologs in several bacterial taxa and detected a conserved site upstream of the YbhB 372 open reading frame in the family Enterobacteriaceae. Unlike most known binding sites for the 373 specialized transcription factors, this site lacks palindromic structure and has a consensus 374 sequence TACACTT. Scanning 41 Enterobacteriaceae species from 14 genera with a 375 probabilistic model of the site identifies additional intergenic regions where the variants of these 376 sites occur. (Table 1, and Supplemental File 3). Three genes that were found to contain a highly 377 similar conserved upstream site in 16-17 species, representing 8 genera are rlmA (23S rRNA 378 m(1)G745 methyltransferase), yobB (putative carbon-nitrogen hydrolase family protein), and 379 adrB (c-di-GMP phosphodiesterase). Sites matching this consensus in E.coli have been noticed 380 before and predicted to represent a modified form of the canonical -10 element sequence 381 TATAATT (Huerta and Collado-Vides, 2003) ; it is located in an appropriate position relative to 382 the ORFs starts, but the regions in which we found this upstream element do not appear to have 383 the recognizable -35 sequence nearby. The significance of the shared TACACTT element thus 384 remains unclear, though a common regulation mechanism for genes that share this site is a 385 possibility.

13 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

386 Finally, we have consulted several curated databases of pre-computed functional 387 linkages. PEBP/FT/YbhB homologs are not well-represented among the modules reported by 388 those databases, but two observations were of interest. The FunCoup database, which uses naïve 389 Bayesian approach to integrate the information from 10 different genomic and proteomic 390 measurement spaces (Persson et al., 2021), shows a small network of interacting proteins in 391 E.coli, which includes two chaperones, i.e., a heat shock 70-family DnaK and a protease Lon, as 392 well as YbhB and YbcL. Interactions with chaperones, which are frequently observed in the 393 protein interaction data, may reflect cellular proteostasis needs and a broad clientele of many 394 chaperone systems, but are often not particularly informative about the protein function. The 395 connection to the other protein in the group, YbcL, may be more revealing: that protein, recently 396 renamed RlhA, appears to be a component of the 5-hydroxycytidine synthase enzymatic complex 397 involved in modification of 23S rRNA (Kimura et al., 2017). And the analysis of phyletic 398 patterns across many eukaryotes, collected in PhyloGene database (Sadreyev et al., 2015) has 399 revealed one highly-scoring match co-inherited with human PEBP1 gene, i.e., NUDT3, encoding 400 an enzyme from the Nudix class. Nudix proteins are hydrolases noted for the affinity to the 401 pyrophosphate moieties in their substrates, often lipid, nucleoside or oligonucleotide derivatives 402 (McLennan, 2006). 403 404 Discussion 405 “Circumstantial evidence is a very tricky thing," 406 answered Holmes thoughtfully. "It may seem to 407 point very straight to one thing, but if you shift your 408 own point of view a little, you may find it pointing 409 in an equally uncompromising manner to something 410 entirely different.” 411 412 Arthur Conan Doyle. The Boscombe Valley Mystery. 413 414 We present the computational evidence indicating that the molecular function of the FT 415 proteins in flowering, as well as the biochemical roles of the FT/CETS/PEBP/PKIP/YbhB 416 homologs, could have been misunderstood: the proteins from this family might be the enzymes 417 involved in the biochemical transformation of small molecules, either instead of, or in addition 418 to, their postulated role of being stoichiometric subunits within plant transcription complexes.

14 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

419 Much of the analysis reported here comes from the examination of prokaryotic genome 420 sequences. Obviously, studies in archaea and bacteria cannot be expected to validate the role of 421 FT in the transcriptional control of flowering in plants, as prokaryotes lack most of the plant 422 signal transduction systems and downstream effectors of flowering. Instead, we asked two 423 different questions, i.e., “What can be deduced about the molecular functions of the FT orthologs 424 in prokaryotes?” and “Is there evidence that these functions may be conserved in eukaryotes, 425 including plants?” 426 The first of these questions can be tentatively answered by taking into account the 427 genomic context of FT/YbhB homologs hinting at their possible functional connections to 428 enzymes involved in specific metabolic pathways. Though no single gene could be linked to 429 YbhB by several orthogonal approaches, a trend emerges when these data are considered jointly 430 (Table 1). The evidence appears to point towards functional linkages of YbhB to sugar and 431 ribonucleotide/RNA modifications, implying that FT/YbhB may be involved in the metabolism 432 of a monosaccharide such as ribose or another pentose, or perhaps their nucleoside-like 433 derivative. Relatedly, FT proteins could be involved in phospholipid metabolism that proceeds 434 through lipid conjugates with nucleotides; indeed, recent genetic evidence has suggested the 435 involvement of phosphorylethanolamine cytidylyltransferase (PECT1 gene product) in flowering 436 in Arabidopsis plants (Susila et al., 2021). Finally, FT proteins could have a role in attaching or 437 removing some low molecular weight moieties from larger molecules, such as proteins or nucleic 438 acids; this could explain the association of FT and its homologs with nuclear proteins seen in 439 plants and other eukaryotes. 440 The answer to the other question posed above, i.e., whether the putative enzymatic 441 function could have been preserved through the evolution of the FT/CETS/PEBP/PKIP/YbhB 442 family, appears to be clearly in the affirmative; the pattern of sequence conservation and spatial 443 juxtaposition of the key charged residues in the family at a long phylogenetic span is striking 444 (Figures 1 and 2), even though this constellation of conserved residues has not been definitely 445 mapped to a specific biological function thus far. Bioinformatic analysis suggests that the 446 conserved sets of two histidines and an aspartic acid clustered in space are frequent in the 447 enzyme active centers but virtually never are found in the non-catalytic ligand-binding domains. 448 Finally, broad occurrence of the protein family in bacteria, archaea, giant viruses and nearly all 449 eukaryotes would not be unusual for an enzyme, especially one involved in metabolism or

15 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

450 signaling, but would be a much rarer occurrence for transcriptional regulators, as they are 451 typically not shared between bacteria and eukaryotes (Babu et al., 2004). 452 There is an ample precedent of utilization of sugars, nucleotides and products of RNA 453 breakdown for biosynthesis; one class of plant hormones, cytokinins, are purine 454 derivatives that can be produced either by isoprenylation of adenosine phosphate or by tRNA 455 degradation (Kakimoto, 2003) , whereas another class, gibberellins, are synthesized by 456 transforming the pentose skeleton generated in the 1-deoxy-D-xylulose 5-phosphate pathway 457 (Salazar-Cerezo et al., 2018). Recent studies has significantly expanded the repertoire of linear 458 and cyclic oligonucleotides that serve as essential signaling messengers in bacteria and animals 459 (Burroughs et al., 2015; Burroughs and Aravind, 2020), again suggesting that some nucleoside 460 derivatives with regulatory properties may remain undiscovered in plants. 461 A critic might point out that the bioinformatic evidence of the enzymatic function of FT 462 and their homologs presented in this manuscript, let alone of the proposed involvement of FT in 463 specific biochemical pathways, is too circumstantial – some would say, speculative – and thus far 464 has not been corroborated by the wet-laboratory experiments. However, exactly the same can be 465 said about the experimental support for the role of FT as a transcriptional co-activator – a 466 hypothesis that has not been corroborated by the computational evidence and is quite unmoored 467 from the facts of comparative genomics. In the spirit of considering the entire corpus of the 468 available data, and using reciprocal illumination from different classes of evidence to improve 469 the precision of scientific hypotheses (Iyer et al., 2001; Baker et al., 2014; Carr et al., 2015), it 470 seems timely and urgent to start testing the hypothesis of a metabolic role of the FT homologs in 471 various organisms. Comparison of metabolic profiles in the wild-type organisms and in FT 472 knock-down mutants could be a first step in determining the identity of the substrates and 473 products of the putative enzymatic activity that we predict to be conserved in most members of 474 this protein family. 475 476 477 Methods 478 Sequence homologs were collected using the PSI-BLAST program (Altschul et al., 1997) 479 with standard settings, run either against the NCBI NR protein sequence database or the 480 databases restricted by taxon (e.g., viruses). Nucleotide and amino acid sequences were aligned

16 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

481 using the program MUSCLE v. 3.8.31 (Edgar, 2004). Distribution of the YbhB homologs in 482 bacterial and archaeal genomes was studied using the web interface for the 2020 release of NCBI 483 COG database (Galperin et al., 2021). The phylogenetic trees were constructed by PhyML 484 program implemented in Phylogeny.fr server (Dereeper et al., 2008), with 1000 bootstrap 485 replicates and approximate Likelihood Ratio Test. The three-dimensional structures of proteins 486 were visualized and approximate, charge-smoothed electrostatic surface representations were 487 generated using the open-source PyMOL environment (Schrödinger LLC; SciCrunch RRID 488 SCR_000305), installed from source using homebrew on the MacOS High Sierra 10.13.6. Gene 489 adjacency on the chromosomes and protein domain fusions were analyzed by visual inspection 490 of the sequence records in GenBank. Phyletic vectors were analyzed using the psi-square 491 program with default settings and vector similarity measured using Pearson correlation-based 492 distance (Glazko et al., 2005); the NCBI COG database release of 2014 was used as the search 493 space, because the digitized phyletic vector information was not available for the 2020 release at 494 the time of writing. For promoter analysis, the complete bacterial genomes were downloaded 495 from Ensembl Bacteria database (release 47) (Howe et al., 2020) in March 2020; if several 496 strains were available for a species, one random strain has been chosen. The regulatory regions 497 of homologous genes were aligned with MUSCLE and the most conservative regions were 498 chosen to build a positional weight matrix (PWM) by the SignaX routine of the Genome 499 Explorer program (Mironov et al., 2000). The genome scan was performed by Genome Explorer 500 with the threshold set equal to the lowest score in the training set, i.e., in the set of ybhB 501 homologous regions. The usage of specific amino acid residues by the enzymatic active centers 502 was performed with the aid of Mechanism and Catalytic Site Atlas (M-CSA) online resource, 503 using the Web interface to select the residues of interest (Ribeiro et al., 2018). The usage of 504 specific amino acid residues by ligand-binding domains was studies by collecting all domains in 505 PFAM 33.1 (El-Gebali et al., 2019) with keywords “ligand bind” or “phosphate-binding”. The 506 known enzymatic domains were manually excluded based on their description, and for each 507 remaining domain, the curated seed alignment has been downloaded from PFAM and used as an 508 input for the Skylign server (Wheeler et al., 2014) to build HMM profiles and to get the 509 information count for the residues of interest. 510 511

17 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

512 Supplemental Data 513 Supplemental Data Set 1. Newick-formatted phylogenetic tree of FT/YbhB homologs in bacteria, 514 archaea and eukaryotes. 515 Supplemental Data Set 2. List of ligand-binding domains in PFAM and their usage of conserved 516 amino acid residues 517 Supplemental Data Set 3. Occurrences of the motif with consensus sequence TACACTT in 518 Enterobacteriaceae. 519 520 Acknowledgements and Authors' Contributions 521 522 OT is supported by the German Federal Ministry of Education and Research (BMBF) within the 523 framework of the e:Med research and funding concept (grant 01ZX1908A). AM is a program 524 director at the National Science Foundation, an agency of the U.S. Government; his work was 525 supported by the National Science Foundation’s Independent Research/Development and Long- 526 Term Professional Development programs, but the statements and opinions expressed herein are 527 made in the personal capacity and do not constitute the endorsement by NSF or the government 528 of the United States. 529 530 AM designed the research; OT and AM performed research; OT and AM wrote the paper. 531 532 533 534 535 536 537 538 539 540 541 542

18 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

543 References Abe, M., Kobayashi, Y., Yamamoto, S., Daimon, Y., Yamaguchi, A., Ikeda, Y., Ichinoki, H., Notaguchi, M., Goto, K., and Araki, T. (2005). FD, a bZIP protein mediating signals from the floral pathway integrator FT at the shoot apex. Science 309, 1052–1056. Abelenda, J.A., Bergonzi, S., Oortwijn, M., Sonnewald, S., Du, M., Visser, R.G.F., Sonnewald, U., and Bachem, C.W.B. (2019). Source-Sink Regulation Is Mediated by Interaction of an FT Homolog with a SWEET Protein in Potato. Current Biology 29, 1178-1186.e6. Ahn, J.H., Miller, D., Winter, V.J., Banfield, M.J., Lee, J.H., Yoo, S.Y., Henz, S.R., Brady, R.L., and Weigel, D. (2006). A divergent external loop confers antagonistic activity on floral regulators FT and TFL1. EMBO J 25, 605–614. Altschul, S.F., Madden, T.I., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402. Andrés, F., and Coupland, G. (2012). The genetic basis of flowering responses to seasonal cues. Nat Rev Genet 13, 627–639. Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M., and Teichmann, S.A. (2004). Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14, 283– 291. Baker, P.A., Fritz, S.C., Dick, C.W., Eckert, A.J., Horton, B.K., Manzoni, S., Ribas, C.C., Garzione, C.N., and Battisti, D.S. (2014). The emerging field of geogenomics: Constraining geological problems with genetic data. Earth-Science Reviews 135, 38–47. Bernier, I., and Jollès, P. (1984). Purification and characterization of a basic 23 kDa cytosolic protein from bovine brain. Biochim. Biophys. Acta 790, 174–181. Burroughs, A.M., and Aravind, L. (2020). Identification of Uncharacterized Components of Prokaryotic Immune Systems and Their Diverse Eukaryotic Reformulations. J Bacteriol 202. Burroughs, A.M., Zhang, D., Schäffer, D.E., Iyer, L.M., and Aravind, L. (2015). Comparative genomic analyses reveal a vast, novel network of nucleotide-centric systems in biological conflicts, immunity and signaling. Nucleic Acids Res 43, 10633–10654. Carr, S.M., Marshall, H.D., Wareham, T., and Craig, D. (2015). The Big ORF Theory. In Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, (Elsevier), pp. 265–274.

19 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Chailakhyan, M. (1936). New facts supporting the hormonal theory of plant development. Dokl. Akad. Nauk SSSR 4, 77–81. Cheng, H., Liao, Y., Schaeffer, R.D., and Grishin, N.V. (2015). Manual classification strategies in the ECOD database: ECOD Manual Classification Strategies. Proteins 83, 1238– 1251. Conti, L., and Bradley, D. (2007). TERMINAL FLOWER1 Is a Mobile Signal Controlling Arabidopsis Architecture. Plant Cell 19, 767–778. Corbesier, L., Vincent, C., Jang, S., Fornara, F., Fan, Q., Searle, I., Giakountis, A., Farrona, S., Gissot, L., Turnbull, C., et al. (2007). FT Protein Movement Contributes to Long- Distance Signaling in Floral Induction of Arabidopsis. Science 316, 1030–1033. Danilevskaya, O.N., Meng, X., McGonigle, B., and Muszynski, M.G. (2011). Beyond flowering time: pleiotropic function of the maize flowering hormone florigen. Plant Signal Behav 6, 1267–1270. Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.-F., Guindon, S., Lefort, V., Lescot, M., et al. (2008). Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 36, W465-469. Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–1797. El-Gebali, S., Mistry, J., Bateman, A., Eddy, S.R., Luciani, A., Potter, S.C., Qureshi, M., Richardson, L.J., Salazar, G.A., Smart, A., et al. (2019). The Pfam protein families database in 2019. Nucleic Acids Res 47, D427–D432. Galperin, M.Y., Wolf, Y.I., Makarova, K.S., Vera Alvarez, R., Landsman, D., and Koonin, E.V. (2021). COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 49, D274–D281. Garner, W.W., and Allard, H.A. (1920). Effect of the relative length of day and night and other factors of the environment on growth and reporduction in plants. Monthly Weather Review 48, 415–415. Glazko, G.V., and Mushegian, A.R. (2004). Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns. Genome Biol. 5, R32. Glazko, G., Gordon, A., and Mushegian, A. (2005). The choice of optimal distance measure in genome-wide datasets. Bioinformatics 21 Suppl 3, iii3-11.

20 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Glazko, G., Coleman, M., and Mushegian, A. (2006). Similarity searches in genome-wide numerical data sets. Biol. Direct 1, 13. Guo, C., Chang, T., Sun, T., Wu, Z., Dai, Y., Yao, H., and Lin, D. (2018). Anti-leprosy drug Clofazimine binds to human Raf1 kinase inhibitory protein and enhances ERK phosphorylation. Acta Biochim. Biophys. Sin. (Shanghai) 50, 1062–1067. Howe, K.L., Contreras-Moreira, B., De Silva, N., Maslen, G., Akanni, W., Allen, J., Alvarez-Jarreta, J., Barba, M., Bolser, D.M., Cambell, L., et al. (2020). Ensembl Genomes 2020- enabling non-vertebrate genomic research. Nucleic Acids Res 48, D689–D695. Huerta, A.M., and Collado-Vides, J. (2003). Sigma70 Promoters in Escherichia coli: Specific Transcription in Dense Regions of Overlapping Promoter-like Signals. Journal of Molecular Biology 333, 261–278. Huynen, M., Snel, B., Lathe, W., and Bork, P. (2000). Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10, 1204–1210. Iyer, L.M., Aravind, L., Bork, P., Hofmann, K., Mushegian, A.R., Zhulin, I.B., and Koonin, E.V. (2001). Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences. Genome Biol. 2, RESEARCH0051. Jin, S., Nasim, Z., Susila, H., and Ahn, J.H. (2021). Evolution and functional diversification of FLOWERING LOCUS T/TERMINAL FLOWER 1 family genes in plants. Semin Cell Dev Biol 109, 20–30. Kakimoto, T. (2003). Biosynthesis of cytokinins. J Plant Res 116, 233–239. Karlgren, A., Gyllenstrand, N., Källman, T., Sundström, J.F., Moore, D., Lascoux, M., and Lagercrantz, U. (2011). Evolution of the PEBP Gene Family in Plants: Functional Diversification in Seed Plant Evolution. Plant Physiol. 156, 1967–1977. Kimura, S., Sakai, Y., Ishiguro, K., and Suzuki, T. (2017). Biogenesis and iron- dependency of ribosomal RNA hydroxylation. Nucleic Acids Research 45, 12974–12986. King, R.W., Evans, L.T., and Wardlaw, I.F. (1968). Translocation of the floral stimulus in Pharbitis nil in relation to that of assimilates. Zeitschrift Fur Pflanzenphysiologie 59, S377-388. Kinoshita, A., and Richter, R. (2020). Genetic and molecular basis of floral induction in . Journal of Experimental Botany 71, 2490–2504.

21 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Klintenäs, M., Pin, P.A., Benlloch, R., Ingvarsson, P.K., and Nilsson, O. (2012). Analysis of conifer FLOWERING LOCUS T/TERMINAL FLOWER1-like genes provides evidence for dramatic biochemical evolution in the angiosperm FT lineage. New Phytol. 196, 1260–1273. Knott, J.E. (1934). Effect of a localized photoperiod on spinach. In Proc Amer Soc Hortic Sci, pp. 152–154. Kobayashi, Y., and Weigel, D. (2007). Move on up, it’s time for change--mobile signals controlling photoperiod-dependent flowering. Genes Dev. 21, 2371–2384. Koornneef, M., Hanhart, C.J., and van der Veen, J.H. (1991). A genetic and physiological analysis of late flowering mutants in Arabidopsis thaliana. Molec. Gen. Genet. 229, 57–66. Krieger, U., Lippman, Z.B., and Zamir, D. (2010). The flowering gene SINGLE FLOWER TRUSS drives heterosis for yield in tomato. Nat. Genet. 42, 459–463. Lang, A., Chailakhyan, M.K., and Frolova, I.A. (1977). Promotion and inhibition of flower formation in a dayneutral plant in grafts with a short-day plant and a long-day plant. Proc. Natl. Acad. Sci. U.S.A. 74, 2412–2416. Lau, M.E., Danka, E.S., Tiemann, K.M., and Hunstad, D.A. (2014). Bacterial lysis liberates the neutrophil migration suppressor YbcL from the periplasm of uropathogenic Escherichia coli. Infect. Immun. 82, 4921–4930. Li, W., Ju, J., Rajski, S.R., Osada, H., and Shen, B. (2008). Characterization of the tautomycin biosynthetic gene cluster from Streptomyces spiroverticillatus unveiling new insights into dialkylmaleic anhydride and polyketide biosynthesis. J. Biol. Chem. 283, 28607–28617. Li, W., Luo, Y., Ju, J., Rajski, S.R., Osada, H., and Shen, B. (2009). Characterization of the tautomycetin biosynthetic gene cluster from Streptomyces griseochromogenes provides new insight into dialkylmaleic anhydride biosynthesis. J. Nat. Prod. 72, 450–459. Lifschitz, E., and Eshed, Y. (2006). Universal florigenic signals triggered by FT homologues regulate growth and flowering cycles in perennial day-neutral tomato. J. Exp. Bot. 57, 3405–3414. Lifschitz, E., Ayre, B.G., and Eshed, Y. (2014). Florigen and anti-florigen – a systemic mechanism for coordinating growth and termination in flowering plants. Front. Plant Sci. 5. McLennan, A.G. (2006). The Nudix hydrolase superfamily. Cell Mol Life Sci 63, 123– 143.

22 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Milyaeva, E.L., Golyanovskaya, S.A., Aksenova, N.P., and Chailakhyan, M.Kh. (1991). Changes of Protein Composition in Leaves and Stem Apices of Bicolored Coneflower with Transition to Flowering. Dokl. Akad. Nauk SSSR (Bot. Sciences) 11–15. Mironov, A.A., Vinokurova, N.P., and Gelfand, M.S. (2000). Software for analysis of bacterial genomes. Mol Biol 34, 222–231. Miyazaki, T., Sugisawa, T., and Hoshino, T. (2006). Pyrroloquinoline Quinone- Dependent Dehydrogenases from Ketogulonicigenium vulgare Catalyze the Direct Conversion of l-Sorbosone to l-Ascorbic Acid. AEM 72, 1487–1495. Musfeldt, M., and Schönheit, P. (2002). Novel Type of ADP-Forming Acetyl Coenzyme A Synthetase in Hyperthermophilic Archaea: Heterologous Expression and Characterization of Isoenzymes from the Sulfate Reducer Archaeoglobus fulgidus and the Methanogen Methanococcus jannaschii. JB 184, 636–644. Nakamura, Y., Andrés, F., Kanehara, K., Liu, Y., Dörmann, P., and Coupland, G. (2014a). Arabidopsis florigen FT binds to diurnally oscillating phospholipids that accelerate flowering. Nat Commun 5, 3553. Nakamura, Y., Andrés, F., Kanehara, K., Liu, Y., Coupland, G., and Dörmann, P. (2014b). Diurnal and circadian expression profiles of glycerolipid biosynthetic genes in Arabidopsis. Plant Signal Behav 9, e29715. Nakamura, Y., Lin, Y.-C., Watanabe, S., Liu, Y.-C., Katsuyama, K., Kanehara, K., and Inaba, K. (2019). High-Resolution Crystal Structure of Arabidopsis FLOWERING LOCUS T Illuminates Its Phospholipid-Binding Site in Flowering. IScience 21, 577–586. Penel, C., Gaspae, Th., and Greppin, H. (1985). Rapid interorgan communications in higher plants with special reference to flowering. Biol Plant 27, 334–338. Persson, E., Castresana-Aguirre, M., Buzzao, D., Guala, D., and Sonnhammer, E.L.L. (2021). FunCoup 5: Functional Association Networks in All Domains of Life, Supporting Directed Links and Tissue-Specificity. J Mol Biol 166835. Pin, P.A., and Nilsson, O. (2012). The multifaceted roles of FLOWERING LOCUS T in plant development: FT, a multifunctional protein. Plant, Cell & Environment 35, 1742–1755. Pnueli, L., Gutfinger, T., Hareven, D., Ben-Naim, O., Ron, N., Adir, N., and Lifschitz, E. (2001). Tomato SP-Interacting Proteins Define a Conserved Signaling System That Regulates Shoot Architecture and Flowering. Plant Cell 13, 2687–2702.

23 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Ribeiro, A.J.M., Holliday, G.L., Furnham, N., Tyzack, J.D., Ferris, K., and Thornton, J.M. (2018). Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res 46, D618–D623. Roderick, S.L., Chan, W.W., Agate, D.S., Olsen, L.R., Vetting, M.W., Rajashankar, K.R., and Cohen, D.E. (2002). Structure of human phosphatidylcholine transfer protein in complex with its ligand. Nat. Struct. Biol. 9, 507–511. Romanov, G.A. (2012). Mikhail Khristoforovich Chailakhyan: The fate of the scientist under the sign of florigen. Russ J Plant Physiol 59, 443–450. Rudnitskaya, A.N., Eddy, N.A., Fenteany, G., and Gascón, J.A. (2012). Recognition and reactivity in the binding between Raf kinase inhibitor protein and its small-molecule inhibitor locostatin. J Phys Chem B 116, 10176–10181. Sachs, J. (1865). Wirkung des Lichtes auf die Blütenbildung unter Vermittlung der Laubblätter. Bot. Ztg 23, 117. Sadreyev, I.R., Ji, F., Cohen, E., Ruvkun, G., and Tabach, Y. (2015). PhyloGene server for identification and visualization of co-evolving proteins using normalized phylogenetic profiles. Nucleic Acids Res 43, W154-159. Salazar-Cerezo, S., Martínez-Montiel, N., García-Sánchez, J., Pérez-Y-Terrón, R., and Martínez-Contreras, R.D. (2018). biosynthesis and metabolism: A convergent route for plants, fungi and bacteria. Microbiol Res 208, 85–98. Shemon, A.N., Heil, G.L., Granovsky, A.E., Clark, M.M., McElheny, D., Chimon, A., Rosner, M.R., and Koide, S. (2010). Characterization of the Raf kinase inhibitory protein (RKIP) binding pocket: NMR-based screening identifies small-molecule ligands. PLoS ONE 5, e10479. Stark, A., and Russell, R.B. (2003). Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res 31, 3341–3344. Suhre, K., and Claverie, J.-M. (2004). FusionDB: a database for in-depth analysis of prokaryotic gene fusion events. Nucleic Acids Res 32, D273-276. Susila, H., Nasim, Z., Gawarecka, K., Jung, J.-Y., Jin, S., Youn, G., and Ahn, J.H. (2021). PHOSPHORYLETHANOLAMINE CYTIDYLYLTRANSFERASE 1 modulates flowering in a florigen-independent manner by regulating SVP. Development 148, dev193870.

24 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Takeba, G., Nakajima, Y., Kozaki, A., Tanaka, O., and Kasai, Z. (1990). A Flower- Inducing Substance of High Molecular Weight from Higher Plants. Plant Physiol. 94, 1677– 1681. Tamaki, S., Matsuo, S., Wong, H.L., Yokoi, S., and Shimamoto, K. (2007). Hd3a protein is a mobile flowering signal in rice. Science 316, 1033–1036. Taoka, K., Ohki, I., Tsuji, H., Furuita, K., Hayashi, K., Yanase, T., Yamaguchi, M., Nakashima, C., Purwestri, Y.A., Tamaki, S., et al. (2011). 14-3-3 proteins act as intracellular receptors for rice Hd3a florigen. Nature 476, 332–335. Thao, S., and Escalante-Semerena, J.C. (2012). A positive selection approach identifies residues important for folding of Salmonella enterica Pat, an Nε-lysine acetyltransferase that regulates central metabolism enzymes. Research in Microbiology 163, 427–435. Tsuji, H., Taoka, K., and Shimamoto, K. (2011). Regulation of flowering in rice: two florigen genes, a complex gene network, and natural variation. Curr. Opin. Plant Biol. 14, 45–52. van Tran, N., Muller, L., Ross, R.L., Lestini, R., Létoquart, J., Ulryck, N., Limbach, P.A., de Crécy-Lagard, V., Cianférani, S., and Graille, M. (2018). Evolutionary insights into Trm112- methyltransferase holoenzymes involved in translation between archaea and eukaryotes. Nucleic Acids Research 46, 8483–8499. Wheeler, T.J., Clements, J., and Finn, R.D. (2014). Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics 15, 7. Wigge, P.A. (2005). Integration of Spatial and Temporal Information During Floral Induction in Arabidopsis. Science 309, 1056–1059. Zeevaart, J.A.D. (1976). Physiology of Flower Formation. Annu. Rev. Plant. Physiol. 27, 321–348. Zeevaart, J.A.D. (2006). Florigen Coming of Age after 70 Years. Plant Cell 18, 1783– 1789. Zhu, Y., Klasfeld, S., and Wagner, D. (2021). Molecular regulation of plant developmental transitions and plant architecture via PEPB family proteins – an update on mechanism of action. Journal of Experimental Botany 72, 2301-2311. 544

25 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

545 Type of contextual Functionally linked proteins / domains / COGs Taxa in which these COG functional categories / most evidence functions are putatively relevant GO terms // Additional linked comments Domain fusion Glucose-sorbosone dehydrogenase, (Pectate Lyase- Actinobacteria, Carbohydrate transport and like carbohydrate-binding module, FN3-like Gammaproteobacteria metabolism/Hydrolase activity, acting on domain) (COG1881, COG2133) glycosyl bonds Operon Dialkylmaleic anhydride synthesis and conjugation Actinobacteria // five-carbon substrate – a pentose? module of tautomycin/tautomycetin biosynthesis operon Phyletic vector Acyl-CoA synthetase, NDP forming (COG1042). 220 species Energy production and conversion/ATP- (prokaryotes) binding, N-acetyltransferase activity // nucleoside-containg substrate Phyletic vector YbaR/Trm112 activator of RNA and protein 233 species Translation, ribosomal structure and (prokaryotes) methyltransferases; COG2835 genesis // RNA modification Phyletic vector NUDT3 and other NUDIX hydrolases Metazoa // Hydrolases preferring pyrophosphate- (eukaryotes) containing substrates (nucleoside phosphates or phospholipids) Shared putative RlmA 23S rRNA m(1)G745 methyltransferase Enterobacteriaceae Coenzyme transport and regulatory motif (COG2226) metabolism/rRNA base methylation Shared putative YobB putative carbon-nitrogen hydrolase family Enterobacteriaceae Energy production and regulatory motif protein (COG0388) conversion/Nitrogen compound metabolic process Shared putative AdrB c-di-GMP phosphodiesterase (COG2200) Enterobacteriaceae Signal transduction mechanisms/Cellular regulatory motif response to DNA damage stimulus Integrated evidence RlhA 23S rRNA 5-hydroxycytidine C2501 n/a Signal transduction mechanisms/rRNA synthase (COG0826) processing, rRNA modification 546 547 Table 1. Genome-context information suggesting functional links between YbhB homologs and various enzymes of 548 biosynthesis and salvage of small molecules. See text for a more detailed characterization of each putative functional link.

26 bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license.

Secondary 1fjj sss sss sss sssss ssssssss sssssss sss sss sssssssssss hhhhhhhhhh ssssssssssss Secondary 1beh ssss sss sss sssssssss sss sss ssssssssssss sss ssssssssss hhhhhhh ssssssss Secondary 1wkp ssss sssss sssssssss sssssssssss ssss ssssssssss hhhhhhh sssssssss >NP_001320342.1_FT_1wkp_AT1G654 KVTYGQ------REVTNGLDLRPSQV------QNKPRVEIGG------EDL-----RNFYTLVMVDPDVPSPSNPHLR------EYLHWLVTDIPATTGTT------FGNEIVCYENPSPT--AG-IHRVVFILFRQLGR-----QTVYA------PGWRQNFNTREFAEIYNLG-LPVAAVFYNCQRES >NP_002558.1_pebp1_1beh_human HVTYAGAAV------DELGKVLTPTQV------KNRPTSISWDG------LDS----GKLYTLVLTDPDAPSRKDPKYR------EWHHFLVVNMKGNDIS------SGTVLSDYVGSGPPKGTG-LHRYVWLVYEQDRP-----LKCDEPI-----LSNRSGDHRGKFKVASFRKKYELR-APVAGTCYQAEWDD >XP_022982617.1_LOC1114814 RLASSA------IDHEGRLPRKYTSE--GQG----AQKNKSPPLEWYN------VPQG----TKTLALVVQDIDAPNPDRPIV------PWTVWVVVNIPPSLKGLPEDFSGNQQG------LGKDYAAIQEGNNDDKIPGWRVPTLP--SL-GHRFEFKLYALDDE-----MNLGN------KPSKEKLLDEIEG--HVLGEAVLMAIF-- >NP_195750.1_AT5G01300 RLVSPT------IDNDGKLPRKYTMA--GQG----VKKDISPPLEWYN------VPEG----TKTLALVVEDIDAPDPSGPLV------PWTVWVVVDIPPEMKGLPEGYSGNE------DQTTGIREGNNDHKIPGWRGPLLP--SH-GHRFQFKLFALDDK-----PKIGH------TVTKERLLIAIEG--HVLGEAILTCLA-- >NP_193770.1_AT4G20370 KVTYGH------REVTNGLDLRPSQV------LNKPIVEIGG------DDF-----RNFYTLVMVDPDVPSPSNPHQR------EYLHWLVTDIPATTGNA------FGNEVVCYESPRPP--SG-IHRIVLVLFRQLGR-----QTVYA------PGWRQQFNTREFAEIYNLG-LPVAASYFNCQREN >XP_022987455.1_HEADING_DA RVVYNS------REVNNGCELKPSQA------VNKPRVEIGG------TDL-----RTFFTLVMVDPDAPSPSDPNLR------EYLHWLVTDIPATTEAT------FGQEIVCYENPRPT--VG-IHRFVLVLFRQLGR-----QTVYA------PGWRQNFNTRHFAELYNLG-SPVAAVYFNCQREN >XP_022973471.1_HEADING_DA RATYNN------REISNGCELKPSQV------VNQPRVEIGG------TDL-----RTFFTLVMVDPDAPSPSDPNLR------EYLHWLVTDIPATTGAN------FGQEIVCYESPRPT--VG-IHRLVLVLFRQLGR-----QTVYA------PGWRQNFNTRDFAELYNLG-LPVAAVYFNCQRES >NP_180324.1_cen_AT2G27550 TVTYNSD------KQVYNGHELFPSVV------TYKPKVEVHG------GDM-----RSFFTLVMTDPDVPGPSDPYLR------EHLHWIVTDIPGTTDVS------FGKEIIGYEMPRPN--IG-IHRFVYLLFKQTRR-----GSVVSV------PSYRDQFNTREFAHENDLG-LPVAAVFFNCQRET >XP_022984743.1_CEN-like_1 SVVYNST------KQVANGHELPPCFV------SSKPRVEVGG------DDM-----RSAFTLIMIDPDAPSPSDPFHK------EYLHWMVTDIPGTTDAS------FGKEMMSYESPNPL--IG-IHRYVFVLFKQHGR-----EMVRLS------SSSRQHFSTRKFSEANGLG-LPVAAVYFNAQREK >NP_201010.1_BFT_AT5G62040 RVTFNSN------TIVSNGHELAPSLL------LSKPRVEIGG------QDL-----RSFFTLIMMDPDAPSPSNPYMR------EYLHWMVTDIPGTTDAS------FGREIVRYETPKPV--AG-IHRYVFALFKQRGR-----QAVKAA------PETRECFNTNAFSSYFGLS-QPVAAVYFNAQRET >XP_022972799.1_TFL1-like_ CITFSN------KLVLNGYQFFPSSL------SFKPRVHISG------EDL-----RSLFTLIMVDPDVPGPSDPFLR------EHLHWLVTDIPGTTDAT------FGREEVSYEIPKPT--IG-IHRFVFVLFKQKHR-----CSVNS------PSSRDHFNTRRFAAENDLG-LPVAAVYFNAKRGT >XP_022978007.1_TFL1-like_ CATFSN------KQVLNGHEFFPSSL------SFKPRVLIQG------EDM-----RSLFTLIMVDPDVPGPSDPYMR------EHLHWLVTDIPGTTDAT------FGKELMSYEIPKPT--IG-IHRFVFVLFKQKRR-----RSVNP------PSSRDRFNTRRFSAENDLG-LPVAAVYFNAQRET >NP_196004.1_TFL1_AT5G0384 NVSYNK------KQVSNGHELFPSSV------SSKPRVEIHG------GDL-----RSFFTLVMIDPDVPGPSDPFLK------EHLHWIVTNIPGTTDAT------FGKEVVSYELPRPS--IG-IHRFVFVLFRQKQR-----RVIFPN------IPSRDHFNTRKFAVEYDLG-LPVAAVFFNAQRET >XP_022985790.1_CEN-like_2 TVTYHSN------KKVSNGHELLPNFI------TLKPKVEVLG------GDL-----RSFFTLVMTDPDVPGPSDPYLR------EHLHWIVTDIPGTTDSS------FGKEVVKYEVPSPN--VG-IHRFVFLLFKQTRR-----QTVKPPA------ASASRDGFNARRFAEENGLG-LPVAAVYFNAQRAT >XP_022983119.1_SELF-PRUNI TVSYNN------KQVFNGHEFFPSAV------AAKPKAEILG------GDL-----RSFFTLVMTDPDVPGPSDPYLR------EHLHWIVTDIPGTTDAT------FGREVISYETPKPN--IG-IHRFVFVLFKQKRR-----QTVSA------PSSRDHFSTRVFAAENDLG-NPVAAVYFNAQRET >NP_173250.1_AT1G18100 SVYFGP------KHITNGCEIKPSTA------VNPPKVNISG------HS-----DELYTLVMTDPDAPSPSEPNMR------EWVHWIVVDIPGGTNPS------RGKEILPYMEPRPP--VG-IHRYILVLFRQNSP-----VGLMV------QQPPSRANFSTRMFAGHFDLG-LPVATVYFNAQKEP >XP_023000789.1_MFT-like_C SVTYAS------KQVANGCEIKPSAA------ADAPTVVIQG------PISN----NDLYALVMVDPDAPSPSEPTFR------EWLHWIVVDIPEGSDAN------KGKEVVQYMGPQPP--TG-IHRYVLAVFKQNRA-----IGGSL------RPPNTRSNFSTQQFASQNALG-LPVAAVYFNSQKQP >XP_022994708.1_MFT_Cucur_ SVYYHS------KHVTNGCNIRPSLT------DDPPRILMSS------LNL-----SDLYTLVMVDPDAPSPSEPYMR------EWVHWIVVDIPGTGNVN------LGREILAYTGPCPP--IG-IHRYILLLFKQKGP------IGMI------EQPATRANFNTRHFATHLELG-LPIAATYFNAQKEP >NP_649642.1_Dmel_CG17917 NVTYHGH------LAAHCGKVLEPMQV------RDEPSVKWPS------APE------NYYALLMVDPDVPNAITPTHR------EFLHWMVLNIPGNLLA------LGDVRVGYMGATPLKGTG-THRFVFLLYKQRDY-----TKFDFPK-----LPKHSVKGRSGFETKRFAKKYRFG-HPVAGNFFTSQWSP >NP_013280.1_YLR179C_yeast SVSYVDS------DDIKLGNPMPMEAT------QAAPTIKFTP------FDKSQLSAEDKLALLMTDPDAPSRTEHKWS------EVCHYIITDIPVEYGPGGDIAI------SGKGVVRNNYIGPGPPKNSG-YHRYVFFLCKQPKG-----ADSST-FTK-VENIISWGYGTPGAGAYDYIKENNL--QLVGANYYMVENTT >NP_013279.1_Tfs1p_yeast AVEYSSS------APVAMGNTLPTEKA------RSKPQFQFTFNKQMQKSVPQANAYVPQDDDLFTLVMTDPDAPSKTDHKWS------EFCHLVECDLKLLNEATHETSGATE------FFASEFNTKGSNTLIEYMGPAPPKGSG-PHRYVFLLYKQPKG-----VDSSK-FSK-IKDRPNWGYGTPATGVGKWAKENNL--QLVASNFFYAETK- >NP_001016825.1_Xenop_tro LVTYGSL------GIDELGQVLTPTQV------QSRPSSIEWEG------MDS----SKLYTLVLTDPDAPSRKNPKFR------EWHHFLVVNMKGNNIN------SGCVLSDYVGSGPPKGTG-LHRYVWLVYEQTEE-----LKCTERV-----LCNRSGEHRGMFKVASFRQKYKLG-TPVAGNCYQAEWDD >NP_002558.1_pebp1_1beh_human HVTYAGA------AVDELGKVLTPTQV------KNRPTSISWDG------LDS----GKLYTLVLTDPDAPSRKDPKYR------EWHHFLVVNMKGNDIS------SGTVLSDYVGSGPPKGTG-LHRYVWLVYEQDRP-----LKCDEPI-----LSNRSGDHRGKFKVASFRKKYELR-APVAGTCYQAEWDD >XP_002938131.1_Xenop_tro KVAFGSV------CVEELGQVLTPTQV------QHCPNIEWES------MDS----SKLYTVIFTDPDVPSRKECHLG------EWHHFLAVNVKGNDLS------SGCILTAYVGSGPGKGTG-LHRYTILVYEQAGR-----VQCTERI-----LGNTSAEHRGKFKASEFRKKYKLA-APIAGTCFQAEWDD >NP_651050.1_Dmel_CG7054 KVIYGDD------LEVKQGNELTPTQV------KDQPIVSWSG------LEGK----SNLLTLLMVDPDAPTRQDPKYR------EILHWSVVNIPGSNENPS------GGHSLADYVGSGPPKDTG-LHRYIFLLYRQENK-----IEETPTI------SNTTRTGRLNFNARDFAAKHGLG-EPIAANYYQAQYDD >NP_651051.1_PEBP1_Drome TITYPSG------VQVELGKELTPTQV------KDQPTVVFDA------EP----NSLYTILLVDPDAPSREDPKFR------ELLHWLVINIPGNKVS------EGQTIAEYIGAGPREGTG-LHRYVFLVFKQNDK-----ITTEKFV------SKTSRTGRINVKARDYIQKYSFG-GPVAGNFFQAQYDD >NP_649644.1_Dmel_CG17919 KVTYSNN------LVAKDGVELTPTQV------KDQPVVEWDA------QP------GEFYTLIMTDPDAPSRAEPKFR------EFKHWILANIAGNDLA------SGEPIAEYIGSGPPQGTG-LHRYVFLLYKQSGK-----LEFDEER-----VSKRSRKDRPKFSAAKFAINHELG-NPIAGTFYQAQYDD >NP_649643.1_Dmel_CG10298 TVTYGGG------QVVDVGGELTPTQV------QSQPKVKWDA------DP------NAFYTLLLTDPDAPSRKEPKFR------EWHHWLVVNIPGNQVE------NGVVLTEYVGAGPPQGTG-LHRYVFLVFKQPQK-----LTCNEPK-----IPKTSGDKRANFSTSKFMSKYKLG-DPIAGNFFQAQWDD >NP_001023904.1_F40A3.3_Ca SVKFNSG------VEANLGNVLTPTQV------KDTPEVKWDA------EP------GALYTLIKTDPDAPSRKEPTYR------EWHHWLVVNIPGNDIA------KGDTLSEYIGAGPPPKTG-LHRYVYLIYKQSGR-----IEDAEHG----RLTNTSGDKRGGWKAADFVAKHKLG-APVFGNLFQAEYDD >NP_502042.1_Y69E1A.5_Caee HLCWDG------IQVEPGMTMQVRNL------KNAPRWALPG------ADP-----ESIYTVLMIDPDNLSRKNPSVA------EWLHWLVCNIPASNIIDGI------NGGQHQMAYGSPAPGPRTD-LHRYVILMWEHAGR-----RISVPK------PSSRAKFNVKQFIEKNKLG-DPIAGNFFLAQHEG >NP_498385.2_C56G2.4_2_Caeel TTLTPPSIS---SMRISSSQSNYINYHR------QTRNFVELSN------EKFSLAIIDAHHG------HLHWLEVDIPAANLN------AANGNGLTKADYVPLIPKKPSTCHSYLFVLLAQPAS-----MQTLESY------CEGMCETRKKFRLELFKQQHGL-RLSALSTVSSCYDL >NP_498385.2_C56G2.4_1_Caeel EHDFFSGNTS--AVQADPDMDPVEPFRI------RKRPEITWSG------LQ------DEQYTVVITDVGYA------TINFLAVDFPKATKI------LKDYEPTENYRPSP--NPMVVLVFRNGRAD------IES------PKSEEDFNLPQFMLKHELE-DDLVGLSLIVASSD >NP_609588.1_Dmel_CG6180 VVEYPGD------IVVKPGQVLTPTQV------KDEPCVKWEA------DA------NKLYTLCMTDPDAPSRKDPKFR------EWHHWLVGNIPGGDVA------KGEVLSAYVGSGPPPDTG-LHRYVFLIYEQRCK-----LTFDEKR-----LPNNSGDGRGGFKIAEFAKKYALG-NPIAGNLYQAEYDD >NP_001350162.1_pebp4_huma EVFYPEL------GNIGCKVVPDCNNYRQKI------TSWMEPIVKFPG------AVD-----GATYILVMVDPDAPSRAEPRQR------FWRHWLVTDIKGADLKKGK------IQGQELSAYQAPSPPAHSG-FHRYQFFVYLQEGK-----VISLLP------KENKTRGSWKMDRFLNRFHLG-EPEASTQFMTQNYQ >XP_002932751.1_Xenop_tro DVIYPDIGRVSCIYIPNCFDFPRSLIKV------WDYPLLRYSK------AQP-----GLKYVLIMVDPDAPSRWDPKYR------YWRHWVLTDIPGWQLLSGR------DLTGNDISAYRRPSPPPGTG-YHRYQFYLYEQPLW-----VILYFL------PEEIRRSTWDLKAFVQRNKLG-EPVATTQFLAMNPI >NP_476998.1_antennal5_Dro RIKYDNT------IDIEEGKTYTPTEL------KFQPRLDWNA------DP------ESFYTVLMICPDAPNRENPMYR------SWLHWLVVNVPGLDIM------KGQPISEYFGPLPPKDSG-IQRYLILVYQQSDK-----LDFDEKK-----MELSNADGHSNFDVMKFTQKYEMG-SPVAGNIFQSRWDE >NP_725293.1_Dmel_CG30060 SVLYPCD------IDIKPGIMVVINET------LKQPIIRFKA------DP------EHYHTLMMVDLDVPPDNNT------EWLIWMVGNIPGCDVA------MGQTLVAYDNRRTIHGSN-IHRIVFLAFKQYLE-----LDFDETF-----VPEGEEKGRGTFNCHNFARKYALG-NPMAANFYLVEWLW >XP_002111463.1_TRIADDRAFT GVEYPN------YRVQWGNLISPSKT------AARPKISFAP------KYN-----DGLWTLIMINPDGVFIENQQES------EILHWLLINMAPSTAI------KGVEACKYLQPLPIRGTG-YHRLIFLLCKQSRE-----IPVPD------QITDIADRSFSTYNFLIQHQDSLKPVGLRFFQSDWDK >XP_004341923.1_Acanthamoe QLKYGSK------GVTEGQELKPSEV------QHQPTVDWDA------DE------NALYTLAMVDPDAPSRDDPKDREVYVDTNVDVVVVIVGLTLWLSTGAIGCRIK------EGDLLTPYQGAAPPPGSG-EHRYVLVLFKQPDR-----IVPEK------MSNDTAGRKSFKIEAWAKKNYML-PALAATHFRAQPEE >XP_001746551.1_Monosiga AVSWAD------TYTYRGNLVAPNDT------LVAPAFHFEP------EA------NVQYTVALLDADGRLDGSEG------QVAHALTCNLTGPT------ASGDEVLPYMPLTPFQGSG-YHRVVAVLLQQSGA-----LDAAALRDSLPAVSSQNPLSNRYMKLQDWARQHNL--QPVAFSFSQAQHDA >NP_010608.1_mrpYmL35_3J6B_1_yea NIKFPFSTG---VNKWIEPGEFLSSNVT------SMRPIFKIQE------YELVN-VEKQLYTVLIVNPDVPDLSNDSFK------TALCYGLVNINLTYNDNLIDPRK------FHSSNIIADYLPPVPEKNAG-KQRFVVWVFRQPLI-----EDKQG-PNM--LEIDRKELSRDDFDIRQFTKKYNL--TAIGAHIWRSEWDA >XP_004338458.1_mrpL38_Aca KNIRKKA--KLHKVFPDVLPEPFTPYFML------QLQTTRVSWPS------DPTKR-----WTLIMATPDDPDLVPDDVEIVK--PGWVVPHEEGRGKEVLHWLVTNIPGNDIS------QGQVLCDYLPPMPKKHGG-GHRYVFLLFEQLDG-----ETQFD-AFP--EGALKEFVQRKNWNTFKAQEKYGL--RPKALAFFKATWDA >NP_001342831.1_mrpL35_Sch QLGFNPEN-----NDSITPGTILPSTVT------VKTPWLSVLP------FNCK----KNHYSVITLDLDVPNYETNRFE------THCNWLLTNIPIEASKRVP------IDTSKAFFQYRPPIVHRGED-KHRILTLVLRQKSS-----SISIPS------NALVRERFDLSEFCSIYDL--EPVGAHLWRSGWDS >NP_115867.2_mrpL38_human HVAYAVGED---DLMPVYCGNEVTPTEA------AQAPEVTYEA------EE------GSLWTLLLTSLDGHLLEPDA------EYLHWLLTNIPGNRVA------EGQVTCPYLPPFPARGSG-IHRLAFLLFKQDQP-----IDFSEDA----RPSPCYQLAQRTFRTFDFYKKHQETMTPAGLSFFQCRWDD >NP_001017146.1_mrpL38_Xen RVQYNKGDE---FLMPVYHGNLVTPAEA------SGPPEVTFEA------EE------GSLWTLLLTNPDGHLKETDS------EYVLWLVGNIPGNQVH------SGEQICHYFPPFPAKGTG-YHRHIFLLFKQDRR-----IEFKDEL----RPNPCHSLKLRTFKTLDFYRKYEESLTPAGLAFFQCAWDD >NP_490808.1_Mrp_Caeel NVQNLQVNF---ENDIVVHSGNVITANST------LKRPEITIES------VGNG----GGFNTLLMINLDGNALDLGK------NGEIVQWMISNIPDGEAI------SAGSEIIDYLQPLPFYGTG-YHRVAFVLFRHEKP-----VDFQIQ------GNSLDTRIHEISKFYKKHEATITPSAIRFFQTSYDN >NP_511152.2_mrpL38_Drome LNISYQLDG---DSLAPVYNGNVIKPTE------AAKAPQIDFDG------LVDPITGQAAGQDTYWTLVASNPDAHYTNGTA------ECLHWFIANIPNGKVS------EGQVLAEYLPPFPPRGVG-YQRMVFVLYKQQAR-----LDLGSYQ---LAAADYGNLEKRTFSTLDFYRQHQEQLTPAGLAFYQTNWDE >NP_415294.1_1fjj_YbhB_Ecoli KLISND------LRDGDKLPHRHVFNGMGY-----DGDNISPHLAWDD------VPAG----TKSFVVTCYDPDAPTGS------GWWHWVVVNLPADTRVLPQGFGSGLV------AMPDGVLQTRTDFGKTGYDGAAPPKGE--THRYIFTVHALDIER----IDVDE------GASGAMVGFNVHF--HSLASASITAMFS- >WP_011487351.1_Paraburkho RLWSDE------FPTNGFMPKAHEYHDKAFG---VDGENISPALKWEA------PPPD----TQSFALTVHDPDAPTGS------GFWHWVVVNIPADIRSLPRNAGKPDG------SLLPHGALQLRNDYGTVGFGGAAPPRGDK-AHRYIFRLHALRVPH----LPINA------DTTNAVARFMTHL--NELDSTTHTGLYEL >NP_415077.1_1fux_YbcL_Ecoli QVTSNE------IKTGEQLTTSHVFSGFGC-----EGGNTSPSLTWSG------VPEG----TKSFAVTVYDPDAPTGS------GWWHWTVVNIPATVTYLPVDAGRRDG------TKLPTGAVQGRNDFGYAGFGGACPPKGDK-PHHYQFKVWALKTEK----IPVDS------NSSGALVGYMLNA--NKIATAEITPVYEI >WP_013460673.1_Sulfuricur TLHSND------LSGQLTKTQEFSGFGCD------GKNISPELHWSD------APKG----TKSFALSVYDPDAPTGS------GWWHWIVFNIPASTVSLPSGFGDTAQ------KHLIQATQSITDYGKAGFGGACPPKGDR-PHRYMFTVHALNVAH----IDLDE------KASPALVGYMINA--HSIAKATIVSYYGR >WP_012024064.1_Flavobacte TLTSKD------LGGETTKVQEFNGFGCS------GENQSPQLSWKN------APEG----TKSFAVTMYDPDAPTGS------GFWHWVVVDIPANVNELVTGAGTKN------LVPKGAVQSVTDFGIKGFGGPCPPVGHG-FHQYIITVYALKTDK----LGTDE------NTNPAVVGFNLWN--QTLAKASIIAYYKR >ADH63402.1_Meiothermus_si RVSSPA------FTPGGTLKAEQVYNGFGC-----TGGNISPELSWTG------APAG----TKSYAVTLYDPDAPTGS------GWWHWVIYNIPASVTKLEAGAGDPKA------NKAPAGSVQGRTDFGTPGYGGPCPPQGDK-PHRYVFTVFALKVEK----LDLPE------GATAAFIGFNLNA--NALAKATLTAMYGR >WP_023563027.1_Actinoplan TVTSTD------VADGRPLDELYAHTSVG------GKNLSPQLSWSG------FPAE----TRGFVVTCFDPDAPTGS------GFWHWVAVNLPASVTELECGADP------LPEGAFSVRNDYGDTGYGGAAPPPGDR-PHRYVFAVHALDVDS----LEVGA------DATPAYVGFNLAF--HTLARATIRPTYQV >WP_023361376.1_Actinoplan TVTSST------ITDGGVLPAEQMSGLFGVP----GGKDISPQLSWTG------APDG----TRGYAVTVYDPDAPTGS------GFWHWAVADLPATVTDLPEGAGDDTG------SGLPEGAFQLRGDAGAARFVGAAPPAGHG-PHRYFVVVHALDVEN----IGIPA------DATPAVLGFTMFG--HVLGRAVLTATAEI >NP_216656.1_Mycobacterium SLTSTS------ITDGQPLATPQVSGIMGA-----GGADASPQLRWSG------FPSE----TRSFAVTVYDPDAPTLS------GFWHWAVANLPANVTELPEGVGDGR------ELPGGALTLVNDAGMRRYVGAAPPPGHG-VHRYYVAVHAVKVEK----LDLPE------DASPAYLGFNLFQ--HAIARAVIFGTYEQ >WP_003856119.1_Corynebact TLTSTD------LENGAKLAEAQLGG------TDISPQLSWSD------LPEG----TKSLAITCLDPDAPTGA------GFWHWAVFNIPTTVTEIPTGAGDETL------GGIEGVVSLKGDSGKRGFYGAQPPAGHA-PHRYLFAVHALDVEK----LDIAP------DATPTVLGFNLYF--HTLGRSIVWGWYEN >WP_011490081.1_Paraburkho ALTSAS------FHAGGTVEAAQVFNQGDC-----KGGNRSPQLSWHD------APPG----TRSFAVTMFDQDAPGR------GWWHWAVAGIPATVVSLPENASSSGF------LSKVGALEARNDFDTDGYGGPCPPAGK--PHRYIITVYALSTAD----LRLAQ------GRPALMFDHEIST--ATLGSARMVVSYGR >WP_012682169.1_Gemmatimon TLTSPD------FVDGGAMPAALAQTG------RDVSPALSWSG------VPDS----TRSFVLIVHDADAPMGDGLD------DLLHWMVWNIPATATSLPRGVAQG------TMADGMRQISVSGPYYRGPAAPSTGP-VHHYVFELYALDIPLNVAATGMSP------AATREAVMRAMAQ--HVRGKGVLVGTYRR >ABJ85800.1_candSolibacter TLAIPG------FPDGGQIPVKYSQAAPGVA----PGEGTSPAMTWAN------APAG----TQSFLLHMHDLDLARNKTTD------DQVHWVFWNIPASATGLPEGVPKGS------QLPDGSYQISVTGPMYRGPGAAANGP-LHHYVFELYALDTKLD-VKPTSDA------FETRANVIKAIQG--HILAKAVYGGLFRR >WP_012681554.1_Gemmatimon VLTVQG------FADGADIPVKFSQAAPGVA----PGEGMSPAINWTN------VPAG----TQSFVLNMRDLDVARNRTTE------DQAHWVVWNIPASSKGMTEGQPKGA------QLPDGSYQHSATGLVYRGPGAAATGP-KHHYIFEIYALDVKLD-VTPGADA------FESRTKVFQAMQG--HVLGKAVYSGLFRR >NP_216427.1_Mycobacterium TIASPM------FADGAPIPVQFSCKG------ANVAPPLTWSS------PAG----AAELALVVDDPDAVGG------LYVHWIVTGIAPGSGSTADGQT------PAGGHSVPNSGGRQGYFGPCPPAGTG-THHYRFTLYHLPVA-----LQLPP------GATGVQAAQAIAQ--AASGQARLVGTFEG >NP_216426.1_Mycobacterium TISSPA------FADGAPIPEQYTCKG------ANIAPPLTWSA------PFG------GALVVDDPDAPRE------PYVHWIVIGIAPGAGSTADGET------PGGGISLPNSSGQPAYTGPCPPAGTG-THHYRFTLYHLPAVP----P------LAGLAGTQAARVIAQ--AATMQARLIGTYEG >WP_023363083.1_Actinoplan RLSSPD------FTDGGPLPATALASG------GNRPPVLYWNG------LPAG----TRSLVLTAFDADAPIPG------GLWHWAVKDVQPVHGE------HTTPLTNSLGVAGWSGANPPPGTG-THRMIFCATALDVAV----LDVPE------GASLALVQILMIP--HTLGRGLLTGTSEA >WP_014747134.1_Tistrella_ MTVTAG------FEDGGSLPTAMLARG------HDASPALTWSG------APEG----TRSFAVIMEDPDAGDPK------PFVHWMIHDIPVGHQALRTGLPTAPV------LRDPEGARQATNSQGGIGYTGPKPPPGDR-RHAWHLVVLALDVPR----LEVGP------AADPDAVLKAMDG--HVIGAGRLVGHAGG >AFK57437.1_Tistrella_mob_ RLTSPD------MPDGRMPAAQVLSAAYGFG---CDGDNRSPALTWEG------APAG----TRSFVLTVHDPDAPTGI------GWTHWVVANIPADAQGLDTGASGNAA------RLPAGAVETRTDFGLPGYGGACPPEGR--EHRYVFTLTALGMEA----LPVAT------PEAMPALIGFFANA--HALDKAVLEVRYGR >XP_004344424.2_Capsaspora KVTYPE------TAIVHGNLVPAGLTSSA------PTVTFEP------ESGK----YYTLMMVGLDSNISSANH------QYMHWLVADISEAGI------SSGNTIYPYVQALPLKDTG-YHRVAFLLLEHAAPL----GMQAESI-----YNRID-LYARSFSTALFVNEFGV--RIVGASFCQTFHDD >WP_015444179.1_Legionella TLESPA------FASNAAIPQKFTCSN------LNYSPPLIWHD------PNLG----TQSYVLIVHDPDAPTG------NWIHWVLFNIPAQVKQLTEGTTT------PAGATSGLNSWNTTGYSGPCPPS--G-THRYYFTLYALDTY-----LSLGP------EATAQDVVNAMKG--DIIDTAELVGHYSK >AFK53295.1_Tistrella_mob TLRSPA------FADGERIPDRFARTG------DNLSPPLDWQD------APEG----TRSFLLIVEDPDAPGG------TFRHWAVFDLHADRTGLEEGAGR------PGSGVHHGTNDFGSAAYDGPEPPEGHG-PHHYYFRLAALDIDH----LDVDP------GRRVGEIWAAARP--HMLAEAGLVGVYER >AAR39014.1_Nanoarchaeum SPPSNNMG----FAKIYIDSEFMPKKYTCDG------ENINPRIEFNE------PGIYAIIMHDPDAPAG------DWFHWGIIFWGKEILEGLPKTHKGENFI------QTYNDFYYYNYITGNIKGIGYDGPCPPKGK--PHRYVFEVYKLDSK------PDKD------IIEKTDIIKFVKE--HALEKQEFVYLYGR >WP_000846523.1_Helicobact EVMIQT------DSKGYLDAKFGGNAPKAFLNSNGLPTYSPKISWQK------VEG----AQSYALELIDHDAQKVCGM------PFVHWVVGNIAHNVLEENASMMDKRIVQGVNSLTQGFIRSPLNESEKQRSNLNNSVYIGPMPPN--G-DHHYLIQVYALDIPK----LALKA------PFFLGDLHDKMRN--HIIAIGRKEFLYKQ >WP_012860605.1_Sebaldella ILKSEG------IVNNIIEEKYGGKGADTI---KGVGSRSLPLEWSN------PPKG----TKSFVIIMEDHDAIPVTGF------TWIHWSVADIEPEKRKLNENESRSNPALI------QGLNSWNSSIGGFTSEEASFYGGPMPPDK---DHKYRFTIYALDKK-----LDLKN------GYYLNELYEKMDG--HVLDKSVLEGIYKK >WP_010966668.1_Clostridiu NITSYG------IVNGVIQNKYGKRGEQFNE--VGIPTYSLPVKIKN------APDK----TVSYALVLEDKDAVPVVGF------SWIHWLASNITKEELNENESISA------KDFVQGINSWCSPIGN--TDRKFCIGYGGMTPPNA---PHIYELHVFALDVK-----LNLKD------GFYMNELYKAMEG--HILEKSTLKGIYNN >WP_000700882.1_Staphyloco KINTEF------KGNNIPYEYAAGADISDL-INGNPIKSFPFEITE------LPKD----TKYLAWSLIDYDAIPVCGF------AWIHWSVANVSITGDSISIKADLSR---TKGDYVQGKNSFTSGLLA-EDFSEIENHYVGPTPPDK---DHQYELTVYALDHS-----LNLKN------GFYLNEFLKEVNQ--HKIDQTSINLIGRK >NP_391766.1_Bacillus_subt YVEANPY------LHDKYSKYADEKYKREGN------PFVSFPIHFDD------IPSG----AKTLALTFIDHDAIPVCGF------SWIHWTAANIPAYIGELPEHASEE----RQDLMIQGQNSFASPLAG-SNDPKVIHQYCGPTPPDK---DHAYTLTVYALDAE-----LNLQP------GFYLNELYQEMKE--HILAETSIELLARV >ADM27881.1_Ignisphaera RVESPE------FRPNETIPTAFTCDG------SDISPPLIIEG------VPSQ----TVSIAIIVYDPDAPGG------IFYHWLLYSIKPSSRIEIPASIPKL------AETKIGFQGFNDFGRIGYGGPCPPKGDK-PHRYIFMVLALDID-----IRTP------SLKPSEFLKEVNG--HIIAYGYTIGIYRR >WP_014451090.1_Fervidobac MVVSSV------FKDKEFIPKKYTCEG------QDVNPELVISN------IPDG----AKTIAIICDDPDAPIG------TFVHWVLWNYPVNGPIVNIPEALPKV------EKLQNGAMQGYNDFGKVGYNGPCPPKGHG-VHHYHFKIYAVNSL-----IELKG------KVTKKELEKALSG--RILAQTEIVGLYER >WP_015144672.1_Pleurocaps KLESTA------FIHNELIPSRYTCDG------EDISPPLNWND------PPAA----TKSLALICDDPDAPMK------TWVHWVVYNLPPETRSLPEKIPAGN------SLSNGGLQGKNDFGKIAYGGPCPPR--G-THRYFFKLYALDTM-----LDLKP------GATKAQLEAAMKG--HILAQAEFIGHYRR >AAB89551.1_Archaeoglobus LTVKSA------FENGGKIPAKYTCDG------EDISPPLYIEG------LRED----VKSLVIICEDPDAPMG------VFTHWIAWNVEPTSEIPENVPKTKF------VDEPKMVQGRNDFGKVGYNGPCPPS--G-EHRYYFRIYAIDTLL------QGDYSRQELLRAIEG--HILQYGEIYGLYSR >WP_011213736.1_Legionella LLTSPA------FEHNDFIPDEFTCHG------LDCPPPLAWEN------APPN----TKSFVIIMDDPDASGG------LWVHWVLFNIPASYNELDAAID------LPDSAISVKNSWGKTGYGGPCPPR--G-THRYFFRLYALDTF-----LSLDE------KATKKDVEYAMQG--HILESSELICKYTK >AAC07255.1_Aquifex EVFSRS------FKNGEEIPKVYTCDG------KDISPHIGWED------VPEG----TKSFVLIMDDPDAPIG------TFTHWVVYDIPSQTRELLEDFPKVP------EVSGIKQGINDFGRVGYGGPCPPRGHG-YHRYFFKVFALSVES----LGLPP------GASRKDVELKMNG--KILAQAHIIGLYKR >WP_013335469.1_Vulcanisae TLTSPA------FKYGERIPRKYTCDD------VDISPPLQWSN------TPAG----TKSLVLIMEDPDAPIG------VFTHWVLYNIPPDRTELPENVPKTP------VVEGIGVQGINDFGRIGYGGPCPPRGHK-PHRYFFRLYALNTV-----PNIKQ------RASKDEVLRIIKN--SIIGTTEYMGTYSR >AAM04847.1_Methanosarcina KVFSSA------FESNGTIPRKYTCNG------ENINPPLEFTD------IPEG----TESLVLIMDDPDAPMN------PFTHWIVWNIEPVAKIEEDS------IPGIEGINNFREIGYGGPCPPS--G-THRFFFRVYALDRQ-----LELKA------GAGRKELESEMIG--HIIAEGELMGKYGE >CAC12212.1_Thermoplasma EVKVEG------FGKGSQIPVEYTCDG------EDRMPAIAVEG------FPSG----TKSLALIVDDPDAPSG------LFTHCIIYNIPTSVKVIDSSIL------KQTGVLSGVNDFGNKGYGGPCPPPGK--PHRYYFKVLALDTT------LTKA------GMKRKDFDNAVRG--HVLGQAEYMGTYLR >ABF91105.1_Myxococcus VLTSPR------FKDGDLIPIAYTGEG------EDISPPLQWTD------MPAG----TKSLALIVEDPDAPDPRNPQM------TFSHWVVYNIPPSAQGLPEGATPDV------LPEGARQGQNDFHRQNYGGPMPPV--G-MHRYYFRLFALDTV-----LPDLG------RVSRTQLRKAMEG--HIVGQAELIGLYTK >WP_010936309.1_Dehalococc LLGSSA------FACGEYIPRRYSGLG------GNISPPLEWMK------LPEG----TKSLTLILEDPDASK------TFTHWLIYNIPSDREGLPEAVPAEP------NLPDGIRQGLNSAGETGYTGPYPPPNR--VHHYHFRLFALDTV-----LRLKA------GSHQADLIKAMEG--HIIDQTDLTGLFVA >WP_011003345.1_Ralstonia TLTSPA------FAAGAEMPGRLTCEG------TDTSPPLAWTG------LPAG----TRSLALIVDDPDAPDPAAPRM------TWVHWVLYNIPPATAALAEDAARSH------LPSGTREGTNGWQRTGYGGPCPPI--G-RHRYFHKLYALDTE-----LPDLG------TPDKAALERAMHG--HVLAQAELVGTYQK >WP_011239419.1_Aromatoleu TLSSTA------FAHGGAVPATYTCDG------RDVSPPLGWSG------IPEN----TRSLALIVDDPDAPDPAAPQR------TWVHWVLYNIPPTTAALPEAVAMAD------LPLGTREGTNDWKRTGYGGPCPPI--G-RHRYFHKLYALDVV-----LAPLG------APTKAELEHAMQG--HILAEAELVGTYQR >AAM71842.1_Chlorobaculum ELTSTA------FRNMGAIPALYTCEG------KNISPPLTWKN------IPKG----TKSLVLIVDDPDAPDPAAPKF------TWVHWVLYNIPPGKTGFAEGAGNHP------AETEMQEGFNSWNRGGYGGPCPPI--G-THRYFFKLYALDTV-----IDDLL------SPLKADVEAAMQG--HILGETVLIGTYKK >WP_015247495.1_Singulisph TIHSNA------FNDGRAIPRRYSEDG------VDVSPSLTWSG------SPKE----ARELALIVDDPDAPTYK------PWVHWVLYKIPPDVQTLPEGLPTAPT------LRTPPGCLQGKNSWGGVGYHGPAPPRGHG-VHHYHFHLYALDAP-----LRTAQ------GLDKEGLLRAMRG--HILAQAELVGTYER >AGN02414.1_Salinarchaeum SLTSPA------FEDGGEIPAKYGLDG------ENVNPPLEIEN------VPND----AGSLALVVDDPDAVEPTGQ------VWLHWIVWNVAPERRAIPEDWT------PEDAEEGRNDSGEIGYSGPSPPDR---AHRYRFKLFALDAT-----LDLPV------ATDATALGEAMAG--HVLAQTQLDGTFPV >WP_015300184.1_Halovivax RVSSPA------FDDGDPIPDAHAYEG------GNVNPPLEIAG------VPED----ATSLALVVDDPDAVDPAGK------IWVHWLVWNLDPERDGVPRAWD------PTDAVEGENDFGEVGYGGPKPPD--G-EHTYRFKLYALDTD-----LGLPT------GASAEALGNAIED--HVVAQTQVTGTYAP >WP_020445774.1_Salinarcha TLTSPA------FADGEPIPDEYGYEA------RNVNPPLRIEG------APDA----AESIVLLMDDPDAVEPAGK------VWDHWTVFDLDPELTEIPEDW------DVEGTEGRNDYGEVGYGGPNPPD--G-EHTYRFRVYALDRS-----LDTAG------VETKDAVEAAMDG--HVLAQDTLTGTYAP >WP_015144015.1_Pleurocaps KLTSPA------FAHEGKIPPKYTCDG------ENINPPLVISD------VPAE----AKSLVLIMDDPDVPKRLRADG------MWDHWVVFNIAPSLTEIQEGK------EPPGVHGIGTSKNLDYYGPCPPDR---EHRYFFKLYALDTR-----LNLPE------KATKQQVERAMEG--HILAKAELMGRYER >NP_220255.1_Chlamydia QLTSQA------FSYGRPIPKKYSCQG------VGISPPLSFSD------VPRE----AKSLVLIVEDPDVPPSVREDG------LWIHWIVYNLSPVVSNLAEGA------QIFAVQGLNTAGEIGYCPPCPPDA---KHRYYFYAYALDVV-----LSDEE------GVTKEQLLEAMDG--HIIATAELMGTYEK >AAK41064.1_Saccharolobus RVVSSA------FKNEDFIPIKYTCDG------QDLSPELEWDL------VTN----AKSYAIIVEDPDAPGG------TFIHWVIYNITTNRLPEGVPRLY------KSQYGVQGVNDFGNIGYNGPCPPKTHP-PHRYYFYVYAIDTI-----LLEIK------NINADKLKSLMEG--HGIERGFVMGKYKR >WP_015232536.1_Caldisphae SLISDS------FKNGEVIPKRHTCEG------DNISPKLSWKF------DE----AKSYTLIVYDPDAPMK------TFIHWVIYDIPSSINSLYEAIPKNE------NVDVGKQGRNDFGEIGYDGPCPPRGHG-FHRYYFALHGLNVES----LDIRG------EANANKVLNSMKG--KVIGYAVLMGKYQR >AEH45872.1_Thermodesulfat TLTSS------FTDKGRIPLKYVMPAAG------GQNISPPLKWEG------APSQ----TKSFALLCVDLHPIAN------HWVHWMVINIPKNVNFLPEGASRHK------MPKGSKELKNSFGFIGYGGPQPPPGTG-EHPYVFTIFALDVAR----IDLPE------DASLEMFSQTIKK--HVLAQANLTGYFSR >WP_015906125.1_Desulfobac QVKSDV------FEHDTLIPSRFTGEG------EDVSPALSWEN------PPGG----TKAFAVICHDPDAPLVTSRGTY------GFVHWVLYNIPGDVTHLDEGT------HLYTSGITDFGKTGYNGPMPPNGHG-LHNYYFWVLALDKA-----VALEQ------GLNMWQLLEKVEP--YLLGMNRLTGRYKR >ACZ10853.1_Sebaldella_ter TVTSPA------FENEAVIPIQYTGRG------EDISPELYLSA------IDGR----AKSLAVIMDDMNHPIP------AYNHWIIWNIPVMEIIPENIAYGAD------VAELGGAVQGRGYGKNRYRGPKPPFNW--LHLYQFNVYVLDCL-----LDLSF------KARKGNLIAAMQG--HILQKGALTGYFQS

Figure 1. Multiple sequence alignment of select members of FT/CETS/PEBP/PKIP/YbhB family. Sequence identifiers in GenBank or PDB are shown before each sequence. In the Secondary structure lines above the alignment, s stands for a beta-strand and h stands for an alpha-helix. Conserved hydrophobic residues (I, L, M, V, F, Y, W) are indicated by yellow shading, conserved small-side-chain or turn/kink- prone residues (A, G, S, P) are indicated by bold red type, and the conserved constellation of charged residues discussed in the text are marked as follows: gray-shaded blue type, two conserved aspartates; cyan shading, two conserved histidines; and black-shaded white type, conserved arginine. bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Parsed Citations Abe, M., Kobayashi, Y., Yamamoto, S., Daimon, Y., Yamaguchi, A., Ikeda, Y., Ichinoki, H., Notaguchi, M., Goto, K., and Araki, T. (2005). FD, a bZIP protein mediating signals from the floral pathway integrator FT at the shoot apex. Science 309, 1052–1056. Google Scholar: Author Only Title Only Author and Title Abelenda, J.A., Bergonzi, S., Oortwijn, M., Sonnewald, S., Du, M., Visser, R.G.F., Sonnewald, U., and Bachem, C.W.B. (2019). Source- Sink Regulation Is Mediated by Interaction of an FT Homolog with a SWEET Protein in Potato. Current Biology 29, 1178-1186.e6. Google Scholar: Author Only Title Only Author and Title Ahn, J.H., Miller, D., Winter, V.J., Banfield, M.J., Lee, J.H., Yoo, S.Y., Henz, S.R., Brady, R.L., and Weigel, D. (2006). A divergent external loop confers antagonistic activity on floral regulators FT and TFL1. EMBO J 25, 605–614. Google Scholar: Author Only Title Only Author and Title Altschul, S.F., Madden, T.I., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402. Google Scholar: Author Only Title Only Author and Title Andrés, F., and Coupland, G. (2012). The genetic basis of flowering responses to seasonal cues. Nat Rev Genet 13, 627–639. Google Scholar: Author Only Title Only Author and Title Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M., and Teichmann, S.A. (2004). Structure and evolution of transcriptional regulatory networks. Curr Opin Struct Biol 14, 283–291. Google Scholar: Author Only Title Only Author and Title Baker, P.A., Fritz, S.C., Dick, C.W., Eckert, A.J., Horton, B.K., Manzoni, S., Ribas, C.C., Garzione, C.N., and Battisti, D.S. (2014). The emerging field of geogenomics: Constraining geological problems with genetic data. Earth-Science Reviews 135, 38–47. Google Scholar: Author Only Title Only Author and Title Bernier, I., and Jollès, P. (1984). Purification and characterization of a basic 23 kDa cytosolic protein from bovine brain. Biochim. Biophys. Acta 790, 174–181. Google Scholar: Author Only Title Only Author and Title Burroughs, A.M., and Aravind, L. (2020). Identification of Uncharacterized Components of Prokaryotic Immune Systems and Their Diverse Eukaryotic Reformulations. J Bacteriol 202. Google Scholar: Author Only Title Only Author and Title Burroughs, A.M., Zhang, D., Schäffer, D.E., Iyer, L.M., and Aravind, L. (2015). Comparative genomic analyses reveal a vast, novel network of nucleotide-centric systems in biological conflicts, immunity and signaling. Nucleic Acids Res 43, 10633–10654. Google Scholar: Author Only Title Only Author and Title Carr, S.M., Marshall, H.D., Wareham, T., and Craig, D. (2015). The Big ORF Theory. In Emerging Trends in Computational Biology, Bioinformatics, and Systems Biology, (Elsevier), pp. 265–274. Google Scholar: Author Only Title Only Author and Title Chailakhyan, M. (1936). New facts supporting the hormonal theory of plant development. Dokl. Akad. Nauk SSSR 4, 77–81. Google Scholar: Author Only Title Only Author and Title Cheng, H., Liao, Y., Schaeffer, R.D., and Grishin, N.V. (2015). Manual classification strategies in the ECOD database: ECOD Manual Classification Strategies. Proteins 83, 1238–1251. Google Scholar: Author Only Title Only Author and Title Conti, L., and Bradley, D. (2007). TERMINAL FLOWER1 Is a Mobile Signal Controlling Arabidopsis Architecture. Plant Cell 19, 767–778. Google Scholar: Author Only Title Only Author and Title Corbesier, L., Vincent, C., Jang, S., Fornara, F., Fan, Q., Searle, I., Giakountis, A., Farrona, S., Gissot, L., Turnbull, C., et al. (2007). FT Protein Movement Contributes to Long-Distance Signaling in Floral Induction of Arabidopsis. Science 316, 1030–1033. Google Scholar: Author Only Title Only Author and Title Danilevskaya, O.N., Meng, X., McGonigle, B., and Muszynski, M.G. (2011). Beyond flowering time: pleiotropic function of the maize flowering hormone florigen. Plant Signal Behav 6, 1267–1270. Google Scholar: Author Only Title Only Author and Title Dereeper, A., Guignon, V., Blanc, G., Audic, S., Buffet, S., Chevenet, F., Dufayard, J.-F., Guindon, S., Lefort, V., Lescot, M., et al. (2008). Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res 36, W465-469. Google Scholar: Author Only Title Only Author and Title Edgar, R.C. (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32, 1792–1797. Google Scholar: Author Only Title Only Author and Title El-Gebali, S., Mistry, J., Bateman, A., Eddy, S.R., Luciani, A., Potter, S.C., Qureshi, M., Richardson, L.J., Salazar, G.A., Smart, A., et al. (2019). The Pfam protein families database in 2019. Nucleic Acids Res 47, D427–D432. Google Scholar: Author Only Title Only Author and Title bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Galperin, M.Y., Wolf, Y.I., Makarova, K.S., Vera Alvarez, R., Landsman, D., and Koonin, E.V. (2021). COG database update: focus on microbial diversity, model organisms, and widespread pathogens. Nucleic Acids Res 49, D274–D281. Google Scholar: Author Only Title Only Author and Title Garner, W.W., and Allard, H.A. (1920). Effect of the relative length of day and night and other factors of the environment on growth and reporduction in plants. Monthly Weather Review 48, 415–415. Google Scholar: Author Only Title Only Author and Title Glazko, G.V., and Mushegian, A.R. (2004). Detection of evolutionarily stable fragments of cellular pathways by hierarchical clustering of phyletic patterns. Genome Biol. 5, R32. Google Scholar: Author Only Title Only Author and Title Glazko, G., Gordon, A., and Mushegian, A. (2005). The choice of optimal distance measure in genome-wide datasets. Bioinformatics 21 Suppl 3, iii3-11. Google Scholar: Author Only Title Only Author and Title Glazko, G., Coleman, M., and Mushegian, A. (2006). Similarity searches in genome-wide numerical data sets. Biol. Direct 1, 13. Google Scholar: Author Only Title Only Author and Title Guo, C., Chang, T., Sun, T., Wu, Z., Dai, Y., Yao, H., and Lin, D. (2018). Anti-leprosy drug Clofazimine binds to human Raf1 kinase inhibitory protein and enhances ERK phosphorylation. Acta Biochim. Biophys. Sin. (Shanghai) 50, 1062–1067. Google Scholar: Author Only Title Only Author and Title Howe, K.L., Contreras-Moreira, B., De Silva, N., Maslen, G., Akanni, W., Allen, J., Alvarez-Jarreta, J., Barba, M., Bolser, D.M., Cambell, L., et al. (2020). Ensembl Genomes 2020-enabling non-vertebrate genomic research. Nucleic Acids Res 48, D689–D695. Google Scholar: Author Only Title Only Author and Title Huerta, A.M., and Collado-Vides, J. (2003). Sigma70 Promoters in Escherichia coli: Specific Transcription in Dense Regions of Overlapping Promoter-like Signals. Journal of Molecular Biology 333, 261–278. Google Scholar: Author Only Title Only Author and Title Huynen, M., Snel, B., Lathe, W., and Bork, P. (2000). Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res 10, 1204–1210. Google Scholar: Author Only Title Only Author and Title Iyer, L.M., Aravind, L., Bork, P., Hofmann, K., Mushegian, A.R., Zhulin, I.B., and Koonin, E.V. (2001). Quod erat demonstrandum? The mystery of experimental validation of apparently erroneous computational analyses of protein sequences. Genome Biol. 2, RESEARCH0051. Google Scholar: Author Only Title Only Author and Title Jin, S., Nasim, Z., Susila, H., and Ahn, J.H. (2021). Evolution and functional diversification of FLOWERING LOCUS T/TERMINAL FLOWER 1 family genes in plants. Semin Cell Dev Biol 109, 20–30. Google Scholar: Author Only Title Only Author and Title Kakimoto, T. (2003). Biosynthesis of cytokinins. J Plant Res 116, 233–239. Google Scholar: Author Only Title Only Author and Title Karlgren, A., Gyllenstrand, N., Källman, T., Sundström, J.F., Moore, D., Lascoux, M., and Lagercrantz, U. (2011). Evolution of the PEBP Gene Family in Plants: Functional Diversification in Seed Plant Evolution. Plant Physiol. 156, 1967–1977. Google Scholar: Author Only Title Only Author and Title Kimura, S., Sakai, Y., Ishiguro, K., and Suzuki, T. (2017). Biogenesis and iron-dependency of ribosomal RNA hydroxylation. Nucleic Acids Research 45, 12974–12986. Google Scholar: Author Only Title Only Author and Title King, R.W., Evans, L.T., and Wardlaw, I.F. (1968). Translocation of the floral stimulus in Pharbitis nil in relation to that of assimilates. Zeitschrift Fur Pflanzenphysiologie 59, S377-388. Google Scholar: Author Only Title Only Author and Title Kinoshita, A., and Richter, R. (2020). Genetic and molecular basis of floral induction in Arabidopsis thaliana. Journal of Experimental Botany 71, 2490–2504. Google Scholar: Author Only Title Only Author and Title Klintenäs, M., Pin, P.A., Benlloch, R., Ingvarsson, P.K., and Nilsson, O. (2012). Analysis of conifer FLOWERING LOCUS T/TERMINAL FLOWER1-like genes provides evidence for dramatic biochemical evolution in the angiosperm FT lineage. New Phytol. 196, 1260–1273. Google Scholar: Author Only Title Only Author and Title Knott, J.E. (1934). Effect of a localized photoperiod on spinach. In Proc Amer Soc Hortic Sci, pp. 152–154. Google Scholar: Author Only Title Only Author and Title Kobayashi, Y., and Weigel, D. (2007). Move on up, it's time for change--mobile signals controlling photoperiod-dependent flowering. Genes Dev. 21, 2371–2384. Google Scholar: Author Only Title Only Author and Title bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Koornneef, M., Hanhart, C.J., and van der Veen, J.H. (1991). A genetic and physiological analysis of late flowering mutants in Arabidopsis thaliana. Molec. Gen. Genet. 229, 57–66. Google Scholar: Author Only Title Only Author and Title Krieger, U., Lippman, Z.B., and Zamir, D. (2010). The flowering gene SINGLE FLOWER TRUSS drives heterosis for yield in tomato. Nat. Genet. 42, 459–463. Google Scholar: Author Only Title Only Author and Title Lang, A., Chailakhyan, M.K., and Frolova, I.A. (1977). Promotion and inhibition of flower formation in a dayneutral plant in grafts with a short-day plant and a long-day plant. Proc. Natl. Acad. Sci. U.S.A. 74, 2412–2416. Google Scholar: Author Only Title Only Author and Title Lau, M.E., Danka, E.S., Tiemann, K.M., and Hunstad, D.A. (2014). Bacterial lysis liberates the neutrophil migration suppressor YbcL from the periplasm of uropathogenic Escherichia coli. Infect. Immun. 82, 4921–4930. Google Scholar: Author Only Title Only Author and Title Li, W., Ju, J., Rajski, S.R., Osada, H., and Shen, B. (2008). Characterization of the tautomycin biosynthetic gene cluster from Streptomyces spiroverticillatus unveiling new insights into dialkylmaleic anhydride and polyketide biosynthesis. J. Biol. Chem. 283, 28607–28617. Google Scholar: Author Only Title Only Author and Title Li, W., Luo, Y., Ju, J., Rajski, S.R., Osada, H., and Shen, B. (2009). Characterization of the tautomycetin biosynthetic gene cluster from Streptomyces griseochromogenes provides new insight into dialkylmaleic anhydride biosynthesis. J. Nat. Prod. 72, 450–459. Google Scholar: Author Only Title Only Author and Title Lifschitz, E., and Eshed, Y. (2006). Universal florigenic signals triggered by FT homologues regulate growth and flowering cycles in perennial day-neutral tomato. J. Exp. Bot. 57, 3405–3414. Google Scholar: Author Only Title Only Author and Title Lifschitz, E., Ayre, B.G., and Eshed, Y. (2014). Florigen and anti-florigen â€" a systemic mechanism for coordinating growth and termination in flowering plants. Front. Plant Sci. 5. Google Scholar: Author Only Title Only Author and Title McLennan, A.G. (2006). The Nudix hydrolase superfamily. Cell Mol Life Sci 63, 123–143. Google Scholar: Author Only Title Only Author and Title Milyaeva, E.L., Golyanovskaya, S.A., Aksenova, N.P., and Chailakhyan, M.Kh. (1991). Changes of Protein Composition in Leaves and Stem Apices of Bicolored Coneflower with Transition to Flowering. Dokl. Akad. Nauk SSSR (Bot. Sciences) 11–15. Google Scholar: Author Only Title Only Author and Title Mironov, A.A., Vinokurova, N.P., and Gelfand, M.S. (2000). Software for analysis of bacterial genomes. Mol Biol 34, 222–231. Google Scholar: Author Only Title Only Author and Title Miyazaki, T., Sugisawa, T., and Hoshino, T. (2006). Pyrroloquinoline Quinone-Dependent Dehydrogenases from Ketogulonicigenium vulgare Catalyze the Direct Conversion of l-Sorbosone to l-Ascorbic Acid. AEM 72, 1487–1495. Google Scholar: Author Only Title Only Author and Title Musfeldt, M., and Schönheit, P. (2002). Novel Type of ADP-Forming Acetyl Coenzyme A Synthetase in Hyperthermophilic Archaea: Heterologous Expression and Characterization of Isoenzymes from the Sulfate Reducer Archaeoglobus fulgidus and the Methanogen Methanococcus jannaschii. JB 184, 636–644. Google Scholar: Author Only Title Only Author and Title Nakamura, Y., Andrés, F., Kanehara, K., Liu, Y., Dörmann, P., and Coupland, G. (2014a). Arabidopsis florigen FT binds to diurnally oscillating phospholipids that accelerate flowering. Nat Commun 5, 3553. Google Scholar: Author Only Title Only Author and Title Nakamura, Y., Andrés, F., Kanehara, K., Liu, Y., Coupland, G., and Dörmann, P. (2014b). Diurnal and circadian expression profiles of glycerolipid biosynthetic genes in Arabidopsis. Plant Signal Behav 9, e29715. Google Scholar: Author Only Title Only Author and Title Nakamura, Y., Lin, Y.-C., Watanabe, S., Liu, Y.-C., Katsuyama, K., Kanehara, K., and Inaba, K. (2019). High-Resolution Crystal Structure of Arabidopsis FLOWERING LOCUS T Illuminates Its Phospholipid-Binding Site in Flowering. IScience 21, 577–586. Google Scholar: Author Only Title Only Author and Title Penel, C., Gaspae, Th., and Greppin, H. (1985). Rapid interorgan communications in higher plants with special reference to flowering. Biol Plant 27, 334–338. Google Scholar: Author Only Title Only Author and Title Persson, E., Castresana-Aguirre, M., Buzzao, D., Guala, D., and Sonnhammer, E.L.L. (2021). FunCoup 5: Functional Association Networks in All Domains of Life, Supporting Directed Links and Tissue-Specificity. J Mol Biol 166835. Google Scholar: Author Only Title Only Author and Title Pin, P.A., and Nilsson, O. (2012). The multifaceted roles of FLOWERING LOCUS T in plant development: FT, a multifunctional protein. bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. Plant, Cell & Environment 35, 1742–1755. Google Scholar: Author Only Title Only Author and Title Pnueli, L., Gutfinger, T., Hareven, D., Ben-Naim, O., Ron, N., Adir, N., and Lifschitz, E. (2001). Tomato SP-Interacting Proteins Define a Conserved Signaling System That Regulates Shoot Architecture and Flowering. Plant Cell 13, 2687–2702. Google Scholar: Author Only Title Only Author and Title Ribeiro, A.J.M., Holliday, G.L., Furnham, N., Tyzack, J.D., Ferris, K., and Thornton, J.M. (2018). Mechanism and Catalytic Site Atlas (M- CSA): a database of enzyme reaction mechanisms and active sites. Nucleic Acids Res 46, D618–D623. Google Scholar: Author Only Title Only Author and Title Roderick, S.L., Chan, W.W., Agate, D.S., Olsen, L.R., Vetting, M.W., Rajashankar, K.R., and Cohen, D.E. (2002). Structure of human phosphatidylcholine transfer protein in complex with its ligand. Nat. Struct. Biol. 9, 507–511. Google Scholar: Author Only Title Only Author and Title Romanov, G.A. (2012). Mikhail Khristoforovich Chailakhyan: The fate of the scientist under the sign of florigen. Russ J Plant Physiol 59, 443–450. Google Scholar: Author Only Title Only Author and Title Rudnitskaya, A.N., Eddy, N.A., Fenteany, G., and Gascón, J.A. (2012). Recognition and reactivity in the binding between Raf kinase inhibitor protein and its small-molecule inhibitor locostatin. J Phys Chem B 116, 10176–10181. Google Scholar: Author Only Title Only Author and Title Sachs, J. (1865). Wirkung des Lichtes auf die Blütenbildung unter Vermittlung der Laubblätter. Bot. Ztg 23, 117. Google Scholar: Author Only Title Only Author and Title Sadreyev, I.R., Ji, F., Cohen, E., Ruvkun, G., and Tabach, Y. (2015). PhyloGene server for identification and visualization of co-evolving proteins using normalized phylogenetic profiles. Nucleic Acids Res 43, W154-159. Google Scholar: Author Only Title Only Author and Title Salazar-Cerezo, S., Martínez-Montiel, N., García-Sánchez, J., Pérez-Y-Terrón, R., and Martínez-Contreras, R.D. (2018). Gibberellin biosynthesis and metabolism: A convergent route for plants, fungi and bacteria. Microbiol Res 208, 85–98. Google Scholar: Author Only Title Only Author and Title Shemon, A.N., Heil, G.L., Granovsky, A.E., Clark, M.M., McElheny, D., Chimon, A., Rosner, M.R., and Koide, S. (2010). Characterization of the Raf kinase inhibitory protein (RKIP) binding pocket: NMR-based screening identifies small-molecule ligands. PLoS ONE 5, e10479. Google Scholar: Author Only Title Only Author and Title Stark, A., and Russell, R.B. (2003). Annotation in three dimensions. PINTS: Patterns in Non-homologous Tertiary Structures. Nucleic Acids Res 31, 3341–3344. Google Scholar: Author Only Title Only Author and Title Suhre, K., and Claverie, J.-M. (2004). FusionDB: a database for in-depth analysis of prokaryotic gene fusion events. Nucleic Acids Res 32, D273-276. Google Scholar: Author Only Title Only Author and Title Susila, H., Nasim, Z., Gawarecka, K., Jung, J.-Y., Jin, S., Youn, G., and Ahn, J.H. (2021). PHOSPHORYLETHANOLAMINE CYTIDYLYLTRANSFERASE 1 modulates flowering in a florigen-independent manner by regulating SVP. Development 148, dev193870. Google Scholar: Author Only Title Only Author and Title Takeba, G., Nakajima, Y., Kozaki, A., Tanaka, O., and Kasai, Z. (1990). A Flower-Inducing Substance of High Molecular Weight from Higher Plants. Plant Physiol. 94, 1677–1681. Google Scholar: Author Only Title Only Author and Title Tamaki, S., Matsuo, S., Wong, H.L., Yokoi, S., and Shimamoto, K. (2007). Hd3a protein is a mobile flowering signal in rice. Science 316, 1033–1036. Google Scholar: Author Only Title Only Author and Title Taoka, K., Ohki, I., Tsuji, H., Furuita, K., Hayashi, K., Yanase, T., Yamaguchi, M., Nakashima, C., Purwestri, Y.A., Tamaki, S., et al. (2011). 14-3-3 proteins act as intracellular receptors for rice Hd3a florigen. Nature 476, 332–335. Google Scholar: Author Only Title Only Author and Title Thao, S., and Escalante-Semerena, J.C. (2012). A positive selection approach identifies residues important for folding of Salmonella enterica Pat, an Nε-lysine acetyltransferase that regulates central metabolism enzymes. Research in Microbiology 163, 427–435. Google Scholar: Author Only Title Only Author and Title Tsuji, H., Taoka, K., and Shimamoto, K. (2011). Regulation of flowering in rice: two florigen genes, a complex gene network, and natural variation. Curr. Opin. Plant Biol. 14, 45–52. Google Scholar: Author Only Title Only Author and Title van Tran, N., Muller, L., Ross, R.L., Lestini, R., Létoquart, J., Ulryck, N., Limbach, P.A., de Crécy-Lagard, V., Cianférani, S., and Graille, M. (2018). Evolutionary insights into Trm112-methyltransferase holoenzymes involved in translation between archaea and eukaryotes. Nucleic Acids Research 46, 8483–8499. Google Scholar: Author Only Title Only Author and Title bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 Google Scholar: Author Only Title Only Author andand Title is also made available for use under a CC0 license. Wheeler, T.J., Clements, J., and Finn, R.D. (2014). Skylign: a tool for creating informative, interactive logos representing sequence alignments and profile hidden Markov models. BMC Bioinformatics 15, 7. Google Scholar: Author Only Title Only Author and Title Wigge, P.A. (2005). Integration of Spatial and Temporal Information During Floral Induction in Arabidopsis. Science 309, 1056–1059. Google Scholar: Author Only Title Only Author and Title Zeevaart, J.A.D. (1976). Physiology of Flower Formation. Annu. Rev. Plant. Physiol. 27, 321–348. Google Scholar: Author Only Title Only Author and Title Zeevaart, J.A.D. (2006). Florigen Coming of Age after 70 Years. Plant Cell 18, 1783–1789. Google Scholar: Author Only Title Only Author and Title Zhu, Y., Klasfeld, S., and Wagner, D. (2021). Molecular regulation of plant developmental transitions and plant architecture via PEPB family proteins – an update on mechanism of action. Journal of Experimental Botany 72, 2301-2311. Google Scholar: Author Only Title Only Author and Title Type of contextual evidence ​Functionally linked proteins / domains / COGs ​Taxa in which these functions are putatively linked ​COG functional categories / most relevant GO terms // Additional comments ​Domain fusion ​Glucose-sorbosone dehydrogenase, (Pectate Lyase-like carbohydrate-binding module, FN3-like domain) (COG1881, COG2133) ​Actinobacteria, Gammaproteobacteria ​Carbohydrate transport and metabolism/Hydrolase activity, acting on glycosyl bonds ​Operon ​Dialkylmaleic anhydride synthesis and conjugation module of tautomycin/tautomycetin biosynthesis operon ​Actinobacteria ​ // five-carbon substrate – a pentose? ​Phyletic vector (prokaryotes) ​Acyl-CoA synthetase, NDP forming (COG1042). ​220 species ​Energy production and conversion/ATP-binding, N-acetyltransferase activity // nucleoside-containg substrate ​Phyletic vector (prokaryotes) ​YbaR/Trm112 activator of RNA and protein methyltransferases; COG2835 ​233 species ​Translation, ribosomal structure and genesis // RNA modification ​Phyletic vector (eukaryotes) ​NUDT3 and other NUDIX hydrolases ​Metazoa ​// Hydrolases preferring pyrophosphate-containing substrates (nucleoside phosphates or phospholipids) ​Shared putative regulatory motif ​RlmA 23S rRNA m(1)G745 methyltransferase (COG2226) ​Enterobacteriaceae ​Coenzyme transport and metabolism/rRNA base methylation ​Shared putative regulatory motif bioRxiv preprint doi: https://doi.org/10.1101/2021.04.16.440192; this version posted April 16, 2021. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. This article is a US Government work. It is not subject to copyright under 17 USC 105 and is also made available for use under a CC0 license. ​YobB putative carbon-nitrogen hydrolase family protein (COG0388) ​Enterobacteriaceae ​Energy production and conversion/Nitrogen compound metabolic process ​Shared putative regulatory motif ​AdrB c-di-GMP phosphodiesterase (COG2200) ​Enterobacteriaceae ​Signal transduction mechanisms/Cellular response to DNA damage stimulus ​Integrated evidence ​RlhA 23S rRNA 5-hydroxycytidine C2501 synthase (COG0826) ​n/a ​Signal transduction mechanisms/rRNA processing, rRNA modification