doi.org/10.26434/chemrxiv.12317480.v1

A Cyclic Di-GMP Network Is Present in Gram-Positive Streptococcus and Gram-Negative Species

Ying Liu, Changhan Lee, Fengyang Li, Janja Trček, Heike Bähre, Rey-Ting Guo, Chun-Chi Chen, Alexey Chernobrovkin, Roman Zubarev, Ute Romling

Submitted date: 16/05/2020 • Posted date: 18/05/2020 Licence: CC BY-NC-ND 4.0 Citation information: Liu, Ying; Lee, Changhan; Li, Fengyang; Trček, Janja; Bähre, Heike; Guo, Rey-Ting; et al. (2020): A Cyclic Di-GMP Network Is Present in Gram-Positive Streptococcus and Gram-Negative Proteus Species. ChemRxiv. Preprint. https://doi.org/10.26434/chemrxiv.12317480.v1

Cyclic di-GMP is a ubiquitous second messenger in . This work describes the occurrence of a cyclic di-GMP signaling network in Gram-positive Streptococcus species and Gram-negative Proteus. After identification of candidate diguanylate cyclases by homology search in the respective species, the open reading frames were cloned and proteins expressed. Production of cyclic di-GMP was demonstrated by riboswitch assays, detection of cyclic di-GMP in cell lysates by MALDI-FTMS and in cell extracts by standard LC-MS/MS. Expression of the diguanylate cyclases in the heterologous host Salmonella typhimurium showed the expected physiological activity, namely up regulation of biofilm formation and down regulation of motility. The co-localisation of both sole diguanylate cyclases with cellulose or cellulose-like synthases indicates exopolysaccharide biosynthesis to be a conserved trait of cyclic di-GMP signaling.

File list (1)

CdiGMPManuscript_PS.pdf (17.47 MiB) view on ChemRxiv download file 1 2 3 4 5 A cyclic di-GMP network is present in Gram-positive Streptococcus and Gram- 6 negative Proteus species 7 8 9 10 Ying Liu1,#, Changhan Lee1##, Fengyang Li1, Janja Trček2, Heike Bähre3, Rey-Ting Guo4, 11 Chun-Chi Chen4, Alexey Chernobrovkin5, Roman Zubarev5,7 and Ute Römling1 12 13 14 15 1Department of Microbiology, Tumor and Cell Biology, and 5Department of Medical 16 Biochemistry and Biophysics, Biomedicum, Karolinska Institutet, SE-171 65 Stockholm, 17 Sweden 2 18 Faculty of Natural Sciences and Mathematics, Department of Biology, University of 19 Maribor, 2000 Maribor, Slovenia 20 3 Research Core Unit Metabolomics, Hannover Medical School, Hannover, Germany 21 4State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative 22 Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of 23 Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, P.R. 24 China 25 5Department of Pharmacological & Technological Chemistry, I.M. Sechenov First Moscow 26 State Medical University, Moscow, 119146, Russia 27 28 #Current address: Department of Microbiology, Philips University Marburg, Marburg, 29 Germany 30 31 ##Current address: Department of Molecular, Cellular and Developmental Biology, 32 University of Michigan, Ann Arbor, MI 48109, USA

1 33 ABSTRACT 34 The ubiquitous cyclic di-GMP (c-di-GMP) network is highly redundant with numerous 35 GGDEF domain containing proteins as diguanylate cyclases and EAL domain containing 36 proteins as c-di-GMP specific phosphodiesterases comprising those domains as two of the 37 most abundant bacterial protein superfamilies. One hallmark of the c-di-GMP network is its 38 outmost plasticity as c-di-GMP turnover proteins can rapidly vanish from species within a 39 genus and possess an above average transmissibility. With the long-term goal to address the 40 evolutionary forces of c-di-GMP turnover protein diversity, maintenance and conservation, 41 we investigated as an example a Gram-positive and a Gram-negative species, which 42 preserved only one single clearly identifiable GGDEF domain protein. Species 43 representatives of most sequenced genera of the family of the order 44 exceptionally show disappearance of the c-di-GMP signaling network, but 45 some species such as Proteus spp. still retain one candidate diguanylate cyclase. As another 46 example, in species of one of the two major branches in the genus Streptococcus, one 47 candidate diguanylate cyclase was frequently identified. We demonstrate that both proteins 48 encompass PAS (Per-ARNT-Sim)-GGDEF domains, possess diguanylate cyclase catalytic 49 activity and are suggested to signal via a PilZ receptor domain at the C-terminus of a type 2 50 glycosyltransferase. Specifically, one of them being the cellulose synthase BcsA. 51 Preservation of the ancient link between production of cellulose-like exopolysaccharides and 52 c-di-GMP signaling indicates that this functionality is even of high ecological importance 53 upon maintenance of the last remnants of the c-di-GMP network in some of today’s free- 54 living bacteria. 55 56 Keywords: Streptococcus gallolyticus subsp. gallolyticus, , GGDEF 57 domain, EAL domain, cyclic di-GMP signaling, cellulose biosynthesis

2 58 INTRODUCTION 59 Signaling systems couple sensing and information transmission and amplification in 60 order to adapt physiology and metabolism to changing external and internal stimuli. Thus, 61 those modules are highly prone to mutational adaptation and/or horizontal gene transfer. The 62 cyclic dinucleotide (CDN) molecule bis-(3’, 5’)-cyclic diguanosine monophosphate (c-di- 63 GMP), identified in 1987 as an allosteric activator of the cellulose synthase in the bacterium 64 Komagataeibacter xylinus (previously Gluconacetobacter (Acetobacter) xylinus (G. xylinus)) 65 is the most abundant CDN based second messenger signaling system in bacteria1-2. Cyclic- 66 di-GMP regulates a multitude of fundamental physiological and metabolic processes, such 67 as single cell motility-to-sessility transition leading to biofilm formation, chronic versus 68 acute virulence, antimicrobial and detergent tolerance, cell cycle progression and cell 69 morphology2. Essential signaling modules of this pathway comprise the GG(D/E)EF domain 70 with diguanylate cyclase (DGC) activity and the EAL and HD-GYP domain with 71 phosphodiesterases (PDE) activity. The GG(D/E)EF domain synthesizes c-di-GMP in a two- 72 step reaction with 5’-pppGpG as intermediate and two molecules of pyrophosphate as 73 byproducts3. The EAL- and HD-GYP domain hydrolyzes c-di-GMP into linear 5’-pGpG and 74 GMP, respectively4-5. Numerous proteins are bifunctional through a combination of GGDEF 75 with EAL/HD-GYP domains. 76 In these three domain superfamilies, catalytic domains have evolved into receptors or act 77 though protein-protein interactions6. Although an intact GG(D/E)EF motif is usually an 78 indicator for catalytic activity, however due to the requirement of extended consensus motifs, 79 the presence of such a motif and even the presence of extended consensus signature motif(s), 80 including divalent ion binding ligands required for catalytic activity, is not a guarantee for 81 the preservation of catalytic activity7-8 or substrate specificity9-10 and vice versa11-12. 82 The activity of DGCs and PDEs is controlled by a diversity of N-terminal sensory 83 domains that can receive various ligands, such as oxygen, nucleotide based small molecules, 84 and light13-17. Thereby, the most frequently used N-terminal signaling domain in this context 85 is the versatile PAS domain18-20. The compact PAS domains with diverse primary sequences 86 of less than 150 amino acids in size, possess an interior pocket flanked characteristically by 87 five antiparallel beta-strands and a few alpha helices to host a variety of prosthetic groups 88 and ligands, with few functional amino acids to determine the binding specificity21-22. 89 Furthermore, signaling of c-di-GMP is translated through protein and RNA-based 90 receptors such as the PilZ domain, MshEN domain, the inhibitory I-site of GGDEF domains, 91 inactive EAL/HD-GYP domains, various classes of transcription regulators and distinct

3 92 RNA aptamers6, 23-25. PilZ domains, the first c-di-GMP receptors discovered, are widespread 93 among bacteria26-27. For example, the catalytic activity of the cellulose synthase BcsA and 94 other exopolysaccharide synthases is regulated by PilZ domains28-29. Riboswitches, 95 consisting of a high affinity CDN binding RNA aptamer and an expression platform located 96 within the 5’-untranslated region (5’-UTR) respond to c-di-GMP binding with 97 conformational changes that alter downstream transcriptional termination and translation23-24. 98 Two classes of c-di-GMP responsive riboswitches, type I and type II, with Genes for the 99 Environment, for Membranes and for Motility (GEMM) motifs have been identified in 100 Gram-negative and Gram-positive species, such as Vibrio cholerae, Geobacter 101 metallireducens and Clostridium difficile. Compared with protein receptors, which have

102 dissociation constants (Kd) in the low µM range for c-di-GMP, RNA aptamers have 103 dissociation constants in the nanomolar range. 104 In many instances, cyclic di-GMP activates biofilm formation, a ubiquitous multicellular 105 sessile life style of bacteria and represses motility30. This physiological regulation by c-di- 106 GMP is evolutionary conserved from ancient thermophiles to human pathogens where 107 biofilm formation contributes to chronic infections31-32. 108 In this study, we identified novel DGCs in the Gram-positive pathogen Streptococcus 109 gallolyticus subsp. gallolyticus UCN34 and Gram-negative Proteus mirabilis UEB50. 110 Previously, c-di-GMP signaling network has not been reported in those two species. 111 Moreover, we verified their catalytic activity in vivo by diverse experimental approaches, for 112 instance, chromosomally integrated c-di-GMP specific Vc1 and Vc2 translational 113 riboswitches; a rapid cell-lysate-based matrix-assisted laser desorption/ionization Fourier 114 transform mass spectrometry (MALDI-FTMS)-based screening approach and detection of 115 cyclic di-GMP by standard liquid chromatography-mass spectrometry (LC-MS/MS) of cell 116 extracts. Combined with phenotypic analyses to promote rdar (red, dry, and rough) biofilm 117 formation and motility in the heterologous host Salmonella typhimurium, our results 118 demonstrate STRGAL_UCN34_GGDEF and PMI3101_v to be active DGCs. 119 Bioinformatics and gene synteny analyses predicted that c-di-GMP produced by 120 STRGAL_UCN34_GGDEF and PMI3101_v most likely regulate exopolysaccharide 121 biosynthesis in their native hosts by binding to the C-terminal PilZ domains of type 2 122 glycosyltransferases. Collectively, our results demonstrate the presence of a functional c-di- 123 GMP signaling network predominantly in species of a phylogenetically distinct branch of 124 the genus Streptococcus and maintenance of c-di-GMP regulated cellulose biosynthesis in

4 125 Proteus spp., which poses the question of the ecological importance of the conservation of 126 those signaling pathways.

5 127 RESULTS 128 Phylogeny of the c-di-GMP signaling network in selected Gram-negative and Gram- 129 positive genera. Species within a genus can possess a highly variable c-di-GMP network 130 with variable number of GGDEF domain proteins 131 (https://www.ncbi.nlm.nih.gov/Complete_Genomes/c-di-GMP.html)33. In addition, c-di- 132 GMP signaling components have a statistically significant higher likelihood to be 133 horizontally transferred22, 34. These observations indicate that the c-di-GMP signaling 134 network can rapidly be altered on a short evolutionary scale. Within the class of gamma- 135 proteobacteria2, the Enterobacterales order with families of , Erwiniaceae, 136 Yersiniaceae and Pectobacteriaceae with the type genera Escherichia, Erwinia, Yersinia and 137 Dickeya, respectively, usually possesses a high density of c-di-GMP turnover proteins 138 (examples are Escherichia coli K-12 MG1655 (6.3/Mbp; genome size: 4.64 Mbp); 139 Salmonella typhimurium ATCC14028 (4.4/Mbp; genome size: 4.96 Mbp); Klebsiella 140 pneumoniae subsp. pneumonia (5.3/Mbp; genome size: 5.33 Mbp), Erwinia amylovora 141 ATCC49946 (3.16/Mbp; genome size: 3.8 Mbp), Yersinia enterocolitica subsp. 142 enterocolitica 8081 (4.62/Mbp; genome size: 4.55 Mbp), Serratia marcescens subsp. 143 marcescens Db11 (4/Mbp; genome size: 5.11 Mbp) and Dickeya dadantii DSM18020 144 (5.6/Mbp; genome size: 4.82 Mbp)). Although the less investigated genera of the families 145 and Budviviaceae possess c-di-GMP turnover proteins, within the family 146 Morganellaceae, most representative sequenced species isolates of the genera are 147 consistently missing c-di-GMP turnover proteins (Figure 1A). The family of 148 Morgenellaceae consists presently of eight genera, , Cosenzaea, Moellerella, 149 Morganella, , Proteus, and (Figure 1A; 150 Supplementary Table S1) encompassing predominantly environmental bacteria. Although 151 symbionts occur within e.g. the genus Arsenophonus with highly reduced genome size, the 152 genome size of other members is 3.8 Mbp or higher. While a reason for disappearance of 153 signaling systems including the c-di-GMP signaling network can be a substantial life style 154 change towards a parasitic invasive intracellular life style as exemplified in the human 155 pathogens Shigella spp. and Yersinia pestis, which in its extreme is manifested as a dramatic 156 reduction of genome size with concomitant reduction of all signaling networks33, 35-36, other 157 reasons for a dramatic deterioration of signaling systems are unknown. 158 159

6 160 161 162 163 Figure 1. Occurrence of cyclic di-GMP turnover proteins in representative species of the 164 order Enterobacteriales and genus Streptococcus. A) Occurrence of cyclic di-GMP turnover 165 proteins in representative species of the order Enterobacteriales. Phylogenetic tree based on 166 the relatedness of conserved core genome protein of representative species from all families 167 Budviviaceae, Enterobacteriaceae, Erwiniaceae, Hafniaceae. Morganellaceae, 168 Pectobacteriaceae and Yersiniaceae of the order Enterobacteriales. Proteus spp. contain a 169 single GGDEF domain protein. The number of GGDEF/EAL/GGDEF+EAL domain 170 proteins and (in the case of a single GGDEF domain protein), the domain structure of the 171 GGDEF domain proteins is indicated. The ML tree is based on the relatedness of 33 core 172 genome proteins of the representative species with >90% amino acid identity. B) 173 Occurrence of cyclic di-GMP turnover proteins in representative species of the genus 174 Streptococcus. The domain structure of the GGDEF domain protein(s) encoded by the 175 respective genomes is indicated. A fraction of Streptococcus pneumoniae genomes encode a 176 GGDEF domain protein, which is not considered in the phylogenetic tree. The ML 177 phylogenetic tree of Streptococcus species is based on the relatedness of 77 core genome 178 proteins of representative species with >90% amino acid identity. 179 180 Of note, bioinformatic analyses, however, indicated that species of the genus Proteus and 181 Cosenzaea represented by, for example, Proteus mirabilis HI4320, Proteus hauseri 182 ATCC700826, ATCC49132 and Cosenzaea myxofaciens ATCC19692 183 (WP_066749622.1) still retained a single GGDEF domain protein (Figure 1A, 184 Supplementary Table S1). As the human pathogen P. mirabilis is the most well investigated

7 185 species, we decided to assess the GGDEF domain protein, a potential DGC, of this species. 186 We used P. mirabilis UEB5037, an isolate from a urinary catheter to investigate PMI3101_v, 187 a homolog of PMI3101 from P. mirabilis HI4320. PMI3101_v differs from PMI3101 in the 188 exchange of one amino acid, N133T. However, proteins identical to PMI3101_v are present 189 in P. mirabilis strains deposited in the NCBI database. 190 As another example, upon a reduction of genome size due to habitat restriction such as in 191 the human-adapted species Staphylococcus aureus and Staphylococcus epidermidis, the 192 remaining GGDEF domain proteins have lost their catalytic activity, although they provide 193 physiological functionalities, which manifest, for instance, through protein-protein 194 interactions7-8. The family of Streptococcae is composed of three different genera, 195 Streptococcus, Lactococcus and Lactovum. To our knowledge, reports of a functional c-di- 196 GMP signaling system within the family Streptococcae are missing32. Using BLAST search, 197 we discovered that strains of several streptococcal species, among them Streptococcus 198 gallolyticus, Streptococcus infantarius, Streptococcus equinus, Streptococcus lutetiensis, 199 Streptococcus salivarius, Streptococcus henryi, Streptococcus vestibularis and 200 Streptococcus agalactolyticus (Figure 1B) encode one GGDEF domain containing 201 diguanylate cyclase candidate. Those streptococcal species mainly belong to one of two 202 distinct phylogenetic lineages of the genus Streptococcus, which includes the Pyogenic, 203 Bovis, Mutans and Salivarius groups (Figure 1B;38). We investigated the GGDEF domain 204 protein (WP_012961431.1; named STRGAL_UCN34_GGDEF) of S. gallolyticus supbsp. 205 gallolyticus UCN34, which is identical to F5WZ28_STRG1 from S. gallolyticus supbsp. 206 gallolyticus ATCC43143. 207 208 Basic characteristics of GGDEF domain proteins from S. gallolyticus subspecies 209 gallolyticus and P. mirabilis. The two newly identified GGDEF proteins were aligned with 210 well-characterized DGCs and with the closest relatives inside and outside of the genus. 211 Alignment of the GGDEF domain of STRGAL_UCN34_GGDEF with previously 212 characterized DGC GGDEF domains showed that STRGAL_UCN34_GGDEF invariantly 213 possesses the RxGGDEF motif and the other conserved characteristic signature amino acids 214 suggesting that it is a functional DGC (Figure 2A;39). The full-length protein possesses two 215 transmembrane domains (TM1, residues 16-34; TM2, 44-63) and a group 4 PAS sensory 216 domain (82-185; pfam08448) N-terminal of the GGDEF domain (135-292; pfam00990) 217 (Figure 2B). STRGAL_UCN34_GGDEF homologues are present in S. salivarius, 218 Streptococcus rubneri, Streptococcus thermophiles, Streptococcus vestibularis S. infantarius,

8 219 S. lutetiensis, S. equinus and Streptococcus spp., while more distant homologues lacking one 220 or both transmembrane helices are found in S. henryi, S. suis and Streptococcus parauberis 221 (Blast search; Figure 2C). Outside of the Streptococcus genus proteins homologous over the 222 entire length are found in Weissella soli (WP_070230170.1) and Lactococcus spp. 223 (WP_096819183) (data not shown). 224

225

226 227

9 228 229 230 Figure 2. Characterization of the GGDEF domain protein STRGAL_UCN34_GGDEF of S. 231 gallolyticus subspecies gallolyticus. A) Alignment of the GGDEF domain of 232 STRGAL_UCN34_GGDEF and PMI3101_v with class I, class II and class III diguanylate 233 cyclases as previously documented39. The sequences used in the alignment are 234 PleD_CAUCR, Q9A5I5 (Caulobacter crescentus); WspR_PSEAE, Q9HXT9 (Pseudomonas 235 aeruginosa); YdeH_ECOLI, P31129 (E. coli); AdrA_SALTY, Q9L401 (S. typhimurium); 236 PA4332_PSEAE, Q9HW69 (P. aeruginosa); DgcB_CAUCR, A0A0H3CAN8 (C. 237 crescentus); CD1420_Cdif, Q18BU4 (Clostridium difficile); VCA0965_VIBCH, Q9KKY5 238 (Vibrio cholerae); ECA3270_PECAS, Q6D226 (Pectobacterium atrosepticum); 239 GSU1658_GEOSL, Q74CL4 (Geobacter sulfurreducens); MXAN_2643, Q1D911 240 (Myxococcus xanthus); YciR_SALTY, A0A0F6B1Y8; DGC1_KOMXY, O87374 241 (Komagataeibacter xylinus); Y1354_MYCTU, Rv1354c/P9WM13 (Mycobacterium 242 tuberculosis); SE_0528_STAES, Q8CTF5 (S. epidermidis); BifA_PSEAE, Q9HW35 (P. 243 aeruginosa), B3F0R5 (Proteus mirabilis). In VCA0965, the sequence ‘RATNQHDY’ was 244 deleted to optimize the alignment. The secondary structure is based on PDB: 2WB4, crystal 245 structure of activated diguanylate cyclase PleD. B) Domain structure of 246 STRGAL_UCN34_GGDEF and selected GGDEF domain proteins STRGAL 247 (WP_012961431.1; S. gallolyticus subsp. gallolyticus UCN34), STRSUI (WP_079269016.1; 248 Streptococcus suis), STRPAR (WP_037620421.1; Streptococcus parauberis), STRHEN 249 (WP_018163948.1; S. henryi). C) Alignment of STRGAL_UCN34_GGDEF with selected 250 GGDEF domain proteins from other Streptococcus species. STRPAR (WP_037620421.1; S.

10 251 parauberis), STRUEB (WP_115242244.1; Streptococcus uberis), STRGAL 252 (WP_012961431.1; S. gallolyticus subsp. gallolyticus UCN34), STRSUI (WP_079269016.1; 253 Streptococcus suis), STR_KCJ4950 (WP_132869219.1; Streptococcus sp. KCJ4950), 254 STRALA (WP_154455231.1; Streptococcus alactolyticus), STRHEN (WP_018163948.1; S. 255 henryi), STRDYS (VTS57701.1; Streptococcus dysgalactiae subsp. equisimilis). D) Domain 256 alignment of STRGAL_UCN34_GGDEF with homologous GGDEF domain proteins 257 outside the genus Streptococcus. Order and identity as in E. E) Sequence alignment of 258 STRGAL_UCN34_GGDEF with homologous GGDEF domain proteins outside the genus 259 Streptococcus. Protein identity is indicated in the figure. SPIPER_WP_149568496.1 of 260 Spirochaeta perfilievii; ESU32445.1/G3A_11215 of Bacillus sp. 17376; 261 GAMMAP_OGT59688.1, A3E01_04295 of bacterium 262 RIFCSPHIGHO2_12_FULL_63_22; HAJ95480.1 of Actinobacteria bacterium; 263 HBT59617.1 of Acholeplasmataceae bacterium. A, C, E: Alignments displayed with 264 ESPript 3.0. TTT and TT indicate strict α- and β-turns. Residues are colored according to 265 physic-chemical properties. Framed residues show more than 70% similarity. Hashtag 266 indicates any amino acid of N/D/Q/E/B/Z, dollar indicates any amino acid of L/M and 267 percentage indicates any amino acid of F/Y. Relative accessibility values are displayed in A 268 below the consensus sequence. 269 270 PMI3101_v is composed of a GGDEF domain (residues 135-293) with the conserved 271 RxGGDEF motif linked to a N-terminal class 9 PAS domain (residues 25-123). The same 272 architecture is also found in P. vulgaris, Proteus columbae, Proteus alimentorum, P. hauseri 273 and Proteus genomosp. 6 (Figure 2A, 3). Outside of the genus Proteus, homologous 274 proteins are found in C. myxofaciens (Figure 3C). Structural models of the PAS-GGDEF 275 domains showed that both PAS domains have the highest structural homology to the PAS 276 domain of the P. aeruginosa PAS-GGDEF-EAL domain protein PA0861 (PDB: c5xgdA) 277 and contain a S-helix like linker region connecting the PAS and GGDEF domain 278 (Supplementary Figure S1;40). Of note, some close homologues of both proteins are more 279 complex, implying that a modulation of domain composition or convergent evaluation of 280 GGDEF domain proteins has already occurred (Figure 2D and E, 3C and D). 281

11 282

283 284 Figure 3. Characterization of the GGDEF domain protein PMI3101_v of Proteus mirabilis. 285 A) Domain structure of PMI3101_v. B) Alignment of the GGDEF protein of PMI3101_v 286 with homologous GGDEF domain proteins from Proteus spp. PROCOL (WP_160230836.1; 287 Proteus columbae), PROGEN6 (WP_109373336.1, Proteus genomosp. 6), PROALI 288 (WP_099073873.1; Proteus alimentorum), PROVUL (WP_072070504.1; P. vulgaris), 289 PROHAU (WP_129038785.1; P. hauseri), PROMIR_UEB50_GGDEF (PMI3101_v; this 290 work). C) Domain alignment of PMI3101_v with homologous GGDEF domain proteins 291 outside the genus Proteus. Order as in D. D) Sequence alignment of PMI3101_v with 292 homologous GGDEF domain proteins outside the genus Proteus. 293 PROMIR_UEB50_GGDEF (PMI3101_v; this work), HERBIAS_GGDEF (RZI42720.1 of 294 Herbaspirillum sp. HC18), METAKO_GGDEF (BBE75897.1/MRY16398_09530 of 295 Metakosakonia sp. MRY16-398), BACMEG_GGDEF (AEN88583.1 of Bacillus 296 megaterium WSH-002), THIDEN_GGDEF (WP_092998343.1 of Thiohalomonas 297 denitrificans). B, D: Alignments displayed with ESPript 3.0. TTT and TT indicate strict α- 298 and β-turns. Residues are colored according to physic-chemical properties. Framed residues 299 show more than 70% similarity. Hashtag indicates any amino acid of N/D/Q/E/B/Z, dollar

12 300 indicates any amino acid of L/M, percentage indicates any amino acid of F/Y and 301 exclamation mark indicates any amino acid of I/V. 302 303 Assessment of riboswitch-based screening system. In order to assess the functionality of 304 these candidate DGCs, we used a previously developed riboswitch based system23, 41 to 305 detect alterations in c-di-GMP concentrations in vivo. The genome of the pathogenic 306 bacterium V. cholerae encodes two c-di-GMP specific riboswitches, Vc1 and Vc223, located 307 upstream of the gbpA and VC1722 gene, respectively. The Vc1 and Vc2 riboswitches have 308 been characterized as ‘off’ and ‘on’ riboswitches with c-di-GMP to promote and repress the 309 expression of the downstream genes, respectively23. Nevertheless, both riboswitches 310 consistently functioned as ‘off’ riboswitches in response to elevated concentrations of c-di- 311 GMP in E. coli TOP10 (41-42, Supplementary Figure S2). This is probably due to a 312 remodeled conformation of the riboswitch upon binding to cellular components in E. coli. 313 Downregulation of beta-galactosidase expression by the Vc1 and Vc2 riboswitches upon 314 overexpression of PMI3101_v and STRGAL_UCN34_GGDEF indicated DGC activity of 315 both proteins (Figure 4). 316

317 318 Figure 4. Detection of alterations in c-di-GMP levels by Vc1 and Vc2 riboswitches for 319 candidate diguanylate cyclases PMI3101_v and STRGAL_UCN34_GGDEF in E. coli 320 TOP10. (A) The Vc-riboswitch based c-di-GMP sensors monitor the in vivo level of c-di- 321 GMP. The riboswitch has been engineered at the 5’-UTR of the lacZ gene encoding β- 322 galactosidase, which allows monitoring c-di-GMP level by changing the level of the 323 production of β-galactosidase resulting in enhanced catalytic activity that transforms into a

13 324 change of the colony color in the presence of the substrate X-gal. Vc1 and Vc2 riboswitches 325 behave as off riboswitches in the E. coli TOP10 strain. Without c-di-GMP, production of β- 326 galactosidase is turned on resulting in a blue colony. Elevated c-di-GMP level by expression 327 of DGCs downregulates production of β-galactosidase, resulting in a white colony. In E. coli 328 TOP10 wild type strain, there are residual amounts of c-di-GMP, resulting in a light blue 329 colony. Response to expression of the GGDEF domain proteins was evaluated by the Vc1 330 (B) or by the Vc2 riboswitch (C) with the assessment of the beta-galactosidase activity on 331 the substrate X-Gal (blue dye precipitate as final output). The recombinant E. coli TOP10 332 strain containing the constructs was grown in the presence or absence of L-arabinose as 333 indicated. All plates were incubated at 28 °C for 24 h. 334 335 In contrast, there was no effect upon overexpression of the catalytic mutants,

336 PMI3101_vD215A and STRGAL_UCN34_GGDEFD273A. Furthermore, we observed that 337 overexpression of STRGAL_UCN34_GGDEF upon induction by L-arabinose at a 338 concentration of 0.01% or higher had a substantial cytotoxic effect on E. coli TOP10.

339 Cytotoxicity was not observed upon expression of the STRGAL_UCN34_GGDEFD273A 340 catalytic mutant, indicating that potentially the high amounts of c-di-GMP produced by 341 STRGAL_UCN34_GGDEF, which does not have an RxxD I-site motif N-terminal of the 342 GGDEF motif43 is toxic to the cells. Moreover, the wildtype protein was expressed at a 343 lower level than inactive mutant (Supplementary Figure S3) demonstrating again that c-di- 344 GMP and/or the catalytic activity of STRGAL_UCN34_GGDEF are responsible for the 345 cytotoxic effect. Cytotoxicity upon overexpression of DGCs has been observed previously 346 in the E. coli BL21 background43. The selective cytotoxicity upon overexpression of 347 STRGAL_UCN34_GGDEF, but not PMI3101_v in E. coli TOP10 indicates a distinct 348 mechanism of action by STRGAL_UCN34_GGDEF. 349 350 The effect of STRGAL_UCN34_GGDEF and PMI3101_v on intracellular c-di-GMP. 351 The riboswitch assay is an indirect approach to assess c-di-GMP levels in vivo. To confirm 352 that STRGAL_UCN34_GGDEF and PMI3101_v can produce c-di-GMP in vivo, we 353 developed a MALDI-FTMS based screen to provide a first evaluation. We overexpressed 354 STRGAL_UCN34_GGDEF, PMI3101_v and their catalytic mutants in E. coli TOP10 and 355 subsequently used the crude lysates to assess c-di-nucleotide production by MALDI-FTMS 356 mass spectrometry. MALDI-FTMS determines the weight-to-charge ratio by measuring the 357 flying time of the ionized molecules in the electric field. Because major molecules have a

14 358 single positive charge, ions are actually separated by their mass. As no extraction of the 359 molecule or isolation of the protein is necessary, this experimental approach can be applied 360 as a rapid screen for DGC activity. 361

362 363 364 Figure 5. Mass spectrometric analysis of cyclic di-nucleotides produced in cell lysates of E. 365 coli TOB10 upon expression of PMI3101_v and STRGAL_UCN34_GGDEF. A) MALDI- 366 FTMS analysis of cell lysates from E. coli TOP10 overexpressing GGDEF domain proteins 367 PMI3101_v, STRGAL_UCN34_GGDEF and their catalytic mutants. The m/z spectrum 368 from 685 to 700 (inlet from 600 to 700) is shown. The ion with the m/z of 691.104 was 369 detected from chemically synthesized c-di-GMP and the control lysate with overexpression 370 of the diguanylate cyclase AdrA44. A minor peak was detected in the lysate from E. coli 371 TOP10 vector control (pBAD28). Enhanced peaks were seen in the lysates derived from E. 372 coli TOP10 expressing STRGAL_UCN34_GGDEF and PMI3101_v, but not in lysates from

373 cells expressing the catalytic mutants STRGAL_UCN34_GGDEFD273A and PMI3101_vD215A. 374 Cyclic di-AMP with a m/z of 659.114 is undetectable in all samples. All the intensities are 375 normalized to the most intense peak with its y-value set to 100. B) Cyclic di-GMP 376 concentrations as measured by LC-MS/MS. PMI3101_v, STRGAL_UCN34_GGDEF and 377 their catalytic mutants were overexpressed in S. typhimurium UMR1.

15 378 379 The ionized c-di-GMP, c-di-AMP and c-GMP-AMP generally have a mass/charge (m/z) 380 ratio [M + H+] + of 691.104 (Figure 5), 659.114 and 675.10710, respectively. A signal 381 corresponding to c-di-GMP was detectable in the samples from cells expressing 382 STRGAL_UCN34_GGDEF or PMI3101_v whereas no signal was measurable when the 383 catalytic mutants and the vector control were expressed (Figure 5), which suggested that c- 384 di-GMP could be synthesized by STRGAL_UCN34_GGDEF and PMI3101_v in E. coli 385 TOP10. Nonetheless, a mass-to-charge of 691.1040 could be created by other molecules 386 with the same molecular weight. 387 To this end, we monitored the cyclic di-nucleotide activity of PMI3101_v and 388 STRGAL_UCN34_GGDEF conventionally by LC-MS/MS after extraction of the molecules. 389 Indeed, again, we observed high concentrations of c-di-GMP upon expression of 390 PMI3101_v and STRGAL_UCN34_GGDEF, but concentrations remained unaltered upon 391 expression of the catalytic mutants (Figure 5). Proteins were expressed in the S. 392 typhimurium UMR1 background (see below) due to the elevated cytotoxic effect of 393 STRGAL_UCN34_GGDEF in E. coli TOP1. 394 We also purified PMI3101_v and attemped to assess its enzymatic activity by in vitro 395 enzymatic assay and thin-layer chromatography (TLC), however, purified PMI3101_v did 396 not show the expected catalytic activity. We concluded that certain co-factors or signals to 397 stimulate activity are missing in vitro. 398 399 The diguanylate cyclases STRGAL_UCN34_GGDEF and PMI3101_v promote the 400 rdar biofilm morphotype of S. typhimurium. Cyclic di-GMP regulates a multi-cellular 401 behavior of S. typhimurium, rdar biofilm morphotype formation, which is characterized by a 402 distinct red, dry and rough colony morphology on Congo Red agar plate characterized by 403 the expression of extracellular components cellulose and curli fimbriae, is positively 404 regulated by c-di-GMP45. This behaviour serves as a biologically relevant read-out for c-di- 405 GMP levels. To investigate whether DGCs STRGAL_UCN34_GGDEF and PMI3101_v 406 affect rdar biofilm formation, we expressed the proteins in S. typhimurium UMR1 that 407 produces necessary components, cellulose and curli fimbriae, for rdar morphotype at 28 °C. 408 The native STRGAL_UCN34_GGDEF and PMI3101_v, but not their catalytic mutants up- 409 regulated the rdar morphotype (Figure 6A), which was another evidence supporting that 410 these two proteins are catalytically active in vivo. 411

16 412 413 414 Figure 6. Effect on biofilm formation and motility upon overexpression of 415 STRGAL_UCN34_GGDEF and PMI3101_v. (A) Over-expression of 416 STRGAL_UCN34_GGDEF or PMI3101_v up-regulated the rdar morphotype of S. 417 typhimurium UMR1. The wild-type strain S. typhimurium UMR1 displays an up-regulated 418 rdar morphotype upon overexpression of STRGAL_UCN34_GGDEF or PMI3101_v, but 419 not upon overexpression of the catalytic mutants. The mutant strain S. typhimurium MAE50 420 serves as the negative control for the rdar morphotype. VC represents the vector control 421 pBAD28. The plate was incubated at 28 °C for 72 h. Over-expression of 422 STRGAL_UCN34_GGDEF (B) and PMI3101_v (C) down-regulated apparent swimming 423 motility of S. typhimurium UMR1. STRGAL_UCN34_GGDEF significantly inhibits 424 swimming of UMR1 compared with the vector control (VC) and its mutant

425 STRGAL_UCN34_GGDEFD271A. Similarly, UMR1 expressing PMI3101_v displayed

426 declined swimming motility compared the vector control and PMI3101_vD215A. Bars 427 represent means of three independent experiments with standard deviation analyzed by 428 Student test (t-test). **, p < 0.01 and ***, p < 0.001, respectively.

429 430 The diguanylate cyclases STRGAL_UCN34_GGDEF and PMI3101_v suppress 431 motility of S. typhimurium. The transition from motility to sessility contributes to the 432 development of multicellular behavior with motility to be inhibited by c-di-GMP30. Thus, 433 we assessed the effect of STRGAL_UCN34_GGDEF and PMI3101_v, on flagellar-based 434 motility of S. typhimurium UMR1 as observed by the apparent motility in a semi-solid LB 435 agar plate at 37 °C (similar results were obtained at 28 °C). Wild type 436 STRGAL_UCN34_GGDEF (Figure 6B) and PMI3101_v (Figure 6C) consistently down- 437 regulated the swimming ability compared to the positive control S. enterica serovar 438 Typhimurium UMR1, while the corresponding mutants had only a minor effect, which 439 suggested that the two novel GGDEF domain proteins affect flagella-mediated motility by

17 440 producing c-di-GMP. Noteworthy, the catalytic mutant of STRGAL_UCN34_GGDEF 441 slightly promoted apparent swimming. This marginal up-regulation of swimming motility

442 by STRGAL_UCN34_GGDEFD273A is probably caused by an alternative binding site for c- 43 443 di-GMP, although an I-site is lacking . However, STRGAL_UCN34_GGDEFD271A might 444 possess an alternative binding site for c-di-GMP. The mutant of PMI3101_v still suppressed 445 swimming of UMR1 to a small extent compared with the vector control, which can be 446 explained by residual catalytic activity of the mutated GGDEF domain. 447 448 Genomic context of the diguanylate cyclase STRGAL_UCN34_GGDEF in 449 Streptococcus gallolyticus. Our experiments showed that STRGAL_UCN34_GGDEF of S. 450 gallolyticus subsp. gallolyticus and PMI3101_v of P. mirabilis are bona fide DGCs. Being 451 the sole DGC encoded by the respective chromosome, we were wondering about the 452 genomic context of the gene products and their physiological targets. 453 STRGAL_UCN34_GGDEF is the first gene of a five-gene cluster that is inserted into the 454 serine tRNA locus flanked by the trmB locus encoding tRNA (guanosine (46)-N7) 455 methyltransferase and rimP encoding ribosome maturation factor (Figure 7A). Blast search 456 showed that this islet is present in S. gallolyticus, S. infantarium, S. luteniensis, S. equinus 457 and Streptococcus sp. (Figure 7A and data not shown), however, not all strains within a 458 species possess this insertion. In Streptococcus macedonicus ACA-DC198, a part of the 459 island is missing, while Streptococcus pasteurianus does not possess the islet at all. 460 Downstream of STRGAL_UCN34_GGDEF are genes coding for two type 2 461 glycosyltransferases. SGGB_RS02225, which overlaps with SGGB_RS02230, codes for a 462 DPM1-GtrA hybrid protein. DPM1 is the catalytic subunit of eukaryotic dolichol-phosphate 463 mannose synthase to which the N-terminal part is homologous (Figure 7B). The C-terminal 464 part consists of a GtsA-like protein with four transmembrane helices of unknown function. 465 Furthermore, SGGB_RS02230 codes for a cellulose synthase-like protein CelA. Intriguingly, 466 CelA has a C-terminal PilZ domain, which suggests that it binds c-di-GMP to regulate the 467 catalytic activity. The SGGB_RS02235 gene downstream of SGGB_RS02230 encodes for a 468 membrane protein with 16 predicted transmembrane helices of unknown function followed 469 by a short coding sequence. Besides CelA, none of these proteins has any identifiable c-di- 470 GMP binding site. 471

18 472 473

474 Figure 7. Genomic context of the SGGB_RS02220 diguanylate cyclase in Streptococcus 475 gallolyticus subsp. gallolyticus UCN34. A) Operon structure in S. gallolyticus subsp. 476 gallolyticus UCN34 and S. equinus. A serine tRNA locus is flanked by the trmB locus 477 encoding tRNA (guanosine (46)-N7) methyltransferase and rimP encoding ribosome 478 maturation factor. The GGDEF diguanylate cyclase is the first gene of a five gene cluster

479 genomic islet inserted into a tRNAserine locus. The two downstream overlapping open 480 reading frames encode two type 2 glycosyltransferases. SGGB_RS02225, which overlaps 481 with SGGB_RS02230 encodes for a DPM1-GtrA hybrid protein and SGGB_RS02230 codes 482 for a cellulose synthase-like protein CelA. The SGGB_RS02235 gene encodes for a 483 membrane protein with 16 predicted transmembrane helices of unknown function. B) 484 Localisation and functionality of gene products of the genomic islet of S. gallolyticus 485 UCN34. Two type 2 glycosyltransferases, a DPM1-GtrA hybrid protein and a cellulose 486 synthase-like protein CelA are membrane proteins involved in (exo)polysaccharide 487 synthesis. The C-terminal PilZ domain of CelA suggests binding of c-di-GMP for regulation 488 of the catalytic activity. The membrane protein with 16 predicted transmembrane helices 489 and the 60 amino acid protein is of unknown function.

490 491 In summary, CelA is a potential target for the c-di-GMP signaling pathway in S. gallolyticus 492 (Figure 7B). 493 494 Genomic context of GGDEF domain in Proteus mirabilis. Investigating the gene synteny 495 in P. mirabilis, we found that PMI3101_v is embedded into a type IB-like hybrid cellulose 496 biosynthesis operon consisting of a bcsOABCD-dgcPMI3101_v-galU-bcsZ structure (Figure 497 8A). Such an operon is invariantly present in representatives of Proteus species such as P. 498 vulgaris, P. hauseri, P. cibarium and P. columbae, Specifically, BcsA and BcsB are an inner 499 membrane-embedded cellulose synthase and an N-terminal membrane-anchored periplasmic

19 500 catalytic subunit, respectively, which constitute the core of cellulose secretion system 28, 46. 501 In addition, there are a few accessory proteins that do not contribute to the synthesis but the 502 translocation and packing of cellulose including an outer membrane pore BcsC, the 503 periplasmic factor BcsD influencing the crystallinity of cellulose macromolecules, BcsZ a 504 periplasmic cellulase and the uncharacterized component BcsO47. Intriguingly, a GalU 505 encoding ORF has been unconventionally inserted into this operon together with the 506 diguanylate cyclase ORF. GalU reversibly catalyzes the synthesis of UTP-glucose, which 507 suggests close proximity of synthesis of the UDP-glucose substrate with the cellulose 508 synthase to produce the 1,4-beta-glucan cellulose macromolecule. Thus, although cellulose 509 biosynthesis has not been observed in Proteus spp. and we did not observe cellulose 510 production upon plasmid-based expression of PMI3010_v in P. mirabilis, P. mirabilis and 511 other Proteus spp. are predicted to synthesize cellulose (Figure 8B;47). 512

513 514 515 Figure 8. Genomic context of the PMI3101 diguanylate cyclase in P. mirabilis. A) Operon 516 structure in P. mirabilis HI4320. The type IB-like hybrid cellulose biosynthesis operon 517 consists of bcsOABCD-dgcPMI3101-galU-bcsZ. B) Localisation and functionality of gene 518 products of the modified class IB cellulose biosynthesis operon in P. mirabilis HI4320. The 519 type IB-like hybrid cellulose biosynthesis operon consisting of bcsOABCD-dgcPMI3101- 520 galU-bcsZ codes for a seemingly functional cellulose biosynthesis complex. The core 521 component of the complex is the functional cellulose synthase consisting of BcsA (catalytic 522 subunit) and BcsB (inner membrane anchored periplasmic protein), the outer membrane 523 pore BcsC, the BcsD periplasmic factor required for crystallinity and the cellulase BcsZ. 524 GalU reversibly catalyses the synthesis of UTP-glucose suggesting direct delivery of the 525 UDP-glucose substrate to produce the 1,4-beta-glucan cellulose macromolecule. The

20 526 cytoplasmic diguanylate cyclase potentially directly delivers the product c-di-GMP for 527 cellulose biosynthesis. 528

21 529 DISCUSSION AND CONCLUSION 530 In this work, we have initially verified and characterized two novel DGCs from a Gram- 531 positive and a Gram-negative species that have not been experimentally demonstrated to 532 possess a c-di-GMP signaling network. Despite being active enzymes, the functionality of 533 STRGAL_UCN34_GGDEF and PMI3101_v in the original organism is still undefined.

534 Although, P. mirabilis is well-known for the flagella-mediated sswimming and warming 535 motility48, it possesses an obviously functional cellulose biosynthesis operon with the 536 catalytic subunit of the cellulose synthase BcsA encompassing a C-terminal PilZ domain 537 receptor for c-di-GMP28, 47 which can be involved in cell aggregation in one of the diverse 538 habitats where P. mirabilis forms biofilms49. Nevertheless, overexpression of PMI3101_v in 539 P. mirabilis UEB50 had no effect on motility and cellulose production (unpublished data).

540 Interestingly, in S. gallolyticus subsp. gallolyticus UCN34, the DGC colocalizes with 541 genes coding for type 2 glycosyltransferases, one of them a cellulose-like synthase CelA 542 harboring a PilZ domain. Whether the PilZ domain can bind to c-di-GMP requires deeper 543 investigation, but bioinformatic analyses indicates that the c-di-GMP binding motif is 544 present (data not shown). Of note, no EAL or HD-GYP c-di-GMP specific PDE was 545 identified in S. gallylyticus subsp. gallolyticus and P. mirabilis. This finding is in line with 546 observations in other bacteria, which possess also a sole functional diguanylate cyclase, but 547 no identified c-di-GMP hydrolyzing enzyme36, 50. Although conventional c-di-GMP specific 548 EAL and HD-GYP domain phosphodiesterases are not present, other ubiquitous 549 phosphodiesterase domains such as the HDc domain of the bifunctional ppGpp 550 synthase/hydrolase SpoT are candidates for such a residual functionality.

551 To investigate the c-di-GMP regulatory network in S. gallylyticus subsp. gallolyticus and 552 P. mirabilis will improve our understanding about the ecological impact of c-di-GMP 553 signaling in bacteria. S. gallolyticus subsp. gallolyticus has been associated with colorectal 554 cancer and causes endocarditis and bacteremia predominantly in the aged population51-53. 555 Cyclic di-GMP mediated biofilm formation by expressing cellulose-like exopolysaccharides 556 could contribute to the pathogenesis of S. gallolyticus subsp. gallolyticus and related 557 bacterial species which possess the c-di-GMP signaling system. However, whether a c-di- 558 GMP signaling system is found in all strains of the mentioned species is unknown. Of note, 559 a complex GGDEF protein is also present in few strains of Streptococcus pneumoniae (data 560 not shown). In P. mirabilis, a frequent cause of catheter associated urinary tract infection, c-

22 561 di-GMP mediated cellulose production might be involved in the modulation of virulence as 562 observed for other bacteria42, 54-55. Further functional analysis of the DGCs and their 563 networks will provide insights into the role of the c-di-GMP signaling network in the 564 pathogenesis of infection caused by those bacterial species.

565 An initial characterization of the DGC activity came from the assessment of a c-di-GMP 566 responsive riboswitch23. Under our experimental conditions, the system seems to be 567 especially robust to detect diguanylate cyclases. Previously, riboswitch-based fluorescent 568 biosensor consisting of dual aptamers, a c-di-GMP binding aptamer and the spinach aptamer 569 that can bind to fluorophore 3,5-difluoro-4-hydroxybenzylidene imidazolinone (DFHBI) 570 were designed to visualize the intracellular CDN level to detect c-di-GMP turnover 571 proteins22. Another biosensor for monitoring changes in intracellular c-di-GMP level is 572 based on modulation of fluorescence resonance energy transfer (FRET) upon binding to the 573 PilZ domain protein containing cyan CFP and yellow YFP fluorescent protein fusions. 574 Furthermore, we developed a MALDI-FTMS based approach to detect intracellular c-di- 575 GMP levels without isolating the compound. To initially assess the catalytic activity is 576 especially useful for difficult to purify proteins. Both approaches also do not require cutting- 577 edge facilities, such as fluorescence microscopy and fluorescence activated cell sorting.

578 As expected, when STRGAL_UCN34_GGDEF and PMI3101_v were over-expressed in S. 579 typhimurium UMR1, they can promote rdar morphotype expression and inhibit motility30. 580 Even though STRGAL_UCN34_GGDEF does not possess an RxxD I-site43, the catalytic 581 mutant slightly up-regulated the swimming motility of S. typhimurium UMR1 indicative for 582 a depletion of the second messenger molecule. However, the GGDEF domain DGC 583 XCC4471 lacking the I-site can bind a semi-intercalated c-di-GMP dimer56. We speculate 584 that the GGDEF domain protein STRGAL_UCN34_GGDEF with a mutated GGAEF site 585 has the ability to bind c-di-GMP.

586 In conclusion, the cyclic di-GMP network is more widespread than previously anticipated 587 with the production of a c-di-GMP activated cellulose or cellulose-like macromolecule as a 588 fundamental physiological trait in distantly related bacteria.

23 589 METHODS 590 Bacterial strains, plasmids and growth conditions. E. coli TOP10 was grown either in 591 Luria-Bertani (LB) medium or on an LB agar plate, while S. typhimurium UMR1 592 (ATCC14028), MAE50 (UMR1 ΔcsgD; biofilm negative control), MAE108 (UMR1 ΔfliC 593 ΔfljB; motility negative control) were growth on LB without salt agar at 30 °C or 37 °C. S. 57 594 gallolyticus subsp. gallolyticus UCN34 was grown at 37 °C with 5% carbon dioxide (CO2) 595 in brain heart infusion (BHI) broth (Oxide) or on a BHI agar plate. P. mirabilis UEB5037 596 was grown in nutrient broth or on a nutrient agar plate (Difco) at 37 °C. Supplements were 597 ampicillin (100 µg/mL) (Sigma) for the selection for recombinant strains and L-arabinose at 598 the indicated concentration for induction of protein expression. All strains and plasmids 599 used in this study are listed in Table 1. 600 601 Bioinformatic analysis. Basic Local Alignment Search Tool (BLAST; 602 https://blast.ncbi.nlm.nih.gov/Blast.cgi) was used for database search of homologous protein 603 and nucleotide sequences58. Protein domains were identified consulting the Pfam 604 (http://pfam.xfam.org), COG (Clusters of Orthologous Groups of proteins), Interpro 605 (https://www.ebi.ac.uk/interpro/) and UniProt (https://www.uniprot.org) databases. PAS 606 domains were further characterized using the Phyre259 and I-TASSER60 server. Proteins 607 were aligned using Clustal61. Alignments were processed with ESPript 3.062. 608 For genome-based phylogenetic analysis, the FASTA genomic sequence files of all species 609 mentioned below were downloaded from the NCBI website. Gene detection and genome 610 annotation were performed using the annotation package Prokka63. The core genomes from 611 the target data set were calculated by Roary software64. Alignments from all proteins were 612 concatenated and used for maximum-likelihood genome-based phylogenetic tree 613 calculations by SeaView software65. If not otherwise indicated, the respective reference 614 genomes were used. Streptococcus files were for d: Streptococcus gallolyticus subsp. 615 gallolyticus UCN34, S. gallolyticus subsp. gallolyticus ATCC 43143, Streptococcus 616 salivarius SK126, S. salivarius K12, Streptococcus equinus ATCC 9812 and ATCC 700338, 617 Streptococcus infantarius subsp. infantarius ATCC BAA-102 and NCTC 13760, 618 Streptococcus parauberis KRS-02083, S. parauberis NCFD 2020, Streptococcus lutetiensis 619 033, S. lutetiensis NCTC 8796, Streptococcus henryi DSM19005 and A4, Streptococcus sp. 620 FDAARGOS-192, Streptococcus sp. CNU77-61, Streptococcus sp. CNUG3, Streptococcus 621 macedonicus ACA-DC198, Streptococcus pasteurianus NCTC13784, S. pasteurianus 622 ATCC43144, Streptococcus agalactiae 2603V/R, Streptococcus pneumoniae R6,

24 623 Streptococcus pyogenes M1 GAS, Streptococcus mutans UA159, Streptococcus suis 624 BM407, Streptococcus mitis B6, Streptococcus thermophilus JIM 8232, Streptococcus equi 625 subsp. zooepidemicus H70, Streptococcus oralis Uo5, Streptococcus uberis 0140J, 626 Streptococcus anginosus C238, Streptococcus dysgalactiae subsp. equisimilis AC-2713, 627 Streptococcus parasanguinis ATCC 15912, Streptococcus intermedius B196, Streptococcus 628 salivarius NCTC8618 and Streptococcus pseudopneumoniae IS7493. 629 FASTA files for proteins of the order of Enterobacterales were representative species from 630 the different families and genera. FASTA files were from Enterobacteriaceae: Escherichia 631 coli MG1655, Salmonella enterica ATCC14028, Citrobacter freundii ATCC8090, 632 Klebsiella pneumoniae HS11286, Cedecea davisae DSM4568, Cronobacter sakazakii 633 ATCC29544, Leclercia adecarboxylata NCTC13032, Yokenella regensburgei W13, 634 Enterobacter cloacae subsp. cloacae ATCC 13047, Enterobacter hormaechei subsp. 635 steigerwaltii DSM16691. Erwiniaceae: Erwinia amylovora ATCC49946, Pantoea stewartii 636 DC283. Pectobacteriaceae: Dickeya dadantii DSM18020, Pectobacterium carotovorum PC1, 637 Pectobacterium atrosepticum JG1008. Yersiniaceae: Yersinia enterocolitica subsp. 638 enterocolitica 8081, Yersinia pseudotuberculosis NTCT20175, Serratia marcescens subsp. 639 marcescens Db11 and CAV1761, Serratia plymuthica S13, Ewingella americana 640 ATCC33852. Budviviaceae: Budvivica aquatic DSM5075, Budvivica diplodum 1D9, Pragia 641 fontium NCTC12284. Morganellaceae: ATCC29999, 642 Xenorhabdus nematophila ATCC19061, MRSN2154, 643 Providencia alcalifaciens DSM30120, Providencia rustigianii NCTC6933, Providencia 644 sneebia DSM 19967, Providencia heimbachae NCTC12003, Providencia alcalifaciens 645 DSM30120, Proteus vulgaris NCTC13145, Proteus mirabilis NCTC4199, Proteus hauseri 646 15H5D-4a, ATCC25830, Morganella psychrotolerans GCSLMp3, 647 Arsenophonus FIN, Cosenzaea myxofaciens ATCC19692, Moellerella wisconsensis 648 ATCC35017. Hafniaceae: Hafnia alvei FB1, Hafnia paralvei FDAARGOS158, 649 Obesumbacterium proteus DSM2777; tarda EIB202, Edwardsiella 650 anguillarum ET080813, Edwardsiella ictaluri 93146, Edwardsiella piscicida C07087, 651 Edwardsiella hoshinae ATCC35051. FASTA files of protein sequences from entire 652 genomes were screened by ScanProsite (https://prosite.expasy.org/scanprosite/) for the 653 occurrence of GGDEF, EAL and HD-GYP domains and/or data were taken from the 654 literature (https://www.ncbi.nlm.nih.gov/Complete_Genomes/c-di-GMP.html). 655

25 656 Riboswitch construction. The Vc1 c-di-GMP riboswitch was amplified from the genomic 657 DNA of V. cholerae strain C6706 comprising from -240 bp to +20 bp with respect to the 658 ORF. The lac promoter was amplified from pUC19. Vc1 and lac promoter were ligated by 659 overlapping PCR, the fragment digested with SmaI and BamHI (New England Biolabs) and

660 ligated into the translational reporter vector pRS414 in frame upstream the 9th codon of the 661 lacZ reporter gene (kindly received from Prof. R. R. Breaker). The genomic DNA was 662 extracted by GenEluteTM Bacterial Genomic DNA kit (Sigma-Aldrich). The Vc2 riboswitch 663 cloned into pRS414 vector was also a gift from Prof. Ronald R. Breaker. 664 Subsequently, riboswitches along with lacZY were amplified from the pRS414 vector. 665 The PCR products and pGRG25 containing the inducible highly efficient recombination 666 system for integration into the attTn7 site were digested with PacI and NotI (New England 667 Biolabs). Digested pGRG25 was dephosphorylated by Antarctic phosphatase (New England 668 Biolabs) and ligated with the PCR product by T4 DNA ligase (Roche Life Science). The 669 resulting product was transformed into chemo-competent E. coli TOP10 cells grown at 670 30 °C. Insertion of genes into the chromosomal attTn7 attachment site was performed as 671 described66. Primers used for cloning and sequencing are listed in Table 2.

672 The cytoplasmic alteration of the c-di-GMP level affects the expression of the beta- 673 galactosidase encoded by lacZ. An ‘on’ riboswitch, promotes the expression of the lacZ 674 gene upon elevated level of ligand, whereas an ‘off’ riboswitch behaves in the opposite way 675 and decreases the expression of beta-galactosidase upon binding of the ligand. The alteration 676 of beta-galactosidase expression is visualized on plates containing 5-bromo-4-chloro-3- 677 indolyl-β-D-galactopyranoside (X-gal), a substrate for beta-galactosidase. The reaction 678 products are galactose and 5-bromo-4-chloro-3-hydroxyindole, which in subsequent 679 oxidation steps develops into an insoluble dye.

680 681 Cloning of GGDEF domain proteins. STRGAL_UCN34_GGDEF and PMI3101_v were 682 PCR amplified from the genomic DNA of S. gallolyticus subsp. gallolyticus UCN3467 and P. 37 683 mirabilis UEB50 , respectively. A hexahistidine-tag (His6-tag) was introduced at the C- 684 terminus of the open reading frames (ORFs) during PCR. The amplified fragments were 685 digested by XmaI and XbaI (New England Biolabs) and subsequently cloned into the 686 corresponding sites of the pBAD28 vector. The site-directed mutagenesis of the GGDEF 687 domain to GGAEF was performed using QuickChange II Site-Directed Mutagenesis Kit 688 (Agilent). Primers are listed in Table 2.

26 689 Assessment of c-di-GMP synthesis by riboswitch-regulated β-galactosidase activity. 690 Vectors harboring cyclic dinucleotides cyclases and respective mutants were transformed 691 into chemically competent E. coli TOP10 cells containing the monitoring riboswitch 692 plasmid on the chromosome. Individual colonies were grown overnight at 37 °C with 200 693 rpm shaking in LB medium supplemented with 100 µg/mL ampicillin. Cultures were diluted

694 to an OD600 of 0.1 and subsequently grown to an OD600 around 0.6. 5 µl of culture was 695 spotted onto an LB agar plate with 100 µg/mL ampicillin, 80 µg/mL X-gal (Roche), 0%- 696 0.1% (wt/vol) L-arabinose and 0.25 mM isopropyl β-D-1-thiogalactopyranoside (IPTG) 697 upon containment of the Vc1 riboswitch. The plates were incubated at 28 °C and color 698 development was monitored up to 72 h. 699 Phenotypic assays. The swimming assay was performed in 1% tryptone, 0.5% NaCl, 0.3%

700 (wt/vol) agar plates. 3 µl of an overnight culture re-suspended in water adjusted to OD600 = 5 701 was injected into the agar and the plates were incubated at 37 °C for 6 h. Afterwards pictures 702 of plates were taken with a Gel Doc XR+ system (Bio-Rad) and the diameter of the 703 swimming zone was measured. 704 The rdar biofilm morphotype was assessed on Congo Red (CR)-LB without salt agar 705 plates68. A single colony was picked from an LB agar plate incubated overnight, re-

706 suspended in water at an OD600 = 5 and 3 µl spotted onto the CR plate. After 72 h incubation 707 at 28 °C, the morphotype of the colony was observed. 708 709 MALDI Fourier Transform mass spectrometry. To prepare the cell lysate for rapid c-di- 710 GMP detection, a single colony was picked and suspended in 450 µl LB medium with 711 ampicillin (100 µg/mL). 50 µl suspension was transferred into 5 ml LB with ampicillin and 712 grown overnight in 30 °C with shaking at 200 rpm. To induce the overexpression of proteins, 713 0.01% arabinose was added to the overnight culture, which was further cultured for 4 h at 714 30 °C. 4 ml culture was harvested and re-suspended in 100 µl LC-MS CHROMASOLV®

715 water (Sigma-Aldrich) of OD600 = 3 supplemented with 0.5 mg/ml lysozyme. The re- 716 suspension was incubated at 24 °C for 1 h followed by two rounds of freeze and thaw (- 717 80 °C 1 h, room temperature 1 h). The lysate was stored at -80 °C until further used. 718 The α-cyano-4-hydroxycinnamic acid (α-cyano, CHCA) (Sigma-Aldrich) matrix for 719 MALDI-TOF mass spectrometry was prepared according to the manufacturer’s instruction. 720 2 µl lysate was mixed with 2 µl matrix, the mixture spotted on a metal plate of the 721 atmospheric pressure MALDI interface (MassTech, Columbia, MD, USA) and measured by 722 Q ExactiveTM (Thermo Scientific) Fourier Transform mass spectrometer.

27 723

724 Protein purification. The PMI3101_v was purified as soluble C-terminal His6-tag fusion 725 using nickel affinity chromatography. Briefly, 0.001% L-arabinose was added to

726 exponentially growing (OD600~0.5) heat shocked (42 °C, 30 min) E. coli TOP10 cells 727 containing pBAD28::PMI3101_v vector. After 3 h induction at 37 °C, cells were pelleted

728 and re-suspended in lysis buffer (50 mM NaH2PO4, 300 mM NaCl, 10 mM imidazole) 729 supplemented with 5 mg/ml lysozyme. Cells were disrupted by Soniprep 150 (MSE Sanyo), 730 crude cell extracts centrifuged at 20,000 rpm for 30 min at 4°C, the cleared lysate mixed 731 with Ni-NTA agarose (Qiagen) and shaken at 70 rpm for 2 h at 4°C. The mixture was then 732 loaded onto the Ni2+-column for affinity purification. The purified protein was assessed after 733 SDS-PAGE gel separation and the protein concentration measured by the Bradford assay 734 (Bio-Rad). 735 736 Estimation of c-di-GMP concentrations by LC-MS/MS. The extraction of c-di-GMP 737 from bacterial cells was performed as reported 69. Overnight cultures from agar-grown 738 colonies with protein expression induced by the 0.1% L-arabinose were suspended 500 µl 739 ice cold extraction solvent (acetonitrile/methanol/water/formic acid=2/2/1/0.02, v/v/v/v), 740 pelleted and resuspended, followed by boiling for 10 min. Three subsequent extracts were 741 combined and frozen at -20 oC overnight. The extract was centrifuged for 10 min at 20,800 g, 742 evaporated to dryness in a Speed-Vac (Savant) and analyzed by LC-MS/MS. 743

28 744 Acknowledgements

745 We express our gratitude to Prof. Ronald R. Breaker, Yale University, USA who kindly 746 provided the Vc2 riboswitch and the vector pRS414. S. gallolyticus subspecies gallolyticus 747 was kindly received from Prof. Jens Dreier, Ruhr University Bochum/Bad Oeynhausen, 748 Germany. Janja Trček appreciates the experimental input from Tomaz Accetto, University 749 of Ljubljana, Slovenia.

29 750 Table 1. Bacterial strains and plasmids used in this study Strain Description Source Escherichia coli

E. coli TOP10 F– mcrA Δ(mrr-hsdRMS-mcrBC) Laboratory Φ80lacZΔM15 ΔlacX74 recA1 collection araD139 Δ(ara leu) 7697 galU galK rpsL (StrR) endA1 nupG

TOP10 attTn7::Vc2lacZY PVc2lacZY inserted into the Tn7 This study attachment site

TOP10 attTn7::PlacVc1lacZY PlacVc1lacZY inserted into the Tn7 This study attachment site

Other bacterial strains

S. typhimurium UMR1 ATCC14028–1s, Nalr; rdar 28 °C 68

S. typhimurium MAE50 UMR1 ∆csgD101; saw 28 °C 70

S. typhimurium MAE108 UMR1 ∆fljB ∆fliC::Cmr 71

V. cholerae C6706 Wild type Inaba; Strr; source of Vc1 Laboratory riboswitch collection

S. gallolyticus subsp. gallolyticus UCN34 wild type, infective endocarditis 57

P. mirabilis UEB50 wild type isolate from the urine of a 37 catheterized patient

Plasmids

pBAD28 L-arabinose-regulated expression 72 vector; pACYC184/p15A ori, PBAD, Ampr, Cmr

pBAD28::ydeH diguanylate cyclase YdeH cloned into 41 XbaI and HindIII site

41 pBAD28::ydeHG206A G207A catalytic mutant of diguanylate cyclase YdeHG206A G207A cloned into XbaI and HindIII site

pBAD30 L-arabinose-regulated expression 72 vector with PBAD promoter; pACYC184/p15A ori, Ampr

pWJB30 diguanylate cyclase AdrA-8xHis 70 cloned into XbaI and HindIII site of

30 pBAD30

pBAD30::yhjH phosphodiesterase YhjH-8xHis 30 cloned into XbaI and SphI site

pBAD30::yhjHE136A catalytic mutant phosphodiesterase Laboratory YhjHE136A -8xHis collection

p4551 diguanylate cyclase STM4551 cloned 73 into pBAD30

73 p4551E267A diguanylate cyclase STM4551E267A cloned into pBAD30

pRS414 pBR322 ori, Ampr; promoter-, RBS- 74 and ATG-less lacZYA operon

pRS414::Vc2 Vc2 riboswitch, promoter and the 23 first 20 bp of downstream gene fused with the 9th codon of lacZ

pGRG25 pSC101 ori, Ampr; to insert genes 66 into attTn7 site in Gram-negative bacterial chromosomes

pGRG25::Vc2lacZY Vc2-lacZY cloned into PacI and NotI This study sites

pRS414::PlacVc1 Vc1 riboswitch, lac promoter and 20 This study bp of the downstream gene cloned into pRS414 and fused with the 9th codon of lacZ

pGRG25::PlacVc1lacZY Vc1-lacZY cloned between PacI and This study NotI sites

pUC19 mutated pMB1ori, Plac, Ampr; used Laboratory for the cloning of the lac promoter collection

pBAD28::STRGAL_UCN34_GGDEF encoding GGDEF domain protein This study STRGAL_UCN34_GGDEF

pBAD28::STRGAL_UCN34_GGDEFD273A encoding This study STRGAL_UCN34_GGDEF with D273A amino acid mutation

pBAD28::PMI3101_v encoding GGDEF domain protein This study PMI3101_v

pBAD28::PMI3101_vD215A encoding PMI3101_v with D215A This study amino acid mutation

751 31 752 Table 2. Primers used in this study No. Nucleotides sequence1 (5’-3’) Description 1 GCGTTAATTAAATAACGCCTAT cloning Vc2lacZY into pGRG25 from ATTTGAAAGCTTG pRS414::Vc2 2 GATCAGCGGCCGCTTAAGCGAC cloning Vc2lacZY and PlacVc1lacZY into TTCATTCACCTG pGRG25 3 CACTTATCTGGTTGGTCGACAC control forward primer in pGRG25 vector T 4 TCTTCATTTTCTGAAGTGCAAA control reverse primer in pGRG25 vector 5 CTGCAAGGCGATTAAGTTGG control primer in lacZ, 79-97 nt 6 GATGCTGGTGGCGAAGCTGT control primer flanking attTn7 site 7 GATGACGGTTTGTCACATGGA control primer flanking attTn7 site 8 CGCCCCGGGCGCAACGCAATTA cloning of lac promoter from pUC19 ATGTG 9 GACCGAGAGCAAATTTACTAAT cloning of lac promoter from pUC19 TGTTATCCGCTCACAA 10 TTGTGAGCGGATAACAATTAGT cloning Vc1 riboswitch from V. cholerae AAATTTGCTCTCGGTC 11 CGCGGATCCATTTTAGGTTGTT cloning Vc1 riboswitch from V. cholerae TTTTCATCACAG 12 GCGTTAATTAACGCAACGCAAT amplification of PlacVc1lacZY from pRS414:: TAATGTGA PlacVc1lacZY for cloning into pGRG25 13 TCCCCCCGGGAGAATGGTGAG forward primer for cloning of GCTGTAATGGTATTAGG STRGAL_UCN34_GGDEF into pBAD28 14 GCTCTAGATTAGTGATGATGAT reverse primer for cloning of GATGATGTCCTTTCTGACTTTTA STRGAL_UCN34_GGDEF with C-terminal TTTTCGTACATT 6xHis-tag into pBAD28 15 TCCCCCCGGGATAGGAGGAAA forward primer for cloning of PMI3101_V into AAATGAGTGAATTAG pBAD28 16 GCTCTAGATTAGTGATGATGAT reverse primer for cloning of PMI3101_V with GATGATGATACTCTGAATAAAA C-terminal 6xHis-tag into pBAD28 ACCGTATT 17 TAATAGCAAGGTTAGGTGGAG mutagenesis of STRGAL_UCN34_GGDEF CTGAATTTGCGGTTTTATTACC 18 GGTAATAAAACCGCAAATTCA mutagenesis of STRGAL_UCN34_GGDEF GCTCCACCTAACCTTGCTATTA 19 GGACGACTTGGTGGCGCTGAAT mutagenesis of PMI3101_v TTGCGGTACTT 20 AAGTACCGCAAATTCAGCGCCA mutagenesis of PMI3101_v CCAAGTCGTCC 21 CCATAGCATTTTTATCCATAAG binds to pBAD28 vector 22 CTGTTTTATCAGACCGCTTCTG binds to pBAD28 vector C 753 1The enzyme restriction site is underlined 754 755

32 756 References 757 1. Ross, P.; Weinhouse, H.; Aloni, Y.; Michaeli, D.; Weinberger-Ohana, P.; Mayer, R.; 758 Braun, S.; de Vroom, E.; van der Marel, G. A.; van Boom, J. H.; Benziman, M., Regulation of 759 cellulose synthesis in Acetobacter xylinum by cyclic diguanylic acid. Nature 1987, 325 760 (6101), 279-81. 761 2. Römling, U.; Galperin, M. Y.; Gomelsky, M., Cyclic di-GMP: the first 25 years of a 762 universal bacterial second messenger. Microbiology and molecular biology reviews : MMBR 763 2013, 77 (1), 1-52. 764 3. Schirmer, T.; Jenal, U., Structural and mechanistic determinants of c-di-GMP 765 signalling. Nature reviews. Microbiology 2009, 7 (10), 724-35. 766 4. Ryan, R. P.; Fouhy, Y.; Lucey, J. F.; Crossman, L. C.; Spiro, S.; He, Y. W.; Zhang, L. H.; 767 Heeb, S.; Camara, M.; Williams, P.; Dow, J. M., Cell-cell signaling in Xanthomonas 768 campestris involves an HD-GYP domain protein that functions in cyclic di-GMP turnover. 769 Proceedings of the National Academy of Sciences of the United States of America 2006, 103 770 (17), 6712-7. 771 5. Schmidt, A. J.; Ryjenkov, D. A.; Gomelsky, M., The ubiquitous protein domain EAL is 772 a cyclic diguanylate-specific phosphodiesterase: enzymatically active and inactive EAL 773 domains. Journal of bacteriology 2005, 187 (14), 4774-81. 774 6. Krasteva, P. V.; Sondermann, H., Versatile modes of cellular regulation via cyclic 775 dinucleotides. Nat Chem Biol 2017, 13 (4), 350-359. 776 7. Holland, L. M.; O'Donnell, S. T.; Ryjenkov, D. A.; Gomelsky, L.; Slater, S. R.; Fey, P. D.; 777 Gomelsky, M.; O'Gara, J. P., A staphylococcal GGDEF domain protein regulates biofilm 778 formation independently of cyclic dimeric GMP. Journal of bacteriology 2008, 190 (15), 779 5178-89. 780 8. Shang, F.; Xue, T.; Sun, H.; Xing, L.; Zhang, S.; Yang, Z.; Zhang, L.; Sun, B., The 781 Staphylococcus aureus GGDEF domain-containing protein, GdpS, influences protein A gene 782 expression in a cyclic diguanylic acid-independent manner. Infection and immunity 2009, 783 77 (7), 2849-56. 784 9. Nelson, J. W.; Sudarsan, N.; Phillips, G. E.; Stav, S.; Lunse, C. E.; McCown, P. J.; 785 Breaker, R. R., Control of bacterial exoelectrogenesis by c-AMP-GMP. Proceedings of the 786 National Academy of Sciences of the United States of America 2015, 112 (17), 5389-94. 787 10. Hallberg, Z. F.; Wang, X. C.; Wright, T. A.; Nan, B.; Ad, O.; Yeo, J.; Hammond, M. C., 788 Hybrid promiscuous (Hypr) GGDEF enzymes produce cyclic AMP-GMP (3', 3'-cGAMP). 789 Proceedings of the National Academy of Sciences of the United States of America 2016, 113 790 (7), 1790-5. 791 11. Hunter, J. L.; Severin, G. B.; Koestler, B. J.; Waters, C. M., The Vibrio cholerae 792 diguanylate cyclase VCA0965 has an AGDEF active site and synthesizes cyclic di-GMP. BMC 793 Microbiol 2014, 14, 22. 794 12. Oliveira, M. C.; Teixeira, R. D.; Andrade, M. O.; Pinheiro, G. M.; Ramos, C. H.; Farah, 795 C. S., Cooperative substrate binding by a diguanylate cyclase. J Mol Biol 2015, 427 (2), 415- 796 32. 797 13. Barends, T. R.; Hartmann, E.; Griese, J. J.; Beitlich, T.; Kirienko, N. V.; Ryjenkov, D. A.; 798 Reinstein, J.; Shoeman, R. L.; Gomelsky, M.; Schlichting, I., Structure and mechanism of a 799 bacterial light-regulated cyclic nucleotide phosphodiesterase. Nature 2009, 459 (7249), 800 1015-8. 801 14. Delgado-Nixon, V. M.; Gonzalez, G.; Gilles-Gonzalez, M. A., Dos, a heme-binding PAS 802 protein from Escherichia coli, is a direct oxygen sensor. Biochemistry 2000, 39 (10), 2685-91.

33 803 15. Cadby, I. T.; Basford, S. M.; Nottingham, R.; Meek, R.; Lowry, R.; Lambert, C.; 804 Tridgett, M.; Till, R.; Ahmad, R.; Fung, R.; Hobley, L.; Hughes, W. S.; Moynihan, P. J.; Sockett, 805 R. E.; Lovering, A. L., Nucleotide signaling pathway convergence in a cAMP-sensing bacterial 806 c-di-GMP phosphodiesterase. EMBO J 2019, 38 (17), e100772. 807 16. Enomoto, G.; Kamiya, A.; Okuda, Y.; Narikawa, R.; Ikeuchi, M., Tlr0485 is a cAMP- 808 activated c-di-GMP phosphodiesterase in a cyanobacterium Thermosynechococcus. J Gen 809 Appl Microbiol 2020. 810 17. Römling, U.; Gomelsky, M.; Galperin, M. Y., C-di-GMP: the dawning of a novel 811 bacterial signalling system. Molecular microbiology 2005, 57 (3), 629-39. 812 18. Taylor, B. L.; Zhulin, I. B., PAS domains: internal sensors of oxygen, redox potential, 813 and light. Microbiology and molecular biology reviews : MMBR 1999, 63 (2), 479-506. 814 19. Zhulin, I. B.; Taylor, B. L.; Dixon, R., PAS domain S-boxes in Archaea, Bacteria and 815 sensors for oxygen and redox. Trends Biochem Sci 1997, 22 (9), 331-3. 816 20. Galperin, M. Y.; Nikolskaya, A. N.; Koonin, E. V., Novel domains of the prokaryotic 817 two-component signal transduction systems. FEMS Microbiol Lett 2001, 203 (1), 11-21. 818 21. Dayton H, S. M., Forouhar F, Harrison JJ, Price-Whelan A, Dietrich LEP, Sensory 819 domains that control cyclic di-GMP-modulating proteins: A critical frontier in bacterial 820 signal transduction. In Microbial cyclic di-nucleotide signaling, Chou SH, G. N., Lee VT, 821 Römling U, Ed. Springer Nature Switzerland AG: Cham, 2020; pp 137-158. 822 22. Chou SH, G. N., Lee VT, Römling U, Microbial Cyclic Di-Nucleotide Signaling. Springer 823 Nature: Cham, 2020. 824 23. Sudarsan, N.; Lee, E. R.; Weinberg, Z.; Moy, R. H.; Kim, J. N.; Link, K. H.; Breaker, R. 825 R., Riboswitches in eubacteria sense the second messenger cyclic di-GMP. Science 2008, 826 321 (5887), 411-3. 827 24. Lee, E. R.; Baker, J. L.; Weinberg, Z.; Sudarsan, N.; Breaker, R. R., An allosteric self- 828 splicing ribozyme triggered by a bacterial second messenger. Science 2010, 329 (5993), 829 845-848. 830 25. Wang, Y. C.; Chin, K. H.; Tu, Z. L.; He, J.; Jones, C. J.; Sanchez, D. Z.; Yildiz, F. H.; 831 Galperin, M. Y.; Chou, S. H., Nucleotide binding by the widespread high-affinity cyclic di- 832 GMP receptor MshEN domain. Nature communications 2016, 7, 12481. 833 26. Ryjenkov, D. A.; Simm, R.; Römling, U.; Gomelsky, M., The PilZ domain is a receptor 834 for the second messenger c-di-GMP: the PilZ domain protein YcgR controls motility in 835 enterobacteria. The Journal of biological chemistry 2006, 281 (41), 30310-4. 836 27. Amikam, D.; Galperin, M. Y., PilZ domain is part of the bacterial c-di-GMP binding 837 protein. Bioinformatics 2006, 22 (1), 3-6. 838 28. Morgan, J. L.; McNamara, J. T.; Zimmer, J., Mechanism of activation of bacterial 839 cellulose synthase by cyclic di-GMP. Nat Struct Mol Biol 2014, 21 (5), 489-96. 840 29. Perez-Mendoza, D.; Sanjuan, J., Exploiting the commons: cyclic diguanylate 841 regulation of bacterial exopolysaccharide production. Curr Opin Microbiol 2016, 30, 36-43. 842 30. Simm, R.; Morr, M.; Kader, A.; Nimtz, M.; Römling, U., GGDEF and EAL domains 843 inversely regulate cyclic di-GMP levels and transition from sessility to motility. Molecular 844 microbiology 2004, 53 (4), 1123-1134. 845 31. Johnson, M. R.; Montero, C. I.; Conners, S. B.; Shockley, K. R.; Bridger, S. L.; Kelly, R. 846 M., Population density-dependent regulation of exopolysaccharide formation in the 847 hyperthermophilic bacterium Thermotoga maritima. Molecular microbiology 2005, 55 (3), 848 664-74.

34 849 32. Arato, V.; Gasperini, G.; Giusti, F.; Ferlenghi, I.; Scarselli, M.; Leuzzi, R., Dual role of 850 the colonization factor CD2831 in Clostridium difficile pathogenesis. Sci Rep 2019, 9 (1), 851 5554. 852 33. Bobrov, A. G.; Kirillina, O.; Ryjenkov, D. A.; Waters, C. M.; Price, P. A.; Fetherston, J. 853 D.; Mack, D.; Goldman, W. E.; Gomelsky, M.; Perry, R. D., Systematic analysis of cyclic di- 854 GMP signalling enzymes and their role in biofilm formation and virulence in Yersinia pestis. 855 Molecular microbiology 2011, 79 (2), 533-51. 856 34. Madsen, J. S.; Hylling, O.; Jacquiod, S.; Pecastaings, S.; Hansen, L. H.; Riber, L.; 857 Vestergaard, G.; Sorensen, S. J., An intriguing relationship between the cyclic diguanylate 858 signaling system and horizontal gene transfer. ISME J 2018, 12 (9), 2330-2334. 859 35. Galperin, M. Y.; Higdon, R.; Kolker, E., Interplay of heritage and habitat in the 860 distribution of bacterial signal transduction systems. Mol Biosyst 2010, 6 (4), 721-8. 861 36. Lai, T. H.; Kumagai, Y.; Hyodo, M.; Hayakawa, Y.; Rikihisa, Y., The Anaplasma 862 phagocytophilum PleC histidine kinase and PleD diguanylate cyclase two-component 863 system and role of cyclic Di-GMP in host cell infection. Journal of bacteriology 2009, 191 (3), 864 693-700. 865 37. Wang, X.; Lünsdorf, H.; Ehren, I.; Brauner, A.; Römling, U., Characteristics of biofilms 866 from urinary tract catheters and presence of biofilm-related components in Escherichia coli. 867 Curr Microbiol 2010, 60 (6), 446-53. 868 38. Gao, X. Y.; Zhi, X. Y.; Li, H. W.; Klenk, H. P.; Li, W. J., Comparative genomics of the 869 bacterial genus Streptococcus illuminates evolutionary implications of species groups. PloS 870 one 2014, 9 (6), e101229. 871 39. Römling, U.; Liang, Z. X.; Dow, J. M., Progress in Understanding the Molecular Basis 872 Underlying Functional Diversification of Cyclic Dinucleotide Turnover Proteins. Journal of 873 bacteriology 2017, 199 (5). 874 40. Liu, C.; Liew, C. W.; Wong, Y. H.; Tan, S. T.; Poh, W. H.; Manimekalai, M. S. S.; Rajan, 875 S.; Xin, L.; Liang, Z. X.; Gruber, G.; Rice, S. A.; Lescar, J., Insights into Biofilm Dispersal 876 Regulation from the Crystal Structure of the PAS-GGDEF-EAL Region of RbdA from 877 Pseudomonas aeruginosa. Journal of bacteriology 2018, 200 (3). 878 41. Liu, Y., Kim, H., Römling, U., In vivo analysis of cyclic di-GMP cyclase and 879 phosphodiesterase activity in Escherichia coli using a Vc2 riboswitch-based assay. Bio- 880 protocols 2018, 8 (5), DOI: 10.21769/BioProtoc.2753. 881 42. Pontes, M. H.; Lee, E. J.; Choi, J.; Groisman, E. A., Salmonella promotes virulence by 882 repressing cellulose production. Proceedings of the National Academy of Sciences of the 883 United States of America 2015, 112 (16), 5183-8. 884 43. Christen, B.; Christen, M.; Paul, R.; Schmid, F.; Folcher, M.; Jenoe, P.; Meuwly, M.; 885 Jenal, U., Allosteric control of cyclic di-GMP signaling. The Journal of biological chemistry 886 2006, 281 (42), 32015-24. 887 44. Simm, R.; Morr, M.; Remminghorst, U.; Andersson, M.; Romling, U., Quantitative 888 determination of cyclic diguanosine monophosphate concentrations in nucleotide extracts 889 of bacteria by matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry. 890 Anal Biochem 2009, 386 (1), 53-8. 891 45. Kader, A.; Simm, R.; Gerstel, U.; Morr, M.; Römling, U., Hierarchical involvement of 892 various GGDEF domain proteins in rdar morphotype development of Salmonella enterica 893 serovar Typhimurium. Molecular microbiology 2006, 60 (3), 602-16. 894 46. Zogaj, X.; Nimtz, M.; Rohde, M.; Bokranz, W.; Römling, U., The multicellular 895 morphotypes of Salmonella typhimurium and Escherichia coli produce cellulose as the

35 896 second component of the extracellular matrix. Molecular microbiology 2001, 39 (6), 1452- 897 63. 898 47. Römling, U.; Galperin, M. Y., Bacterial cellulose biosynthesis: diversity of operons, 899 subunits, products, and functions. Trends in microbiology 2015. 900 48. Harshey, R. M.; Partridge, J. D., Shelter in a Swarm. J Mol Biol 2015, 427 (23), 3683- 901 94. 902 49. Jacobsen, S. M.; Shirtliff, M. E., Proteus mirabilis biofilms and catheter-associated 903 urinary tract infections. Virulence 2011, 2 (5), 460-5. 904 50. Romling, U., Cyclic Di-GMP (c-Di-GMP) goes into host cells--c-Di-GMP signaling in 905 the obligate intracellular pathogen Anaplasma phagocytophilum. Journal of bacteriology 906 2009, 191 (3), 683-6. 907 51. Shanan, S.; Gumaa, S. A.; Sandstrom, G.; Abd, H., Significant Association of 908 Streptococcus bovis with Malignant Gastrointestinal Diseases. Int J Microbiol 2011, 2011, 909 792019. 910 52. Isenring, J.; Kohler, J.; Nakata, M.; Frank, M.; Jans, C.; Renault, P.; Danne, C.; Dramsi, 911 S.; Kreikemeyer, B.; Oehmcke-Hecht, S., Streptococcus gallolyticus subsp. gallolyticus 912 endocarditis isolate interferes with coagulation and activates the contact system. Virulence 913 2018, 9 (1), 248-261. 914 53. Grimm, I.; Vollmer, T., Streptococcus gallolyticus subsp. gallolyticus pathogenesis: 915 current state of play. Future microbiology 2018, 13, 731-735. 916 54. Ahmad, I.; Rouf, S. F.; Sun, L.; Cimdins, A.; Shafeeq, S.; Le Guyon, S.; Schottkowski, 917 M.; Rhen, M.; Römling, U., BcsZ inhibits biofilm phenotypes and promotes virulence by 918 blocking cellulose production in Salmonella enterica serovar Typhimurium. Microb Cell Fact 919 2016, 15 (1), 177. 920 55. Raterman, E. L.; Shapiro, D. D.; Stevens, D. J.; Schwartz, K. J.; Welch, R. A., Genetic 921 analysis of the role of yfiR in the ability of Escherichia coli CFT073 to control cellular cyclic 922 dimeric GMP levels and to persist in the urinary tract. Infection and immunity 2013, 81 (9), 923 3089-98. 924 56. Chou, S. H.; Galperin, M. Y., Diversity of Cyclic Di-GMP-Binding Proteins and 925 Mechanisms. Journal of bacteriology 2016, 198 (1), 32-46. 926 57. Grimm, I.; Dumke, J.; Dreier, J.; Knabbe, C.; Vollmer, T., Biofilm formation and 927 transcriptome analysis of Streptococcus gallolyticus subsp. gallolyticus in response to 928 lysozyme. PloS one 2018, 13 (1), e0191705. 929 58. Coordinators, N. R., Database resources of the National Center for Biotechnology 930 Information. Nucleic Acids Res 2018, 46 (D1), D8-D13. 931 59. Kelley, L. A.; Mezulis, S.; Yates, C. M.; Wass, M. N.; Sternberg, M. J., The Phyre2 web 932 portal for protein modeling, prediction and analysis. Nature protocols 2015, 10 (6), 845-58. 933 60. Zheng, W.; Zhang, C.; Bell, E. W.; Zhang, Y., I-TASSER gateway: A protein structure 934 and function prediction server powered by XSEDE. Future Gener Comput Syst 2019, 99, 73- 935 85. 936 61. Chenna, R.; Sugawara, H.; Koike, T.; Lopez, R.; Gibson, T. J.; Higgins, D. G.; 937 Thompson, J. D., Multiple sequence alignment with the Clustal series of programs. Nucleic 938 Acids Res 2003, 31 (13), 3497-500. 939 62. Robert, X.; Gouet, P., Deciphering key features in protein structures with the new 940 ENDscript server. Nucleic Acids Res 2014, 42 (Web Server issue), W320-4. 941 63. Seemann, T., Prokka: rapid prokaryotic genome annotation. Bioinformatics 2014, 30 942 (14), 2068-9.

36 943 64. Page, A. J.; Cummins, C. A.; Hunt, M.; Wong, V. K.; Reuter, S.; Holden, M. T.; Fookes, 944 M.; Falush, D.; Keane, J. A.; Parkhill, J., Roary: rapid large-scale prokaryote pan genome 945 analysis. Bioinformatics 2015, 31 (22), 3691-3. 946 65. Gouy, M.; Guindon, S.; Gascuel, O., SeaView version 4: A multiplatform graphical 947 user interface for sequence alignment and phylogenetic tree building. Mol Biol Evol 2010, 948 27 (2), 221-4. 949 66. McKenzie, G. J.; Craig, N. L., Fast, easy and efficient: site-specific insertion of 950 transgenes into enterobacterial chromosomes using Tn7 without need for selection of the 951 insertion event. BMC Microbiol 2006, 6, 39. 952 67. Grimm, I.; Weinstock, M.; Birschmann, I.; Dreier, J.; Knabbe, C.; Vollmer, T., Strain- 953 dependent interactions of Streptococcus gallolyticus subsp. gallolyticus with human blood 954 cells. BMC Microbiol 2017, 17 (1), 210. 955 68. Römling, U.; Sierralta, W. D.; Eriksson, K.; Normark, S., Multicellular and aggregative 956 behaviour of Salmonella typhimurium strains is controlled by mutations in the agfD 957 promoter. Molecular microbiology 1998, 28 (2), 249-64. 958 69. Burhenne, H.; Kaever, V., Quantification of cyclic dinucleotides by reversed-phase 959 LC-MS/MS. Methods Mol Biol 2013, 1016, 27-37. 960 70. Römling, U.; Rohde, M.; Olsen, A.; Normark, S.; Reinköster, J., AgfD, the checkpoint 961 of multicellular and aggregative behaviour in Salmonella typhimurium regulates at least 962 two independent pathways. Molecular microbiology 2000, 36 (1), 10-23. 963 71. Rochon, M.; Römling, U., Flagellin in combination with curli fimbriae elicits an 964 immune response in the gastrointestinal epithelial cell line HT-29. Microbes and infection / 965 Institut Pasteur 2006, 8 (8), 2027-33. 966 72. Guzman, L. M.; Belin, D.; Carson, M. J.; Beckwith, J., Tight regulation, modulation, 967 and high-level expression by vectors containing the arabinose PBAD promoter. Journal of 968 bacteriology 1995, 177 (14), 4121-30. 969 73. Ahmad, I.; Cimdins, A.; Beske, T.; Römling, U., Detailed analysis of c-di-GMP 970 mediated regulation of csgD expression in Salmonella typhimurium. BMC Microbiol 2017, 971 17 (1), 27. 972 74. Simons, R. W.; Houman, F.; Kleckner, N., Improved single and multicopy lac-based 973 cloning vectors for protein and operon fusions. Gene 1987, 53 (1), 85-96. 974 975

37 976 Supporting Information 977 978 A cyclic di-GMP network is present in Gram-positive Streptococcus and Gram- 979 negative Proteus species 980 981 982 Ying Liu1,#, Changhan Lee1,##, Fengyang Li1, Janja Trček2, Heike Bähre3, Rey-Ting Guo4, 983 Chun-Chi Chen4, Alexey Chernobrovkin5,###, Roman Zubarev5,7 and Ute Römling1,* 984 985 986 1Department of Microbiology, Tumor and Cell Biology, and 5Department of Medical 987 Biochemistry and Biophysics, Biomedicum, Karolinska Institutet, SE-171 65 Stockholm, 988 Sweden 2 989 Faculty of Natural Sciences and Mathematics, Department of Biology, University of 990 Maribor, 2000 Maribor, Slovenia 991 3 Research Core Unit Metabolomics, Hannover Medical School, Hannover, Germany 992 4State Key Laboratory of Biocatalysis and Enzyme Engineering, Hubei Collaborative 993 Innovation Center for Green Transformation of Bio-Resources, Hubei Key Laboratory of 994 Industrial Biotechnology, School of Life Sciences, Hubei University, Wuhan, 430062, P.R. 995 China 996 5Department of Pharmacological & Technological Chemistry, I.M. Sechenov First Moscow 997 State Medical University, Moscow, 119146, Russia 998 999 #Current address: Department of Microbiology, Philips University Marburg, Marburg, 1000 Germany 1001 1002 ##Current address: Department of Molecular, Cellular and Developmental Biology, 1003 University of Michigan, Ann Arbor, MI 48109, USA 1004 1005 *Corresponding author: Department of Microbiology, Tumor and Cell Biology, 1006 Biomedicum C8, Karolinska Institutet, Solnavägen 9, SE-171 65 Stockholm, Sweden. Ute 1007 Römling: ID: 0000-0003-3812-6621. Phone: +46-8-524 87319. E-mail: [email protected]

38 1008 Supporting Information 1009

1010 1011 1012 Supporting Figure S1. The protein models of PMI3101 and UCN34 are built using PA0861 1013 from Pseudomonas aeruginosa ((PDB ID, 5XGB);40) as a template by Phyre2 algorithm59. 1014 The models are displayed in cartoon models. The color schemes for PAS-like domain, S- 1015 helix linker, and GGDEF domain are red, gray, and green, respectively. 1016 1017

39 1018

1019 Supporting Figure S2. Detection of alterations in cyclic di-GMP levels by Vc1 and Vc2 1020 riboswitches using blue/white selection through downstream beta-galactosidase activity in E. 1021 coli TOP10. The upper row shows detection of cyclic di-GMP concentration by the Vc1 1022 riboswitch, while the lower row shows the results for the Vc2 riboswitch. Previously 1023 identified cyclic di-GMP turnover proteins, such as the di-guanylate cyclases AdrA, 1024 STM4551 and YdeH and the c-di-GMP phosphodisterase YhjH were expressed. All plates 1025 were supplemented with 0.1% arabinose to induce protein overexpression and incubated at 1026 28 °C. 0.25 mM IPTG was added into plates monitoring activity of the Vc1 riboswitch.

1027

40 1028

1029 1030 1031 Supporting Figure S3. Expression of STRGAL_UEB50_GGDEF and 1032 STRGAL_UEB50_GGDEFD273A in riboswitch-bearing E. coli TOP10. Recombinant

1033 TOP10 containing STRGAL_UEB50_GGDEF, STRGAL_UEB50_GGDEFD273A and the 1034 vector control were grown in the presence of 0.001% L-arabinose. The whole cell lysates 1035 were analyzed by SDS-PAGE with Western blot. All lanes were loaded with same amount 1036 of samples. 1037 1038

41 1039 Supporting Table S1. GGDEF domain proteins in selected species of the family 1040 Morganellaceae

Genus Species Strain Cyclic di-GMP turnover protein(s) GGDEF/EAL/- GGDEF+EAL/HD-GYP Proteus Proteus alimentorum 08MAS0041 1/-/-/- Proteus cibarius ZF2 1/-/-/- Proteus columbae T60 1/-/-/- Proteus hauseri 15H5D-4a 1/-/-/- ATCC700826 Proteus mirabilis HI4320 1/-/-/- Proteus vulgaris NCTC_3000 1/-/-/- ATCC49132 Proteus terrae 1/-/-/-

Proteus ATCC 51471 1/-/-/- genomospecies 6 Cosenzaea Cosenzaea ATCC19692 1/-/-/- myxofaciens Moellerella Moellerella ATCC35017 -/1(partial)/-/- wisconsensis Morganella Morganella KT -/-/-/- morganii subsp morganii Morganella FDA-CFSAN -/-/-/- psychrotolerans Providencia Providencia alcalifaci DSM30120 -/-/-/- ens Providencia P12672 -/-/-/1(?) heimbachae Providencia NCTC6933 -/-/-/- rustigianii DSM19967 -/-/-/- Providencia stuartii MRSN2154 1 (partial)/-/-/- Providencia DSM19967 burkhodogranariea Photorhabdus Photorhabdus DSM3368 -/-/-/- luminescens subsp luminescens Xenorhabdus Xenorhabdus AN6/1 1/3/-/- nematophila Arsenophonus Arsenophonus FIN -/-/-/- nasoniae 1041

42 CdiGMPManuscript_PS.pdf (17.47 MiB) view on ChemRxiv download file