RÉGULATION DU C-DI-GMP ET RÔLE DE CE MESSAGER SECONDAIRE DANS LA FORMATION DE PILI DE TYPE IV CHEZ CLOSTRIDIUM DIFFICILE

par

Eric Bordeleau

thèse présentée au Département de biologie en vue de l’obtention du grade de docteur ès science (Ph.D.)

FACULTÉ DES SCIENCES UNIVERSITÉ DE SHERBROOKE

Sherbrooke, Québec, Canada, juillet 2014 Le 15 juillet 2014

le jury a accepté la thèse de Monsieur Eric Bordeleau dans sa version finale.

Membres du jury

Professeur Vincent Burrus Directeur de recherche Département de biologie, Université de Sherbrooke

Professeur Josée Harel Évaluatrice externe Département de biomédecine vétérinaire, Université de Montréal

Professeur Daniel Lafontaine Évaluateur interne Département de biologie, Université de Sherbrooke

Professeur Louis-Charles Fortier Évaluateur interne Département de microbiologie, Université de Sherbrooke

Professeur François Malouin Président-rapporteur Département de biologie, Université de Sherbrooke SOMMAIRE

Malgré la découverte du c-di-GMP en 1987, ce n’est que durant la dernière décennie que l’importance de ce messager secondaire dans la régulation des phénotypes bactériens a été exposée. Synthétisé par des diguanylate cyclases (DGC) et dégradé par des phosphodiestérases spécifiques (PDE), le c-di-GMP est prédit pour être un messager secondaire très répandu chez les bactéries et pratiquement exclusif à celles-ci. Le c-di-GMP est particulièrement reconnu pour son rôle dans la transition des bactéries motiles et planctoniques vers la formation de biofilm chez les bactéries à Gram négatif telles qu’Escherichia coli, Pseudomonas aeruginosa et Vibrio cholerae. De plus, le c-di-GMP est impliqué dans la régulation de l’expression de certains facteurs de virulence chez certaines bactéries. Ainsi, il est possible de révéler les mécanismes de régulation de certains phénotypes importants par l’étude de la signalisation à c-di-GMP dans une bactérie donnée. Clostridium difficile est une bactérie pathogène causant des diarrhées nosocomiales, des colites et pouvant causer des décès chez l’Homme. Les phénotypes impliqués dans la pathogenèse de C. difficile et leur régulation demeurent en grande partie méconnus. Le génome de C. difficile 630 était prédit être capable de coder pour 37 DGC et PDE putatives, un nombre en apparence élevé pour une bactérie à Gram positif.

L’objectif global de mon doctorat était de déterminer si la signalisation à c-di-GMP était fonctionnelle chez C. difficile puis de déterminer le rôle du c-di-GMP chez cette bactérie.

Dans un premier projet, mes travaux de doctorat ont permis de démontrer que la majorité des 37 DCG et PDE putatives chez C. difficile 630 sont fonctionnelles. Les 31 DCG et PDE les plus conservées dans les différentes souches de C. difficile ont été exprimées dans V. cholerae afin d’évaluer indirectement leur capacité de synthèse et de dégradation du c-di-GMP en

i mesurant leur impact sur motilité et la formation de biofilm de V. cholerae. La surexpression d’une DGC chez V. cholerae réduit la motilité par flagelle et augmente la formation de biofilm, alors que l’inverse est observé lors de la surexpression d’une PDE. De plus, l’activité d’une DCG, CD1420 (renommée DccA, CD630_14200), et une PDE, CD0757 (renommée CD630_07570) a été démontrée plus explicitement par des essais enzymatiques in vitro. Ainsi, ce projet a exposé l’important potentiel de la signalisation à c-di-GMP chez C. difficile, jusqu’alors étudiée presque exclusivement chez les bactéries à Gram négatif.

Dans un deuxième projet, mes travaux de doctorat ont permis de démontrer le rôle des pili de type IV (T4P) dans l’agrégation de C. difficile et la régulation de leur expression par un riborégulateur à c-di-GMP. Les riborégulateurs sont des structures ARN situées dans la région 5’UTR des gènes capables de réguler l’expression des gènes en aval en fonction de la liaison d’un métabolite spécifique. Parmi les 16 riborégulateurs à c-di-GMP prédits dans le génome de C. difficile 630, le riborégulateur c-di-GMP-II Cdi2_4 est situé en amont du locus principal de synthèse de T4P. Mes travaux ont permis de montrer que l’augmentation de la concentration de c-di-GMP intracellulaire se traduit par une augmentation de l’expression des gènes de T4P, la formation de T4P à la surface des cellules et l’agrégation dépendante des T4P. De plus, le mécanisme de régulation du riborégulateur Cdi2_4 a été démontré in vitro. La liaison du c-di-GMP au riborégulateur Cdi2_4 prévient la formation d’un terminateur transcriptionnel et favorise ainsi la transcription des gènes de T4P en aval.

Depuis la mise en évidence de la signalisation à c-di-GMP chez C. difficile dans la première partie de mon doctorat, un certain nombre de phénotypes régulés par c-di-GMP chez cette bactérie ont pu être déterminés ou prédits. Notamment, le c-di-GMP inhibe la transcription des gènes des flagelles en se liant au riborégulateur c-di-GMP-I Cd1 en amont et inhibe indirectement la production des toxines TcdA et TcdB. La démonstration de l’effet positif du c-di-GMP sur l’agrégation des cellules via les T4P, dans la deuxième partie de mon doctorat,

ii contribue à notre compréhension de la signalisation à c-di-GMP chez C. difficile. Il apparait que le c-di-GMP inhibe la motilité et favorise la formation de structures pluricellulaires chez C. difficile à l’instar de plusieurs bactéries, néanmoins par des mécanismes de régulation distincts.

Mots-clés : régulation, messager secondaire, c-di-GMP, Clostridium difficile, pili de type IV, agrégation, riborégulateur.

iii REMERCIEMENTS

Je tiens à remercier tout d’abord mon directeur de thèse, Vincent Burrus, pour m’avoir donné l’opportunité de travailler sur des projets qui m’ont passionné durant mon doctorat. Sa disponibilité remarquable et ses conseils m’ont permis de mener à terme cette thèse et de quitter avec un bagage scientifique inestimable. Je tiens aussi à remercier les Prs Louis- Charles Fortier, Daniel Lafontaine et François Malouin, qui ont contribué au bon déroulement de mes projets de doctorat par leurs suggestions éclairées à titre de membres de mon comité de conseillers et de collaborateurs sur divers projets. Je remercie également, Josée Harel pour le temps consacré à la lecture de cette thèse et d’avoir accepter de participer à l’évaluation de celle-ci à titre membre externe du jury.

Je remercie sincèrement tous les membres du laboratoire qui ont participé de près ou de loin aux projets, leur présence fut un atout essentiel. Je remercie aussi les membres des laboratoires de Louis-Charles Fortier et Daniel Lafontaine pour leur accueil et pour m’avoir transmis leur expertise lors de mon passage dans ces laboratoires.

Pour leur soutien financier, je remercie le Fonds de recherche du Québec – Nature et technologies et le Conseil de recherches en sciences naturelles et en génie du Canada.

Je tiens également à remercier ma famille pour leur soutien durant ma thèse, mais également toutes mes études. Enfin, je partage l'accomplissement de cette thèse avec ma conjointe, Mireille, pour son support inconditionnel, sa compréhension et son amour.

iv TABLE DES MATIÈRES

SOMMAIRE ...... i REMERCIEMENTS ...... iv LISTE DES ABRÉVIATIONS ...... viii LISTE DES TABLEAUX ...... x LISTE DES FIGURES ...... xi PRÉAMBULE ...... xiii CHAPITRE 1 ...... 1 INTRODUCTION ...... 1 1.1 — Clostridium difficile ...... 1 1.1.2 — Cycle infectieux ...... 2 1.1.3 — Épidémiologie ...... 4 1.1.4 — Résistance aux antibiotiques ...... 5 1.1.5 — Pathogenèse ...... 6 1.1.5.1 — Toxines ...... 6 1.1.5.2 — Autres facteurs de virulence ...... 7 1.2 — Le c-di-GMP, messager secondaire bactérien ...... 9 1.2.1 — Synthèse et dégradation du c-di-GMP ...... 10 1.2.1.1 — Distribution des DGC et PDE ...... 12 1.2.2 — Phénotypes bactériens régulés par le c-di-GMP ...... 13 1.2.3 — Effecteurs du c-di-GMP ...... 14 1.2.3.1 — Protéines avec domaines PilZ ...... 15 1.2.3.2 — Protéines avec domaines GGDEF ou EAL non catalytiques ...... 16 1.2.3.4 — Autres effecteurs protéiques du c-di-GMP ...... 17 1.2.3.5 — Effecteurs protéiques eucaryotes ...... 18 1.2.3.6 — Riborégulateurs c-di-GMP ...... 18 1.3 — Pili de type IV (T4P) ...... 25

v 1.3.1 — Distinction des T4P parmi les différents types de pili ...... 25 1.3.2 — Rôles des T4P ...... 27 1.3.3 — Pilines des T4P ...... 28 1.3.4 — Autres composantes des T4P et homologie avec d’autres systèmes ...... 30 1.4 Projets de doctorat ...... 33 1.4.1 Objectifs généraux du doctorat: ...... 33 CHAPITRE 2 ...... 34 SYNTHÈSE ET DÉGRADATION DU C-DI-GMP CHEZ C. DIFFICILE ...... 34 Résumé de l’article ...... 34 Contribution à l’article ...... 35 CHAPITRE 3 ...... 90 RÔLE DES PILI DE TYPE IV DANS l’AGRÉGATION DE ...... 90 C. DIFFICILE INDUITE PAR LE C-DI-GMP ...... 90 Résumé de l’article ...... 91 Contribution à l’article ...... 92 CHAPITRE 4 ...... 147 DISCUSSION ...... 147 4.1 — Rôles et régulation des T4P par le c-di-GMP chez C. difficile ...... 147 4.2 — Les multiples rôles du c-di-GMP chez C. difficile ...... 151 4.2.1 — Régulation de la motilité et des gènes des flagelles ...... 151 4.2.2 — Régulation des toxines TcdA et TcdB ...... 153 4.2.3 — Régulation de la métalloprotéase Zmp1 ...... 155 4.2.4 — Régulation de la protéine de surface putative CD630_32460 ...... 157 4.2.5 — Autres cibles du c-di-GMP ...... 157 4.3 — Multiplicité des DGC et PDE ...... 164 CONCLUSION ...... 169 ANNEXE 1 ...... 171 ANNEXE 2 ...... 186 ANNEXE 3 ...... 203

vi BIBLIOGRAPHIE ...... 210

vii LISTE DES ABRÉVIATIONS

5’UTR : 5’untranslated region ; région 5’ non traduite des ARNm ADN : Acide désoxyribonucléique ADNc : ADN complémentaire ADP : Adénosine diphosphate AMPc : Adénosine monophosphate cyclique ARN : Acide ribonucléique ARNm : ARN messager ATP : Adénosine triphosphate ATPase : Hydrolase de l’ATP CDI : Clostridium difficile Infection ; Infection à Clostridium difficile c-di-GMP : Diguanosine monophosphate cyclique c-di-GMP-I : c-di-GMP de classe I, en référence aux riborégulateurs c-di-GMP-II : c-di-GMP de classe II, en référence aux riborégulateurs CDT : C. difficile toxin ou C. difficile transferase, toxine binaire de C. difficile DACD : Diarrhées associées à C. difficile DGC : Diguanylate cyclase GEMM : Genes for the Environment, Membranes and Motility; nom precedent des riborégulateurs c-di-GMP-I GMP : Guanosine monophosphate GTP : Guanosine triphosphate GTPase : Hydrolase du GTP HMW SLP : High Molecular weight Surface Layer Protein ICE : Integrating conjugative element ; Élément intégratif et conjugatif

Kd : Constante de dissociation

Ki : Constante d’inhibition

Km : Constante de réaction

viii MSCRAMM : Microbial surface components recognizing adhesive matrix molecules NCBI : National Center for Biotechnology Information PaLoc : Pathogenicity locus ; locus de pathogenèse de Clostridium difficile PDE : phosphodiestérase du c-di-GMP PNPase : polynucléotide phosphorylase RBS : Ribosome Binding Site ; Site de liaison du ribosome SARM : Staphylococcus aureus résistant à la méthiciline Site-A : Site actif des domaines GGDEF et EAL Site-I : Site d’inhibition (de l’activité des domaines GGDEF par la liaison de c-di-GMP) LMW SLP : Low Molecular weight Surface Layer Protein T2SS : Type II Secretion System ; Système de sécrétion de type II T3SS : Type III Secretion System ; système de sécrétion de type III T4aP : Type IVa pilus ; Pilus de type IVa T4bP : Type IVb pilus ; Pilus de type IVb T4P : Type IV pilus ; Pilus de type IV

T4SS : Type IV Secretion System ; Système de sécrétion de type I

ix LISTE DES TABLEAUX

CHAPITRE 2 Table S1. Predicted and observed activities of the 31 most conserved putative c-di- GMP turn-over proteins...... 75 Table S2. Accession numbers of genomes of C. difficile strains used in this study. .... 77 Table S3. Clostridiaceae-encoded proteins containing GGDEF, EAL or HD-GYP domains...... 78 Table S4. Bacterial strains and plasmids used in this study...... 81 Table S5. DNA sequences of the oligonucleotides used in this study...... 83

CHAPITRE 3 Table S1. Putative type IV pilus proteins encoded by C. difficile 630...... 138 Table S2. Primers used in this study...... 139

CHAPITRE 4 Tableau 1. Homologues des composantes liées à la compétence chez C. difficile...... 150 Tableau 2. Riborégulateur c-di-GMP prédits et gènes en aval chez C. difficile 630. .... 160 Tableau 3. Domaines conservés des protéines de C. difficile 630 encodés par des gènes de fonction inconnue en aval de riborégulateurs à c-di-GMP-I prédits...... 161

x LISTE DES FIGURES

CHAPITRE 1 Figure 1. Cycle infectieux de C. difficile...... 2 Figure 2. Modèle de la formation des pseudomembranes lors d’infections à C. difficile...... 3 Figure 3. Représentation schématique du c-di-GMP...... 9 Figure 4. Diversité des composantes de la signalisation à c-di-GMP...... 10 Figure 5. Mécanismes de régulation de la transcription et de la traduction par les riborégulateurs...... 19 Figure 6. Structure cristallographique des aptamères c-di-GMP de classe I et II...... 22 Figure 7. Reconnaissance du c-di-GMP par les aptamère c-di-GMP de classe I et II. 23 Figure 8. Représentation des différents types de pili des bactéries à Gram négatif. ... 26 Figure 9. Superposition des structures 3D de trois pilines de T4P...... 29 Figure 10. Comparaison des composantes des T2SS, T4P et de la transformation naturelle...... 31

CHAPITRE 2 Figure 1. Domain composition and organization of the 37 putative c-di-GMP- signalling proteins encoded by C. difficile 630...... 43 Figure 2. Conservation of c-di-GMP-signalling components of C. difficile 630 in other strains of C. difficile...... 45 Figure 3. Phenotypic changes triggered by the expression of C. difficile 630 genes encoding predicted DGCs or PDEAs in V. cholerae N16961...... 48 Figure 4. Alteration of V. cholerae N16961 motility and biofilm formation upon expression of wild-type or mutated putative DGCs and PDEAs from C. difficile 630...... 49 Figure 5. TLC analysis of the enzymatic activities of CD1420 and CD0757...... 53

xi Figure S1. Domain composition and organization of the 3 putative c-di-GMP- signalling proteins encoded by C. difficile strains other than strain 630...... 86 Figure S2. Analysis of intracellular c-di-GMP by HPLC...... 87 Figure S3. Phylogenetic tree of selected Clostridiaceae species...... 88

CHAPITRE 3 Figure 1. Schematic representation of the primary T4P gene cluster...... 98 Figure 2. C-di-GMP induces T4P formation...... 101 Figure 3. T4P are required for c-di-GMP-dependent aggregation of C. difficile planktonic cells...... 103 Figure 4. Time-course monitoring of cell aggregation reveals differences in aggregation rate...... 106 Figure 5. C-di-GMP increases the expression of the T4P primary cluster genes...... 108 Figure 6. Cdi2_4 is a transcriptional c-di-GMP-II riboswitch...... 110 Figure 7. Cdi2_4 riboswitch transcription regulation requires specific c-di-GMP binding...... 112 Figure S1. Southern blot hybridization confirming single intron insertion in either pilA1 or pilB2...... 141 Figure S2. Absorbance data of the 16-h cell aggregation assays and cell viability...... 142 Figure S3. Analysis of intracellular c-di-GMP by HPLC...... 143 Figure S4. Map of pDcca-RB-pilA1...... 144 Figure S5. Predicted transcription terminator downstream pilA1...... 144

CHAPITRE 4 Figure 11. Gènes de synthèse et de modification des flagelles chez C. difficile 630. ... 152 Figure 12. Modèle de régulation de la motilité et des toxines par le c-di-GMP...... 154 Figure 13. Contexte génétique des gènes CD630_28300 et CD630_28310...... 156 Figure 14. Représentation schématique du contexte génétique de bcsA...... 158

xii PRÉAMBULE

Le c-di-GMP est un messager secondaire important dans la régulation de plusieurs phénotypes bactériens. L’étude de la signalisation à c-di-GMP a connu un essor important dans la dernière décennie et l’étendue de son rôle ainsi que la diversité des mécanismes de régulation employés chez différentes bactéries n’ont cessé de croitre. Bien que le thème de recherche principal du laboratoire de Vincent Burrus porte sur des éléments génétiques mobiles retrouvés principalement chez les bactérie du genre Vibrio dont Vibrio cholerae, la caractérisation de deux enzymes de synthèse du c-di-GMP codées par certains de ces éléments génétiques au début de mon doctorat (voir Annexe 1) m’a amené à travailler plus spécifiquement sur la signalisation à c-di-GMP chez Clostridium difficile.

L’objectif initial de ce projet était de déterminer la capacité de synthèse et de dégradation du c-di-GMP pour ensuite déterminer le rôle de ce messager secondaire chez C. difficile. De nouvelles informations ont ensuite permis d’émettre l’hypothèse que le c-di-GMP est impliqué dans l’agrégation de C. difficile en régulant l’expression de pili de type IV. Ainsi, afin de mettre en contexte les travaux de cette de thèse par article, l’introduction (Chapitre 1) portera sur différents aspects de C. difficile, de la signalisation à c-di-GMP et les pili de type IV. Les résultats sont présentés sous la forme d’un article publié (Chapitre 2 – Synthèse et dégradation du c-di-GMP chez C. difficile) et un article soumis (Chapitre 3 – Rôle des pili de type IV dans l’agrégation de C. difficile induite par le c-di-GMP). Enfin, certains aspects non abordés dans les articles présentés seront discutés (Chapitre 4 – Discussion) et les résultats seront intégrés à la littérature la plus récente afin de tirer un portrait global de notre compréhension actuelle de la signalisation à c-di-GMP chez C. difficile.

Enfin, j’ai participé de façon significative à la publication d’un article (Annexe 2) et d’un

xiii commentaire (Annexe 3) qui ne seront pas abordés, car leur sujet est éloigné de ma thèse. Ces articles portent sur la distribution des éléments intégratifs et conjugatifs chez les actinobactéries. L’intérêt que j’ai développé durant mon doctorat pour la prédiction de domaines conservés m’a permis de participer à ces travaux.

xiv CHAPITRE 1

INTRODUCTION

1.1 — Clostridium difficile

Identifié pour la première fois en 1935 dans la microflore intestinale de nouveau-nés, ce n’est toutefois qu’en 1978 que Clostridium difficile a été reconnu comme l’agent causal de la colite pseudomembraneuse, la forme sévère caractéristique d’infections à C. difficile (CDI) (Bartlett et al., 1978). Première cause de diarrhée nosocomiale, ce pathogène intestinal est responsable, en plus des diarrhées associées à C. difficile (DACD), d’inflammations importantes du côlon (mégacôlons toxiques), de septicémies et peut entrainer le décès du patient. C. difficile est une bactérie anaérobie stricte à Gram positif et à bas G+C qui parvient tout de même à survivre en présence d’oxygène sous sa forme dormante et très résistante aux conditions adverses, la spore. L’expulsion des spores dans les fèces des patients constitue un mode de dissémination efficace en milieu hospitalier. En effet, les CDI sont un problème majeur depuis la dernière décennie entrainant fréquemment la complication et la prolongation des hospitalisations. Le taux des CDI aurait surpassé celui des infections causées par le SARM (Staphylococcus aureus résistant à la méthiciline) en milieux hospitaliers, devenant ainsi la cause la plus fréquente d’infection nosocomiale (Miller et al., 2011).

À noter que la bactérie C. difficile est en voie d’être renommée Peptoclostridium difficile afin de souligner sa classification récente parmi la nouvelle famille des Peptostreptococcaceae (Yutin et Galperin, 2013). Néanmoins, le nom Clostridium difficile a été employé dans cette thèse.

1 1.1.2 — Cycle infectieux

C. difficile est transmis par voie fécale-orale (Figure 1). Malgré l’ingestion de bactéries à la fois sous forme de spores et de cellules végétatives, les spores sont principalement la forme atteignant l’intestin selon des observations dans le modèle d’infection du hamster (Wilson et al., 1985). La résistance supérieure des spores à l’acidité gastrique explique cette observation. Une fois dans l’intestin, les spores germent en réponse à l’exposition à certains composés de cet environnement, dont les sels biliaires (Giel et al., 2010; Wilson et al., 1982).

1 Ingestion Spores et 3 cellules végétatives Colonisation

Côlon Estomac 4 Infection Toxines 2 Germination 5 Expulsion

Figure 1. Cycle infectieux de C. difficile. Les bactéries sous forme de cellules végétatives ou de spores sont ingérées. Les spores résistent à l’acidité gastrique et parviennent à atteindre le petit intestin. La germination des spores permet aux cellules végétatives de coloniser le côlon. La production de toxines crée des dommages tissulaires et une inflammation. L’expulsion des bactéries avec les fèces permet la dissémination dans l’environnement. Figure modifiée de l’originale de Poutanen et Simor (2004).

2 La colonisation par C. difficile et les dommages tissulaires sont généralement observés au niveau du côlon bien que plusieurs cas de CDI dans le petit intestin aient été rapportés (Kim et Muder, 2011). Les CDI surviennent généralement suite à une perturbation de la flore intestinale, communément suite à la prise d’antibiotiques à large spectre, favorisant la prolifération de C. difficile. La production des exotoxines A et B provoque des dommages tissulaires et une perte de rétention d’eau entrainant la diarrhée (Poutanen et Simor, 2004). Les toxines induisent également une réponse inflammatoire contribuant à l’inflammation des tissus et menant à la formation de pseudomembranes qui sont des agrégats de cellules inflammatoires, de fibrine et de débris cellulaires (Figure 2) (Kelly et Kyne, 2011). Les bactéries expulsées avec les fèces d’un individu infecté peuvent persister dans l’environnement pendant plusieurs mois sous forme de spores et peuvent infecter de nouveaux individus ou encore la même personne (Poutanen et Simor, 2004).

Figure 2. Modèle de la formation des pseudomembranes lors d’infections à C. difficile. Les toxines TcdA et TcdB sont produites par C. difficile dans la lumière intestinale. La toxine TcdA est internalisée par les cellules épithéliales et provoque le réarrangement du cytosquelette d’actine ainsi que la rupture des jonctions serrées. TcdA et TcdB peuvent ensuite traverser l’épithélium. TcdA et TcdB sont toutes deux cytotoxiques et entrainent la production de médiateurs de la réponse inflammatoire qui vont attirer des cellules immunitaires au site d’infection, majoritairement des neutrophiles. Figure tirée de Rupnik et al. (2009).

3 1.1.3 — Épidémiologie

Le nombre et la sévérité des cas de CDI observés ont augmenté de façon importante lors de la dernière décennie. En 2000, les taux de colites et de colites avec symptômes sévères dues à des CDI ont doublé selon une étude menée dans le centre hospitalier universitaire de Pittsburgh, Pennsylvanie (Dallal et al., 2002). Depuis, une multitude d’épidémies avec recrudescence des cas sévères ont été rapportées au Canada, aux États-Unis et en Europe (Kuijper et al., 2006). Ces épidémies ont largement été attribuées à l’émergence de souches « hypervirulentes » de C. difficile NAP1/BI/027 (North American Pulsed-field Gel Electrophoresis Type 1, Restriction Endonuclease Analysis Group BI, PCR ribotype 027). Ces souches de toxinotype III produisent les toxines TcdA et TcdB ainsi que la toxine binaire CdtA/CdtB en plus de contenir une délétion en phase de lecture de 18 pb et une délétion d’une paire de base à la position 117 dans le gène tcdC menant à son inactivation (MacCannell et al., 2006; Warny et al., 2005). La protéine TcdC est un facteur anti-sigma réduisant l’expression des toxines TcdA et TcdB (Dupuy et al., 2008; Matamouros et al., 2007). Cela suggère une association entre l’inactivation du gène tcdC, l’augmentation de la production des toxines et la sévérité des cas de CDI. Toutefois, les études basées sur la comparaison de différentes souches cliniques sont contradictoires et il semble qu’il n’y ait pas nécessairement de corrélation significative entre l’inactivation du gène tcdC dans les souches NAP1/BI/027 et leur production de toxines ou même la sévérité des CDI comparativement à d’autres souches cliniques (Merrigan et al., 2010; Murray et al., 2009; Sirard et al., 2011; Walk et al., 2012). Le rôle de TcdC en tant que régulateur négatif de la production des toxines est par ailleurs réfuté dans deux études n’ayant pu observer de différences significatives dans des souches isogéniques (Bakker et al., 2012; Cartman et al., 2012). En somme, le caractère hypervirulent des souches NAP1/BI/027 est critiqué et ne semble pas absolu. Il semble par ailleurs que les souches de ribotype 027 soient de moins en moins prédominantes en Europe, alors que des souches de ribotype 078, déjà associées à des cas de CDI acquis en communauté et isolées de nourriture et d’animaux, sont émergentes (O'Donoghue et Kyne, 2011). En effet, bien que les CDI soient reconnues comme la première cause de diarrhées en milieux hospitaliers, de 10 à

4 40 % des cas de CDI serait acquis en communauté, bien que leur nombre soit difficile à évaluer (Gupta et Khanna, 2014).

1.1.4 — Résistance aux antibiotiques

Enfin, en plus de résister sous la forme de spores aux conditions stressantes telles l’acidité gastrique, la présence d’oxygène, l’action de nombreux désinfectants et antibiotiques, C. difficile est capable de résister à divers antibiotiques sous sa forme végétative. Les souches de C. difficile épidémiques sont en effet bien souvent multirésistantes aux antibiotiques. Jusqu’à récemment, seuls deux antibiotiques, le métronidazole et la vancomycine, ont été couramment utilisés pour le traitement des CDI depuis l’identification de C. difficile comme agent causal des CDI en 1978 (Pepin, 2006). Les recommandations dans l’usage de ces antibiotiques peuvent différer, mais il est courant d’utiliser le métronidazole pour les infections modérées et la vancomycine pour les infections sévères et récurrentes (O'Donoghue et Kyne, 2011). Or, ces deux antibiotiques ne permettent pas de traiter efficacement tous les cas de CDI, les taux de récurrences étant d’environ 20 % (Pepin et al., 2007). De nouveaux antibiotiques ayant un spectre d’action plus étroit ont été, ou sont actuellement développés afin de réduire les perturbations du microbiote intestinal qui a un rôle important dans la protection aux CDI (Taur et Pamer, 2014; Walk et al., 2012). La fidaxomycine a été approuvée en 2011 par la Food and Drug Administration pour le traitement des CDI (Traynor, 2011). Avec une efficacité de traitement comparable à la vancomycine, le traitement à la fidaxomicine altère moins le microbiote intestinal et les taux de récurrence sont plus faibles (Poxton, 2010; Tannock et al., 2010). Dans cette même optique, la transplantation fécale est une avenue prometteuse afin de rétablir le microbiote intestinal et prévenir la récurrence des CDI avec un taux de succès d’environ 90 % (Bakken, 2009; Cammarota et al., 2014).

5 1.1.5 — Pathogenèse

1.1.5.1 — Toxines

Les facteurs de virulence de C. difficile les plus étudiés sont sans aucun doute l’entérotoxine TcdA (308 kDa) et la cytotoxine TcdB (270 kDa). Les gènes tcdA et tcdB font partie d’un locus de pathogenèse d’environ 19 kb, PaLoc, codant également le régulateur positif TcdR et le régulateur négatif de l’expression des toxines TcdC (Dupuy et al., 2008). La protéine TcdE, également encodée par le PaLoc, présente des similitudes avec les holines des phages et serait impliquée dans la sécrétion des toxines A et B (Govind et al., 2012; Tan et al., 2001). TcdA et TcdB sont des glucosyl transférases de très grande taille inactivant les protéines Rho, Rac et Cdc42 à l’intérieur des cellules de l’hôte par l’ajout d’un glucose (Just et al., 1995a; Just et al., 1995b). Ces GTPases de la famille Rho sont d’importants régulateurs pour le maintien et le réarrangement du cytosquelette d’actine, une structure dynamique impliquée dans une multitude de processus tels que la morphogénèse, la migration, la polarité, la phagocytose, la division cellulaire et le transport de vésicules dans les cellules eucaryotes (Heasman et Ridley, 2008). L’action des toxines A et B sur des cellules en cultures cause une déformation des cellules pouvant mener à la lyse de celles-ci. TcdB semble avoir une activité jusqu’à 1000 fois plus importante que TcdA et, bien que des souches toxines A-/B+, mais non pas A+/B- soient isolées, une controverse existe quant à la nécessité et au pouvoir de la toxine A dans la pathogenèse de C. difficile (Kuehne et al., 2010; Lyras et al., 2009). En plus des toxines A et B, la toxine binaire CDT (C. difficile toxin ou C. difficile transferase) est aussi encodée par certaines souches de C. difficile. Comme pour les autres toxines binaires connues, CdtB facilite l’entrée et la translocation d’une protéine indépendante dans le cytoplasme, CdtA (Barth et al., 2004). CdtA est une ADP-ribosyltransférase qui agit directement sur l’actine-G (forme monomérique) ainsi qu’indirectement sur l’extrémité + de l’actine-F (forme polymérisée) et provoque la dépolymérisation de l’actine (Barth et al., 2004). La toxine CDT provoque la formation de protubérance de microtubules à la surface des cellules, augmentant

6 ainsi d’environ 4,5 fois l’adhérence des bactéries in vitro (Schwan et al., 2009). Récemment, il a été démontré que CDT a pour effet de rediriger la sécrétion de fibronectine vers les protubérances de microtubules du côté apical des cellules intestinales plutôt que du côté basal, et que cette fibronectine relâchée augmente l’adhésion de C. difficile (Schwan et al., 2014). Cette découverte suggère que les composantes de la matrice extracellulaire telle que la fibronectine peuvent être accessibles aux autres facteurs de virulence de C. difficile sans qu’il y ait nécessairement bris de l’intégrité de la couche de cellules épithéliales de la muqueuse intestinale.

1.1.5.2 — Autres facteurs de virulence

Outre les toxines, C. difficile compte sur une multitude de protéines de surface et de facteurs de virulence putatifs pouvant contribuer à la pathogenèse (Sebaihia et al., 2006). Ceux-ci incluent un certain nombre d’adhésines, les flagelles, la production d’exopolysaccharides, une capsule, la formation de pili et la sécrétion d’une collagénase. Néanmoins, la fonctionnalité et l’importance de plusieurs des facteurs de virulence putatifs de C. difficile n’ont pas été confirmées. La protéine Fbp68 de la souche 79-685 (homologue de FnbA de C. difficile 630) se lie à la fibronectine in vitro (Hennequin et al., 2003). Quoiqu’un mutant fnbA de C. difficile 630 montre une adhésion accrue aux cellules épithéliales de côlon en culture (Caco-2 et HT29-MTX), sa colonisation du cæcum est moindre que la souche isogénique sauvage dans un modèle d’infection de souris monoxénique (Barketi-Klai et al., 2011). La surface de C. difficile est recouverte d’une couche protéique (S-layer) composée principalement de la protéine de haut poids moléculaire HMW SLP (High Molecular weight Surface Layer Protein) et de la protéine de faible poids moléculaire LMW SLP (Low Molecular weight Surface Layer Protein) (Calabi et al., 2001). La HMW SLP et la LMW SLP sont dérivées du clivage du précurseur SlpA par la protéase à cystéine associée à la paroi, Cwp84 (Kirby et al., 2009). Les flagelles sont impliqués dans l’adhésion in vitro, mais semblent avoir un effet négatif sur la colonisation et la virulence in vivo selon une étude avec la souche 630 (Dingle et al., 2011;

7 Tasteyre et al., 2001). Enfin, il semble que contrairement à la souche 630, les flagelles de la souche R20291 sont importants pour la colonisation (Baban et al., 2013).

Au début de mon doctorat, encore peu d’études portaient sur la régulation des facteurs impliqués dans la pathogenèse de C. difficile. Parmi les régulateurs majeurs, la protéine CodY est l’une des mieux étudiées jusqu’à présent chez C. difficile. Découverte dans un premier temps chez Bacillus subtilis, puis dans plusieurs autres Firmicutes, CodY a pour rôle général la répression de gènes du métabolisme dont l’expression est superflue dans un milieu riche en nutriments (Sonenshein, 2005). Son affinité pour l’ADN est accrue par la liaison de deux activateurs synergiques, le GTP et les acides aminés branchés (Ratnayake-Lecamwasam et al., 2001; Shivers et Sonenshein, 2004). CodY régule également l’expression de différents gènes de virulence de façon spécifique, entre autres chez C. difficile, Staphyloccocus aureus, Listeria monocytogenes et Bacillus anthracis (Stenz et al., 2011). Chez C. difficile, CodY réprime la production des toxines TcdA et TcdB en se liant au promoteur du régulateur positif TcdR (Dineen et al., 2007). Une étude parue durant mon doctorat a permis d’identifier 165 gènes différemment exprimés dans un mutant codY dans C. difficile dont 60 % semblent être la cible directe de CodY (Dineen et al., 2010). L’étendue des phénotypes régulés par CodY pourrait être encore plus importante, car il semble réguler l’expression et cibler directement certains gènes liés au métabolisme d’un messager secondaire encore peu étudié chez les Firmicutes, et qui est au centre de mon sujet de thèse, le c-di-GMP.

8 1.2 — Le c-di-GMP, messager secondaire bactérien

Le diguanosine monophosphate cyclique (c-di-GMP) est une molécule composée de deux GMP joints par deux liaisions phosphodiesters entre les riboses (3’O) et les phosphates (5’P) (Figure 3). Le c-di-GMP a été découvert il y a plus de 25 ans grâce aux travaux menés dans le laboratoire de Moshe Benziman sur les bases biochimiques de la biosynthèse de la cellulose chez Komagataeibacter xylinus (anciennement, Gluconacetobacter xylinus ou Acetobacter xylinum) (Ross et al., 1987). Ross et al. (1987) avaient alors identifié une molécule de faible poids moléculaire augmentant de plus de 200 fois l’activité de la cellulose synthétase BcsA et démontré que l’ajout de c-di-GMP produisait le même effet. Cette même équipe a également identifié les premières protéines responsables de la synthèse et la dégradation du c-di-GMP (Tal et al., 1998). Dans la dernière décennie, l’intérêt pour cette molécule a augmenté considérablement grâce à la caractérisation des enzymes de régulation du c-di-GMP. La détermination des domaines catalytiques a permis de prédire que ce second messager est très répandu dans les diverses espèces bactériennes et pratiquement restreint à celles-ci.

Figure 3. Représentation schématique du c-di-GMP. Le c-di-GMP est composé de deux molécules de GMP liées entre elles par deux liaisons phosphodiesters.

9 1.2.1 — Synthèse et dégradation du c-di-GMP

La concentration intracellulaire de c-di-GMP est contrôlée par les activités antagonistes des diguanylate cyclases (DGC) et des phosphodiestérases (PDE) (Figure 4).

Figure 4. Diversité des composantes de la signalisation à c-di-GMP. Le c-di-GMP est synthétisé par des diguanylate cyclases et dégradé par des phosphodiestérases, enzymes contenant les domaines conservés GGDEF, EAL et HD-GYP. L’initiation de la transcription est régulée par différents facteurs de transcription liant le c-di-GMP (FleQ, PelD et Clp). Les deux classes de riborégulateurs c- di-GMP régulent la transcription et la traduction de certains gènes. L’activité, la localisation ou la protéolyse de certaines protéines sont régulées par la liaison du c-di-GMP à leur domaine PilZ, GGDEF dégénéré ou EAL dégénéré. R : A ou G; Y : C ou U. Figure modifiée de l’originale de Hengge (2010).

10 Les DGC catalysent la formation du c-di-GMP à partir de 2 molécules de GTP et sont caractérisées par la présence d’un domaine GGDEF (Ryjenkov et al., 2005). Le domaine GGDEF tire son nom du motif conservé en acides aminés GG[D/E]EF du site actif (site-A), initialement identifié comme étant GGDEF (Hecht et Newton, 1995). Plusieurs DGC contiennent également un site d’inhibition allostérique par le c-di-GMP (site-I) (Christen et al., 2006). Les PDE les plus étudiées contiennent un domaine EAL et dégradent le c-di-GMP en diguanosine monophosphate linéaire (pGpG) (Schmidt et al., 2005). Les PDE du second type contiennent un domaine HD-GYP et dégradent le c-di-GMP directement en 2 molécules de GMP (Ryan et al., 2006). Les domaines EAL et HD-GYP tirent également leur nom des motifs en acides aminés hautement conservés de leur site actif respectif (Galperin et al., 1999; Tal et al., 1998).

Plus de la moitié des DGC et PDE avec un domaine GGDEF ou EAL contiennent également des domaines capteurs pour réguler leur activité, dont les plus fréquents sont les domaines REC et PAS (Seshasayee et al., 2010). Un certain nombre d’études ont démontré une régulation post-traductionnelle de l’activité des DGC et PDE. Notamment, la DGC PleD de Caulobacter crescentus (Paul et al., 2007), la DGC WspR de P. aeruginosa (Hickman et al., 2005), la DGC DgcL encodée par des éléments génétiques mobiles de bactéries du genre Vibrio (Annexe 1) (Bordeleau et al., 2010) et la PDE VieA de V. cholerae (Martinez-Wilson et al., 2008) sont toutes activées par la phosphorylation d’un ou plusieurs domaines REC situés en N-terminal. Les domaines PAS sont répandus tant chez les eucaryotes que les procaryotes et certains de ceux-ci sont capables de percevoir spécifiquement certaines molécules directement tel le citrate, ou encore percevoir indirectement la lumière visible, le potentiel redox et l’oxygène via un cofacteur tels que des flavines et l’hème (Revue dans (Henry et Crosson, 2011). Par exemple, la PDE DosP de E. coli contient un domaine PAS ayant une molécule d’hème et est activée par l’oxygène (Tuckerman et al., 2009). Enfin, la transcription de certaines DGC et PDE et leur protéolyse peuvent également être contrôlées afin de répondre à des signaux spécifiques ou restreindre leur expression à une phase de

11 croissance spécifique (Perry et al., 2004; Pesavento et al., 2008; Weber et al., 2006). Cette régulation à plusieurs niveaux est nécessaire afin de coordonner l’activité des DGC et PDE qui peuvent être très nombreuse chez certaines bactéries.

1.2.1.1 — Distribution des DGC et PDE

L’analyse des génomes bactériens permet de prédire la présence de protéines avec des domaines GGDEF, EAL et HD-GYP parmi une très grande diversité d’espèces bactériennes (Romling et al., 2005). Néanmoins, la distribution de ces enzymes prédites n’est pas uniforme parmi les différentes espèces. Plusieurs espèces de -protéobactéries telle que Pseudomonas aeruginosa, Escherichia coli et Vibrio vulnificus, codent de 29 à près de 100 de ces gènes (Banque de données SignalCensus du NCBI; (Romling et al., 2013)). À l’opposé, plusieurs espèces ne comptent que très peu, voire aucune de ces protéines et les génomes séquencés de certains phyla entiers, tels que les Bacteroidetes et Chlamydiae, ne codent pratiquement aucune de ces protéines (Romling et al., 2013; Romling et al., 2005). Des protéines de régulation du c-di-GMP sont aussi encodées par des éléments génétiques mobiles tels que des plasmides et des éléments intégratifs et conjugatifs (en anglais, Integrating Conjugative Elements) (Annexe 1) (Bordeleau et al., 2010), des éléments mobiles d’ADN double brin s’intégrant dans le chromosome bactérien et qui se transfèrent entre bactéries par conjugaison (Burrus et al., 2002).

Enfin, des protéines liées à la signalisation à c-di-GMP ont été prédites dans un très faible nombre de génome de plantes, et d’animaux inférieurs tels que des amibes, et au moins une partie de ces protéines prédites résulte de contaminations bactériennes (Romling et al., 2013). Néanmoins, une espèce eucaryote, l’amibe Dictyostelium discoideum, code pour une DGC dont l’activité a été confirmée (Chen et Schaap, 2012). Chez D. discoideum, le c-di-GMP agit comme signal extracellulaire favorisant la différenciation des cellules formant la tige (en

12 anglais, stalk-cells) à la base du sporocarpe, une structure pluricellulaire aboutissant à la production des spores (Chen et Schaap, 2012). Le c-di-GMP régule ainsi la transition du stade pluricellulaire motile, le pseudo-plasmode (en anglais, slug), au stade pluricellulaire statique, le sporocarpe. Ceci est non sans rappeler le rôle principal du c-di-GMP chez les bactéries.

1.2.2 — Phénotypes bactériens régulés par le c-di-GMP

Les phénotypes régulés par le c-di-GMP commencent à être bien définis chez plusieurs bactéries à Gram négatif et tout particulièrement chez certaines -protéobactéries sur lesquelles portent la grande majorité des études sur le c-di-GMP. L’augmentation de la concentration intracellulaire en c-di-GMP favorise le passage des cellules motiles planctoniques à un mode plus statique se traduisant par une augmentation de la formation de biofilm et de l’adhésion chez plusieurs espèces telles que V. cholerae, E. coli, P. aeruginosa, et Yersinia pestis (revue dans (Hengge, 2009)). En utilisant des voies de signalisation parfois très diverses en aval, le c-di-GMP régule communément cette transition en réduisant l’expression ou l’activité des flagelles et en augmentant l’expression ou l’activité des protéines impliquées dans la formation d’exopolysaccharides et dans l’adhésion. Un certain nombre des divers mécanismes de régulation de ces phénotypes par le c-di-GMP seront abordés dans la section suivante (1.2.3 — Effecteurs du c-di-GMP).

D’autres phénotypes plus directement liés à la virulence sont aussi régulés par le c-di-GMP comme la résistance à l’oxydase des phagocytes chez Salmonella typhimurium (Hisert et al., 2005) ou la production de toxine chez certaines souches de V. cholerae (Tischler et al., 2002). Le c-di-GMP est aussi impliqué dans la régulation de l’expression de facteurs de virulence chez certaines -protéobactéries phytopathogènes telles que Xanthomonas campestris (Ryan et al., 2006) et Dickeya dadantii (Yi et al., 2010).

13 La signalisation à c-di-GMP a aussi fait l’objet de plusieurs travaux chez une espèce appartenant au phylum des -protéobactéries, C. crescentus. Cette bactérie aquatique particulière est un modèle d’étude important du cycle cellulaire et de la différenciation (Curtis et Brun, 2010). Chez C. crescentus le c-di-GMP est impliqué dans le passage des cellules motiles non réplicatives à des cellules adhérées réplicatives (Duerig et al., 2009).

Enfin, plusieurs espèces, genres et familles de bactéries n’ont pas ou ont été très peu étudiés relativement à la signalisation à c-di-GMP. C’est le cas notamment des Firmicutes, formés de l’ensemble des bactéries à Gram positif à bas G+C. Les études portant sur la signalisation à c- di-GMP chez S. aureus et Staphylococcus epidermidis, laissaient initialement croire que cette signalisation n’y était pas fonctionnelle. Il n’y a qu’une seule enzyme de synthèse du c-di- GMP putative et celle-ci ne semble pas active in vitro (Holland et al., 2008; Shang et al., 2009). Aucune autre étude ne s’était penchée sur la régulation à c-di-GMP chez les Firmicutes au début de mon doctorat. C’est dans ce contexte que nous avons entrepris de déterminer la fonctionnalité de la signalisation à c-di-GMP chez la bactérie Firmicutes C. difficile (Chapitre 2) et d’étudier son rôle chez cette bactérie (chapitre 3). La littérature concernant le rôle du c-di-GMP chez C. difficile parue au cours de mon doctorat sera abordée dans la discussion (Chapitre 3).

1.2.3 — Effecteurs du c-di-GMP

L’une des difficultés importantes dans l’étude du rôle du c-di-GMP réside dans la prédiction des cibles directes du second messager. En effet, bien que plusieurs récepteurs protéiques du c-di-GMP soient maintenant décrits, ceux-ci sont très divers et ne présentent pas toujours de domaines conservés permettant leurs prédictions (Figure 4).

14 1.2.3.1 — Protéines avec domaines PilZ

Bien que les effecteurs protéiques du c-di-GMP soient très diversifiés, plusieurs d’entre eux ont en commun la présence d’un domaine protéique responsable de la liaison du c-di-GMP, un domaine PilZ. Le domaine PilZ a été nommé à partir du nom de la protéine du même nom impliquée dans la biogenèse de pili de type IV chez P. aeruginosa (Amikam et Galperin, 2006). Fait intéressant, la liaison du c-di-GMP à la protéine PilZ n’a pas été démontrée. Au contraire, parmi les 8 protéines contenant des domaines PilZ chez P. aeruginosa, il s’agit de la seule protéine ne liant pas le c-di-GMP in vitro (Merighi et al., 2007). Les domaines PilZ liant le c-di-GMP contiennent les motifs en acides aminées conservés RxxxR et D/NxSxxG impliqués directement dans la liaison d’un dimère de molécules de c-di-GMP intercalées (Benach et al., 2007; Morgan et al., 2014; Ryjenkov et al., 2006). Une analyse subséquente de la séquence du domaine PilZ de la protéine PilZ de P. aeruginosa confirme l’ironie du sort. En effet, en raison de l’absence de ces motifs, il est peu probable que le domaine de la protéine PilZ de P. aeruginosa, à l’origine du nom du domaine de liaison du c-di-GMP, soit capable d’accomplir cette tâche. Par contre, il existe plusieurs exemples de protéines contenant un domaine de liaison du c-di-GMP PilZ fonctionnel. Par exemple, la protéine YcgR, retrouvée chez plusieurs entérobactéries dont E. coli a été déterminée comme agissant, suite à la liaison de c-di-GMP à son domaine PilZ, comme un frein moléculaire de la motilité en se liant aux protéines FliG et FliM impliquées dans la rotation du flagelle (Boehm et al., 2010; Fang et Gomelsky, 2010; Paul et al., 2010). Un autre exemple d’effecteur c-di-GMP contenant un domaine PilZ est la sous unité BcsA de la cellulose synthétase de K. xylinus, qui n’est nul autre que l’effecteur ayant mené à la découverte de la signalisation à c-di-GMP (Ross et al., 1987). Une liaison du c-di-GMP au domaine PilZ de BcsA, libère le site actif du domaine catalytique de BcsA qui peut alors catalyser la synthèse de cellulose en complexe avec BcsB (Morgan et al., 2014). Bien que la découverte des domaines PilZ ait permis d’identifier plusieurs effecteurs du c-di-GMP, plusieurs espèces n’ont aucune protéine prédite contenant de tels domaines et utilisent donc d’autres effecteurs.

15 1.2.3.2 — Protéines avec domaines GGDEF ou EAL non catalytiques

En plus d’être impliqués dans la synthèse et la dégradation du c-di-GMP, les domaines GGDEF et EAL peuvent servir de récepteur du c-di-GMP. Ces domaines, dits dégénérés, ne contiennent pas certains des acides aminés nécessaires à l’activité catalytique, mais sont capables de lier le c-di-GMP. Ainsi, certaines protéines contentant un domaine GGDEF avec un site-A dégénéré mais un site-I intact peuvent agir comme récepteurs du c-di-GMP. C’est le cas de la protéine PopA de C. crescentus qui régule la progression du cycle cellulaire par des interactions protéine-protéine suite à la liaison du c-di-GMP au site-I de son domaine GGDEF dégénéré (Duerig et al., 2009). De plus, le site-I des domaines GGDEF est présent dans la protéine PelD de P. aeruginosa qui ne contient en apparence pas de domaine GGDEF prédit (Lee et al., 2007). Néanmoins, le c-di-GMP lie bel et bien la protéine PelD pour induire la formation de polysaccarides pel chez P. aeruginosa, fort probablement, par des interactions protéine-protéine avec des composantes membranaires du complexe de biosynthèse de ces exopolysaccharides (Franklin et al., 2011; Lee et al., 2007). Bien qu’un domaine GGDEF ne puisse être prédit par analyse de la séquence de la protéine PelD, l’analyse de la structure de PelD par cristallographie a permis de révéler que le domaine de liaison du c-di-GMP a un repliement similaire à un domaine GGDEF (Whitney et al., 2012). Ceci suggère que la protéine PelD contient un domaine GGDEF dont la séquence en acides aminés a fortement divergé des domaines GGDEF catalytiques tout en conservant la structure et les acides aminés pour la liaison du c-di-GMP. Certains domaines GGDEF non catalytiques ont la capacité de lier le GTP à leur site-A et agissent comme domaines capteurs afin de réguler l’activité d’un domaine EAL catalytique (Christen et al., 2005).

Certains domaines EAL dégénérés peuvent lier le c-di-GMP à leur site-A non catalytique. C’est le cas de la protéine membranaire LapD de Pseudomonas fluorescens qui, par la séquestration de la protéase périplasmique LapG, contrôle la dégradation de l’adhésine LapA (Newell et al., 2011; Newell et al., 2009). LapD contient à la fois un domaine GGDEF

16 dégénérés et un domaine EAL dégénéré situés dans le cytoplasme (Newell et al., 2009). La liaison du c-di-GMP aux domaines EAL d’un homodimère de LapD provoque un changement de conformation jusque dans la portion périplasmique du dimère et permet la séquestration de la protéase LapG (Navarro et al., 2011; Newell et al., 2011; Newell et al., 2009). Ainsi, le c- di-GMP favorise l’adhésion de P. fluorescens en prévenant la dégradation de l’adhésine LapA située dans la membrane externe.

1.2.3.4 — Autres effecteurs protéiques du c-di-GMP

D’autres effecteurs protéiques du c-di-GMP ne présentant pas d’homologie particulière avec les effecteurs décrits ci-haut ont été identifiés. Le régulateur transcriptionnel VpsT de V. cholerae, est impliqué dans l’inhibition de la transcription des gènes de flagelles et l’induction de la transcription des gènes de synthèse des exopolysaccharides (vps) suite à la liaison du c-di-GMP (Krasteva et al., 2010). La protéine VpsT contient un domaine N- terminal REC et un domaine C-terminal hélice-tour-hélice de liaison de l’ADN. Le domaine REC particulier de VpsT favorise la dimérisation de VpsT en liant le c-di-GMP plutôt que par la phosphorylation par une kinase comme c’est le cas pour les domaines REC canoniques (Gao et Stock, 2009; Krasteva et al., 2010).

Trois équipes de recherche ont identifié presque simultanément un nouvel effecteur du c-di- GMP chez X. campestris, soit le régulateur transcriptionnel Clp (CRP-like protein) (Chin et al., 2010; Leduc et Roberts, 2009; Tao et al., 2010). Les protéines CRP (cyclic AMP receptor protein), aussi appelées CAP (catabolite activator protein), sont des régulateurs transcriptionnels positifs qui se lient à l’ADN suite à la liaison d’AMPc (Won et al., 2009). La protéine Clp lie le c-di-GMP plutôt que l’AMPc et régule directement ou indirectement l’expression de plusieurs gènes impliqués dans la virulence de ce pathogène de plante (Revue dans (Ryan, 2013).

17 Enfin, il existe également d’autres effecteurs du c-di-GMP dont le site de liaison n’a pas été déterminé, tels que le régulateur transcriptionnel FleQ de P. aeruginosa (Hickman et Harwood, 2008) et la polynucléotide phosphorylase (PNPase) de E. coli (Tuckerman et al., 2011). La liaison du c-di-GMP lève la répression de FleQ sur l’expression des gènes de synthèse de polysaccharides pel (Hickman et Harwood, 2008). La PNPase est une enzyme généralement retrouvée dans le dégradosome et qui est impliquée dans la dégradation et le métabolisme des ARN (Carpousis, 2007). La dégradation des ARN par un complexe ribonucléoprotéique d’E. coli contenant la PNPase, la RNase E, une DGC et PDE est stimulée in vitro par le c-di-GMP (Tuckerman et al., 2011).

1.2.3.5 — Effecteurs protéiques eucaryotes

Bien que le c-di-GMP ne soit pas utilisé comme messager secondaire chez les mammifères, un effet immunostimulant a été découvert pour cette molécule (Karaolis et al., 2007). Depuis, le c-di-GMP fait l’objet de plusieurs études pour ses capacités à agir comme adjuvant pour la vaccination (revue dans (Dubensky et al., 2013)). Le mécanisme d’action du c-di-GMP a été révélé récemment. Le c-di-GMP est lié à l’intérieur des cellules par au moins deux protéines, STING et DDX41, pour stimuler la production d’interférons de type I et une réponse inflammatoire associée à la détection de pathogènes intracellulaires (Burdette et al., 2011; Parvatiyar et al., 2012; Sauer et al., 2011). Ces protéines ne présentent cependant pas d’homologie avec les récepteurs du c-di-GMP identifiés chez les bactéries.

1.2.3.6 — Riborégulateurs c-di-GMP

En plus des récepteurs du c-di-GMP de nature protéique, il existe des récepteurs de type ARN, appelés riborégulateurs. Localisés dans la région non traduite des ARNm (5’UTR), les

18 riborégulateurs bactériens modulent l’expression de certains gènes en fonction de la liaison d’un ligand spécifique (Mandal et al., 2003). Les riborégulateurs se composent typiquement de deux structures, l’aptamère et la plate-forme d’expression. L’aptamère est la partie servant à la liaison spécifique d’une variété de métabolites. La liaison d’un ligand à l’aptamère induit un changement de structure dans l’aptamère et entraîne aussi une restructuration de la plate- forme d’expression permettant de moduler l’expression des gènes en aval, généralement associés à la voie métabolique du ligand. Les deux mécanismes de régulation de l’expression des gènes en aval les plus répandus agissent sur: 1) la transcription en empêchant ou en favorisant la formation d’une tige terminatrice ou 2) la traduction par la libération ou la séquestration du site de liaison du ribosome (RBS), plus précisément aussi appelé Shine- Dalgarno chez les procaryotes (Figure 5).

Figure 5. Mécanismes de régulation de la transcription et de la traduction par les riborégulateurs. (A) La liaison du ligand à l’aptamère favorise la formation d’un terminateur transcriptionnel dans la plate-forme d’expression et réduit la transcription du gène en aval. (B) La liaison du ligand favorise la séquestration du RBS et réduit la traduction du gène en aval. Figure modifiée de l’originale de Kim et Breaker (2008).

Il existe plus d’une vingtaine de classes de riborégulateurs reconnaissant une vaste gamme de ligands connus, mais aussi certaines classes de riborégulateurs dont le ligand demeure inconnu (Garst et Batey, 2009). En 2007, grâce à des modèles de prédictions bioinformatiques, une

19 étude a permis la prédiction de 22 nouveaux motifs ARN dont 6 seraient des aptamères non décrits appartenant à de nouveaux riborégulateurs putatifs (Weinberg et al., 2007). L’un des riborégulateurs putatifs était remarquable par sa localisation en 5’UTR de gènes associés à des fonctions très diversifiées et non impliqués dans la synthèse d’un métabolite précis. Sur la base de ces informations, le ligand ne pouvait pas être prédit et l’hypothèse d’un messager secondaire comme ligand fut lancée. À cause des fonctions prédites de la majorité des gènes en aval de ces motifs ARN, ceux-ci ont été nommés motifs GEMM pour Genes for the Environment, Membranes and Motility. En 2008, le ligand des motifs GEMM a été déterminé comme étant le c-di-GMP et les premiers riborégulateurs à c-di-GMP (c-di-GMP-I) ont été démontrés (Sudarsan et al., 2008). Les riborégulateurs c-di-GMP constituent la première classe de riborégulateurs dont le ligand est un second messager, et, par conséquent, à être impliquée dans la signalisation plutôt que dans la régulation des gènes de la voie métabolique de leur ligand (Sudarsan et al., 2008).

Plus de 500 riborégulateurs c-di-GMP-I ont été prédits dans l’ensemble des espèces bactériennes séquencées (Sudarsan et al., 2008). La structure cristallographique du riborégulateur c-di-GMP-I Vc2 de V. cholerae (localisé en amont d’un gène codant pour un facteur de transcription putatif lié à la compétence, tfoY), et son interaction avec le c-di-GMP ont été étudiées plus en profondeur (Kulshina et al., 2009; Smith et al., 2009). Initialement, l’affinité de Vc2 pour le c-di-GMP avait été estimée à ~ 1 nM (Kd, constante de dissociation) (Sudarsan et al., 2008). Cette affinité est environ 50-1000 fois plus importante que les protéines liant le c-di-GMP qui ont plutôt des Kd d’environ 50 nM à quelques M (Hengge, 2009). De plus, selon une estimation en fonction de la taille des cellules d’E. coli en phase de croissance exponentielle, une concentration de 1 nM correspondrait à une molécule de c-di- GMP par cellule (Hengge, 2009). Par contre, la concentration estimée dans E. coli, en fonction de la constante d’inhibition par le produit de plusieurs DGC (Ki = 0.5-1 M) (Christen et al.,

2006; Wassmann et al., 2007) et la constante de réaction des PDE (Km de 60 nM à plusieurs M) (Christen et al., 2005; Fouhy et al., 2006; Tamayo et al., 2005), varierait du bas nM au

20 bas M (Hengge, 2009). Ce riborégulateur aurait donc une affinité remarquable permettant la détection des plus fines quantités de c-di-GMP dans les bactéries. Une autre équipe a par la suite rapporté une valeur de Kd d’environ 10 pM, ce qui laisse croire que tout c-di-GMP libre dans une bactérie pourrait être détecté par l’aptamère Vc2 (Smith et al., 2009). Par contre, les constantes (Kd) sont, par définition, mesurées à l’équilibre et ne sont pas nécessairement les concentrations auxquelles le riborégulateur influence l’expression du gène en aval en cours de transcription. La cinétique de liaison du ligand avec l’aptamère et de repliement du riborégulateur doit être compatible avec la vitesse de transcription pour influencer l’expression de riborégulateurs transcriptionnels cinétiquement contrôlés (Coppins et al., 2007; Garst et Batey, 2009). En d’autres mots, la vitesse de liaison du ligand est critique, car la liaison doit avoir lieu avant que l’ARN polymérase ne dépasse le terminateur. Selon une estimation, on pourrait donc s’attendre à avoir une sensibilité moindre, plutôt dans l’ordre des concentrations prédites en c-di-GMP dans les cellules, soit de quelques nM à quelques M pour le riborégulateur Vc2, prédit pour être régulé de façon cinétique (Smith et al., 2009).

Une seconde classe de riborégulateur ayant pour ligand le c-di-GMP a été plus récemment définie (c-di-GMP-II) (Lee et al., 2010). Les riborégulateurs c-di-GMP de classe II prédits jusqu’à présent sont beaucoup moins répandus que ceux de classe I, avec seulement 45 représentants retrouvés en grande partie chez des espèces de la classe Clostridia et au genre Deinococcus (Lee et al., 2010). Les affinités respectives pour le c-di-GMP du riborégulatuer c-di-GMP-II en amont du gène CD630_32460 chez C. difficile (Kd de ~ 200 pM) et du riborégulateur c-di-GMP-II Cac-1-2 de Clostridium acetobutilycum (Kd de 2,2 nM) sont aussi très fortes bien qu’un peu plus faibles que celle de Vc2 plus récemment déterminée (Lee et al., 2010; Simm et al., 2009; Smith et al., 2009).

Bien que les riborégulateurs c-di-GMP-I et c-di-GMP-II aient le même ligand, leurs structures et les interactions avec le c-di-GMP sont bien différentes (Figure 4, Figure 6 et Figure 7).

21 L’aptamère c-di-GMP-I est formé de trois hélices, P1, P2 et P3, adoptant une forme en « Y », dont la jonction est le site de liaison du c-di-GMP (Kulshina et al., 2009; Smith et al., 2009) (Figure 6A et 6B).

Figure 6. Structure cristallographique des aptamères c-di-GMP de classe I et II. (A) Structure de l’aptamère c-di-GMP-I de Vc2 de V. cholerae lié au c-di-GMP (PDB : 3MXH). (B) Site de liaison dans l’aptamère c-di-GMP-I. (C) Structure de l’aptamère c-di-GMP-II de Cac-1-2 de C. acetobutylicum lié au c-di-GMP (PDB : 3Q3Z). (D) Site de liaison dans l’aptamère c-di-GMP-II. Le c-di-GMP est représenté en rouge, les nucléotides en contact direct avec le ligand sont en bleu et les nucléotides empilés directement au- dessus ou en dessous du ligand sont en vert. Les liaisons entre les bases sont indiquées selon les symboles standard (Leontis et Westhof, 2001). Figure modifiée de l’originale de Shanahan et al. (2011).

22

Les guanines du ligand sont reconnues particulièrement par deux nucléotides. La G20 fait une liaison trans Watson-Crick/Hoogstenn avec la face Hoogsteen de la G du c-di-GMP

(Figure 7A) alors que C92 est impliqué dans une paire Watson-Crick canonique avec la G

(Figure 7B). La G fait également des liaisons hydrogènes simples avec C46 et A48, alors que la G en fait avec C17, A18 et avec le résidu A47 intercalé entre les deux bases du ligand (Figure 6B et Figure 7A et 7B).

Figure 7. Reconnaissance du c-di-GMP par les aptamère c-di-GMP de classe I et II.

Interaction de l’aptamère c-di-GMP-I avec (A) la G du c-di-GMP et (B) la G du c-di-GMP. Interaction de l’aptamère c-di-GMP-II avec (C) la G du c-di-GMP et (D) la G du c-di-GMP. Le c-di-GMP est coloré en blanc (carbone), bleu (azote) et rouge (oxygène). Les nucléotides en contact direct avec le ligand sont de couleur bleue. Les liaisons hydrogène sont indiquées par des traits pointillés. Un ion magnésium (sphère verte) hydraté (sphères rouges) est représenté. Figure modifiée de l’originale de Shanahan et al. (2011).

23 L’aptamère c-di-GMP-II est quant à lui plus compact et formé de 4 hélices, P1, P2, P3 et P4, dont la tige P4 est un pseudonœud entre des nucléotides des boucles P3 et P1/P2 (Smith et al., 2011) (Figure 6C). Le c-di-GMP est incorporé au sein d’une triple hélice à la jonction de P1, P2 et P4 (Figure 6C et 6D). Tout comme pour l’aptamère c-di-GMP-I, les guanines du c-di- GMP sont reconnues de manière asymétrique par l’aptamère c-di-GMP-II mais via des interactions distinctes (Smith et al., 2009). La G du c-di-GMP fait une liaison trans Watson- Crick/Sugar-edge avec la face sucre (Sugar-edge) d’A69 qui fait également une liaison Watson-Crick canonique avec U37, formant ainsi un triplet de bases (Figure 6D et Figure 7C).

La G fait une liaison hydrogène avec un ion magnésium hydraté ainsi qu’avec G73 et le résidu A70 intercalé entre les deux bases du ligand (Figure 6D et Figure 7D). La reconnaissance du c-di-GMP par les deux classes d’aptamères aurait ainsi pour seule similitude l’adénine intercalée entre les G et G.

La prédiction de 16 riborégulateurs c-di-GMP chez C. difficile, dont un situé en amont d’un locus de synthèse de pili de type IV, a permis d’orienter plus précisément mon second projet de recherche sur le rôle du c-di-GMP dans la régulation des pili de type IV (Chapitre 3).

24 1.3 — Pili de type IV (T4P)

1.3.1 — Distinction des T4P parmi les différents types de pili

Les pili sont des appendices non flagellaires localisés à la surface des bactéries pouvant porter plusieurs autres noms tels que fimbriae, fimbrillae, cilia et filament (Duguid et Anderson, 1967). De façon générale, ces structures bactériennes jouent toutes un rôle dans l’adhésion en liant directement la surface ou indirectement, en favorisant l’agrégation des bactéries et la formation de biofilm (Proft et Baker, 2009). Chez les bactéries à Gram négatif, il existe 5 classes majeures de pilus, basées sur la voie de biosynthèse employée : i) pilus chaperon- usher, ii) curli, iii) pilus de type IV (T4P), iv) pilus de sécrétion de type III (pilus T3SS) ou injectisome et v) pilus de sécrétion de type IV (pilus T4SS) (Figure 8).

Les pili chaperon-usher et les curli sont assemblés par des complexes de sécrétion simples localisés dans la membrane externe tandis que les pili T4P, T3SS et T4SS sont assemblés par de larges complexes de plusieurs sous-unités occupant toutes les couches de l’enveloppe cellulaire (Fronzes et al., 2008). Les pili T3SS et T4SS forment en quelque sorte une classe à part et sont souvent exclues des pili classiques du fait qu’ils ont un rôle dans la sécrétion de facteurs de virulence (protéines et/ou ADN) dans des cellules hôtes ou encore dans la conjugaison bactérienne plutôt que dans l’adhésion.

25

Figure 8. Représentation des différents types de pili des bactéries à Gram négatif. Les noms des différentes composantes des pili exposées sont spécifiques aux bactéries les encodant et la nomenclature peut différer chez d’autres microorganismes. La nomenclature de deux différents pili chaperon- usher est indiquée : pilus P (Pap) et pilus type I (Fim). Figure originale de Fronzes et al. (2008).

Les bactéries à Gram positif exposent à leur surface un autre type de pilus, décrit pour la première fois en 1968, par la visualisation de ces structures à la surface de Corynebacterium renale par microscopie électronique (Yanagawa et al., 1968). Depuis, ces pili particuliers et restreints aux bactéries à Gram positif ont été observés chez diverses espèces, dont plusieurs des genres Actinomyces, Corynebacterium et Streptococcus (Girard et Jacius, 1974; Lauer et al., 2005; Mora et al., 2005; Yanagawa et Honda, 1976). Ces pili, contrairement aux pili des bactéries à Gram négatif mentionnés précédemment, sont assemblés par des liaisons covalentes successives des différentes sous-unités (pilines) et ancrés au peptidoglycane grâce à une sortase spécifique aux pilines (classe C) et la sortase générale (classe A), respectivement (Hendrickx et al., 2011). Ainsi, ces pili impliqués dans l’adhésion sont ancrés tels que les

26 autres protéines d’adhésion des bactéries à Gram positif, les MSCRAMM (Microbial surface components recognizing adhesive matrix molecules) (Marraffini et al., 2006). Néanmoins, des T4P ont été identifiés chez une bactérie à Gram positif, C. perfringens, où ils ont un rôle dans la motilité par contraction (en anglais, twitching motility ou gliding motility) et la formation de biofilm (Varga et al., 2006; Varga et al., 2008). Il s’agissait seulement du deuxième exemple de T4P décrit chez des bactéries à Gram positif avec ceux retrouvés chez Ruminococcus albus qui jouent un rôle dans l’adhésion à la cellulose (Rakotoarivonina et al., 2002). L’analyse des neuf génomes de Clostridium spp. séquencés suggérait aussi la présence de T4P chez ces neuf espèces, incluant C. difficile (Varga et al., 2006). Malgré la prédiction de certaines différences dans la structure des T4P, il s’agit du seul type de pilus retrouvé à la fois chez les bactéries à Gram négatif et à Gram positif et serait le type de pilus le plus répandu (Pelicic, 2008).

1.3.2 — Rôles des T4P

Tout comme les autres types de pilus, les T4P facilitent l’adhésion des bactéries à des surfaces biotiques et abiotiques. En effet, les T4P favorisent l'adhésion de plusieurs pathogènes humains aux cellules hôtes, tels qu'E. coli entéropathogène, P. aeruginosa et Neisseria meningitidis, et peuvent aussi promouvoir l’adhésion à la cellulose et à l’acier inoxydable de R. albus et P. aeruginosa, respectivement (Pelicic, 2008). De plus, les T4P sont connus depuis longtemps pour leur capacité à conférer d’autres propriétés, notamment l’acquisition d’ADN exogène par transformation, la transduction par certains phages et la motilité par contraction (en anglais, twitching motility) (Biswas et al., 1977; Bradley, 1980). La motilité par contraction est rendue possible par la capacité particulière de rétracter les pili de ce type (Mattick, 2002). Ce mode de déplacement est distinct de l’essaimage (en anglais, swarming), où des flagelles sont principalement responsables de la motilité sur milieu solide, tel qu’observé pour Proteus mirabilis, bien que les termes utilisés pour décrire les deux types de mouvements bactériens soient régulièrement confondus (Henrichsen, 1972).

27 1.3.3 — Pilines des T4P

Les T4P se distinguent des autres pili par les pilines caractéristiques qui les constituent. Les T4P sont des polymères synthétisés à partir de plusieurs sous-unités de pilines transportées à travers la membrane cytoplasmique et clivées par une prépiline peptidase spécifique grâce à la séquence caractéristique N-terminale des prépilines des T4P. Malgré une grande variabilité sur la majorité de leur séquence et de leur taille, les prépilines des T4P partagent toutes une certaine homologie et un motif de clivage conservé dans la partie N-terminale (Pugsley, 1993). Le motif N-terminal conservé GFxxxE est reconnu par la prépiline peptidase cytoplasmique qui procède à la maturation des prépilines en deux temps, soit le clivage après le résidu glycine et l’ajout d’un méthyle au nouveau résidu N-terminal (Lory et Strom, 1997). Les pilines matures ont aussi une structure globale similaire rappelant la forme d’une louche où la tête globulaire est formée de l’hélice-1 enroulée d’un feuillet- (structure appelée /-roll, en anglais) sur sa portion C-terminale et le manche correspond à la partie N-terminale hydrophobe de l’hélice- (Figure 9) (Craig et al., 2004). À ces structures conservées s’ajoutent les parties variables, soient celles les plus exposées à la surface des pili. La région hypervariable localisée en C-terminal (région D) contient néanmoins une paire de cystéines conservées formant un pont disulfure (Craig et Li, 2008). L’autre région, correspondant à la boucle entre l’hélice-1 et le feuillet- (boucle /), peut de plus être la cible de modifications post-traductionnelles de certaines pilines de N. meningitidis et P. aeruginosa (Stimson et al., 1995; Voisin et al., 2007).

28

Figure 9. Superposition des structures 3D de trois pilines de T4P. La superposition des pilines PAK (bleue), GC (blanche) et TcpA tronquée (rouge) permet de voir les éléments structurels conservés. La superposition de l’hélice-1 et les trois premières chaînes du feuillet- des trois pilines est exposée (A) de face et (B) de dos. (C) Vue de face de la superposition des trois pilines incluant les régions variables (boucle- et région D). PAK : PilA, P. aeruginosa K; GC : PilE, N. gonorrhoeae MS11; TcpA : TcpA, V. cholerae RT4236. Figure modifiée de l’originale de Craig et al. (2004).

Les caractéristiques des pilines permettent de les associer spécifiquement aux T4P, mais servent aussi, en partie, à la division de ces pili en deux classes, les pili de type IVa et IVb (T4aP et T4bP). Les prépilines des T4aP ont une séquence leader clivée relativement courte de moins de 10 acides aminés et les pilines matures ont une taille de 150-160 acides aminées. Les prépilines des T4bP ont des séquences leader de 15-30 acides aminés et les pilines sont très longues (180-200 acides aminés) ou très courtes (40-50 acides aminées) (Proft et Baker, 2009). De plus, le résidu méthylé N-terminal est invariablement une phénylalanine dans le cas des T4aP alors qu’il est variable pour les T4bP (Craig et al., 2004). Outre les différences au niveau de leurs pilines, les deux classes de T4P se distinguent par leur distribution. Alors que les T4aP sont très répandus, incluant même des exemples chez des bactéries à Gram positif (Varga et al., 2006), il semble que les T4bP soient généralement associées aux bactéries entéropathogènes à Gram négatif telles qu’E. coli et V. cholerae (Craig et al., 2004). Finalement, les T4bP n’ont généralement pas la protéine nécessaire pour effectuer la

29 dépolymérisation des pilines, l’ATPase PilT. Ainsi les T4bP ne confèreraient pas tous la capacité de se mouvoir par rétraction, bien que les T4bP aient des composantes principales similaires aux T4aP pour la synthèse des pili (Ayers et al., 2010).

1.3.4 — Autres composantes des T4P et homologie avec d’autres systèmes

Bien que la biogenèse des T4P ne soit pas entièrement comprise, leurs composantes principales ont été identifiées. Pour simplifier le texte, seule la nomenclature des T4P de N. gonorrhoeae a été employée dans cette section. Outre la piline, les T4P nécessitent une prépiline peptidase (PilD, discutée ci-dessus), une ATPase nécessaire à la polymérisation (PilF), une ATPase pour la dépolymérisation des pilines (PilT) et la motilité par contraction, une protéine plateforme enchâssée dans la membrane interne (PilG) et une sécrétine (PilQ) formant un pore dans la membrane externe des bactéries à Gram négatif (Ayers et al., 2010) (Figure 10 b). Plusieurs des protéines qui composent les T4P partagent une certaine homologie avec la machinerie de la compétence naturelle et les systèmes de sécrétion de type II (T2SS) (Figure 10) (Aas et al., 2002b). Les T2SS permettent la translocation à travers la membrane externe de protéines initialement dans le périplasme chez les bactéries à Gram négatif (Figure 10a) (Johnson et al., 2006). Alors que les bactéries à Gram positif n’ont pas de T2SS, certaines ont un système apparenté permettant l’acquisition d’ADN exogène et la compétence naturelle.

30 a Type II secretion b Type IV pilus c Competence d Competence Klebsiella oxytoca Neisseria Neisseria gonorrhoeae Bacillus subtilis gonorrhoeae

PilC DR PulD PilQ

Outer membrane PulS PilP

Periplasm Assembly Disassembly ComE

ComEA Inner membrane Com PulF Sec PilG ComA ComEC PulO PilD Com GB C PulE PilF PilT ComFA ComGA Pullulanase

Figure 10 . Comparaison des composantes des T2SS, T4P et de la transformation naturelle.

Les composantes des systèmes nécessaires à la formation des T2SS (a), des T4P (b), de la machinerie de compétence de Neisseria gonorrhoeae (c) et de Bacillus subtilis (d) présentent des similarités. Figure originale de

Chen et Dubnau (2004) .

Les pilines liées à la compétence et aux T2SS sont simila ires à celles des T4P, mais ces

systèmes ne permettent pas la formation de longs pili. Ils forment plutôt des pseudopili limités

à l’intérieur du périplasme (Gram négatif) ou à la paroi cellulaire (Gram positif). Conséquemment, les pilines de ces pseudopili sont nommées pseudopilines. Une autre différence majeure réside dans le fait que les T4P ont une seconde ATPase (PilT) permettant la rétraction des pili par la dépolymérisation des sous -unités de piline contrairement aux deux autres systèmes (Douzi et al., 2012).

Bien que le système prototype de la compétence naturelle soit dérivé de la bactérie à Gram positif B. subtilis, certaines bactéries à Gram négatif ont des systèmes similaires (Figure 10 c et

10d). Chez Neisseria gonorrhoeae , les composantes du T4P forment le pseudopilus de compétence et participent à la transformation naturelle de concert avec les protéines qui

31 forment un complexe similaire à celui de B. subtilis permettant la liaison et la translocation à travers la membrane interne (Figure 10c et 10d) (Hamilton et Dillard, 2006). Alors que la piline majeure PilE est essentielle à la formation de T4P et à la transformation, d’autres pilines semblent avoir des rôles spécifiques. Du point de vue des T4P, ces pilines sont dites mineures en raison de leur abondance relativement faible dans le pilus en comparaison de la piline majeure. Les pilines mineures et les pseudopilines semblent moduler l’initiation du T4P ou du pseudopilus de compétence. En effet, chez N. gonorrhoeae, l’expression de la piline ComP ou une mutation abolissant l’expression de PilV favorise la transformation au détriment de la formation de T4P (Aas et al., 2002a; Aas et al., 2002b).

Les pilines mineures ont un rôle important dans la biogenèse des pili et leurs fonctions associées telles que la compétence (voir ci-dessus). Chez P. aeruginosa et N. gonorrhoeae, la délétion de l’ensemble des pilines mineures entraîne une quasi-abolition de la formation de T4P (Giltner et al., 2010; Winther-Larsen et al., 2005). Chez N. gonorrhoeae, la piline mineure PilX est incorporée dans le pilus et joue un rôle dans l’agrégation et l’adhésion des bactéries (Helaine et al., 2007). Finalement, chez P. aeruginosa les pilines mineures FimU, PilV, PilW, PilX et PilE sont toutes incorporées dans le pilus et nécessaires à sa formation bien que leur rôle individuel demeure inconnu (Giltner et al., 2010).

32 1.4 Projets de doctorat

L’hypothèse initiale du projet de doctorat était la suivante: la signalisation à c-di-GMP est fonctionnelle chez C. difficile et joue un rôle dans la pathogenèse de cette bactérie. Suite, à la confirmation de l’activité des DGC et PDE de C. difficile (Chapitre 2) et la prédiction d’un riborégulateur c-di-GMP-II en amont de gènes de synthèse de pili de type IV, j’ai émis l’hypothèse suivante: le c-di-GMP régule l’expression des pili de type IV, une structure potentiellement impliquée dans la pathogenèse de cette bactérie (Chapitre 3).

1.4.1 Objectifs généraux du doctorat:

1) Déterminer la capacité de synthèse et de dégradation du c-di-GMP de C. difficile

a) Prédire les DGC et PDE putatives codées dans les génomes de C. difficile b) Évaluer l’activité de l’ensemble des DGC et PDE de C. difficile c) Démontrer explicitement l’activité d’une DCG et une PDE

2) Démontrer la régulation de la formation des T4P par le c-di-GMP et leur rôle chez C. difficile

a) Démontrer l’activité du riborégulateur c-di-GMP-II en amont de CD630_35130 b) Démontrer l’effet du c-di-GMP sur l’expression des T4P chez C. difficile c) Évaluer le rôle du pilus de type IV chez C. difficile

33 CHAPITRE 2

SYNTHÈSE ET DÉGRADATION DU C-DI-GMP CHEZ C. DIFFICILE

La prédiction de 37 gènes codant des protéines contenant des domaines associées à la régulation du c-di-GMP (GGDEF et/ou EAL) dans le génome de C. difficile 630, un nombre en apparence particulièrement élevé pour une bactérie à Gram positif selon la littérature rapportée jusqu’alors, nous a mené à évaluer la conservation et la fonctionnalité des enzymes de régulation du c-di-GMP chez C. difficile. Les résultats de ce projet on été publiés en 2011 dans la revue PLoS Genetics, dont la référence complète est la suivante :

Bordeleau, E., Fortier, L.C., Malouin, F., et Burrus, V. (2011). c-di-GMP turn-over in Clostridium difficile is controlled by a plethora of diguanylate cyclases and phosphodiesterases. PLoS Genet 7: e1002039.

Résumé de l’article

L’alignement des domaines GGDEF et EAL prédits a permis de prédire l’activité des protéines selon la conservation de certains acides aminés essentiels. La conservation des 37 protéines identifiées dans le génome de C. difficile 630 parmi 19 autres souches de C. difficile en cours de séquençages a été évaluée également par alignements de séquences. La vérification de l’activité de 31 des protéines conservées parmi la majorité des souches a été effectuée par clonage et surexpression hétérologue de ces enzymes putatives dans V. cholerae. Chez V. cholerae, les variations de la concentration de c-di-GMP se traduisent par une modification de la motilité et du biofilm, deux phénotypes faciles à mesurer. La fonctionnalité des enzymes putatives a alors été évaluée en observant l’effet de la surexpression de ces

34 protéines de C. difficile sur la motilité et la formation de biofilm de V. cholerae. Pour un grand nombre de protéines, une corrélation était évidente entre la fonction prédite (DGC ou PDE) et les résultats du modèle. Afin de confirmer l’activité d’une DGC (CD1420) et d’une PDE (CD0757), des essais enzymatiques in vitro de production et de dégradation de c-di-GMP ont été effectués. CD1420 est capable de former du c-di-GMP à partir de GTP. CD0757 est une PDE dont l’activité de dégradation de c-di-GMP en pGpG est augmentée par la présence de GTP. Le GTP activerait CD0757 de façon analogue à d’autres PDE contenant un domaine EAL actif et un domaine GGDEF inactif liant le GTP.

Les résultats obtenus dans cette étude ont permis de mettre en lumière la présence d’un nombre important de protéines de signalisation à c-di-GMP chez C. difficile. Ceci laisse présager un rôle important du c-di-GMP chez cette bactérie en plus de dissiper un peu l’impression de la faible participation du c-di-GMP dans la signalisation chez les bactéries à Gram positif.

À noter que les noms des ORF correspondent à ceux qui étaient en vigueur au moment de la rédaction de l’article. Dans les sections suivantes, les noms ont été mis à jour (p. ex., CD1420 est maintenant CD630_14200).

Contribution à l’article

J’ai élaboré les expériences sous la supervision de mon directeur de thèse, Vincent Burrus. J’ai effectué la grande majorité des manipulations ayant permis d’obtenir les résultats de l’article. J’ai fait la rédaction initiale de l’article, qui a ensuite été corrigé avec mon directeur et révisé par tous les auteurs.

35 c-di-GMP turn-over in Clostridium difficile is controlled by a plethora of diguanylate cyclases and phosphodiesterases

Eric Bordeleau1,3, Louis-Charles Fortier2,3, François Malouin1,3 and Vincent Burrus1,3*

1Département de biologie, Faculté des sciences, Université de Sherbrooke, Sherbrooke, Québec, Canada 2Département de microbiologie et d’infectiologie, Faculté de médecine et sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada 3Centre d'étude et de valorisation de la diversité microbienne (CEVDM), Université de Sherbrooke, Sherbrooke, Québec, Canada * Corresponding author

36 ABSTRACT

Clostridium difficile infections have become a major healthcare concern in the last decade during which the emergence of new strains has underscored this bacterium’s capacity to cause persistent epidemics. c-di-GMP is a bacterial second messenger regulating diverse bacterial phenotypes, notably motility and biofilm formation, in proteobacteria such as Vibrio cholerae, Pseudomonas aeruginosa and Salmonella. c-di-GMP is synthesized by diguanylate cyclases (DGCs) that contain a conserved GGDEF domain. It is degraded by phosphodiesterases (PDEs) that contain either an EAL or an HD-GYP conserved domain. Very little is known about the role of c-di-GMP in the regulation of phenotypes of Gram-positive or fastidious bacteria. Herein, we exposed the main components of c-di-GMP signalling in 20 genomes of C. difficile, revealed their prevalence and predicted their enzymatic activity. Ectopic expression of 31 of these conserved genes was carried out in V. cholerae to evaluate their effect on motility and biofilm formation, two well-characterized phenotype alterations associated with intracellular c-di-GMP variation in this bacterium. Most of the predicted DGCs and PDEs were found to be active in the V. cholerae model. Expression of truncated versions of CD0522, a protein with two GGDEF domains and one EAL domain, suggests that it can act alternatively as a DGC or a PDE. The activity of one purified DGC (CD1420) and one purified PDE (CD0757) was confirmed by in vitro enzymatic assays. GTP was shown to be important for the PDE activity of CD0757. Our results indicate that, in contrast to most Gram-positive bacteria including its closest relatives, C. difficile encodes a large assortment of functional DGCs and PDEs revealing that c-di-GMP signalling is an important and well conserved signal transduction system in this human pathogen.

37 AUTHOR SUMMARY

c-di-GMP is a bacterial intracellular signalling molecule regulating motility, biofilm formation, cell cycle control or virulence in Gram-negative bacteria. The function and importance of this molecule still remain unknown in Gram-positive bacteria, even in important emerging pathogens such as Clostridium difficile, which causes from mild to deadly intestinal infections, and has lately been on the rise in the healthcare setting and in the community. Here, we expose in the genomes of C. difficile strains a large number of conserved genes encoding proteins involved in the synthesis, degradation or sensing of c-di-GMP, in contrast with most other Gram-positive bacteria including C. difficile’s closest relatives. We confirmed the activity of most of these well-conserved proteins in a microorganism for which typical behavior alterations associated with variation of the intracellular c-di-GMP pools are known. We further confirmed the c-di-GMP synthesis and degradation activities of two purified C. difficile proteins. Our results indicate that c-di-GMP signalling is important in the lifecycle of this pathogen. This finding is particularly exciting because the c-di-GMP signalling network could serve as a target for the development of new drugs against C. difficile-associated diseases that are commonly associated with antibiotic usage.

38 INTRODUCTION

Clostridium difficile is a Gram-positive, anaerobic, spore-forming bacterium causing mild diarrhea to fulminant colitis in humans. Due to spreading of hypervirulent and high toxin- producing strains, C. difficile has caused in the last decade several epidemics in Europe and North America where it is now the leading cause of nosocomial diarrhea [1-3]. Its ability to sporulate allows this bacterium to remain dormant for years and survive to harsh conditions such as gastric acid after ingestion or the presence of oxygen in the environment. C. difficile- associated diseases are commonly associated with antibiotic usage, which creates a favorable niche for C. difficile to grow and cause infection in part by disrupting the gut microflora.

Bis-(3’-5’)-cyclic dimeric guanosine monophosphate (c-di-GMP) is a bacterial second messenger controlling diverse bacterial phenotypes mostly known to be involved in the transition from free- in Gram-negative bacteria [4,5]. c-di- GMP has also been shown to be involved in the development and cell cycle control in Caulobacter crescentus [6,7], and the modulation of virulence in several pathogens such as Vibrio cholerae, Vibrio vulnificus, Bordetella pertussis or Pseudomonas aeruginosa [8-11]. c- di-GMP is synthesized from 2 GTP molecules by enzymes named diguanylate cyclases (DGCs) that contain a GGDEF domain [12]. It is degraded respectively into pGpG or 2 GMP by phosphodiesterases (PDEs) that contain an EAL (PDEA) or a HD-GYP domain [13,14]. GGDEF domains were named by Hecht and Newton according to their conserved amino acid motif GG[D/E]EF [12]. EAL and HD-GYP domains are also named based on conserved amino motifs within these domains [13,14]. Whole genome analysis of a large number of bacterial species has revealed that the number of genes coding for enzymes involved in c-di- GMP turn-over varies widely between different species [15]. Genomes of proteobacteria generally encode a much wider array of such enzymes compared to those of Gram-positive bacteria. For instance, 66 genes coding for predicted enzymes involved in c-di-GMP turn-over are found in Shewanella oneidensis, 62 in V. cholerae, 41 in Pseudomonas aeruginosa, 29 in

39 Escherichia coli, 6 in Bacillus subtilis and 1 in Staphylococcus aureus. In fact, very little is known about c-di-GMP’s input in the regulation of phenotypes within Gram-positive bacteria. GdpS, the sole predicted staphylococcal GGDEF domain-containing protein, positively regulates biofilm formation in both S. aureus and S. epidermidis, and expression of protein A, a major virulence factor in S. aureus. However, GdpS does not appear to be an active DGC in vitro and its C-terminal GGDEF domain is not involved in these two phenotypes [16,17].

Although the enzymes producing and degrading c-di-GMP share customary domains in all bacteria, when known, the downstream effectors and pathways regulating the different phenotypical responses are usually different. c-di-GMP-sensing proteins containing the characterized c-di-GMP-binding PilZ domain [18] and some non-PilZ proteins are known c- di-GMP binding receptors. In E. coli and related bacteria, the PilZ domain-containing protein YcgR was found to decrease motility by interacting with the flagellar motor to control its direction and rotation speed upon binding of c-di-GMP [19,20]. In V. cholerae, VpsT, a key transcription regulator that inversely regulates the expression of genes associated with motility and biofilm formation, was only recently found to be active following binding of c-di-GMP to its atypical receiver domain [21]. Recently, the first c-di-GMP-binding riboswitches (c-di- GMP-I) have been discovered in the genomes of V. cholerae, C. difficile and other bacteria [22]. Riboswitch Cd1 of C. difficile is located upstream of the large operon coding for the synthesis of the flagellum and was found to have an “off” switch action on transcription in an in vitro transcription assay and in -galactosidase assays in B. subtilis [22]. Additionally, self- splicing of an unusual group I intron in C. difficile genome was found to be allosterically controlled by a c-di-GMP-II riboswitch aptamer, likely enabling the translation of a putative surface protein upon c-di-GMP binding [23]. Furthermore, 37 genes of C. difficile 630 that encode putative proteins containing GGDEF and/or EAL domains are available in the SignalCensus database [24]. Together, these observations suggest that c-di-GMP is a key signalling component in this emerging pathogen; yet studies on proteins regulating the intracellular c-di-GMP level, i.e. DGCs and PDEs, are still lacking.

40 The identification and characterization of enzymes producing or degrading c-di-GMP is a critical step to determine and understand the relevance of this second messenger in C. difficile’s lifecycle and virulence. In this study, we analyzed the prevalence and conservation of genes coding for putative DGCs and PDEAs in the genomes of 20 C. difficile isolates. Thirty-one conserved genes were assayed for their ability to encode functional DGC or PDEAs by evaluating the effect of their expression in V. cholerae. Most of these proteins conferred phenotypes that were consistent with their predicted function in our heterologous expression model. Our results indicate that, unlike the vast majority of Gram-positive bacteria including the Clostridiaceae, C. difficile regulates its intracellular c-di-GMP pools via a plethora of functional DGCs and PDEAs.

41 RESULTS

Domain composition and activity prediction of putative DGCs and PDEAs encoded by C. difficile 630.

Initial examination of the Pfam database 24.0 [25] for proteins involved in c-di-GMP turn- over in C. difficile 630, correlated to data provided by the NCBI’s SignalCensus database [24], reveals a total of 37 proteins containing a GGDEF (Pfam PF00990) and/or an EAL (Pfam PF00563) domain: 18 proteins have a GGDEF domain, one protein contains an EAL domain and 18 proteins have both GGDEF and EAL domains. Proteins containing both GGDEF and EAL domains have been shown to act either as DGCs or as PDEAs, or to exhibit both activities [14,26,27]. Furthermore, not all proteins containing a GGDEF or an EAL domain have been shown to exhibit DGC or PDEA activity. Several conserved amino acids in the GGDEF and EAL domains are predicted to be important to confer enzymatic activity [28,29], among which the highly conserved motifs GG[D/E]EF and EXLR, respectively. These motifs and other conserved amino acid residues were sought for in the primary sequences of the 37 putative c-di-GMP-signalling proteins encoded by the genome of C. difficile 630 by multiple sequence alignment (Figure 1). Briefly, among those 37 proteins, 15 are most likely active DGCs, 18 could be active PDEAs, 1 protein (CD0522) could either be an active DGC and/or PDEA, and 3 contain a predicted catalytically inactive GGDEF domain (Table S1). Except for CD0522, all the proteins with both GGDEF and EAL domains were predicted to be PDEAs having an inactive GGDEF partner domain since these have a degenerated GG[D/E]EF motif. Most of the 37 proteins are predicted to have at least one sensor domain, transmembrane regions or a signal peptide region.

42

Figure 1. Domain composition and organization of the 37 putative c-di-GMP-signalling proteins encoded by C. difficile 630. Proteins names are listed on the left and predicted sizes in amino acids are indicated on the right. GGDEF domains are shown in green and EAL domains are shown in red. The DGC PleD from C. crescentus (YP_002517919) and the PDEA RocR from P. aeruginosa (NP_252636) are shown as references to highlight amino acids important for enzymatic activity. The putative functions and positions of important amino

43 acid residues [28,29] are listed above each domain in black (conserved) or grey (non-conserved) and their position in PleD and RocR are indicated. Blue boxes are predicted signal peptide regions. Black boxes represent predicted transmembrane regions. Grey boxes represent predicted coiled-coil motifs. Additional sensor domains predicted by Pfam 24.0 are also shown in white. Proteins are not drawn to scale. Cache 1, Calcium channels and chemotaxis receptor family 1; PAS 3, PAS fold family 3; PTS EIIC, Phosphotransferase system EIIC; REC, Response regulator receiver; SBP bac 3, Bacterial extracellular solute-binding proteins, family 3; IP, primary inhibitory site; IS, secondary inhibitory site.

Distribution of the putative c-di-GMP signalling components in C. difficile

To assess whether c-di-GMP signalling components are conserved within the species C. difficile, orthologs of the proteins identified in strain 630 were exhaustively sought for in 19 other partially or completely sequenced C. difficile genomes using tblastn. Most genes found in C. difficile 630 are conserved in all the strains examined with 31 of the 37 GGDEF and/or EAL proteins having an ortholog in at least 15 strains and only CD2753 and CD2754 being unique to strain 630 (Figure 2A). The putative glycosyltransferase CD2545, the sole protein predicted to contain a PilZ domain (Pfam PF07238), has orthologs in all the other analyzed strains (Figure 2A). Furthermore, two RNA targets (named herein Cd1-a and 84Cd, respectively) that have been shown to bind c-di-GMP, the riboswitch Cd1 located upstream of the flagellum synthesis operon, and the self-splicing group I intron (tandem riboswitch- ribozyme) are conserved in a subset of strains [22,23]. The conservation pattern of the c-di- GMP regulatory proteins and downstream effectors clearly follows the phylogenetic distribution of the strains (Figure 2A and 2B). Interestingly, the cluster of 10 hypervirulent NAP1/BI/027 strains (cluster CD196/2007855 in Figure 2B) regroups the strains encoding the lowest number of c-di-GMP signalling components.

44

Figure 2. Conservation of c-di-GMP-signalling components of C. difficile 630 in other strains of C. difficile. (A) Protein orthologs exhibiting 90% identity are indicated by dark grey boxes. More divergent proteins sharing between 85 and 89% identity or containing major gaps are indicated by light grey boxes. Proteins that are absent are indicated by empty boxes. The c-di-GMP-sensing riboswitch Cd1 ((RS) Cd1) and 84Cd aptamer from the tandem riboswitch-ribozyme ((RZ) 84Cd) are indicated by a dark gray box when they shared 90% and were in the same genetic context as in strain 630. CdifQ2682, CdifA2673, and CdifA6827 were used to refer to

45 proteins CdifQCD-2_020200002682 (ZP_05400016) from C. difficile QCD-23m63, CdifA_020200002673 (ZP_05349638) and CdifA_020200006827 (ZP_05350459) from C. difficile ATCC 43255, respectively. (B) Phylogenetic tree of C. difficile isolates based on rpoB nucleotide sequences. Bootstrap values are indicated at branch point in percent. Only values 70% are shown.

Like CD2753 and CD2754 that are specific to C. difficile 630, we assumed that other strains could also encode c-di-GMP signalling proteins that were absent from all of the other strains.

The Microbial Signal Transduction database (MiST2) [30], which currently provides data for 14 out of our 19 strains, contains 3 putative c-di-GMP turn-over proteins that are absent from C. difficile 630’s proteome. Notably, CdifA_020200002673, is unique to strain ATCC43255 (Figure 2A), and is the only HD-GYP domain-containing protein detected in C. difficile. Based on analysis of amino acid conservation, it is predicted to have both DGC and PDE enzymatic activities (Figure S1). To identify additional strain-specific genes encoding c-di- GMP signalling proteins, we performed profile hidden-Markov model (profile HMM) searches with the Pfam HMMs for GGDEF, EAL, HD and PilZ conserved domains using HMMER3 software against the proteomes of strains 2007855, BI1, BI9, CF5, M120 and M68 as predicted by GeneMark.hmm (Table S2). No additional c-di-GMP-signalling proteins were detected. Besides CD2545, no other PilZ domain-containing protein was found. Finally, no GGDEF, EAL, HD-GYP or PilZ domain-containing proteins were found to be encoded by C. difficile plasmids pCD630, pCD6, pCDBI1 or the 300 kb putative phage or plasmid from BI1 (Table S2).

The remarkable prevalence of c-di-GMP-signalling components among C. difficile strains suggests that c-di-GMP is an important second messenger in this bacterium. The functionality of the 31 most conserved GGDEF and/or EAL domain proteins among C. difficile strains (Figure 2A) was therefore assessed to confirm their biological activity.

46 Heterologous expression of DGC and PDEA encoding genes in V. cholerae.

To the best of our knowledge, no model of Gram-positive bacteria is currently available to efficiently and reliably evaluate in vivo the enzymatic activity of proteins regulating the intracellular levels of c-di-GMP. Moreover, to circumvent the tedious laboratory procedures associated with working with C. difficile and the lack of information regarding the phenotypes regulated by c-di-GMP in this bacterium, assessment of the enzymatic activity of the 31 putative DGCs and PDEAs (cloned from C. difficile 630) was carried out by heterologous expression in V. cholerae. Characteristic phenotype alterations associated with variations of the intracellular c-di-GMP concentration of V. cholerae are easily observable and measurable. High levels of intracellular c-di-GMP concentrations increase biofilm formation and decrease motility whereas lower c-di-GMP concentrations cause the opposite effects [31]. As expected, ectopic expression of most of the predicted DGCs containing the canonical GG[D/E]EF motif decreased cell motility (Figure 3A) and increased biofilm formation (Figure 3C), supporting the hypothesis that these C. difficile proteins are genuine and functional DGCs (Table S1).

47

Figure 3. Phenotypic changes triggered by the expression of C. difficile 630 genes encoding predicted DGCs or PDEAs in V. cholerae N16961. Relative motility of cells over-expressing (A) predicted DGCs or (B) predicted PDEAs. Relative biofilm formation of mutant cells over-expressing (C) predicted DGCs or (D) predicted PDEAs. V. cholerae over- expressing lacZ was used as a reference control. The means and standard deviations obtained from at least three independent assays are shown. One-way ANOVA with a Dunnett multiple comparison post-test was used to compare the strains over-expressing putative c-di-GMP signalling proteins and the control strain (LacZ) (*, P<0.05; **, P<0.01; ***, P<0.001).

Replacement of the second glycine of the GGDEF motif by site-directed mutagenesis has been shown to greatly reduce the enzymatic activity of other DGCs [32,33]. Such a mutation in CD1420 (mutant G204E) abolished alterations of both biofilm and motility phenotypes (Figure 4A, 4B and 4C) confirming that the phenotypical alterations observed in V. cholerae are linked to the enzymatic activity of this functional C. difficile DGC.

48

Figure 4. Alteration of V. cholerae N16961 motility and biofilm formation upon expression of wild-type or mutated putative DGCs and PDEAs from C. difficile 630. (A) Relative motility upon expression of wild-type or mutated CD0757 or CD1420. (B) Relative biofilm expression upon expression of wild-type or mutated CD0757 or CD1420. (C) Motility of the strain expressing the wild-type (WT) and mutated proteins on LB soft agar plates at 30°C. The picture shown is representative of four independent experiments. (D) Schematic representation of CD0522 protein fragments expressed. The amino acids in the signature motifs of the GGDEF or EAL domains are indicated above each domain. Positions of the mutations in the GGDEF or EAL domains are indicated by asterisks. (E) Relative motility upon expression of full-length, mutated or truncated CD0522. (F) Relative biofilm formation upon expression of full-length, mutated or truncated CD0522. The means and standard deviations obtained from four independent assays are shown. One- way ANOVA with a Dunnett multiple comparison post-test was used to compare the cells over-expressing putative c-di-GMP signalling proteins and the control strain (LacZ) (*, P<0.05; **, P<0.01; ***, P<0.001).

Moreoever, the expression of CD1420 in V. cholerae N16961 led to a dramatic increase of the c-di-GMP level compared to the same strain expressing LacZ (Figure S2B and S2D, Text S1). Furthermore, we observed that CD1015, CD1185 and CD1420, the DGCs causing the strongest phenotypical shifts (Figure 3A and 3C), lacked the amino acid motif RXXD that is

49 part of the retro-inhibition site (I-site) of the GGDEF domain (Figure 1) [34]. The putative DGC CD2887 significantly enhanced biofilm formation without affecting the motility of V. cholerae. Interestingly, CD2887 contains a GGEEY motif instead of the canonical GG[D/E]EF motif suggesting that it might not be a functional DGC. However, Malone and colleagues [35] showed that upon substitution of the F amino acid residue for a Y residue in the A-site, the DGC WspR of Pseudomonas fluorescens retains its activity. Unlike the putative DGCs presented above, several predicted DGCs that appear to possess all the conserved amino acid residues necessary for c-di-GMP synthesis (e.g. CD2385 and CD3365) did not modulate biofilm formation or motility of V. cholerae in our experimental conditions. Yet we cannot conclude that these putative DGCs are not functional since they might not have been produced, may have been unstable or may lack the appropriate activating signal in the heterologous host. Unexpectedly, CD0537, which also has a canonical A-site, did not enhance biofilm formation but enhanced cell motility by ~60%. While this increase is notable compared to other DGCs, it remained very modest compared to the increase promoted by PDEAs (see below) suggesting that in our experimental setting, this putative DGC was not functional. The unexpected result on motility could result from partial sequestration of intracellular c-di-GMP by CD0537’s I-site upon overexpression of the protein. Additionally, the lack of apparent DGC activity of CD0537’s might simply be due to a lack of phosphorylation of its phosphoreceptor REC domain, a modification that could be catalyzed by the putative kinase cheA (CD0539) that was absent in our assay. Other DGCs have been shown to be activated by phosporylation of their phosphoreceptor REC domain [29,32,36]. As expected, CD1028 and CD2384, which respectively possess the degenerated QKDMI and GGEEI motifs, did not alter V. cholerae’s phenotypes. Consistent with our observation, the substitution of the F residue of the A-site for an I residue alone is known to eliminate the activity of WspR of P. fluorescens [35].

Expression of several putative PDEAs (CD0757, CD1515, CD1616, CD1840 and CD2663) significantly enhanced cell motility on soft agar by 2 to 4 fold (Figure 3B and Table S1).

50 While five putative PDEAs exhibited significant activity (CD0204, CD1421, CD1515, CD2134 and CD2663), in most cases, the impact of ectopic expression of these proteins on biofilm formation was modest and not significant (Figure 3D). The weak response of V. cholerae N16961 to PDEA activity could be due to its low basal level of biofilm formation in our assays. However, we indirectly confirmed that the V. cholerae motility response to overexpression of these proteins was linked to PDEA activity by mutating the glutamic acid residue of the EVLxR motif of CD0757, which is critical for enzymatic activity. Indeed, unlike the wild-type protein, overexpression of CD0757-E339A did not enhance the motility of V. cholerae N16961 on soft agar (Figure 4A and 4C).

CD0522 has dual enzymatic activity

Since CD0522 has a particular combination of GGDEF and EAL domains suggesting that it could have both DGC and PDEA activities, it was analyzed apart from those in Figure 3. CD0522 contains two predicted N-terminal GGDEF domains and one predicted C-terminal EAL domain (Figure 1 and 4D). Our analysis indicates that the first GGDEF domain and the EAL domain should be catalytically active due to the conservation of the A-sites. However, the second GGDEF domain contains the strongly degenerated motif YADVF suggesting that it is catalytically inactive (Figure 1 and 4D). CD0522 or its individual domains were tested in our V. cholerae heterologous expression model to verify these predictions (Figure 4C, 4E and 4F). Since the variations of cell motility and biofilm formation we observed upon ectopic expression of CD0522 were not statistically significant, we could not clearly establish a DGC activity for the complete CD0522. On the other hand, expression of the N-terminal fragment N-448, which encompasses the first GGDEF domain, reduced motility by more than half and increased biofilm by ~7 fold, suggesting that this fragment of CD0522 acts as a functional DGC (Figure 4E and 4F). The observed phenotypes correlate with a marked increase of intracellular c-di-GMP upon expression of N-448 in V. cholerae N16961 compared to the same strain expressing LacZ (Figure S2B and S2C, Text S1). DGC catalytic activity of

51 CD0522 was also indirectly confirmed by mutating the glutamic acid residue of the EAL motif to abolish any possible PDEA activity (Figure 4D). Expression of CD0522-E814A altered phenotypes as expected for a DGC, as observed for N-448 (Figure 4C, 4E and 4F). Expression of the C-terminal fragment C-307 led to a modest but significant increase of motility. As expected, expression of the degenerated GGDEF-containing central fragment M-305 did not lead to any significant change of phenotype compared to the control. The degenerated GGDEF domain of M-305 could be involved in the regulation of the PDEA activity of the EAL domain of CD0522. We observed that motility of cells expressing MC-591, which contains the central and C-terminal fragments, increased by ~50% (Figure 4E), which is consistent with a diminution of intracellular c-di-GMP. Increased motility was also observed when we overexpressed CD0522-G366E, a protein containing a substitution of the second glycine residue of the first GGDEF to abolish any DGC activity (Figure 4C and 4E). Complex proteins such as CD0522 that are composed of several GGDEF and EAL domains suggest a possible two-way c-di-GMP control and must be studied in detail to reveal what stimuli switches their enzymatic activity between the DGC or PDEA state.

In vitro enzymatic assays.

CD1420 and CD0757 enzymatic activities were further assessed in vitro to corroborate the results obtained in the V. cholerae expression model and confirm that the C. difficile proteins are genuine DGC and PDEA, respectively. These proteins were chosen for their strong activity in V. cholerae and the simplicity of the structure of the N-terminal sensor domain that suggested little requirements for in vitro assays (Figure 1). Purified CD1420 in its native form was able to produce c-di-GMP from GTP as substrate and the accumulation of the product increased with time (Figure 5A). Conversely, purified CD0757 did not produce c-di-GMP from GTP even after 1h incubation. Therefore, we confirmed that CD1420 is a functional DGC. The absence of DGC activity of CD0757 suggests that it contains an inactive GGDEF domain and acts as a PDEA onlyz

52

Figure 5. TLC analysis of the enzymatic activities of CD1420 and CD0757. (A) DGC activity of purified CD1420. (B) PDEA activity of purified CD0757. Snake venom phosphodiesterase (SVPD) was used as a positive control.

CD0757 was then assessed for PDEA activity on c-di-GMP. c-di-GMP hydrolysis by PDEAs is known to yield the linear diguanylate pGpG [37]. We incubated purified CD0757 with radiolabeled c-di-GMP, yielding small amounts of pGpG. This characteristic PDEA activity was abolished by denaturing the protein prior to the assay (Figure 5B). Inactive GGDEF domains have been shown to enhance PDEA activity of an adjacent EAL domain by binding GTP [26]. Addition of GTP to the enzymatic reaction increased noticeably the PDEA activity of CD0757 presumably through binding to the GGDEF domain like for PdeA (CC3396) from C. crescentus. After a 30-min incubation period, virtually all the c-di-GMP was converted to pGpG. Marginal degradation of c-di-GMP to GMP by CD0757 was detected as previously shown to occur with another PDEA [37].

53 DISCUSSION

Studies on c-di-GMP have addressed with some depth many aspects regarding the proteins involved in its synthesis (DGCs and PDEAs) and the molecular targets of c-di-GMP such as proteins and riboswitches in several bacteria. While c-di-GMP signalling has been extensively studied in many Gram-negative bacteria like C. crescentus, E. coli, V. cholerae, Salmonella and Pseudomonas, very few studies have been carried out on Gram-positive bacteria. To the best of our knowledge, the staphylococcal GGDEF domain protein GdpS has been the only c- di-GMP regulatory protein studied to date in low G+C Gram-positive bacteria. GdpS does not seem to have any measurable DGC activity [16]. The recent discovery of a functional c-di- GMP binding riboswitch in C. difficile and Bacillus cereus, as well as the prediction of several other similar riboswitches in other Gram-positive bacteria has revived the interest in studying c-di-GMP metabolism in these microorganisms. The recent characterization of a c-di-GMP- dependent self-splicing group I ribozyme in C. difficile further reinforces the role of c-di-GMP in Gram-positive bacteria.

In this work, we have shown that many of the genes encoding putative DGCs and PDEAs of C. difficile behave like genuine DGCs and PDEAs in heterologous expression experiments (Figure 3 and 4, Table S1). This number of c-di-GMP regulatory proteins encoded by C. difficile is high compared to what is found in its closest relatives (Table S3), and also among the Firmicutes in general (median=1) [38]. Analysis of the genomes of 49 strains of Clostridiaceae representing 27 species revealed that most contain less than 20 of such genes (Table S3). Only 2 species of Clostridium were found to encode more putative DGCs/PDEAs than C. difficile, Clostridium asparagiforme DSM15981 and Clostridium bolteae ATCC BAA-613, two newly characterized yet barely studied species isolated from human fecal samples [39,40]. The disparity in the occurrence of c-di-GMP signalling proteins is remarkable. The two species coding for the lowest number of c-di-GMP regulatory proteins, Clostridium hiranonis and Clostridium bartlettii, are the closest phylogenetically related

54 species to C. difficile (Figure S3 and Table S3). On the opposite, the two species coding for the highest number of GGDEF, EAL or HD-GYP protein, C. asparagiforme and C. bolteae, are among the most distant species from C. difficile. Additionally, C. difficile, which encodes with one exception no HD-GYP domain proteins, seems to be an exception among the Clostridiaceae and contrasts with Clostridium beijerinckii which encodes 14 HD-GYP domain proteins (Table S3), while retaining the same number of c-di-GMP regulatory proteins as C. difficile. In addition, C. difficile does not seem to carry any gene encoding c-di-GMP- signalling proteins that could have been recently exchanged by horizontal transfer with the 3 other Clostridium species containing the highest number of such proteins (Table S3 and data not shown). The Clostridiaceae seem to have a high number of c-di-GMP-signalling proteins among the Firmicutes in general, but it remains similar to other Firmicutes of comparable size (3000-4000 genes, median=10) [38].

The high number of c-di-GMP turn-over proteins in C. difficile is likely indicative of the importance of this second messenger in the bacterium’s lifecycle and suggests a major role in regulating different phenotypes. The diversity of N-terminal structures suggests that their function is not redundant. Instead, these proteins could individually act in a functionally or spatially sequestered way, in addition to being temporally regulated through differential expression. It has been shown that DGCs usually are not interchangeable and can contribute to very specific and distinct phenotypes for a unique microorganism. For example, while the DGC YddV of E. coli impacts poly-N-acetylglucosamine production, other DGCs like AdrA do not [41]. Instead AdrA controls the production of cellulose, another exopolysaccharide, in E. coli and Salmonella [42,43]. Furthermore, the prevalence of DGCs and PDEAs in C. difficile could also indicate the importance of these proteins in sensing and relaying a diversified array of environmental conditions through their sensor domains or their eventual differential expression. In V. cholerae, the PDEA CdpA is not expressed in vivo until the late stage of infection in a mouse colonization model [44]. C. difficile vegetative cells encounter various environmental conditions during their journey through the gastrointestinal tract, during

55 which c-di-GMP signalling might play a role in regulating diverse phenotypes.

Despite the current lack of experimental data, it is reasonable to assume that c-di-GMP regulates at least two phenotypes in C. difficile: flagella synthesis/motility and polysaccharide synthesis. A putative c-di-GMP-binding PilZ domain is located in the putative glycosyltransferase CD2545, which is predicted to be a cellulose synthase. Interestingly, although motility is commonly controlled by c-di-GMP in bacteria, the c-di-GMP-responsive effectors that likely regulate this phenotype in C. difficile appear to differ from those found in V. cholerae, E. coli and related bacteria. The c-di-GMP-sensing riboswitch Cd1 appears to control the transcription of the large operon of genes essential for assembling the flagellum apparatus [22]. Bacterial flagella are obviously important for motility but can also be involved in adhesion. Adhesion to mouse cecal mucus of the flagellin FliC and of the flagellum cap protein FliD of C. difficile has been demonstrated in vitro [45]. Moreover, FliD has been shown to specifically adhere to cultured cells [45]. Therefore, c-di-GMP signalling might impact both cell motility and adhesion of C. difficile to mucosal surface through the regulation of flagellum assembly. Additionally, c-di-GMP signalling may play a significant role in the excessive inflammation caused by C. difficile infection since flagellin is a very potent immunogenic protein recognized as a proinflammatory ligand by toll-like receptor 5 (TLR-5) located at the baso-lateral surface of intestinal cells (reviewed in [46]).

Except for one EAL protein (CD3650), all of C. difficile’s PDEAs contain a GGDEF domain predicted to be non-catalytically active as shown in vitro for CD0757 (Figure 1 and 5). Composite proteins containing both GGDEF and EAL domains are relatively frequent, representing approximately one third of proteins with such domains [47]. Some of these composite proteins act as DGCs, PDEAs, or have both activities (reviewed in [47]). Inactive GGDEF or EAL domains can act as sensor domains rather than catalytic domains. GGDEF domain proteins with degenerated active sites have been reported to bind c-di-GMP at their

56 conserved I-site, as for the C. crescentus protein PopA [48], or to retain the ability to bind GTP at their degenerated active site, as for the C. crescentus protein PdeA [26]. Christen and colleagues [26] have demonstrated that binding of GTP to the inactive GGDEF domain of PdeA of C. crescentus, strongly enhanced the PDEA activity of the C-terminal EAL domain. The authors formulated two hypotheses to explain why the PDEA activity is linked to GTP intracellular concentrations: (i) to prevent GTP pools to drop by the uncontrolled successive activities of DGCs and PDEAs and (ii) to sense physiological changes. Intracellular GTP levels have been shown to impact the activity of CodY, a major transcriptional regulator in many low G+C Gram-positive bacteria such as C. difficile, S. aureus, Streptococcus pneumoniae, Streptococcus pyogenes, Streptococcus mutans, Listeria monocytogenes, B. cereus and Bacillus anthracis in which it affects virulence gene expression ([49] and references therein). CodY is known to have greater affinity to target DNA promoter regions upon binding of two synergistic effectors, GTP and branched-chain amino acids [50,51]. C. difficile CodY has been shown to repress the expression of the toxin A and B genes (tcdA and tcdB), through binding to the promoter region of the positive transcriptional regulator TcdR [52]. A recent study aimed at identifying all DNA promoter regions targeted by C. difficile CodY as well as genes differentially expressed in a codY null mutant [49]. Interestingly, among the 165 genes identified with altered expression, PDEA genes CD0757 and CD1476 were highly derepressed. Additionally, DNA regions containing CD1476, CD2385, CD2873, CD2965 and CD3650 were identified as CodY binding-sites. These data suggest a probable interplay between the c-di-GMP and CodY signalling pathways, known to be important in the regulation of many metabolic genes and of the major virulence factors, toxins A and B [49,52].

To the best of our knowledge, no model of Gram-positive bacteria is currently available to efficiently and reliably evaluate the enzymatic activity of proteins regulating the intracellular levels of c-di-GMP. With the recent availability of molecular tools for C. difficile genetic manipulation [53-55], it will finally be possible to study in detail the many genes involved in

57 c-di-GMP signalling and turn-over in this bacterium and to identify the phenotypes associated with the variation of intracellular c-di-GMP pools. The need to decipher the regulatory mechanisms underlying C. difficile’s behaviors is imperative to the development of new therapeutics and treatment strategies. Particularly, the bacterial signalling pathways and phenotypes involved at the colon mucosal interface ought to be addressed.

58 MATERIALS AND METHODS

Bioinformatics

Proteins containing the c-di-GMP-associated conserved domains (GGDEF, EAL, HD for HD- GYP and PilZ) were searched for in the Clostridiaceae proteomes on the Pfam 24.0 server [25]. HD domains were further analyzed to identify HD-GYP domains by looking for the HD- GYP amino acid motif by multiple alignment with the HD-GYP domain of Rpfg from Xanthomonas campestris 8004 (Accession number AAY49388) using ClustalW version 2.0.12 [56]. Other conserved domains, signal peptides, transmembrane regions, and coiled-coil motifs annotations are as determined on Pfam 24.0 [25]. Identification of proteins with c-di-GMP- associated conserved domains in C. difficile strains other than 630 was achieved with the hmmsearch program of the HMMER 3.0 software (http://hmmer.org/). C. difficile annotated protein sequences were retrieved for the 13 other strains and 2 plasmids available in the NCBI Refseq database (Table S2) [57]. Protein sequences from C. difficile 2007855, BI1, BI9, CF5, M120 and M68 genomes and extrachromosomal sequences (Table S2) were predicted using GeneMark.hmm for Prokaryotes version 2.4 [58]. Profile hidden Markov models (profile- HMMs) of c-di-GMP-associated conserved domains were downloaded from Pfam 24.0. The bit score threshold values used in every search were the “trusted cutoff” values for the Pfam profile-HMMs. Proteins containing the c-di-GMP-related conserved domains identified using HMMER 3.0 software were further analyzed to identify other conserved domains (Pfam 24.0), signal peptides and transmembrane regions (Phobius [59]), and coiled-coil motifs (ncoils [60]). Nucleotide and amino acid conservation of selected C. difficile 630 genes and proteins were assessed with the appropriate BLAST algorithms [61]. Since most of the genomes are drafts, pseudogenes were ignored and assumed to be the results of sequencing errors. As a matter of fact, pseudogenes are found in many of these strains even for important, unique and well-conserved genes such as the gene encoding DNA polymerase I (data not shown). Phylogenetic trees were generated using the neighbor-joining method as implemented by

59 ClustalX version 2.012 [56] from gapless alignments of nucleotide sequences. Nucleotide sequences were aligned using ClustalW version 2.0.12 [56] and gap columns were removed using Jalview version 2.5 multiple alignment editor [62]. The reliability of each tree was subjected to a bootstrap test with 1000 replications. Trees were edited using FigTree version 1.3.1 (http://tree.bio.ed.ac.uk/software/figtree/).

Growth conditions

Bacterial strains were routinely grown in Luria–Bertani (LB) broth at 37°C in an orbital shaker and maintained at 80°C in LB broth containing 15% (v/v) glycerol. Ampicillin (Ap) was used at 100 g ml1 when needed. For induction of gene expression in the strains carrying arabinose-inducible vectors (pBAD series), L-arabinose was added to the growth medium at a final concentration of 0.02% (w/v).

Bacterial strains and plasmid construction

The bacterial strains and plasmids used in this study are described in Table S4. The oligonucleotides used for plasmid constructions are described in Table S5. For expression of putative DGCs and PDEAs in V. cholerae, genes cloned in pBAD-TOPO were amplified by PCR with their native Shine-Dalgarno sequence using C. difficile 630 genomic DNA as a template. Truncated versions of CD0522 were cloned to include the native Shine-Dalgarno sequence of CD0522. DNA was amplified to express CD0522 protein fragments N-448, M- 305, C-307 and MC-591 respectively containing the N-terminal 448 amino acids (aa), 305 aa encompassing the middle domain, the 307 aa in C-terminal and 591 aa encompassing the middle and C-terminal domains (Figure 4D). Plasmids pCD0522-G366E, pCD0522-E814A, pCD0757-E339A and pCD1420-G204E, which

60 accordingly contain amino acid substitutions in their respective conserved GG[D/E]EF or EXL A-sites, were created by site-directed mutagenesis of pCD0522, pCD0757 and pCD1420 using the QuickChange Lightning Site-directed Mutagenesis Kit (Stratagene) using primer pairs listed in Table S5. The mutations introduced were designed to create new EarI or PvuII restriction sites for initial screening of the mutated plasmids. Mutated genes were verified by sequencing.

For CD0757 and CD1420 proteins purification, the corresponding genes were amplified by PCR from C. difficile 630 genomic DNA and cloned into BamHI/SalI-digested pGEX6P-1 in frame with the glutathione S-transferase (GST) coding sequence.

Molecular biology methods

All the enzymes used in this study were obtained from New England BioLabs and were used according to the manufacturer's instructions. Plasmid DNA was prepared with a Qiaprep Spin miniprep kit (Qiagen). Genomic DNA of C. difficile 630 was extracted using the illustra bacteria GenomicPrep mini spin kit (GE Healthcare). PCR assays were performed with the primers described in Table S4 in 50 l of PCR mixtures with 1 U of Pfu Ultra DNA polymerase (Agilent). PCR conditions were as follows: (i) 3 min at 94 °C, (ii) 30 cycles of 30 s at 94 °C, 30 s at suitable annealing temperature, and 30-300 s at 72 °C, and (iii) 5 min at 72 °C. When needed PCR products were purified using a QIAquick PCR Purification Kit (Qiagen) according to the manufacturer's instructions. E. coli was transformed by electroporation according to Dower and colleagues [63]. V. cholerae was transformed by electroporation according to Occhino and colleagues [64]. In both cases, transformation was carried out in 0.1 cm electroporation cuvettes using a Bio-Rad GenePulser Xcell apparatus set at 25 F, 200 and 1.8 kV.

61

Motility and biofilm formation assays

Motility and biofilm assays were performed as described before [32]. Briefly, a semi-solid medium composed of 1% tryptone, 0.5% NaCl, 0.3% agar supplemented with ampicillin and L-arabinose was used to evaluate motility of V. cholerae mutant strains during over- expression assays at 30°C. Motility was assessed from the comparison of the surface area (mm2) of the colonies from plate images captured and analyzed using a Gel Doc XR system and Quantity One software (Bio-Rad). The capacity of V. cholerae mutant strains to form biofilm was determined after 6h static growth in LB broth containing ampicillin and L- arabinose at 30°C. Bound crystal violet was solubilized with 200 l of 95% ethanol and quantified by absorbance at 595 nm in a Model 680 microplate reader (Bio-Rad). Motility and biofilm formation assays were carried in triplicate and data were normalized as fold expression compared with the control LacZ over-expressing bacteria. Data from at least three independent experiments were combined.

Production of recombinant proteins

Overnight-grown cultures of E. coli BL21 bearing pGCD0757 or pGCD1420 were diluted 1:100 in fresh 2 YTA broth and incubated at 37°C with agitation. Protein expression was induced with 0.1 mM IPTG (isopropyl 1-thio--D-galactopyranoside) at mid-exponential phase (OD600 of 0.6) for CD0757 or at late-exponential phase (OD600 of 1.2) for CD1420. The cultures were grown for an additional 4 h at 37°C for CD0757 or 2 h at 25°C for CD0757. Cells were collected by centrifugation, re-suspended in PBS containing 1% Triton X-100 and protease inhibitors (Protease Inhibitor Cocktail, Sigma), and lysed by sonication. CD0757 and CD1420 were recovered by affinity chromatography using the GST purification module (GE Healthcare) with the PreScission protease (GE Healthcare) according to the manufacturer's

62 instructions. After elution, proteins samples were dialyzed against the conservation buffer

(50 mM Tris-HCl pH 7.8, 250 mM NaCl, 25 mM KCl, 10 mM MgCl2, 30% glycerol) for 18 h in D-Tube Dialyzer Maxi (MWCO 12-14, Novagen), concentrated by centrifugation on Amicon Ultra-15 columns (MWCO 10, Millipore), and stored at 20°C. Protein concentration was estimated using a BCA Protein Assay Kit (ThermoScientific) and purity was determined by SDS-PAGE analysis.

Assays for enzymatic activity and TLC analysis

Diguanylate cyclase and phosphodiesterase activities were measured according to previously described procedures [32,37] with the following modifications. Diguanylate cyclase assays were performed with approximately 1-2 g of purified proteins in a final volume of 50 l. Reaction mixtures were pre-incubated for 5 min at 30°C in the reaction buffer (50 mM Tris-

HCl pH 7.8, 250 mM NaCl, 25 mM KCl, 10 mM MgCl2). DGC reactions were initiated by adding 33.3 nM [-33P]-GTP (0.1 Ci l1) and incubated at 30°C. Samples were taken at various times, and the reactions were stopped by addition of one volume 0.5 M EDTA. Radiolabeled c-di-GMP for phosphodiesterase activity assays was synthesized using purified DgcK [32]. Purified DgcK (30 g) was incubated 8 h at 30°C in the reaction buffer to completely convert [-33P]-GTP into c-di-GMP. Reactions were stopped by denaturing at 99°C for 15 min, centrifuged for 2 min at 16,000g to elimate DgcK and recover the supernatant containing the radiolabeled c-di-GMP. Phosphodiesterase assays were performed with approximately 1-2 g of purified proteins in a final volume of 50 l of reaction buffer containing 20 nM prepared radiolabeled c-di-GMP (0.1 Ci l1) with or without 100 M GTP. One unit of snake venom phosphodiesterase (Phosphodiesterase I, Worthington) suspended in SVPD conservation buffer (100 mM Tris-HCl pH 8.0, 100 mM NaCl, 14 mM

MgCl2, 50% glycerol) was used as a positive control in PDEA assays. Proteins denatured at 99°C for 15 min were used as negative controls in both DGC and PDEA assays. Reaction products were analyzed by TLC as described before [32]. Briefly aliquots (2-4 l) were

63 spotted on polyethyleneimine-cellulose TLC plates (Sigma) previously washed in 0.5 M LiCl and air dried. Plates were then soaked for 5 min in methanol, dried, and developed in 2:3 (v/v) saturated (NH4)2SO4/1.5 M KH2PO4 (pH 3.5). Plates were allowed to dry prior to exposition to a phosphor imaging screen (Molecular Dynamics). Data were collected and analyzed using a FX molecular imager and the Quantity One software (Bio-Rad).

64 ACKNOWLEDGMENTS

The authors are grateful to Alain Lavigueur and Daniela Ceccarelli for helpful comments on the manuscript, to Francis Gaudreault for his help with bioinformatics, and Michel Trottier and Patrick Allard for their technical help with HPLC experiments.

65 REFERENCES

1. Gerding DN (2010) Global epidemiology of Clostridium difficile infection in 2010. Infect Control Hosp Epidemiol 31 Suppl 1: S32-34. 2. Kuijper EJ, Coignard B, Tull P (2006) Emergence of Clostridium difficile-associated disease in North America and Europe. Clin Microbiol Infect 12 Suppl 6: 2-18. 3. Loo VG, Poirier L, Miller MA, Oughton M, Libman MD, et al. (2005) A predominantly clonal multi-institutional outbreak of Clostridium difficile-associated diarrhea with high morbidity and mortality. N Engl J Med 353: 2442-2449. 4. Jenal U, Malone J (2006) Mechanisms of cyclic-di-GMP signaling in bacteria. Annu Rev Genet 40: 385-407. 5. Romling U, Amikam D (2006) Cyclic di-GMP as a second messenger. Curr Opin Microbiol 9: 218-228. 6. Aldridge P, Paul R, Goymer P, Rainey P, Jenal U (2003) Role of the GGDEF regulator PleD in polar development of Caulobacter crescentus. Mol Microbiol 47: 1695-1708. 7. Paul R, Weiser S, Amiot NC, Chan C, Schirmer T, et al. (2004) Cell cycle-dependent dynamic localization of a bacterial response regulator with a novel di-guanylate cyclase output domain. Genes Dev 18: 715-727. 8. Kim YR, Lee SE, Kim CM, Kim SY, Shin EK, et al. (2003) Characterization and pathogenic significance of Vibrio vulnificus antigens preferentially expressed in septicemic patients. Infect Immun 71: 5461-5471. 9. Kulasakara H, Lee V, Brencic A, Liberati N, Urbach J, et al. (2006) Analysis of Pseudomonas aeruginosa diguanylate cyclases and phosphodiesterases reveals a role for bis-(3'-5')-cyclic-GMP in virulence. Proc Natl Acad Sci U S A 103: 2839-2844. 10. Merkel TJ, Stibitz S, Keith JM, Leef M, Shahin R (1998) Contribution of regulation by the bvg locus to respiratory infection of mice by Bordetella pertussis. Infect Immun 66: 4367- 4373. 11. Tischler AD, Camilli A (2005) Cyclic diguanylate regulates Vibrio cholerae virulence gene expression. Infect Immun 73: 5873-5882.

66 12. Hecht GB, Newton A (1995) Identification of a novel response regulator required for the swarmer-to-stalked-cell transition in Caulobacter crescentus. J Bacteriol 177: 6223-6229. 13. Galperin MY, Natale DA, Aravind L, Koonin EV (1999) A specialized version of the HD hydrolase domain implicated in signal transduction. J Mol Microbiol Biotechnol 1: 303- 305. 14. Tal R, Wong HC, Calhoon R, Gelfand D, Fear AL, et al. (1998) Three cdg operons control cellular turnover of cyclic di-GMP in Acetobacter xylinum: genetic organization and occurrence of conserved domains in isoenzymes. J Bacteriol 180: 4416-4425. 15. Romling U, Gomelsky M, Galperin MY (2005) C-di-GMP: the dawning of a novel bacterial signalling system. Mol Microbiol 57: 629-639. 16. Holland LM, O'Donnell ST, Ryjenkov DA, Gomelsky L, Slater SR, et al. (2008) A staphylococcal GGDEF domain protein regulates biofilm formation independently of cyclic dimeric GMP. J Bacteriol 190: 5178-5189. 17. Shang F, Xue T, Sun H, Xing L, Zhang S, et al. (2009) The Staphylococcus aureus GGDEF domain-containing protein, GdpS, influences protein A gene expression in a cyclic diguanylic acid-independent manner. Infect Immun 77: 2849-2856. 18. Benach J, Swaminathan SS, Tamayo R, Handelman SK, Folta-Stogniew E, et al. (2007) The structural basis of cyclic diguanylate signal transduction by PilZ domains. EMBO J 26: 5153-5166. 19. Paul K, Nieto V, Carlquist WC, Blair DF, Harshey RM (2010) The c-di-GMP binding protein YcgR controls flagellar motor direction and speed to affect chemotaxis by a "backstop brake" mechanism. Mol Cell 38: 128-139. 20. Ryjenkov DA, Simm R, Romling U, Gomelsky M (2006) The PilZ domain is a receptor for the second messenger c-di-GMP: the PilZ domain protein YcgR controls motility in enterobacteria. J Biol Chem 281: 30310-30314. 21. Krasteva PV, Fong JC, Shikuma NJ, Beyhan S, Navarro MV, et al. (2010) Vibrio cholerae VpsT regulates matrix production and motility by directly sensing cyclic di-GMP. Science 327: 866-868. 22. Sudarsan N, Lee ER, Weinberg Z, Moy RH, Kim JN, et al. (2008) Riboswitches in

67 eubacteria sense the second messenger cyclic di-GMP. Science 321: 411-413. 23. Lee ER, Baker JL, Weinberg Z, Sudarsan N, Breaker RR (2010) An allosteric self-splicing ribozyme triggered by a bacterial second messenger. Science 329: 845-848. 24. Galperin MY, Higdon R, Kolker E (2010) Interplay of heritage and habitat in the distribution of bacterial signal transduction systems. Mol Biosyst 6: 721-728. 25. Finn RD, Mistry J, Tate J, Coggill P, Heger A, et al. (2010) The Pfam protein families database. Nucleic Acids Res 38: D211-222. 26. Christen M, Christen B, Folcher M, Schauerte A, Jenal U (2005) Identification and characterization of a cyclic di-GMP-specific phosphodiesterase and its allosteric control by GTP. J Biol Chem 280: 30829-30837. 27. Kumar M, Chatterji D (2008) Cyclic di-GMP: a second messenger required for long-term survival, but not for biofilm formation, in Mycobacterium smegmatis. Microbiology 154: 2942-2955. 28. Rao F, Yang Y, Qi Y, Liang ZX (2008) Catalytic mechanism of cyclic di-GMP-specific phosphodiesterase: a study of the EAL domain-containing RocR from Pseudomonas aeruginosa. J Bacteriol 190: 3622-3631. 29. Wassmann P, Chan C, Paul R, Beck A, Heerklotz H, et al. (2007) Structure of BeF3-- modified response regulator PleD: implications for diguanylate cyclase activation, catalysis, and feedback inhibition. Structure 15: 915-927. 30. Ulrich LE, Zhulin IB (2010) The MiST2 database: a comprehensive genomics resource on microbial signal transduction. Nucleic Acids Res 38: D401-407. 31. Beyhan S, Tischler AD, Camilli A, Yildiz FH (2006) Transcriptome and phenotypic responses of Vibrio cholerae to increased cyclic di-GMP level. J Bacteriol 188: 3600- 3613. 32. Bordeleau E, Brouillette E, Robichaud N, Burrus V (2010) Beyond antibiotic resistance: integrating conjugative elements of the SXT/R391 family that encode novel diguanylate cyclases participate to c-di-GMP signalling in Vibrio cholerae. Environ Microbiol 12: 510- 523. 33. Kirillina O, Fetherston JD, Bobrov AG, Abney J, Perry RD (2004) HmsP, a putative

68 phosphodiesterase, and HmsT, a putative diguanylate cyclase, control Hms-dependent biofilm formation in Yersinia pestis. Mol Microbiol 54: 75-88. 34. Christen B, Christen M, Paul R, Schmid F, Folcher M, et al. (2006) Allosteric control of cyclic di-GMP signaling. J Biol Chem 281: 32015-32024. 35. Malone JG, Williams R, Christen M, Jenal U, Spiers AJ, et al. (2007) The structure- function relationship of WspR, a Pseudomonas fluorescens response regulator with a GGDEF output domain. Microbiology 153: 980-994. 36. Hickman JW, Tifrea DF, Harwood CS (2005) A chemosensory system that regulates biofilm formation through modulation of cyclic diguanylate levels. Proc Natl Acad Sci U S A 102: 14422-14427. 37. Tamayo R, Tischler AD, Camilli A (2005) The EAL domain protein VieA is a cyclic diguanylate phosphodiesterase. J Biol Chem 280: 33324-33330. 38. Seshasayee AS, Fraser GM, Luscombe NM (2010) Comparative genomics of cyclic-di- GMP signalling in bacteria: post-translational regulation and catalytic activity. Nucleic Acids Res. 39. Mohan R, Namsolleck P, Lawson PA, Osterhoff M, Collins MD, et al. (2006) Clostridium asparagiforme sp. nov., isolated from a human faecal sample. Syst Appl Microbiol 29: 292-299. 40. Song Y, Liu C, Molitoris DR, Tomzynski TJ, Lawson PA, et al. (2003) Clostridium bolteae sp. nov., isolated from human sources. Syst Appl Microbiol 26: 84-89. 41. Tagliabue L, Antoniani D, Maciag A, Bocci P, Raffaelli N, et al. (2010) The diguanylate cyclase YddV controls production of the exopolysaccharide poly-N-acetylglucosamine (PNAG) through regulation of the PNAG biosynthetic pgaABCD operon. Microbiology 156: 2901-2911. 42. Garcia B, Latasa C, Solano C, Garcia-del Portillo F, Gamazo C, et al. (2004) Role of the GGDEF protein family in Salmonella cellulose biosynthesis and biofilm formation. Mol Microbiol 54: 264-277. 43. Zogaj X, Nimtz M, Rohde M, Bokranz W, Romling U (2001) The multicellular morphotypes of Salmonella typhimurium and Escherichia coli produce cellulose as the

69 second component of the extracellular matrix. Mol Microbiol 39: 1452-1463. 44. Tamayo R, Schild S, Pratt JT, Camilli A (2008) Role of cyclic Di-GMP during el tor biotype Vibrio cholerae infection: characterization of the in vivo-induced cyclic Di-GMP phosphodiesterase CdpA. Infect Immun 76: 1617-1627. 45. Tasteyre A, Barc MC, Collignon A, Boureau H, Karjalainen T (2001) Role of FliC and FliD flagellar proteins of Clostridium difficile in adherence and gut colonization. Infect Immun 69: 7937-7940. 46. Vijay-Kumar M, Gewirtz AT (2009) Flagellin: key target of mucosal innate immunity. Mucosal Immunol 2: 197-205. 47. Schirmer T, Jenal U (2009) Structural and mechanistic determinants of c-di-GMP signalling. Nat Rev Microbiol 7: 724-735. 48. Duerig A, Abel S, Folcher M, Nicollier M, Schwede T, et al. (2009) Second messenger- mediated spatiotemporal control of protein degradation regulates bacterial cell cycle progression. Genes Dev 23: 93-104. 49. Dineen SS, McBride SM, Sonenshein AL (2010) Integration of metabolism and virulence by Clostridium difficile CodY. J Bacteriol 192: 5350-5362. 50. Ratnayake-Lecamwasam M, Serror P, Wong KW, Sonenshein AL (2001) Bacillus subtilis CodY represses early-stationary-phase genes by sensing GTP levels. Genes Dev 15: 1093- 1103. 51. Shivers RP, Sonenshein AL (2004) Activation of the Bacillus subtilis global regulator CodY by direct interaction with branched-chain amino acids. Mol Microbiol 53: 599-611. 52. Dineen SS, Villapakkam AC, Nordman JT, Sonenshein AL (2007) Repression of Clostridium difficile toxin gene expression by CodY. Mol Microbiol 66: 206-219. 53. Heap JT, Cartman ST, Kuehne SA, Cooksley C, Minton NP (2010) ClosTron-targeted mutagenesis. Methods Mol Biol 646: 165-182. 54. Heap JT, Kuehne SA, Ehsaan M, Cartman ST, Cooksley CM, et al. (2010) The ClosTron: Mutagenesis in Clostridium refined and streamlined. J Microbiol Methods 80: 49-55. 55. Heap JT, Pennington OJ, Cartman ST, Minton NP (2009) A modular system for Clostridium shuttle plasmids. J Microbiol Methods 78: 79-85.

70 56. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947-2948. 57. Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35: D61-65. 58. Lukashin AV, Borodovsky M (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Res 26: 1107-1115. 59. Kall L, Krogh A, Sonnhammer EL (2007) Advantages of combined transmembrane topology and signal peptide prediction--the Phobius web server. Nucleic Acids Res 35: W429-432. 60. Lupas A, Van Dyke M, Stock J (1991) Predicting coiled coils from protein sequences. Science 252: 1162-1164. 61. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410. 62. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ (2009) Jalview Version 2-- a multiple sequence alignment editor and analysis workbench. Bioinformatics 25: 1189- 1191. 63. Dower WJ, Miller JF, Ragsdale CW (1988) High efficiency transformation of E. coli by high voltage electroporation. Nucleic Acids Res 16: 6127-6145. 64. Occhino DA, Wyckoff EE, Henderson DP, Wrona TJ, Payne SM (1998) Vibrio cholerae iron transport: haem transport genes are linked to one of two sets of tonB, exbB, exbD genes. Mol Microbiol 29: 1493-1507.

71 Supporting information c-di-GMP turn-over in C. difficile is controlled by a plethora of diguanylate cyclases and phosphodiesterases

Eric Bordeleau1,3, Louis-Charles Fortier2,3, François Malouin1,3 and Vincent Burrus1,3*

1 Département de biologie, Faculté des sciences, Université de Sherbrooke, Sherbrooke, Québec, Canada 2 Département de microbiologie et d’infectiologie, Faculté de médecine et sciences de la santé, Université de Sherbrooke, Sherbrooke, Québec, Canada 3 Centre d'étude et de valorisation de la diversité microbienne (CEVDM), Université de Sherbrooke, Sherbrooke, Québec, Canada * Corresponding author

72 MATERIALS AND METHODS

Nucleotide extraction and HPLC analysis

Intracellular c-di-GMP amounts were analysed as previously described [1] with the following modifications. Briefly, overnight-grown cultures were diluted 1/200 in fresh LB and incubated at 37°C with agitation to OD600 of 0.9. Protein expression was induced with 0.02% arabinose and the cultures were grown for one additional hour. Formaldehyde was added to a final concentration of 0.18% to the cultures which were then chilled on ice for 10 min before pelleting the cells by centrifugation. Cell pellets were resuspended in water and heated to 95°C for 15 min. Samples were cooled on ice for 10 min and cold ethanol was added to a final 65% concentration to extract the nucleotides. Samples were centrifuged for 10 min at 16 000 g at 4 °C to pellet cell debris. The pellet was extracted a second time with 65 % ethanol and supernatants were pooled for lyophilisation. Lyophilised samples were dissolved in 300 l of 50 mM triethyl ammonium bicarbonate (TEAB) buffer (pH 7.0, pH adjusted using acetic acid). Filtered samples were analyzed by ion-pair reversed-phase high-performance chromatography (HPLC). Analysis was performed using an LC1260 infinity HPLC system equipped with a diode array detector (Agilent). Separation of 2 l of each sample was achieved on a Zorbax SB-C18 column (4.6 x 150 mm, 3.5 m) with a combination of 50 mM TEAB (eluent A) and acetonitrile (eluent B) at a constant 0.8 mL/min flow rate with monitoring at 254nm. 100% eluent A was used from 0 to 3 min followed by a linear gradient from 100% to 85% eluent A until 15 min. A second linear gradient from 85% to 50 % eluent A, from 15 to 18 min, followed by a 6-min post-run was used for re-equilibration of the column. Synthetic c-di-GMP (BIOLOG, Bremen, Germany) was used as a standard.

73 REFERENCES

1. Ueda A, Wood TK (2009) Connecting quorum sensing, c-di-GMP, pel polysaccharide, and biofilm formation in Pseudomonas aeruginosa through tyrosine phosphatase TpbA (PA3885). PLoS Pathog 5: e1000483.

74 Table S1. Predicted and observed activities of the 31 most conserved putative c-di-GMP turn-over proteins.

Protein GGDEF EAL Predicted Observed activitya name motif motif activity Motility Biofilm CD0537 GGEEF - DGC + 0 CD0707 GGDEF - DGC -- + CD0997 GGDEF - DGC -- 0 CD1015 GGDEF - DGC -- ++ CD1028 QKDMI - none 0 0 CD1185 GGEEF - DGC -- ++ CD1419 GGDEF - DGC - + CD1420 GGDEF - DGC -- ++ CD1841 GGDEF - DGC -- 0 CD2244 GGEEF - DGC -- 0 CD2384 GGEEI - none 0 0 CD2385 GGEEF - DGC 0 0 CD2887 GGEEY - none 0 ++ CD3365 GGDEF - DGC 0 0 CD0204 DADLF ESLxR PDEA 0 - CD0522 GGDEF EALxR DGC/PDEA 0 b 0 b YADVF CD0748 SGEEF EALxR PDEA 0 0 CD0757 DGDEF EVLxR PDEA ++ 0 CD0811 GGDIF EALxR PDEA 0 0 CD1421 KGEGF EALxR PDEA 0 - CD1515 DGDEM EVLxR PDEA ++ - CD1538 DNDHF EALxR PDEA 0 - CD1616 SADNF EALxR PDEA ++ 0

75 CD1651 IADRF EALxR PDEA + 0 CD1840 SADNF EALxR PDEA + 0 CD2134 GSDNF EALxR PDEA - -- CD2663 SNDNF EALxR PDEA ++ - CD2760 KSDVF EALxR PDEA + 0 CD2873 SNDGF EALxR PDEA 0 0 CD2965 SSDYF EALxR PDEA 0 0 CD3650 - EGLxR PDEA 0 0 a Observed activity upon over-expression in V. cholerae N16961. + indicates an increase, 0 indicates no significant variation and - indicates a decrease. b Activity observed for the complete wild-type protein.

76 Table S2. Accession numbers of genomes of C. difficile strains used in this study.

Strain Sequence Accession # 630 complete genome NC_009089 630 pCD630 NC_008226 CD6 pCD6 NC_005326 ATCC 43255 whole genome shotgun sequence NZ_ABKJ00000000 CD196 complete genome NC_013315 CIP 107932 whole genome shotgun sequence NZ_ABKK00000000 NAP07 whole genome shotgun sequence NZ_ADVM00000000 NAP08 whole genome shotgun sequence NZ_ADNX00000000 QCD-23m63 whole genome shotgun sequence NZ_ABKL00000000 QCD-32g58 whole genome shotgun sequence NZ_AAML00000000 QCD-37x79 whole genome shotgun sequence NZ_ABHG00000000 QCD-63q42 whole genome shotgun sequence NZ_ABHD00000000 QCD-66c26 whole genome shotgun sequence NZ_ABFD00000000 QCD-76w55 whole genome shotgun sequence NZ_ABHE00000000 QCD-97b34 whole genome shotgun sequence NZ_ABHF00000000 R20291 complete genome NC_013316 2007855 complete genome FN665654a BI1 complete genome FN668941a BI1 Plasmid pCDBI1 FN668942 a BI1 BI1 putative phage or plasmid FN668943a BI9 complete genome FN668944a CF5 complete genome FN665652a M68 complete genome FN668375a M120 complete genome FN665653a

a Gene annotation of these genomes was not available at the time of our analysis.

77 Table S3. Clostridiaceae-encoded proteins containing GGDEF, EAL or HD-GYP domains.

EAL

-

GGDEF EAL GGDEF - - -

GGDEF EAL GGDEF GGDEF HDGYP - - - - -

Strains GGDEF EAL GGDEF GGDEF GGDEF HDGYP HDGYP GGDEF Total proteins Clostridium asparagiforme DSM 15981 21 2 14 4 - 1 4 - 46 Clostridium bolteae ATCC BAA-613 20 3 14 2 1 1 2 - 43 Clostridium beijerinckii ATCC 51743 13 1 9 - - 12 2 - 37 Clostridium difficile 630 18 1 16 2 - - - - 37 Clostridium hylemonae DSM 15053 14 3 10 5 - 1 1 - 34 Alkaliphilus metalliredigens QYMF 6 1 4 - - 9 4 - 24 Clostridium acetobutylicum 7 1 3 - - 9 - - 20 Clostridium butyricum E4 BoNT E BL5262 8 1 3 - - 5 3 - 20 Clostridium butyricum 5521 7 1 3 - - 5 3 - 19 Clostridium phytofermentans ISDg 8 - 5 - - 6 - - 19 Clostridium botulinum E3 - Alaska E43 7 - 4 2 - 4 1 - 18 Clostridium botulinum B - Eklund 17B 7 - 3 2 - 3 - - 15 Clostridium botulinum A3 - Loch Maree 6 1 1 - - 4 2 - 14 Clostridium botulinum A - Hall 6 1 2 - - 2 2 - 13 Clostridium botulinum B1 - Okra 6 1 1 - - 3 2 - 13 Clostridium botulinum Ba4 - 657 5 1 1 - - 4 2 - 13 Clostridium botulinum A2 - Kyoto 6 1 1 - - 2 2 - 12 Clostridium botulinum Bf 5 1 1 - - 3 2 - 12

78 Clostridium botulinum F - Langeland 6 1 1 - - 2 2 - 12 Clostridium botulinum NCTC 2916 6 1 1 - - 2 2 - 12 Clostridium botulinum A - ATCC 19397 6 1 1 - - 1 2 - 11 Clostridium botulinum C - Eklund 2 - 3 - - 2 - - 7 Clostridium kluyveri ATCC 8527 5 4 3 - - 4 2 - 18 Clostridium kluyveri NBRC 12016 4 3 4 - - 4 2 - 17 Alkaliphilus oremlandii OhILAs 9 1 4 - - 2 - 16 Clostridium thermocellum ATCC27405 6 - 1 - - 5 3 - 15 Clostridium thermocellum DSM 4150 5 - 1 - - 4 2 - 12 Clostridium methylpentosum DSM 5476 4 2 3 1 - 1 - - 11 Clostridium sp. 7_2_43FAA 2 3 4 - - 1 1 - 11 Clostridium novyi NT 3 1 3 - - 3 - - 10 Clostridium sp. L2-50 3 1 3 1 - 1 - 1 10 Clostridium sp. SS2-1 6 - 1 1 - 1 1 - 10 Clostridium perfringens 13 5 2 1 - - - 1 - 9 Clostridium perfringens A - ATCC 13124 3 2 2 - - - 2 - 9 Clostridium perfringens C - JGS1495 3 2 2 - - - 2 - 9 Clostridium perfringens CPE - F4969 3 2 2 - - - 2 - 9 Clostridium perfringens D - JGS1721 2 3 2 - - - 2 - 9 Clostridium perfringens E - JGS1987 3 2 2 - - - 2 - 9 Clostridium perfringens NCTC 8239 3 2 2 - - - 2 - 9 Clostridium perfringens A - SM101 3 2 1 - - - 2 - 8 Clostridium perfringens B - ATCC 3626 2 2 2 - - - - - 6 Clostridium cellulolyticum ATCC 35319 3 - 1 - - 4 - - 8 Clostridium scindens ATCC 35704 5 1 1 - - 1 - - 8 Clostridium sp. M62-1 5 - 2 - - 1 - - 8 Clostridium tetani 6 1 - - - 1 - - 8 Clostridium nexile DSM 1787 3 - 1 - - 2 - - 6 Clostridium leptum DSM 753 1 - 2 - - 1 1 - 5

79 Clostridium hiranonis DSM 13275 1 ------1 Clostridium bartletii DSM16795 ------

80 Table S4. Bacterial strains and plasmids used in this study.

Strain or plasmid Description Reference or source Strains C. difficile 630 epidemic type X [1] V. cholerae N16961 O1 El Tor strain [2] E. coli TOP10 F- mcrA (mrr-hsdRMS-mcrBC) 80lacZM15 Invitrogen lacX74 recA1 araD139 (ara-leu)7697 galU galK rpsL (StrR) endA1 nupG - - E.coli BL21 F- ompT hsdS (rB mB ) gal dcm GE Healthcare E.coli XL10-Gold TetR (mcrA)183 (mcrCB-hsdSMR-mrr)173 endA1 Stratagene supE44 thi-1 recA1 gyrA96 relA1 lac Hte [F’ proAB lacIqZM15 Tn10 (TetR) AmyR CamR]

Plasmids pBAD-TOPO ApR, arabinose-inducible expression vector, high-copy Invitrogen pBAD- pBAD-TOPO expressing lacZ Invitrogen TOPO/lacZ/V5-His pCD0204 pBAD-TOPO - CD0204 This study pCD0522 pBAD-TOPO - CD0522 This study pCD0522-(N-448) pBAD-TOPO containing the first GGDEF domain of This study CD0522 pCD0522-(M-305) pBAD-TOPO containing the second GGDEF domain of This study CD0522 pCD0522-(C-307) pBAD-TOPO containing the EAL domain of CD0522 This study pCD0522-(MC-591) pBAD-TOPO containing the second GGDEF domain This study and the EAL domain of CD0522 pCD0522-G366E pCD0522 with G to E and D to E substitutions at amino This study acids 366 and 367 pCD0522-E814A pCD0522 with a E to A substitution at amino acid 814 This study pCD0537 pBAD-TOPO - CD0537 This study pCD0707 pBAD-TOPO - CD0707 This study pCD0748 pBAD-TOPO - CD0748 This study pCD0757 pBAD-TOPO - CD0757 This study pCD0757-E339A pCD0757 with a E to A substitution at amino acid 339 This study pCD0811 pBAD-TOPO - CD0811 This study pCD0997 pBAD-TOPO - CD0997 This study pCD1015 pBAD-TOPO - CD1015 This study pCD1028 pBAD-TOPO - CD0204 This study pCD1185 pBAD-TOPO - CD1185 This study

81 pCD1419 pBAD-TOPO - CD1419 This study pCD1420 pBAD-TOPO - CD1420 This study pCD1420-G204E pCD1420 with G to E and D to E substitutions at amino This study acids 204 and 205 pCD1421 pBAD-TOPO - CD1421 This study pCD1515 pBAD-TOPO - CD1515 This study pCD1538 pBAD-TOPO - CD1538 This study pCD1616 pBAD-TOPO - CD1616 This study pCD1651 pBAD-TOPO - CD1651 This study pCD1840 pBAD-TOPO - CD1840 This study pCD1841 pBAD-TOPO - CD1841 This study pCD2134 pBAD-TOPO - CD2134 This study pCD2244 pBAD-TOPO - CD2244 This study pCD2384 pBAD-TOPO - CD2384 This study pCD2385 pBAD-TOPO - CD2385 This study pCD2663 pBAD-TOPO - CD2663 This study pCD2760 pBAD-TOPO - CD2760 This study pCD2873 pBAD-TOPO - CD2873 This study pCD2887 pBAD-TOPO - CD2887 This study pCD2965 pBAD-TOPO - CD2965 This study pCD3365 pBAD-TOPO - CD3365 This study pCD3650 pBAD-TOPO - CD3650 This study pGEX-6P1 ApR, IPTG-inducible glutathione S-transferase gene GE Healthcare fusion vector pGCD0757 pGEX-6P1 - CD0757 This study pGCD1420 pGEX-6P1 - CD1420 This study

REFERENCES

1. Sebaihia M, Wren BW, Mullany P, Fairweather NF, Minton N, et al. (2006) The multidrug- resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat Genet 38: 779-786. 2. Heidelberg JF, Eisen JA, Nelson WC, Clayton RA, Gwinn ML, et al. (2000) DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae. Nature 406: 477-483.

82 Table S5. DNA sequences of the oligonucleotides used in this study.

Primer name Nucleotide sequence (5 to 3)a CD0204F TAATTTGGAGGAAAAAAATGAAA CD0204R ACATTTAATAATGGATATTTTTAAAC CD0522F TAAAAGGAGATATATATATAAATGGA CD0522R TTATTTTTGATAAAGTAACTTCTCA CD0522-1DR TTATAAGTCACTAATTACTTTCATTTCA CD0522-2DF TAAAAGGAGATATATATATAAATGAAAGTAATTAGTGACTTAGCTG CD0522-2DR TTACGCATTTGTAATTGGAATA CD0522-EALF TAAAAGGAGATATATATATAAATGTTAGTACCATGTTTTGGG CD0537F TAAGAGAGGTTTGTATATGAAAAAT CD0537R TTATATCTTTTTTAATCCTTTCATT CD0707F TAACTATATGGAGGATTGTGTAG CD0707R AAAATTAATTCTTATACTATTTTAAA CD0748F TAGGTGGAATGGGTATGGAA CD0748R TTTTTCCTCTATTAAACCACATCA CD0757F TAGGGGGAAAAGATGAGAGA CD0757R TCACTTTGATAGTTGAAATTTATAAA CD0811F TAGCGAGGGAAGTTTATGGTA CD0811R ATAAAGGGTGTTAACTTGTAAAAG CD0997F TAAACAATGAGGATGTGAGTATG CD0997R CACAGTATTTTAAGTATTTTCTTCA CD1015F TAAAAGAAGGCTTTGAAATAATG CD1015R AAATTCAAATTCACTGTACATATAG CD1028F TAATTGGAGGTGTAAATTTATA CD1028R AATTTTGACATCTATATTTACATA CD1185F TAGAAAAGGAGGACTAAAATTTTG CD1185R AATTACTTATATACTACTTTATTTCTACCAAG CD1419F TAAAGGAGGAATATCATGAGCA CD1419R TTATTCGTATGTACTATAACAATTTTTG CD1420F TAATTTGGAGAAATTAATATGTTTA CD1420R ACAATGTTAATAATCATTTTTATCA CD1421F TAACTTATGGAGGAATACCATG

83 CD1421R AATACTTTGTACTAGATTTTTGTTAAA CD1515F TAACTATGGAGGAGATAAGTATATG CD1515R ATTGATAAATTAGAATTAATTATCTAG CD1538F TAATTAATGGTATTTTAATATG CD1538R TAATGATTAATCAAATTTAAA CD1616F TAATGGGAGGAAATTTTTATG CD1616R ACAGATTAATTACCCTCTTACCA CD1651F TAGGATAAAGGTGTTTTATGAAG CD1651R TTAGCTATTATAAACAAACTCTTTATC CD1840F TAATTGCTAAAGAGAGTGAGG CD1840R CGCTGACTATATTTTATAGACTAA CD1841F TAAATGGGAGGTGGTGATTAAGTG CD1841R ATAGTCCTTTTTTATTTATTTTTGTATAAAGAAG CD2134F TAAAAATGAGGTAACAATAATGA CD2134R AAAATAATTATATTATTTTTTAGTTACA CD2244F TAATGGAGGCTTATATGAACTTA CD2244R AACCTTTTAATAAAATGCAACA CD2384F TGAAATAGGATTGGTGATTTA CD2384R CACCTCAATTTTAAATTATAATG CD2385F TAAGTCGGAGGTGAGTTACAATA CD2385R CAACAGTTAAAAATTCTTAAAATCA CD2663F TAGAAAGGCAGGAAGCATAATG CD2663R ACAATTTTATTTAATTTGTTTCTTTTCTA CD2760F TAGGGGAATTTATATGAATAA CD2760R TTACTCCTCCTCAACTGTCA CD2873F TAAAGGATGGAGGATGAAA CD2873R TATTTAAACGGAACTATCCATA CD2887F TAACAAGACATTTAAGCTATGA CD2887R AATTTATTATTGATATTGTTTATTAA CD2965F TAATTAGGGGAGGAGTCGAATG CD2965R TTAAAGTAAATCTAAAAAATTCTTAGCTG CD3365F TAAGGGGGTAATCTATGGAA CD3365R CTTTTGTACTAAAACCAACTAAAG CD3650F TAGAGGTGAATAACTTTGCTAAAG CD3650R TTATATCTCTCTTTCTTGATTTGC

84 0757GF NNNNNGGATCCAGAGAAGAATCGAACTCTT 0757GR NNNNNGTCGACTCACTTTGATAGTTGAAA 1420 GF NNNNNGGATCCTTTAAAGAAATTTTTTTAAGAAC 1420GR NNNNNGTCGACTTAATAATCATTTTTATC CD0522CmutF CATTCTTATAATGTTTACCGTATTGGAGAAGAGGAGTTTGTTGTATTATGTGAAGATATAT CD0522CmutR ATATATCTTCACATAATACAACAAACTCCTCTTCTCCAATACGGTAAACATTATAAGAATG CD0522NmutF GCAAATAAAATTGTTGGTGCAGCTGCATTGGTTAGATGGGTTCATC CD0522NmutR GATGAACCCATCTAACCAATGCAGCTGCACCAACAATTTTATTTGC CD0757NmutF CAAAAGAGGTAATAGGGGCAGCTGTACTTTTAAGATGGCATTCG CD0757NmutR CGAATGCCATCTTAAAAGTACAGCTGCCCCTATTACCTCTTTTG CD1420NmutF ATAAATTCTACTTCTATTATAAGATTAGGAGAAGAGGAATTTATTGTTATATTCTCTAGTG ATGTTG CD1420NmutR CAACATCACTAGAGAATATAACAATAAATTCCTCTTCTCCTAATCTTATAATAGAAGTAGA ATTTAT

a Boldface text indicates restriction sites used to clone in pGEX6P-1 or to screen site-directed mutagenesis mutants.

85

Figure S1. Domain composition and organization of the 3 putative c-di-GMP-signalling proteins encoded by C. difficile strains other than strain 630. Domain composition and organization of the proteins are depicted essentially as in Figure 1. The HD-GYP domain is colored in orange.

86

Figure S2. Analysis of intracellular c-di-GMP by HPLC. 10M synthetic c-di-GMP (A), nucleotide extracts from V. cholerae N16961 expressing lacZ from pBAD-TOPO/lacZ/V5-His (negative control) (B), the DGC N- 448 from pCD0522-(N-448) (C), or the DGC CD1420 from pCD1420 (D). Spectra of synthetic c-di-GMP (E), and of peaks with corresponding retention times in the nucleotide extracts from V. cholerae N16961 expressing the DGC N-448 (F) or the DGC CD1420 (G). 100mg of cells were used for all nucleotide extractions.

87

Figure S3. Phylogenetic tree of selected Clostridiaceae species. The tree was generated by alignment of the rpoB sequences using the neighbor-joining method. The support for each branch is indicated by the value at each node (in percent) as determined by 1,000 bootstrap samples. Only values 70% are shown. Staphylococcus aureus COL and Bacillus subtilis 168 were taken as outgroup strains.

88 Colored boxes highlight the Clostridium difficile strains (cyan), the two Clostridiaceae strains with the most c-di- GMP regulatory proteins (yellow) and the two Clostridiaceae strains with the least c-di-GMP regulatory proteins (magenta). Total c-di-GMP regulatory proteins per species are shown in parenthesis next to colored boxes.

89 CHAPITRE 3

RÔLE DES PILI DE TYPE IV DANS L’AGRÉGATION DE C. DIFFICILE INDUITE PAR LE C-DI-GMP

Après avoir démontré la conservation et la fonctionnalité des enzymes de synthèse et de dégradation du c-di-GMP chez C. difficile, je me suis penché sur le rôle de ce second messager chez C. difficile. Des études ont permis d’orienter mon projet plus spécifiquement sur le rôle du c-di-GMP dans la régulation de l’expression des T4P et de l’agrégation cellulaire.

Le génome de C. difficile 630 contient 16 riborégulateurs prédits : 12 riborégulateurs c-di- GMP-I et 4 riborégulateurs c-di-GMP-II (Lee et al., 2010; Sudarsan et al., 2008). La fonctionnalité d’un des 12 riborégulateurs c-di-GMP-I prédits, Cd1, a été déterminée in vitro (Sudarsan et al., 2008). Le riborégulateur Cd1 se situe devant un opéron de gènes des flagelles. En augmentant la concentration de c-di-GMP intracellulaire, une réduction de l’expression des gènes en aval du riborégulateur Cd1 et une baisse de motilité en milieux semi-solide (0,3 % agar) ont été observées (Purcell et al., 2012). De plus, l’augmentation du niveau de c-di-GMP intracellulaire s’est traduite par l’agrégation des cellules en milieu liquide, par un mécanisme par contre inconnu (Purcell et al., 2012). Le riborégulateur c-di- GMP-II prédit, Cdi2_4, est situé en amont d’un locus de 11 gènes putatifs de synthèse de T4P (CD630_35130 à CD630_35030). Nous avons vérifié l’hypothèse que l’expression des T4P était régulée par le c-di-GMP et que ces structures étaient impliquées dans le phénotype d’auto-agrégation des cellules de C. difficile. Les résultats de ce projet ont été soumis pour publication dans la revue Molecular Microbiology sous le titre suivant : A cyclic-di-GMP riboswitch controls type IV pili-mediated aggregation in Clostridium difficile (numéro de soumission: MMI-2014-14222).

90 Résumé de l’article

L’augmentation du niveau intracellulaire de c-di-GMP par l’expression de la DGC DccA a permis d’observer l’apparition de filaments pouvant correspondre à des T4P, ainsi que la perte de la synthèse de flagelles à la surface des cellules. L’inactivation des gènes situés directement en aval du riborégulateur c-di-GMP-II Cdi2_4, pilA1 (CD630_35130) et pilB2 (CD630_35120), abolit la formation de T4P induite par le c-di-GMP. De plus, alors que l’augmentation du c-di-GMP intracellulaire par l’expression de DccA dans la souche sauvage C. difficile 630erm provoque une agrégation d’environ 90 % des cellules après 16 heures de culture statique en bouillon, l’agrégation est réduite significativement à environ 40-45% dans les souches mutantes pilA1 et pilB2. La transcription des gènes de T4P en aval du riborégulateur Cdi2_4 est augmentée par le c-di-GMP et ce, jusqu’à 40 fois pour le gène pilA1 situé directement en aval du riborégulateur. Enfin, des tests de transcription in vitro ont permis de démontrer que la liaison spécifique du c-di-GMP à l’aptamère du riborégulateur Cdi2_4 inhibe la terminaison précoce de la transcription en prévenant la formation d’un terminateur transcriptionnel dans la plate-forme d’expression du riborégulateur.

Ce projet a permis de déterminer l’importance des T4P dans l’agrégation de C. difficile, et par le fait même, de démontrer un premier rôle pour ces structures encore peu étudiées chez cette bactérie. De plus, un nouveau mécanisme de régulation d’un phénotype par le c-di-GMP a été exposé, en l’occurrence, l’induction de l’expression des T4P par la liaison du c-di-GMP à un riborégulateur c-di-GMP-II.

91 Contribution à l’article

J’ai élaboré et effectué la majorité des expériences sous la supervision de mon directeur de thèse, Vincent Burrus et des Prs Daniel Lafontaine et Louis-Charles Fortier. La microscopie électronique a été effectuée par l’étudiante postdoctorale Erin B. Purcell dans le laboratoire de Rita Tamayo. J’ai fait la rédaction initiale de l’article, qui a ensuite été corrigé avec mon directeur et révisé par tous les auteurs.

92 A cyclic-di-GMP riboswitch controls type IV pili-mediated aggregation in Clostridium difficile

Eric Bordeleau1, Erin B. Purcell2, Daniel A. Lafontaine1, Louis-Charles Fortier3, Rita Tamayo2 and Vincent Burrus1*

1 Département de biologie, Faculté des sciences, Université de Sherbrooke, QC, Canada 2 Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, NC, USA 3 Département de microbiologie et d’infectiologie, Faculté de médecine et sciences de la santé, Université de Sherbrooke, QC, Canada * Corresponding author

93 SUMMARY

Clostridium difficile is an anaerobic Gram-positive bacterium that causes intestinal infections with symptoms ranging from mild diarrhea to fulminant colitis. Cyclic diguanosine monophosphate (c-di-GMP) is a bacterial second messenger that typically regulates the switch from motile, free-living to sessile and multicellular behaviors in Gram-negative bacteria. Increased intracellular c-di-GMP concentration in C. difficile was recently shown to reduce flagellar motility and to increase cell aggregation. In this work, we investigated the role of the primary type IV pili (T4P) locus in c-di-GMP-dependent cell aggregation. Inactivation of two T4P genes, pilA1 (CD630_35130) and pilB2 (CD630_35120), abolished pili formation and significantly reduced cell aggregation in high c-di-GMP conditions. pilA1 is preceded by a putative c-di-GMP riboswitch, predicted to be transcriptionally active upon c-di-GMP binding. Consistent with our prediction, high intracellular c-di-GMP concentration increased transcript levels of T4P genes. In addition, single-round in vitro transcription assays confirmed that transcription downstream of the predicted transcription terminator was dose-dependent and specific to c-di-GMP binding to the riboswitch aptamer. These results support a model in which T4P gene transcription is up-regulated by c-di-GMP as a result of its binding to an upstream transcriptionally activating riboswitch, promoting cell aggregation in C. difficile.

94 INTRODUCTION

The Gram-positive spore-forming bacterium Clostridium difficile is the leading cause of nosocomial diarrhea and antibiotic associated-colitis in hospital settings (Rupnik et al., 2009). Illness caused by C. difficile infections (CDI) may range from mild-diarrhea to life- threatening, fulminant colitis. Throughout its lifecycle, C. difficile has to cope with multiple changing environments. In the early steps of CDI, in order to colonize the colon and produce toxins, C. difficile spores need first to germinate in the small intestine. Vegetative cells need to reach the colon, likely attaching to and colonizing the gut mucosa where they proliferate and produce toxins. Hence, C. difficile arguably needs to sense and integrate multiple environmental stimuli in order to coordinate the expression of its colonization and virulence factors during its journey through the gastrointestinal tract. The mechanisms allowing this adaptability in C. difficile have been the focus of many recent studies including those on the agr quorum-sensing system, alternative sigma factors and two-component systems, the global transcription regulator CodY and the sigma factor TcdR that promotes TcdA and TcdB toxin expression (Dineen et al., 2007, El Meouche et al., 2013, Ho & Ellermeier, 2011, Mani & Dupuy, 2001, Martin et al., 2013, Saujet et al., 2011).

One way by which bacteria sense and respond to changes in their environment is through signal transduction involving secondary messenger molecules. The bacterial second messenger 3’-5’-cyclic diguanosine monophosphate (c-di-GMP) plays multiple key roles in bacterial physiology and virulence (Hengge, 2009, Romling et al., 2013, Tamayo et al., 2007). C-di- GMP has been shown to antagonistically control the motility of planktonic cells and biofilm formation in several bacterial species. An increase in the intracellular c-di-GMP concentration typically induces exopolysaccharide synthesis and adhesion while inhibiting flagellar motility, thereby favoring biofilm formation. A low concentration of the second messenger has an opposite effect. C-di-GMP also controls cell differentiation in Caulobacter crescentus (Aldridge et al., 2003) and virulence of important human pathogens including Vibrio cholerae

95 (Tischler & Camilli, 2005), Borrelia burgdorferi (Pitzer et al., 2011), and Pseudomonas aeruginosa (Kulasakara et al., 2006). C-di-GMP turn-over is controlled by two groups of enzymes with antagonistic activities. The diguanylate cyclases (DGC) contain a GGDEF domain and synthesize c-di-GMP from two GTP molecules, whereas the c-di-GMP phosphodiesterases (PDE) contain an EAL domain or an HD-GYP domain and hydrolyze c-di- GMP into linear pGpG or two GMP molecules, respectively (Ryan et al., 2006, Ryjenkov et al., 2005, Schmidt et al., 2005, Simm et al., 2004, Tamayo et al., 2005). There are several c- di-GMP effectors responsible for the regulatory function of c-di-GMP through transcriptional, post-transcriptional, translational and protein-protein interaction mechanisms. A major class of c-di-GMP effector proteins and the first ever described are the PilZ domain c-di-GMP-binding proteins (Amikam & Galperin, 2006). Although PilZ effectors are numerous, there are also other c-di-GMP-binding effectors such as the unrelated transcription factors FleQ in P. aeruginosa and VpsT in V. cholerae (Hickman & Harwood, 2008, Krasteva et al., 2010). In addition to protein effectors, c-di-GMP is also sensed by RNA effectors, the type-I and type-II c-di-GMP riboswitches (Lee et al., 2010, Sudarsan et al., 2008). Riboswitches are structured RNA regions typically located in the 5’ untranslated region of diverse bacterial mRNAs. Riboswitches are composed of two modular regions: an aptamer that recognizes and binds a specific metabolite and an expression platform that in most cases controls the expression of the downstream gene(s) (for recent review see, (Breaker, 2012, Serganov & Nudler, 2013)). Ligand binding typically leads to repression of gene expression, most commonly through transcription termination or inhibition of translation initiation.

The genome of C. difficile 630 codes for 18 predicted DGCs and 19 predicted PDEs, many of which have confirmed enzymatic activity (Bordeleau et al., 2011). Sudarsan et al. (2008) revealed the presence of a c-di-GMP-I riboswitch, Cd1, upstream of the flgB gene, which is part of one of two loci encoding the flagella genes in C. difficile. In vitro studies have shown that binding of c-di-GMP to Cd1 promotes premature termination of transcription, thereby suggesting that the expression of flagella in C. difficile would be repressed by c-di-GMP.

96 Purcell et al. (2012) demonstrated that overexpression of DccA (CD630_14200) in C. difficile increases the intracellular c-di-GMP level, which was also associated with a decreased transcription of flagella genes and reduced bacterial motility. In addition to decreasing motility, DccA overexpression seemed to promote aggregation of C. difficile cells by an unknown mechanism (Purcell et al., 2012).

One of the four predicted c-di-GMP-II riboswitches in C. difficile 630 is located less than 200 bp upstream of a predicted type IV pili (T4P) locus (Lee et al., 2010) (Fig. 1A). T4P are surface appendages involved in many phenotypes in Gram-negative bacteria such as P. aeruginosa and Neisseria sp., including but not limited to adhesion, surface motility (twitching and gliding), biofilm formation and development, cell aggregation, DNA uptake, and phage absorption (for reviews see, (Giltner et al., 2012, Proft & Baker, 2009). T4P are thin fibers (6-9 nm) that can extend several µm in length (Craig & Li, 2008). These fibers are made by the T4P secretion machinery, a type II-related secretion system that promotes pilus elongation by addition of pilin subunits, and may also retract pili by depolymerization of pilin subunits, both at the proximal end of the pilus. T4P gene loci have been predicted in many other Gram-positive bacteria, including several Clostridium species (Sebaihia et al., 2006, Varga et al., 2006, Imam et al., 2011). C. difficile T4P genes are mainly located in one primary cluster of genes extending from CD630_35130 to CD630_3503 (Fig. 1A and Table S1) (Melville & Craig, 2013). Herein, we demonstrate a role for T4P in cell aggregation of C. difficile and the role of the second messenger c-di-GMP in their expression through binding to a transcriptional riboswitch upstream of the primary T4P locus.

97 RESULTS

T4P biogenesis requires pilA1 and pilB2 and is induced by c-di-GMP

Since the putative T4P primary locus in C. difficile (ORFs CD630_35130 to CD630_35030) is located downstream of a predicted c-di-GMP-II riboswitch (Fig. 1A), we speculated that T4P are involved in the previously observed aggregation of C. difficile cells in liquid culture upon elevation of intracellular c-di-GMP levels (Purcell et al., 2012).

Figure 1. Schematic representation of the primary T4P gene cluster. (A) Genes associated with T4P functions (white arrows) are downstream of a predicted c-di-GMP-II riboswitch (Cdi2_4). ORF numbers (bottom) have been abbreviated for clarity and gene annotations (top) are indicated, as proposed in the text, based on previously proposed annotations (Maldarelli et al., 2014, Melville & Craig, 2013) (Table S1). A transcriptional terminator (TT) is predicted 36 bp downstream of pilA1 (see Fig. S5). (B) The pilA1 and pilB2 mutant strains were generated using the Clostron system and introns were retargeted to insert in the sense orientation in pilA1 between nucleotides 474 and 475 (CD630_35130-474/475s intron) or in pilB2 between nucleotides 786 and 787 (CD630_35120-786/787s intron).

98 CD630_35130 is the first gene of the T4P primary locus and likely codes for a major pilin subunit (Maldarelli et al., 2014, Melville & Craig, 2013). As such, it is named pilA1 as proposed by Maldarelli et al. (2014). The second gene of the T4P primary locus, CD630_35120, encodes a homolog of PilB, a predicted hexameric ATPase thought to supply the mechanical energy required for pilus assembly through the successive addition of pilin subunits. PilB is essential for T4P secretion in P. aeruginosa and Neisseria gonorrhoeae (named PilF in N. gonorrhoeae) (Freitag et al., 1995, Turner et al., 1993). CD630_35120 and the downstream genes are part of the primary T4P locus and are named after their homologs in Clostridium perfringens as proposed by Melville and Craig (2013) for consistency. Therefore, CD630_35120 is named pilB2 and given a designation of “2” although it is part of the primary T4P locus.

Reasoning that the gene located immediately downstream of the c-di-GMP-II riboswitch Cdi2_4 is most likely the gene directly regulated by c-di-GMP, at the transcriptional or translational level, we disrupted the putative pilin gene pilA1 using the group II intron retargeting approach adapted to Clostridia (Clostron) (Fig. 1B and S1). We also disrupted pilB2 to verify the role of T4P independently of PilA1 pilin (Fig. 1B and S1). The pilB2 gene is likely essential for pilus biogenesis in C. difficile and could also be regulated by c-di-GMP, as it is directly downstream of pilA1. The intracellular c-di-GMP level in C. difficile 630erm (WT), or its pilA1 and pilB2 mutants, was manipulated as described before by introducing pDccA to express the C. difficile DGC DccA under a nisin-inducible promoter (Purcell et al., 2012). The empty vector (pMC-Pcpr) was used as a negative control (Table 1).

99 Table 1. Bacterial strains and plasmids used in this study

Strain or plasmid Description Reference or source Strains Clostridium difficile 630 epidemic type X (Sebaihia et al., 2006) 630erm Erythromycin-sensitive mutant of C. difficile 630 (Hussain et al., 2005) pilA1 630erm CD630_35130-474 s::CT This study pilB2 630erm CD630_35120-786 s::CT This study E.coli CA434 HB101 carrying plasmid R702 (Purdy et al., 2002)

Plasmids pCR2.1-TA TA cloning vector Life Technologies pCR2.1-Rb3513sr pCR2.1-TA with transcription template for Cdi2_4 c- This study di-GMP-II riboswitch pCR2.1-Rb3513sr- pCR2.1-Rb3513 with A to C substitution at nucleotide This study a70c 70 pCR2.1-Rb3513sr- pCR2.1 with with A to G substitution at nucleotide 70 This study a70g pMTL007C-E5- pMTL007C-E5 with the Ll.ltrB intron retargeted to This study CD3513-474/475s insert after base 474 in the 630erm CD630_35130 ORF pMTL007C-E5- pMTL007C-E5 with the Ll.ltrB intron retargeted to This study CD3512-786/787s insert after base 786 in the 630erm CD630_35120 ORF pMC-Pcpr cprK promoter cloned into pMC123 E. coli- C. difficile (Purcell et al., 2012) shuttle vector, bla, catP pDccA pMC-Pcpr with DccA downstream of Pcpr (Purcell et al., 2012) pDccAmut pMC-Pcpr with DccA(AADEF) downstream of Pcpr Purcell, 2012 pDccA-RB-pilA1 pDccA with CD630_35130 and 497 bp upstream This study including the Cdi2_4 riboswitch pDccAmut -RB- pDccAmut with CD630_35130 and 497 bp upstream This study pilA1 including the c-di-GMP riboswitch pDccA-RBA70C- pDccA-RB-pilA1 with A to C substitution at nucleotide This study pilA1 70 in Cdi2_4 pDccA-RBA70G- pDccA-RB-pilA1 with A to G substitution at nucleotide This study pilA1 70 in Cdi2_4 pDccA-RB- pDccA-RB-pilA1 with G to Y substitution at amino acid This study pilA1G9Y 9 in PilA1

100 Examination of the cells by transmission electron microscopy revealed that upon expression of DccA in the WT strain, c-di-GMP induces the formation of pili-like structures whilst inhibiting biosynthesis of flagella (Fig. 2A). In contrast, the WT strain with the empty vector displayed many flagella but no pili at its surface under the same inducing conditions.

Figure 2. C-di-GMP induces T4P formation. (A) Transmission electron microscopy of C. difficile 630erm (WT) with pDccA or the empty vector, and the pilB2 and pilA1 mutants with pDccA from 12-h static liquid cultures in BHIS-tm with nisin 1 µg ml-1. Black scale bars = 1 µm. Open arrowheads show flagella and closed arrowheads show pili. (B) Mean number of pili for each cell in 15 (WT) or 20 (pilA1 and pilB2) randomly selected grid squares were photographed at 8000 magnification. The images were blinded for quantification in which the number of cells and pili in each image were recorded by three observers. Data were analyzed by one- way ANOVA and Dunnett’s multiple comparisons test comparing values to WT pDccA (ns, P >0.05; **, P < 0.01). (C) Comparison of the distribution of cell populations harboring pili.

Interestingly, the average number of pili/cell was only 2.4, as many cells did not display any pili (Fig. 2B and 2C). Disruption of either pilA1 or pilB2 virtually abolished the formation of

101 pilus-like structures at the surface of the cells despite DccA expression sufficient to inhibit flagella formation. These results indicate that pilB2 plays a key role in the formation of these appendages. While pilA1 also seemed to be required, we cannot rule out that its disruption had a polar effect on the expression of pilB2, leading to the comparable pilB2 mutant strain phenotype.

T4P are involved in c-di-GMP-dependent aggregation

To investigate further the role of pilA1 and pilB2 and of pilus-like structures, we monitored cell aggregation in cultures of C. difficile 630erm (WT) and the pilA1 and pilB2 mutants bearing pDccA or pDccAmut (Table 1), which expresses a catalytically inactive mutant of DccA unable to produce c-di-GMP. Extensive cell aggregation occurred for the WT strain with pDccA upon nisin induction with approximately 90% of the cells pelleting to the bottom of the culture tube after a 16-h incubation period (Fig. 3A and S2A). In the same conditions, less than 2.5 % cell aggregation was observed for the WT strain with pDccAmut, and less than 0.05 % cell aggregation was observed for the uninduced WT strain with pDccA (Fig. 3A and S2A). Like the WT strain, neither the pilA1 nor the pilB2 mutant strains aggregated upon expression of dccAmut or without induction. Interestingly, both mutant strains with pDccA exhibited a 2- fold reduction (pilA1, P = 0.005; pilB2, P = 0.0028) compared to the WT strain with pDccA, displaying 40-45% cell aggregation upon nisin induction. These observations suggest that pilA1 and pilB2 contribute significantly to c-di-GMP-regulated cell aggregation, but additional c-di-GMP-regulated factor(s) are likely involved.

102

Figure 3. T4P are required for c-di-GMP-dependent aggregation of C. difficile planktonic cells. Cell aggregation was monitored by measuring the absorbance (OD600nm) of supernatants before (ODpre) and after vortexing (ODpost) (see Fig. S2A) for 16-h static liquid cultures in BHIS-tm. Aggregation was expressed as a percentage of cells at the bottom of culture tubes ((ODpost – ODpre) / ODpost 100)). (A) Cell aggregation of C. difficile 630erm (WT) or its pilB2 and pilA1 insertion mutants upon expression of the diguanalyte cyclase DccA (pDccA) or inactive DccA mutant (pDccAmut) by induction with 0 or 1 µg ml-1 nisin. (B) Cell aggregation complementation assays using the pilA1 gene and its upstream Cdi2_4 riboswitch cloned into a plasmid expressing DccA (pDccA-RB-pilA1) or expressing DccAmut (pDccAmut-RB-pilA1). Complementation of the pilA1 mutant was also carried out using plasmids expressing PilA1G9Y (pDccA-RB-pilA1G9Y) or containing the nucleotide substitutions A70C or A70G in the Cdi2_4 riboswitch (pDccA-RBA70C-pilA1 and pDccA-RBA70G- pilA1). The result of the WT strain with pDccA from panel A was duplicated to ease interpretation. The means

103 and standard deviations obtained from three independent assays are shown. One-way ANOVA with a Dunnett multiple comparisons post-test was used to compare the strains to the WT strains with pDccA (ns, P > 0.05; **, P

< 0.01).

We noticed that the absorbances of cultures of aggregated cells after resuspension were slightly lower than those of uninduced control cultures (Fig. S2A), suggesting that the aggregates could have resulted from the accumulation of dead cells. We verified that the aggregated cells were viable and, most importantly, that observation of the optical density reflected the viable cell counts by comparing the CFU ml-1 in samples collected before (pre) and after (post) suspension of the induced and uninduced cells carrying pDccA (Fig. S2C). As expected, culture samples collected before and after resuspension of the aggregates exhibited concentrations of viable cells (CFU ml-1) in agreement with the observed culture absorbance measurements (Fig. S2A and S2C). This observation also confirmed that the observed c-di- GMP-dependent aggregation phenotype does not result from cell death and subsequent pelleting of cell debris over time.

The PilA1G9Y prepilin leader peptide mutant is unable to complement pilA1 disruption

To rule out a possible polar effect of pilA1 disruption on the expression of the downstream genes that could explain the lack of cell aggregation in this mutant, we carried out complementation experiments. The wild-type pilA1 gene and its upstream promoter and regulatory region including the c-di-GMP-II riboswitch Cdi2_4 (RB) were introduced into pDccA and pDccAmut to generate pDccA-RB-pilA1 and pDccAmut-RB-pilA1, respectively (Fig. S3). Complementation of the pilA1 mutant using pDccA-RB-pilA1 increased cell aggregation by 2 fold from about 40 to 80% when compared to the same mutant bearing pDccA (Fig. 3B). Aggregation of the pilA1 mutant complemented with pDccA-RB-pilA1 was indistinguishable from the WT strain bearing pDccA (P = 0.88). Furthermore, complementation of the pilA1 mutant using a plasmid expressing a PilA1G9Y prepilin leader peptide substitution mutant,

104 pDccA-RB-pilA1G9Y, did not restore cell aggregation back to WT levels. G9 of PilA1 corresponds to a highly conserved glycine residue located at the -1 position of the predicted cleavage site in the prepilin subunit. This residue was shown to be necessary for the cleavage and subsequent assembly of pilin subunits into pili in P. aeruginosa (Strom & Lory, 1991). As expected, none of the strains containing pDccAmut-RB-pilA1 (i.e., complemented for pilA1 but expressing DccAmut) were able to aggregate (Fig. 3B). Finally, the additional copies of the pilA1 gene in the pilB2 mutant with pDccA-RB-pilA1 did not restore cell aggregation back to WT levels (Fig. 3B). Altogether these observations support that pilA1 disruption has no noticeable polar effect on the downstream genes and that the PilA1 pilin itself is involved in c- di-GMP-regulated cell aggregation.

Cell aggregation occurs during exponential growth

C-di-GMP-regulated aggregation was also monitored more closely for all strains by measuring the absorbance of cultures over a 10-h incubation time. Upon nisin induction, cell aggregates of the WT strain with pDccA could be visually observed in suspension after ~5-6 h as the cells in suspension reached an OD600nm of ~ 0.8-0.9 (Fig. 4A). From this breakpoint, longer incubation time resulted in decreased absorbance of the supernatant down to an OD600nm of ~0.3 after 10 h and in cells accumulating at the bottom of the tube after only 6 h (Fig. 4A and 4C). Interestingly, cell aggregation was clearly impaired for both the pilA1 and pilB2 mutants compared to the WT strain; in the presence of nisin, the absorbance of the culture in suspension continued to increase after 6 h to reach an OD600nm of ~ 1.2-1.4 after 10 h, although a much reduced cell pellet could also be observed (Fig. 4A and 4C). In contrast, no cell aggregation could be observed and the absorbance of the cultures increased steadily to reach

mut stationary phase at an OD600nm of ~2.0 after 10 h for all strains containing DccA induced with nisin or containing pDccA but cultured without induction (Fig. S4A and S4B).

105

Figure 4. Time-course monitoring of cell aggregation reveals differences in aggregation rate. Cell aggregation from static liquid cultures was monitored by measuring the optical densities (600 nm) at 30-min intervals. The expression of dccA (A) or dccAmut (B) was induced with 1 µg ml-1 nisin. The means and standard deviations obtained from three independent assays are shown. Differences in the duration of the lag-phase, independent of the strains, were attenuated by adjusting the time point of the beginning of the exponential phase as in the uninduced control cultures (see Fig. S4).

Interestingly, observation of aggregation over shorter incubation times revealed that while complementation of the pilA1 mutant using pDccA-RB-pilA1 seemed to restore the cell aggregation phenotype in overnight cultures (Fig. 3B), it promoted an earlier and more drastic cell aggregation phenotype (Fig. 4A and 4C). In fact, the absorbance of cells in suspension of

106 both the WT and the pilA1 mutant strains bearing pDccA-RB-pilA1 barely increased past 4-h incubation time, stabilizing to an OD600nm of ~0.3. Cell aggregates in suspension were already apparent and a pellet became increasingly visible at the bottom of the culture tubes (Fig. 4A and 4C). Increased cell aggregation is likely caused by the increased pilA1 copy number provided by pDccA-RB-pilA1. Indeed real-time quantitative PCR (qPCR) experiments confirmed that pDccA-RB-pilA1 maintains at about 34.2 ± 2.0 and 17.3 ± 0.2 copies per cell in exponential and stationary phase, respectively. Again, complementation of the pilA1 mutant by expressing the pilA1G9Y mutant from pDccAmut-RB-pilA1G9Y did not restore cell aggregation back to WT levels and was undistinguishable from the pilA1 mutant strain containing pDccA (Fig. 4B), thereby confirming that the mature PilA1 pilin is involved in cell aggregation.

C-di-GMP induces the expression of the primary T4P gene cluster

Typically, riboswitches control either mRNA transcription termination through the formation of a Rho-independent terminator stem-loop or translation initiation through the sequestration of the Shine-Dalgarno sequence or start codon. To assess the effect of increased intracellular c-di-GMP levels on the transcription of the genes downstream of the riboswitch, we carried out real-time quantitative reverse transcription PCR (qRT-PCR) experiments on cDNAs obtained from total RNAs isolated from C. difficile 630erm with either pDccA or pDccAmut. The relative expression of the gene directly downstream of the riboswitch Cd2_4, pilA1, increased nearly 40 fold upon high intracellular c-di-GMP levels (Fig. 5A). Likewise, the expression of all genes downstream of pilA1 tested by qRT-PCR was more than 5 fold higher in induced conditions. The lower increase in transcript levels for genes downstream of pilA1 could be explained by the presence of a predicted Rho-independent terminator located in the intergenic region between pilA1 and pilB2, 36 bp downstream of pilA1 stop codon (Fig 1 and S5), although these genes could also be regulated from their own promoter(s). Expression of the three additional predicted pilin genes pilJ (CD630_07550), pilW (CD630_23450) and

107 pilA2 (CD630_32940), located in three distinct chromosomal loci remote from the primary T4P gene cluster, was not significantly modulated by c-di-GMP as their corresponding transcript levels remained virtually unchanged by increased intracellular levels of c-di-GMP (Fig. 5B).

Figure 5. C-di-GMP increases the expression of the T4P primary cluster genes. Relative transcript levels of (A) T4P primary cluster genes and (B) alternative T4P pilin genes. Bars represent the transcript levels (Fold) relative to C. difficile 630erm containing the pDccAmut control plasmid induced with 1 µg ml-1 nisin as determined by qRT-PCR using adk, gyrA and rpoA as internal references for normalization. The means and standard deviations obtained from three independent assays are shown. One-way ANOVA with a Dunnett multiple comparisons post-test was used to compare the values to that of the strain with the pDccAmut control plasmid induced with 1 µg ml-1 nisin (a, P < 0.0001; b, P < 0.01; c, P < 0.05).

108 The T4P primary locus is located downstream of a functional transcriptional c-di-GMP- II riboswitch

Our analysis of T4P gene expression suggested that Cdi2_4 is a positive transcriptional riboswitch. This prompted us to search for a possible terminator structure in the vicinity of Cdi2_4, as it was not initially predicted to have a transcriptional terminator as an expression platform (Lee et al., 2010). The predicted secondary structure of the Cdi2_4 riboswitch contains features of a functional c-di-GMP-II aptamer (i.e., P1, P2, P3 and P4/pseudoknot structures). For this reason, we inferred that formation of the P1 stem upon c-di-GMP binding to the aptamer could act as an antiterminator stem that sequesters the proximal-half of a putative Rho-independent terminator stem (Fig. 6A).

To test this hypothesis, a single-round in vitro transcription assay was developed to differentiate premature transcription termination (Terminated) from full-length transcription (Read-through) upon c-di-GMP binding. Using this assay, we first demonstrated that c-di- GMP increased transcription past the predicted terminator (Fig. 6B). Indeed, the percentage of read-through transcripts increased with the c-di-GMP concentration (Fig. 6C). Moreover, the amount of c-di-GMP that achieved half of the transcription elongation change, defined as the

T50 (Wickiser et al., 2005), was determined to be 72 ±12 nM, which is a physiologically relevant concentration in C. difficile (Purcell et al., 2012). These results support that binding of c-di-GMP to the Cdi2_4 riboswitch promotes transcription elongation.

109

Figure 6. Cdi2_4 is a transcriptional c-di-GMP-II riboswitch. (A) Predicted Cdi2_4 c-di-GMP-II riboswitch- mediated transcription regulation. C-di-GMP is predicted to increase transcription past a predicted rho- independent transcription terminator. Upon c-di-GMP binding to the aptamer, the P1 stem folds and sequesters nucleotides needed for the terminator formation and therefore acts as an antiterminator. Positions 231 and 146 correspond to the last nucleotide of the full-length transcript or the predicted terminated transcript, respectively. Nucleotides GC (underlined) were added to the template for transcription priming using GC dinucleotides. The predicted Cdi2_4 aptamer is illustrated based on its predicted secondary structure (Lee et al., 2010). The

110 transcription terminator folding was predicted using RNAfold web server (http://rna.tbi.univie.ac.at/cgi- bin/RNAfold.cgi) (Gruber et al., 2008). (B) Single-round in vitro transcription of the Cdi2_4 riboswitch template. Transcription reactions were carried out as a function of c-di-GMP concentration. Read-through (RT) and terminated (T) products are indicated on the right. Transcripts of 231 nt (RT) and 146 nt (T) were labeled and used as molecular weight makers (MWM). (C) The percentages of read-through products are plotted as a function of c-di-GMP concentration as quantified from (B). The plot shows a two-state model from which a T50 value of 72 ± 12 nM c-di-GMP was calculated.

The Cdi2_4 aptamer specifically binds c-di-GMP

Smith and colleagues (Smith et al., 2011) determined the structure and binding pocket nucleotides of a c-di-GMP-II aptamer from Clostridium acetobutylicum, Cac-1-2. Modeling of the Cdi2_4 aptamer tertiary structure revealed the conservation of nucleotides involved in ligand recognition through non-canonical base pairing (A69 and G73) with c-di-GMP guanines and base stacking interactions (A13, A61 and A70) (Fig. 7A) (Smith et al., 2011). This led us to test whether increased read-through transcription in the presence of c-di-GMP was triggered by its binding to the riboswitch aptamer or by unforeseen interaction of c-di- GMP in single-round transcription assays. Reasoning that mutations in the aptamer that decrease c-di-GMP binding should in return affect the ability of the whole riboswitch to regulate read-through transcription, we generated two single-nucleotide substitutions in Cdi2_4, A70G and A70C, that were shown to reduce the binding affinity of Cac-1-2 by 25,000 and 2 fold, respectively (Smith et al., 2011). Similarly, the A70G substitution in Cdi2_4 abolished the effect of c-di-GMP on promoting transcription elongation, while the A70C substitution did not impact considerably the percentage of read-through transcripts (Fig. 7B). To further test the specificity of Cdi2-4 for c-di-GMP, we carried out the same assay using c- di-AMP, a close c-di-GMP analog, and observed that c-di-AMP does not increase the percentage of read-through transcripts. Together, these results support that increased transcription elongation is promoted by specific interactions between c-di-GMP and nucleotides of the Cdi2_4 aptamer binding pocket and supports a model in which c-di-GMP promotes T4P genes transcription through binding to the upstream c-di-GMP-II riboswitch.

111

Figure 7. Cdi2_4 riboswitch transcription regulation requires specific c-di-GMP binding. (A) Predicted secondary structure and tertiary interactions of the Cdi2_4 aptamer upstream of pilA1. The cdi2_4 aptamer structure is based on the 3D structure of the Cac-1-2 c-di-GMP-II aptamer from Clostridium acetobutilicum (Smith et al., 2011). For clarity, P1 (blue), P2 (grey), P3 (purple) and P4/pseudo-knot (green) are shown in color. The c-di-GMP molecule is shown in red. Positions 69, 70 and 73 correspond to nucleotides interacting with c-di- GMP in Cac-1-2. Bases pairs are indicated with standard symbols (Leontis & Westhof, 2001). (B) Single-round in vitro transcription of the Cdi2-1-4 riboswitch Transcription reactions were performed on DNA templates carrying no mutation (WT) or a single-nucleotide substitution (A70C and A70G) without or with 10 mM c-di- GMP or c-di-AMP. Read-through (RT) and terminated (T) products are indicated on the right. The percentages of read-through (RT %) products are indicated under every lane.

112 To assess the importance of the c-di-GMP riboswitch for regulating the expression of pilA1, complementation of the pilA1 mutant by adding plasmids pDccA-RBA70C-pilA1 and pDccA- RBA70G-pilA1 was attempted. As expected, the A70C substitution did not impair Cdi2_4 riboswitch activity, as the pilA1 mutant with pDccA-RBA70C-pilA1 displayed a very similar cell aggregation phenotype in both overnight cultures and shorter incubation times compared to the pilA1 mutant with pDccA-RB-pilA1 (Fig. 3B and 3A). Surprisingly, the A70G substitution did not completely prevent complementation of the pilA1 mutant by plasmid-borne copies of pilA1 in aggregation assays. Indeed, cell aggregation of the pilA1 mutant containing pDccA-RBA70G- pilA1 after 16 h incubation was about 80%, i.e. not significantly lower than the WT strain with pDccA or the pilA1 mutant with pDccA-RB-pilA1 (Fig. 3B). Closer examination over shorter incubation times also revealed increased cell aggregation of the pilA1 mutant with pDccA- RBA70G-pilA1 compared to the non-complemented strain (pilA1 pDccA) as the optical density of the cells in suspension remained at about 0.5 for the pilA1 mutant with pDccA-RBA70G-pilA1 compared to about 1.3 for pilA1 with pilA1 pDccA (Fig. 4A). Nevertheless, the A70G substitution seemed to impair the function of the c-di-GMP riboswitch and the expression of pilA1. Indeed, cell aggregation of the pilA1 mutant with pDccA-RBA70G-pilA1 was lower than for the same pilA1 mutant complemented with pDccA-RB-pilA1. After 5 to 6 h of incubation, the absorbance of the cells in suspension of the pilA1 mutant complemented with pDccA-RB- pilA1, or the control riboswitch substitution mutant plasmid pDccA-RBA70C-pilA1, did not continue to increase past about 0.3 while the pilA1 mutant with pDccA-RBA70G-pilA1 reached and remained at OD600nm ~ 0.5 as the cell aggregates accumulated at the bottom of the tubes (Fig. 4A and 4C, and data not shown). This observation suggests that while c-di-GMP binding to the A70G riboswitch mutant was impaired, thereby leading to high rate of early transcription termination at the transcriptional terminator, there was enough transcription of pilA1 through the terminator to induce aggregation. This phenomenon was likely boosted by the multiple plasmid-borne copies of pilA1.

113 DISCUSSION

Insightful studies have revealed the versatility of functions conferred by T4P as well as the complexity of their assembly and regulation in Gram-negative bacteria. Recently, T4P were predicted to be also encoded by many Gram-positive bacteria (Sebaihia et al., 2006, Varga et al., 2006, Imam et al., 2011). Yet, the role of T4P in Gram-positive bacteria has only been determined for the cellulolytic bacterium Ruminococcus albus, for which T4P are important for adhesion to cellulose (Rakotoarivonina et al., 2002), and C. perfringens, for which T4P pilins appear at the bacterial surface and T4P are involved in gliding motility and biofilm formation (Varga et al., 2006, Varga et al., 2008).

In this work, we have unraveled the contribution of T4P in the aggregation of C. difficile and the regulation of their synthesis by a c-di-GMP-dependent transcriptional riboswitch activator. Although our study represents the first genetic and functional characterization of T4P in C. difficile, these structures were most likely observed much earlier. Borriello et al (1988) reported filaments measuring 4 to 9 nm in diameter and up to 6 µm long located at the poles of agar-grown cells, mostly gathered in bundles, although only 0 to 6% of the cells presented pili-like structures depending on the strains. This is similar to the size of the T4P on the liquid- grown planktonic cells reported herein, which we only observed as single filaments. And while most cells displayed T4P at their surface, we also observed a noticeable population of cells devoid of T4P upon high c-di-GMP levels. While this observation could simply result from pilus breakages during sample preparation, we cannot rule out that T4P synthesis is subject to additional stochastic regulation mechanisms.

Recently Piepenbrink et al (2014) reported the display of T4P at the surface of C. difficile R20291 after anaerobic growth at 37°C on Columbia Agar plates and prolonged incubation at room temperature. Composition analysis of these appendages by double immunogold staining

114 revealed the presence of both PilA1 and PilJ in C. difficile R20291 pili, and tends to suggest that the pilin PilJ is a major component of T4P (Piepenbrink et al., 2014). While the role of PilJ in the biogenesis of T4P in C. difficile 630 is unknown, we clearly show that disruption of either pilA1 or pilB2 abolished pilus formation and significantly reduced cell aggregation. Furthermore ectopic expression of pilA1 complemented the pilA1 mutant of C. difficile 630 in cell aggregation experiments confirming that PilA1 is a key factor for both T4P biogenesis and aggregation. No c-di-GMP-dependent riboswitch is predicted upstream of pilJ (Lee et al., 2010, Sudarsan et al., 2008) and no significant increase of expression of PilJ was observed upon elevated c-di-GMP level (Fig. 5B). We speculate that if PilJ is the main component of C. difficile T4P, PilA1 might play a key regulatory role in the assembly of the pilus structure. However, additional studies are required to understand the relative contribution of PilJ and PilA1 to T4P formation in C. difficile as well as the possible contribution of the three other predicted pilins encoded by the primary T4P locus, PilK, PilU and PilV, as well as the two predicted pilins PilW and PilA2.

T4P were previously shown to be regulated by c-di-GMP in other bacteria. Most interestingly, the c-di-GMP-binding PilZ domain retains its name from the PilZ protein from P. aeruginosa, which is required for twitching motility and T4P biogenesis in this species (Alm et al., 1996, Amikam & Galperin, 2006). Ironically, PilZ is the only one of the eight proteins containing a PilZ-domain in P. aeruginosa that was unable to bind c-di-GMP in in vitro assays (Merighi et al., 2007). The inability of PilZ to bind c-di-GMP is also supported by the lack of essential residues for c-di-GMP binding and structural dissimilarity in the N-terminal motif of the orthologous PilZ protein from Xanthomonas campestris pv campestris (Li et al., 2009). However, FimX, a protein localizing to the cell pole and also required for T4P biogenesis and twitching motility, connects c-di-GMP signaling with T4P in P. aeruginosa (Huang et al., 2003, Jain et al., 2012, Kazmierczak et al., 2006). Although FimX was initially described as a c-di-GMP phosphodiesterase (Kazmierczak et al., 2006), it was later shown to contain a degenerated EAL domain (and a degenerated GGDEF domain) that retained its capacity to

115 bind c-di-GMP but cannot hydrolyze it (Navarro et al., 2009, Rao et al., 2008). In Xanthomonas axonopodis pv citri, the orthologous PilZ protein is also involved in the regulation of twitching motility and was shown to bind to both the FimX c-di-GMP-binding protein and the PilB ATPase required for pilin polymerization (Guzzo et al., 2009). However, the FimX-PilZ interaction does not seem to be conserved in P. aeruginosa (Qi et al., 2012). In these species, current evidence suggests that c-di-GMP regulates T4P biogenesis and function by mechanisms relying on protein-protein interactions. In addition, no c-di-GMP riboswitches were predicted in these species (Lee et al., 2010, Sudarsan et al., 2008). Riboswitch-mediated transcription regulation reported herein is a novel mechanism of T4P biogenesis regulation by c-di-GMP signaling. Likewise, multiple species-specific mechanisms account for c-di-GMP inhibition of flagella-driven motility in diverse bacteria such as Escherichia coli and Salmonella, in which the c-di-GMP effector YcgR interacts directly with the FliG/FliM switch, V. cholerae, in which the transcription factor VpsT responds to c-di-GMP, and C. difficile, in which the c-di-GMP-I riboswitch Cd1 senses c-di-GMP (Boehm et al., 2010, Fang & Gomelsky, 2010, Krasteva et al., 2010, Purcell et al., 2012, Sudarsan et al., 2008).

While elevating intracellular c-di-GMP in C. difficile increased pilA1 transcript levels by 30 to 40 fold, c-di-GMP increased Cdi2_4 riboswitch read-through transcription from 50% to 80% in vitro. Yet, it is not uncommon to observe a relatively modest change in transcription in single-round in vitro transcription assays compared to the observed cellular transcript levels (Lemay et al., 2006, Lemay et al., 2011, Heppell et al., 2011, Winkler et al., 2003). As for other in vitro assays, single-round transcription assays provide clear evidence of a biological process. Nevertheless, the results obtained may suffer from the inherent lack of complexity and/or from inaccuracy compared to biological systems. For instance, the transcription rate is crucial because ligand binding to the riboswitch has to occur co-transcriptionally. However, it may not be strictly reflected in vitro due to differences in RNA polymerase, NTP concentration and the absence of other components promoting transcription pause sites (Breaker, 2012). Moreover, transcripts are likely to be stabilized by active translation, which

116 may also contribute to the observation of greater transcript level changes compared to in vitro assays.

Nonetheless, the hypothesized transcriptional termination switch of riboswitch Cdi2_4 confirmed in vitro echoes recent unexplained results. Recently, Soutourina et al. (2013) have identified regulatory RNAs in the whole genome of C. difficile 630 and confirmed the expression of the 12 c-di-GMP-I and 4 c-di-GMP-II predicted riboswitches. The authors reported increased pilA1 (CD630_35130) transcript levels and observed only full-length transcripts upon increasing c-di-GMP levels. They reasonably hypothesized that the full- length transcripts were stabilized only in elevated c-di-GMP conditions because of active translation of PilA1, assuming that all 4 c-di-GMP-II riboswitches have the same expression platform as Cdi2_1, which contains a 600-nt self-splicing group I intron promoting translation of the CD630_32460 putative surface protein upon binding of c-di-GMP to the aptamer (Chen et al., 2011, Lee et al., 2010). However, this self-splicing group I ribozyme is the only one among the 10 representative ribozymes predicted in strain 630 to be located near a c-di-GMP riboswitch (Lee et al., 2010). Although translation of the full-length transcripts upon increasing c-di-GMP levels may still contribute to pilA1 transcripts stability, our results suggest that c-di-GMP increases pilA1 transcript levels through binding to a transcriptionally activating c-di-GMP riboswitch rather than by promoting pilA1 translation.

Riboswitches typically control the expression of genes involved in the uptake, synthesis or degradation of the respective ligand (Breaker, 2012). In contrast, the vast majority of c-di- GMP riboswitches are not involved in c-di-GMP homeostasis (Lee et al., 2010, Sudarsan et al., 2008). Although Cdi2_1 and Cdi2_4 both have a c-di-GMP-II aptamer, their respective expression platforms are different, thereby leading to translational and transcriptional regulation of the downstream genes, respectively. This observation has already been reported in two different microorganisms but is unusual in a single bacterium. For instance adenine-

117 binding riboswitches regulate the translation of add in Vibrio vulnificus and the transcription of pbuE in Bacillus subtilis (Lemay et al., 2011, Mandal & Breaker, 2004, Rieder et al., 2007).

The exact mechanism leading to T4P-dependent cell aggregation and its role in the physiology and pathogenesis of C. difficile remains to be determined. Interestingly, Reynolds et al. (2011) reported that the cell wall protein CwpV also induces cell aggregation by a yet unknown mechanism. Presumably, CwpV may interact with T4P at the cell surface in order to contribute to aggregation. Alternatively, CwpV and T4P may promote aggregation independently. In Neisseria meningitidis, cell aggregation also requires T4P formation and more specifically the PilX minor pilin, increasing the amount of adherent bacteria to human cells by promoting cell-cell interactions (Helaine et al., 2005). A hook-like structure in the D- region domain of PilX minor pilin, protruding from the pili, was hypothesized to promote aggregation by countering the retraction of pili from adjacent cells (Helaine et al., 2007).

It is also reasonable to assume that T4P may be involved in other phenotypes in C. difficile such as biofilm formation, which was recently reported to be induced by c-di-GMP (Soutourina et al., 2013). C. difficile cells can grow in vitro on abiotic surfaces embedded within a biofilm matrix containing exopolysaccharides, DNA and proteins, which increases resistance of vegetative cells to oxidative stress and vancomycin (Dawson et al., 2012, Ethapa et al., 2013). Hypothetically, auto-aggregation could promote biofilm formation in its early stages. However, preliminary experiments did not show a clear contribution of T4P or c-di- GMP in biofilm formation as the phenotype was weak and variable in our experimental conditions (data not shown). Biofilm formation in C. difficile was reported to be weaker with strain 630 than with strain R20291 (Ethapa et al., 2013) and perhaps the latter, and other strains, should be used in future studies to ascertain the role of T4P in C. difficile.

Nevertheless, we speculate that T4P are involved in the pathogenicity of C. difficile.

118 Multicellular structures of C. difficile cells, described as cell aggregates, were observed in a mouse model of colonization and suggest aggregation and/or biofilm could be important for colonization (Lawley et al., 2009). Goulding et al. (2009) also detected the presence of pilus- like structures at the surface of C. difficile cells within the intestinal crypts of infected hamsters and detected the predicted pilin CD630_35070 in vivo. These observations and our results confirmed that T4P are produced at the cell surface and suggest that these appendages are important for host colonization and pathogenesis of C. difficile.

119 EXPERIMENTAL PROCEDURES

Bacterial strains and growth conditions

The bacterial strains and plasmids used in this study are described in Table 1. C. difficile strains were grown in brain-heart infusion (BHI) medium (BD, Bacto) or in BHI medium supplemented with yeast extract (5 g l-1) and 0.1% L-cysteine-HCl (BHIS) (Smith et al., 1981) at 37°C in an atmosphere of 90% N2, 10% CO2 and 5% H2 in an anaerobic chamber (Coy Laboratory Products). Media for growth of C. difficile were supplemented with 10 µg ml-1 thiamphenicol (BHIS-tm), 10 µg ml-1 norfloxaxin and/or 2.5 µg ml-1 erythromycin, as appropriate. E. coli strains were routinely grown at 37°C under aerobic conditions in Luria- Bertani (LB) medium supplemented with 25 µg ml-1 chloramphenicol or 100 µg ml-1 ampicillin as needed. For induction of gene expression in the strains carrying vectors with genes under the control of the cationic antimicrobial peptide inducible cpr promoter (Pcpr), BHIS-tm was supplemented with 1 µg ml-1 nisin (2.5% w/w, MP Biomedicals).

Plasmid and strain construction

The pilA1 and pilB2 genes were insertionally inactivated using a group II intron retargeting approach (Karberg et al., 2001) adapted for Clostridium, the ClosTron, as previously described (Heap et al., 2010). Briefly, the Ll.ltrB intron derivative in plasmid pMTL007C-E5 was retargeted to insert after position 474 of CD630_35130 (pMTL007C-E5-CD3513- 474/475s) or to insert after position 786 of CD630_35120 (pMTL007C-E5-CD3512-786/787s) using the Perutka algorithm (Perutka et al., 2004) implemented at Clostron.com to predict suitable intron insertion sites and determine the corresponding intron sequence modifications required. Both pMTL007C-E5 derivatives were synthesized by DNA 2.0 (Menlo Park, US). E.

120 coli CA434 was transformed by electroporation and used as a donor strain to transfer plasmids by conjugation into C. difficile 630erm. Transconjugants were plated on BHIS agar plates with thiamphenicol to select for the plasmid and norfloxacin to select against E. coli. Integrants could be isolated by subsequently culturing transconjugants on BHIS agar plates supplemented with erythromycin. Loss of the plasmid after overnight growth in BHIS broth was then confirmed by sensitivity to thiamphenicol. Intron insertion and orientation in pilA1 or pilB2 was confirmed by PCR screening using primer pairs CD3513-282s-F/LCF439 or CD3512-795s-F/LCF439. Insertion of only one intron copy was confirmed by Southern blot hybridization (see supplementary material).

For complementation studies, pilA1 and the upstream 497-bp region encompassing the c-di- GMP-II riboswitch Cdi2_4 and putative promoter region, were amplified by PCR using primer pair RB3513-F2-EcoRI/CD3513-R3-EcoRI, digested with EcoRI and ligated into the EcoRI- digested pDccA and pDccAmut plasmids. The resulting pDccA-RB-pilA1 and pDccAmut-RB- pilA1 plasmids were verified by DNA sequencing. In both plasmids, the Pcpr promoter and the promoter of pilA1 are divergent (Fig. S3). Plasmid pDccA-RB-pilA1G9Y, which codes for the PilA1 pilin mutant with a glycine to tyrosine amino acid substitution at position 9, and plasmids pDccA-RBA70C-pil1A and pDccA-RBA70C-pilA1, which contain adenine to cytosine and adenine to guanine substitutions at position 70 of Cdi2_4, respectively, were generated by site-directed mutagenesis of pDccA-RB-pilA1 using the QuickChange Lightning Site-Directed Mutagenesis Kit (Agilent Technologies) using primer pairs listed in Table S2. Mutated plasmids were verified by DNA sequencing.

For in vitro transcription assays, a synthetic transcription template containing the B. subtilis glyQS promoter sequence, a GC dinucleotide transcription start site, and the Cdi2_4 riboswitch sequence including 17 bp upstream of the predicted P1 stem and 93 bp downstream of the last base of the predicted transcription attenuator stem was generated in three steps.

121 First, the 135-bp glyQS promoter sequence, followed by a GC sequence, was generated by polymerase cycling assembly (PCA) using oligonucleotides GlyQS-F1 (0.5 µM), GlyQS-R1 (40 pM), GlyQS-F2 (40 pM) and GlyQS-R2 (0.5 µM) as previously described (Stemmer et al., 1995). The 229-bp extended Cdi2_4 sequence was amplified by PCR using primers GlyQS- RB3513-F2 and GlyQS-RB3513-R. The final template sequence was then generated by PCR amplification with primers GlyQS-RB3513-F and GlyQS-RB3513-R using the purified and 1:1000 diluted PCA and PCR products as templates, and cloned in pCR2.1-TA (Life Technologies) to generate plasmid pCR2.1-Rb3513sr. Plasmids pCR2.1-Rb3513sr-a70c and pCR2.1-Rb3513sr-a70g, which contain respectively adenine to cytosine and adenine to guanine substitutions at position 70 of the c-di-GMP-II riboswitch, were generated by site- directed mutagenesis of pCR2.1-Rb3513sr using the QuickChange Lightning Site-Directed Mutagenesis Kit using primer pairs listed in Table S2.

Transmission electron microscopy

Bacteria were inoculated 1:50 from saturated overnight cultures into filter-sterilized BHIS-tm with 1 µg ml-1 nisin and grown anaerobically at 37°C for 12 h. Bacteria from 4 ml of culture were centrifuged 2 min, resuspended in 1 ml of PBS, 4% paraformaldehyde and fixed in the anaerobic chamber for 45 min. 2.5 µl of cell suspension were adsorbed onto formar/carbon copper grids (Ted Pella, Inc., Redding CA) for 2 min, washed twice in filtered deionized water, and stained for 30 seconds in two sequential drops of 0.1% aqueous uranyl acetate. Cells were observed on a LEO EM 910 Transmission electron microscope (Carl Zeiss Microscopy, LLC, Thornword NY) and recorded with a Gatan Orius SC1000 digital camera with Digital Micrograph 3.11.0 software (Gatan, Inc., Pleaston, CA). All of the cells in 15 (WT) or 20 (pilA1 and pilB2) randomly selected grid squares were photographed at 8000 magnification. The images were blinded for quantification in which the number of cells and pili in each image were recorded by three observers who were unaware of the experimental conditions.

122 Aggregation assays

Cell aggregation in liquid cultures was monitored as previously reported (Purcell et al., 2012). Briefly, borosilicate tubes containing 10 ml BHIS-tm with or without nisin were inoculated with colonies from 24 to 48-h old BHIS-tm plates, and incubated 16 h at 37°C. Aggregation was assessed by measuring the cell density (OD600nm) on the culture tubes, before (pre) and after (post) vortexing to resuspend the cell aggregates. Monitoring of cell aggregation over time was carried out in fresh BHIS-tm broth with or without nisin inoculated 1/50 with 16-h old cultures in BHIS-tm, and incubated at 37°C. Cell density (OD600nm) was measured on static cultures over a 10-h incubation time after which the tubes were vortexed to resuspend the aggregates and measure the cell density of the total cell population in each tube.

Molecular biology methods

All the enzymes used in this study were obtained from New England Biolabs and were used according to the manufacturer's instructions. Plasmid DNA from E. coli was prepared with an EZ-10 Spin Column Plasmid DNA Minipreps Kit (Bio Basic). Genomic DNA for PCR verification of C. difficile clones was crudely extracted by boiling cell pellets of overnight cultures in 5 % Chelex-100, 200-400 mesh (Bio-Rad). Genomic DNA isolation for qPCR and Southern blot hybridization was performed on 10 ml overnight cultures using the Marmur’s method with increased lysozyme concentration (15 mg ml-1) and incubation time (2 h) (Colmin et al., 1991). PCR assays were performed with the primers described in Table S2 in 25-µl PCR reactions with 1 U of Phusion High Fidelity DNA polymerase. PCR conditions were as follows: (i) 30 s at 98 °C, (ii) 30 cycles of 10 s at 98 °C, 20 s at suitable annealing temperature, and 15-300 s at 72 °C, and (iii) 5 min at 72 °C. When needed PCR products were purified using an EZ-10 Spin Column PCR products Purification Kit. E. coli was transformed by electroporation according to Dower and colleagues (Dower et al., 1988). Transformation

123 was carried out in 0.1 cm electroporation cuvettes using a Bio-Rad GenePulser Xcell apparatus set at 25 µF, 200 and 1.8 kV.

RNA isolation and qRT-PCR

RNA was isolated from C. difficile cultures grown in BHIS-tm to an OD600nm of 0.7 without or with 0.1 and 1µg ml-1 nisin. Samples were mixed with an equal volume of cold 1:1 ethanol- acetone, and stored at -80°C. Total RNA extraction was achieved using an RNeasy minikit (Qiagen) following the manufacturer's instructions and glass bead lysis. Cells resuspended in 1 ml of RLT buffer (Qiagen) were submitted to 2 cycles of 40 s at 4 m s-1 on a Fastprep 24 apparatus (MP Bimedicals) with 0.5 g of 0.1 mm glass beads. Purified RNA samples were then subjected to gDNA digestion using Turbo DNase (Ambion) following the manufacturer's instructions. qRT-PCR assays were performed on the RNomics platform of the Laboratoire de Genomique Fonctionnelle de l’Universite de Sherbrooke (http://lgfus.ca). RNA integrity was assessed using an Agilent 2100 Bioanalyzer (Agilent Technologies). Reverse transcription was performed on 2.2 µg total RNA with Transcriptor reverse transcriptase, random hexamers, dNTPs (Roche Diagnostics), and 10 units of RNAseOUT (Invitrogen Life Technologies) following the manufacturer’s protocol in a total volume of 20 µl. All forward and reverse primers were individually resuspended to 20-100 µM stock solutions in Tris-EDTA buffer (IDT) and diluted as a primer pair to 1 µM in RNase DNase-free water (IDT). Quantitative PCR (qPCR) reactions were performed in 10-µl volumes in 96 well plates on a CFX-96 thermocycler (Bio-Rad) with 5 µL of 2 iTaq Universal SYBR Green Supermix (Bio-Rad), 10 ng (3 µl) cDNA, and 200 nM final (2 µl) primer pair solutions. The following cycling conditions were used: 3 min at 95 °C; 50 cycles: 15 s at 95 °C, 30 s at 60 °C, 30 s at 72 °C. Relative expression levels were calculated using a model taking into account multiple stably expressed reference genes (Hellemans et al., 2007) and housekeeping genes adk, rpoA and gyrA evaluated by geNorm (Vandesompele et al., 2002). Primer design (see Table S2) and validation were evaluated as described elsewhere (Brosseau et al., 2010). In every qPCR run, a

124 no-template control was performed for each primer pair and a no-reverse transcriptase control was performed for each cDNA preparation. Experiments were carried out three times on three biological replicates and combined.

Plasmid copy number quantification

Genomic DNA was isolated from C. difficile cultures grown in BHIS to mid-exponential phase (OD600nm of 0.6) or stationary phase (16 h). Gene copy number was measured by qPCR using 10 ng (3 µl) of gDNA as described herein for cDNA quantification (see qRT-PCR) using primers listed in Table S2. CD630_35130 (pilA1) was targeted to measure its relative gene copy number in the WT strain harboring plasmids pDccA-RB-pilA1 or devoid of vector. The data were analyzed using the Ct method using CD630_35120 (pilB2) as the chromosomal copy number reference. Experiments were carried out three times on two biological replicates and combined.

Single-round in vitro transcription assays

Transcription reactions were conducted as previously described (Blouin & Lafontaine, 2007) with minor modifications. Briefly, DNA templates for in vitro transcription maintained in plasmids pCR-2.1-Rb3513sr, pCR-2.1-Rb3513sr-a70c and pCR-2.1-Rb3513sr-a70g were amplified by PCR using primers GlyQS-RB3513-F and GlyQS-RB3513-R. The dinucleotide GpC, fused with the 17 nt upstream of the predicted P1 stem, was used to initiate transcription. Transcription reactions were expected to stop at the predicted transcription terminator for terminated RNA species (approx. 146 nt) or to stop at the end of the template for read-through species (231 nt). Reaction mixtures containing 300 fmol of DNA template in a buffer containing 150 mM GpC, 2.5 mM rATP and rCTP, 0.75 mM rUTP, 2 mCi [-32P]-UTP, 50

125 mM Tris-HCl (pH 8.0 at 23 °C), 20 mM MgCl2, 0.1 mM EDTA, 1 mM DTT, 10 % glycerol, and c-di-GMP or c-di-AMP (BIOLOG Life Science Institute, Bremen) when needed, were incubated 5 min at 37 °C. Transcription was initiated by addition of 0.25 U E. coli RNA polymerase holoenzyme (Epicentre) and incubation for 15 min at 37°C. Reactions were completed by the addition of each rNTP to a final concentration of 2.5 µM and 0.45 mg ml-1 heparin to avoid transcription reinitiation. After an additional 15-min incubation period at 37 °C, reactions were stopped by addition of an equal volume of 95% formamide. Reaction products were separated on 8 % denaturing polyacrylamide gels (PAGE). Exposed phosphor imager screens were scanned and analyzed using an FX molecular imager and the Quantity

One software (Bio-Rad). The c-di-GMP concentration needed to obtain half of the change, T50, was determined as described previously (Blouin & Lafontaine, 2007).

RNA molecular marker

In order to verify the length of transcription reaction products, two RNA molecular markers corresponding to the length of the predicted terminated (146 nt) and read-thought (231 nt) RNA species were generated using T7 RNA polymerase (Life Technologies). Templates for T7 RNA transcription were prepared by PCR amplification on C. difficile 630 genomic DNA using primer pairs T7RB3513-F/GlyQS-RB3513term-R2 and T7RB3513-F/GlyQS-RB3513- R, and then purified by denaturing PAGE and electro-elution. Both transcripts start with a 5’- GCG sequence to minimize 5’ end heterogeneity of the RNA population (Pleiss et al., 1998). [5-32P]-RNA was obtained by dephosphorylation with shrimp alkaline phosphatase (Affymetrix), and phosphorylation with T4 polynucleotide kinase (NEB) and [-32P]-ATP, according to manufacturer’s instructions. Reactions products were separated by denaturing PAGE, electro-eluted, ethanol precipitated and dissolved in water.

126 ACKNOWLEDGEMENTS

We thank Robert McKee, Adam Spaulding and Ankunda Kariisa for their assistance in enumerating pili for the EM studies. This work was supported by a grant from the FRQNT Research Team program to V.B. and L.-C.F., an Alexander Graham Bell Canada Graduate Scholarship from the NSERC to E.B. and NIH awards T32-DK007737 to E.B.P. and U54- AI057157 and R01-AI107029 to R.T. VB holds a Canada Research Chair in Bacterial Molecular Genetics. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

127 REFERENCES

Aldridge, P., R. Paul, P. Goymer, P. Rainey & U. Jenal, (2003) Role of the GGDEF regulator PleD in polar development of Caulobacter crescentus. Mol. Microbiol. 47: 1695-1708. Alm, R.A., A.J. Bodero, P.D. Free & J.S. Mattick, (1996) Identification of a novel gene, pilZ, essential for type 4 fimbrial biogenesis in Pseudomonas aeruginosa. J. Bacteriol. 178: 46- 53. Amikam, D. & M.Y. Galperin, (2006) PilZ domain is part of the bacterial c-di-GMP binding protein. Bioinformatics 22: 3-6. Blouin, S. & D.A. Lafontaine, (2007) A loop loop interaction and a K-turn motif located in the lysine aptamer domain are important for the riboswitch gene regulation control. RNA 13: 1256-1267. Boehm, A., M. Kaiser, H. Li, C. Spangler, C.A. Kasper, M. Ackermann, V. Kaever, V. Sourjik, V. Roth & U. Jenal, (2010) Second messenger-mediated adjustment of bacterial swimming velocity. Cell 141: 107-116. Bordeleau, E., L.C. Fortier, F. Malouin & V. Burrus, (2011) c-di-GMP turn-over in Clostridium difficile is controlled by a plethora of diguanylate cyclases and phosphodiesterases. PLoS Genet 7: e1002039. Borriello, S.P., H.A. Davies & F.E. Barclay, (1988) Detection of fimbriae amongst strains of Clostridium difficile. FEMS Microbiol. Lett. 49: 65-67. Breaker, R.R., (2012) Riboswitches and the RNA world. Cold Spring Harb Perspect Biol 4. Brosseau, J.P., J.F. Lucier, E. Lapointe, M. Durand, D. Gendron, J. Gervais-Bird, K. Tremblay, J.P. Perreault & S.A. Elela, (2010) High-throughput quantification of splicing isoforms. RNA 16: 442-449. Chen, A.G., N. Sudarsan & R.R. Breaker, (2011) Mechanism for gene control by a natural allosteric group I ribozyme. RNA 17: 1967-1972. Colmin, C., M. Pebay, J.M. Simonet & B. Decaris, (1991) A species-specific DNA probe obtained from Streptococcus salivarius subsp. thermophilus detects strain restriction polymorphism. FEMS Microbiol. Lett. 65: 123-128.

128 Craig, L. & J. Li, (2008) Type IV pili: paradoxes in form and function. Curr. Opin. Struct. Biol. 18: 267-277. Dawson, L.F., E. Valiente, A. Faulds-Pain, E.H. Donahue & B.W. Wren, (2012) Characterisation of Clostridium difficile biofilm formation, a role for Spo0A. PLoS One 7: e50527. Dineen, S.S., A.C. Villapakkam, J.T. Nordman & A.L. Sonenshein, (2007) Repression of Clostridium difficile toxin gene expression by CodY. Mol. Microbiol. 66: 206-219. Dower, W.J., J.F. Miller & C.W. Ragsdale, (1988) High efficiency transformation of E. coli by high voltage electroporation. Nucleic Acids Res 16: 6127-6145. El Meouche, I., J. Peltier, M. Monot, O. Soutourina, M. Pestel-Caron, B. Dupuy & J.L. Pons, (2013) Characterization of the SigD regulon of C. difficile and its positive control of toxin production through the regulation of tcdR. PLoS One 8: e83748. Ethapa, T., R. Leuzzi, Y.K. Ng, S.T. Baban, R. Adamo, S.A. Kuehne, M. Scarselli, N.P. Minton, D. Serruto & M. Unnikrishnan, (2013) Multiple factors modulate biofilm formation by the anaerobic pathogen Clostridium difficile. J. Bacteriol. 195: 545-555. Fang, X. & M. Gomelsky, (2010) A post-translational, c-di-GMP-dependent mechanism regulating flagellar motility. Mol. Microbiol. 76: 1295-1305. Freitag, N.E., H.S. Seifert & M. Koomey, (1995) Characterization of the pilF-pilD pilus- assembly locus of Neisseria gonorrhoeae. Mol. Microbiol. 16: 575-586. Giltner, C.L., Y. Nguyen & L.L. Burrows, (2012) Type IV pilin proteins: versatile molecular modules. Microbiol. Mol. Biol. Rev. 76: 740-772. Goulding, D., H. Thompson, J. Emerson, N.F. Fairweather, G. Dougan & G.R. Douce, (2009) Distinctive profiles of infection and pathology in hamsters infected with Clostridium difficile strains 630 and B1. Infect. Immun. 77: 5478-5485. Gruber, A.R., R. Lorenz, S.H. Bernhart, R. Neubock & I.L. Hofacker, (2008) The Vienna RNA websuite. Nucleic Acids Res 36: W70-74. Guzzo, C.R., R.K. Salinas, M.O. Andrade & C.S. Farah, (2009) PILZ protein structure and interactions with PILB and the FIMX EAL domain: implications for control of type IV pilus biogenesis. J. Mol. Biol. 393: 848-866.

129 Heap, J.T., S.A. Kuehne, M. Ehsaan, S.T. Cartman, C.M. Cooksley, J.C. Scott & N.P. Minton, (2010) The ClosTron: Mutagenesis in Clostridium refined and streamlined. J. Microbiol. Methods 80: 49-55. Helaine, S., E. Carbonnelle, L. Prouvensier, J.L. Beretti, X. Nassif & V. Pelicic, (2005) PilX, a pilus-associated protein essential for bacterial aggregation, is a key to pilus-facilitated attachment of Neisseria meningitidis to human cells. Mol. Microbiol. 55: 65-77. Helaine, S., D.H. Dyer, X. Nassif, V. Pelicic & K.T. Forest, (2007) 3D structure/function analysis of PilX reveals how minor pilins can modulate the virulence properties of type IV pili. Proc. Natl. Acad. Sci. U. S. A. 104: 15888-15893. Hellemans, J., G. Mortier, A. De Paepe, F. Speleman & J. Vandesompele, (2007) qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol 8: R19. Hengge, R., (2009) Principles of c-di-GMP signalling in bacteria. Nat Rev Microbiol 7: 263- 273. Heppell, B., S. Blouin, A.M. Dussault, J. Mulhbacher, E. Ennifar, J.C. Penedo & D.A. Lafontaine, (2011) Molecular insights into the ligand-controlled organization of the SAM-I riboswitch. Nat Chem Biol 7: 384-392. Hickman, J.W. & C.S. Harwood, (2008) Identification of FleQ from Pseudomonas aeruginosa as a c-di-GMP-responsive transcription factor. Mol. Microbiol. 69: 376-389. Ho, T.D. & C.D. Ellermeier, (2011) PrsW is required for colonization, resistance to antimicrobial peptides, and expression of extracytoplasmic function sigma factors in Clostridium difficile. Infect. Immun. 79: 3229-3238. Huang, B., C.B. Whitchurch & J.S. Mattick, (2003) FimX, a multidomain protein connecting environmental signals to twitching motility in Pseudomonas aeruginosa. J. Bacteriol. 185: 7068-7076. Hussain, H.A., A.P. Roberts & P. Mullany, (2005) Generation of an erythromycin-sensitive derivative of Clostridium difficile strain 630 (630erm) and demonstration that the conjugative transposon Tn916E enters the genome of this strain at multiple sites. J. Med. Microbiol. 54: 137-141.

130 Imam, S., Z. Chen, D.S. Roos & M. Pohlschroder, (2011) Identification of surprisingly diverse type IV pili, across a broad range of Gram-positive bacteria. PLoS One 6: e28919. Jain, R., A.J. Behrens, V. Kaever & B.I. Kazmierczak, (2012) Type IV pilus assembly in Pseudomonas aeruginosa over a broad range of cyclic di-GMP concentrations. J. Bacteriol. 194: 4285-4294. Karberg, M., H. Guo, J. Zhong, R. Coon, J. Perutka & A.M. Lambowitz, (2001) Group II introns as controllable gene targeting vectors for genetic manipulation of bacteria. Nat. Biotechnol. 19: 1162-1167. Kazmierczak, B.I., M.B. Lebron & T.S. Murray, (2006) Analysis of FimX, a phosphodiesterase that governs twitching motility in Pseudomonas aeruginosa. Mol. Microbiol. 60: 1026-1043. Krasteva, P.V., J.C. Fong, N.J. Shikuma, S. Beyhan, M.V. Navarro, F.H. Yildiz & H. Sondermann, (2010) Vibrio cholerae VpsT regulates matrix production and motility by directly sensing cyclic di-GMP. Science 327: 866-868. Kulasakara, H., V. Lee, A. Brencic, N. Liberati, J. Urbach, S. Miyata, D.G. Lee, A.N. Neely, M. Hyodo, Y. Hayakawa, F.M. Ausubel & S. Lory, (2006) Analysis of Pseudomonas aeruginosa diguanylate cyclases and phosphodiesterases reveals a role for bis-(3'-5')-cyclic- GMP in virulence. Proc. Natl. Acad. Sci. U. S. A. 103: 2839-2844. Lawley, T.D., S. Clare, A.W. Walker, D. Goulding, R.A. Stabler, N. Croucher, P. Mastroeni, P. Scott, C. Raisen, L. Mottram, N.F. Fairweather, B.W. Wren, J. Parkhill & G. Dougan, (2009) Antibiotic treatment of Clostridium difficile carrier mice triggers a supershedder state, spore-mediated transmission, and severe disease in immunocompromised hosts. Infect. Immun. 77: 3661-3669. Lee, E.R., J.L. Baker, Z. Weinberg, N. Sudarsan & R.R. Breaker, (2010) An allosteric self- splicing ribozyme triggered by a bacterial second messenger. Science 329: 845-848. Lemay, J.F., G. Desnoyers, S. Blouin, B. Heppell, L. Bastet, P. St-Pierre, E. Masse & D.A. Lafontaine, (2011) Comparative study between transcriptionally- and translationally-acting adenine riboswitches reveals key differences in riboswitch regulatory mechanisms. PLoS Genet 7: e1001278.

131 Lemay, J.F., J.C. Penedo, R. Tremblay, D.M. Lilley & D.A. Lafontaine, (2006) Folding of the adenine riboswitch. Chem. Biol. 13: 857-868. Leontis, N.B. & E. Westhof, (2001) Geometric nomenclature and classification of RNA base pairs. RNA 7: 499-512. Li, T.N., K.H. Chin, J.H. Liu, A.H. Wang & S.H. Chou, (2009) XC1028 from Xanthomonas campestris adopts a PilZ domain-like structure without a c-di-GMP switch. Proteins 75: 282-288. Maldarelli, G.A., L. De Masi, E.C. von Rosenvinge, M. Carter & M.S. Donnenberg, (2014) Identification, immunogenicity, and cross-reactivity of type IV pilin and pilin-like proteins from Clostridium difficile. Pathog Dis: In press (DOI: 10.1111/2049-1632X.12137). Mandal, M. & R.R. Breaker, (2004) Adenine riboswitches and gene activation by disruption of a transcription terminator. Nat Struct Mol Biol 11: 29-35. Mani, N. & B. Dupuy, (2001) Regulation of toxin synthesis in Clostridium difficile by an alternative RNA polymerase sigma factor. Proc. Natl. Acad. Sci. U. S. A. 98: 5844-5849. Martin, M.J., S. Clare, D. Goulding, A. Faulds-Pain, L. Barquist, H.P. Browne, L. Pettit, G. Dougan, T.D. Lawley & B.W. Wren, (2013) The agr locus regulates virulence and colonization genes in Clostridium difficile 027. J. Bacteriol. 195: 3672-3681. Melville, S. & L. Craig, (2013) Type IV pili in Gram-positive bacteria. Microbiol. Mol. Biol. Rev. 77: 323-341. Merighi, M., V.T. Lee, M. Hyodo, Y. Hayakawa & S. Lory, (2007) The second messenger bis- (3'-5')-cyclic-GMP and its PilZ domain-containing receptor Alg44 are required for alginate biosynthesis in Pseudomonas aeruginosa. Mol. Microbiol. 65: 876-895. Navarro, M.V., N. De, N. Bae, Q. Wang & H. Sondermann, (2009) Structural analysis of the GGDEF-EAL domain-containing c-di-GMP receptor FimX. Structure 17: 1104-1116. Perutka, J., W. Wang, D. Goerlitz & A.M. Lambowitz, (2004) Use of computer-designed group II introns to disrupt Escherichia coli DExH/D-box protein and DNA helicase genes. J. Mol. Biol. 336: 421-439. Piepenbrink, K.H., G.A. Maldarelli, C.F. de la Pena, G.L. Mulvey, G.A. Snyder, L. De Masi, E.C. von Rosenvinge, S. Gunther, G.D. Armstrong, M.S. Donnenberg & E.J. Sundberg,

132 (2014) Structure of Clostridium difficile PilJ exhibits unprecedented divergence from known type IV pilins. J. Biol. Chem. 289: 4334-4345. Pitzer, J.E., S.Z. Sultan, Y. Hayakawa, G. Hobbs, M.R. Miller & M.A. Motaleb, (2011) Analysis of the Borrelia burgdorferi cyclic-di-GMP-binding protein PlzA reveals a role in motility and virulence. Infect. Immun. 79: 1815-1825. Pleiss, J.A., M.L. Derrick & O.C. Uhlenbeck, (1998) T7 RNA polymerase produces 5' end heterogeneity during in vitro transcription from certain templates. RNA 4: 1313-1317. Proft, T. & E.N. Baker, (2009) Pili in Gram-negative and Gram-positive bacteria - structure, assembly and their role in disease. Cell. Mol. Life Sci. 66: 613-635. Purcell, E.B., R.W. McKee, S.M. McBride, C.M. Waters & R. Tamayo, (2012) Cyclic diguanylate inversely regulates motility and aggregation in Clostridium difficile. J. Bacteriol. 194: 3307-3316. Purdy, D., T.A. O'Keeffe, M. Elmore, M. Herbert, A. McLeod, M. Bokori-Brown, A. Ostrowski & N.P. Minton, (2002) Conjugative transfer of clostridial shuttle vectors from Escherichia coli to Clostridium difficile through circumvention of the restriction barrier. Mol. Microbiol. 46: 439-452. Qi, Y., L. Xu, X. Dong, Y.H. Yau, C.L. Ho, S.L. Koh, S.G. Shochat, S.H. Chou, K. Tang & Z.X. Liang, (2012) Functional divergence of FimX in PilZ binding and type IV pilus regulation. J. Bacteriol. 194: 5922-5931. Rakotoarivonina, H., G. Jubelin, M. Hebraud, B. Gaillard-Martinie, E. Forano & P. Mosoni, (2002) Adhesion to cellulose of the Gram-positive bacterium Ruminococcus albus involves type IV pili. Microbiology 148: 1871-1880. Rao, F., Y. Yang, Y. Qi & Z.X. Liang, (2008) Catalytic mechanism of cyclic di-GMP-specific phosphodiesterase: a study of the EAL domain-containing RocR from Pseudomonas aeruginosa. J. Bacteriol. 190: 3622-3631. Reynolds, C.B., J.E. Emerson, L. de la Riva, R.P. Fagan & N.F. Fairweather, (2011) The Clostridium difficile cell wall protein CwpV is antigenically variable between strains, but exhibits conserved aggregation-promoting function. PLoS Pathog 7: e1002024. Rieder, R., K. Lang, D. Graber & R. Micura, (2007) Ligand-induced folding of the adenosine

133 deaminase A-riboswitch and implications on riboswitch translational control. Chembiochem 8: 896-902. Romling, U., M.Y. Galperin & M. Gomelsky, (2013) Cyclic di-GMP: the first 25 years of a universal bacterial second messenger. Microbiol. Mol. Biol. Rev. 77: 1-52. Rupnik, M., M.H. Wilcox & D.N. Gerding, (2009) Clostridium difficile infection: new developments in epidemiology and pathogenesis. Nat Rev Microbiol 7: 526-536. Ryan, R.P., Y. Fouhy, J.F. Lucey, L.C. Crossman, S. Spiro, Y.W. He, L.H. Zhang, S. Heeb, M. Camara, P. Williams & J.M. Dow, (2006) Cell-cell signaling in Xanthomonas campestris involves an HD-GYP domain protein that functions in cyclic di-GMP turnover. Proc. Natl. Acad. Sci. U. S. A. 103: 6712-6717. Ryjenkov, D.A., M. Tarutina, O.V. Moskvin & M. Gomelsky, (2005) Cyclic diguanylate is a ubiquitous signaling molecule in bacteria: insights into biochemistry of the GGDEF protein domain. J. Bacteriol. 187: 1792-1798. Saujet, L., M. Monot, B. Dupuy, O. Soutourina & I. Martin-Verstraete, (2011) The key sigma factor of transition phase, SigH, controls sporulation, metabolism, and virulence factor expression in Clostridium difficile. J. Bacteriol. 193: 3186-3196. Schmidt, A.J., D.A. Ryjenkov & M. Gomelsky, (2005) The ubiquitous protein domain EAL is a cyclic diguanylate-specific phosphodiesterase: enzymatically active and inactive EAL domains. J. Bacteriol. 187: 4774-4781. Sebaihia, M., B.W. Wren, P. Mullany, N.F. Fairweather, N. Minton, R. Stabler, N.R. Thomson, A.P. Roberts, A.M. Cerdeno-Tarraga, H. Wang, M.T. Holden, A. Wright, C. Churcher, M.A. Quail, S. Baker, N. Bason, K. Brooks, T. Chillingworth, A. Cronin, P. Davis, L. Dowd, A. Fraser, T. Feltwell, Z. Hance, S. Holroyd, K. Jagels, S. Moule, K. Mungall, C. Price, E. Rabbinowitsch, S. Sharp, M. Simmonds, K. Stevens, L. Unwin, S. Whithead, B. Dupuy, G. Dougan, B. Barrell & J. Parkhill, (2006) The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat. Genet. 38: 779-786. Serganov, A. & E. Nudler, (2013) A decade of riboswitches. Cell 152: 17-24. Simm, R., M. Morr, A. Kader, M. Nimtz & U. Romling, (2004) GGDEF and EAL domains

134 inversely regulate cyclic di-GMP levels and transition from sessility to motility. Mol. Microbiol. 53: 1123-1134. Smith, C.J., S.M. Markowitz & F.L. Macrina, (1981) Transferable tetracycline resistance in Clostridium difficile. Antimicrob. Agents Chemother. 19: 997-1003. Smith, K.D., C.A. Shanahan, E.L. Moore, A.C. Simon & S.A. Strobel, (2011) Structural basis of differential ligand recognition by two classes of bis-(3'-5')-cyclic dimeric guanosine monophosphate-binding riboswitches. Proc. Natl. Acad. Sci. U. S. A. 108: 7757-7762. Soutourina, O.A., M. Monot, P. Boudry, L. Saujet, C. Pichon, O. Sismeiro, E. Semenova, K. Severinov, C. Le Bouguenec, J.Y. Coppee, B. Dupuy & I. Martin-Verstraete, (2013) Genome-wide identification of regulatory RNAs in the human pathogen Clostridium difficile. PLoS Genet 9: e1003493. Stemmer, W.P., A. Crameri, K.D. Ha, T.M. Brennan & H.L. Heyneker, (1995) Single-step assembly of a gene and entire plasmid from large numbers of oligodeoxyribonucleotides. Gene 164: 49-53. Strom, M.S. & S. Lory, (1991) Amino acid substitutions in pilin of Pseudomonas aeruginosa. Effect on leader peptide cleavage, amino-terminal methylation, and pilus assembly. J. Biol. Chem. 266: 1656-1664. Sudarsan, N., E.R. Lee, Z. Weinberg, R.H. Moy, J.N. Kim, K.H. Link & R.R. Breaker, (2008) Riboswitches in eubacteria sense the second messenger cyclic di-GMP. Science 321: 411- 413. Tamayo, R., J.T. Pratt & A. Camilli, (2007) Roles of cyclic diguanylate in the regulation of bacterial pathogenesis. Annu. Rev. Microbiol. 61: 131-148. Tamayo, R., A.D. Tischler & A. Camilli, (2005) The EAL domain protein VieA is a cyclic diguanylate phosphodiesterase. J. Biol. Chem. 280: 33324-33330. Tischler, A.D. & A. Camilli, (2005) Cyclic diguanylate regulates Vibrio cholerae virulence gene expression. Infect. Immun. 73: 5873-5882. Turner, L.R., J.C. Lara, D.N. Nunn & S. Lory, (1993) Mutations in the consensus ATP- binding sites of XcpR and PilB eliminate extracellular protein secretion and pilus biogenesis in Pseudomonas aeruginosa. J. Bacteriol. 175: 4962-4969.

135 Vandesompele, J., K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe & F. Speleman, (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biol 3: RESEARCH0034. Varga, J.J., V. Nguyen, D.K. O'Brien, K. Rodgers, R.A. Walker & S.B. Melville, (2006) Type IV pili-dependent gliding motility in the Gram-positive pathogen Clostridium perfringens and other Clostridia. Mol. Microbiol. 62: 680-694. Varga, J.J., B. Therit & S.B. Melville, (2008) Type IV pili and the CcpA protein are needed for maximal biofilm formation by the Gram-positive anaerobic pathogen Clostridium perfringens. Infect. Immun. 76: 4944-4951. Wickiser, J.K., W.C. Winkler, R.R. Breaker & D.M. Crothers, (2005) The speed of RNA transcription and metabolite binding kinetics operate an FMN riboswitch. Mol. Cell 18: 49- 60. Winkler, W.C., A. Nahvi, N. Sudarsan, J.E. Barrick & R.R. Breaker, (2003) An mRNA structure that controls gene expression by binding S-adenosylmethionine. Nat. Struct. Biol. 10: 701-707.

136 Supporting information

A cyclic-di-GMP riboswitch controls type IV pili-mediated aggregation in Clostridium difficile

Eric Bordeleau1, Erin B. Purcell2, Daniel A. Lafontaine1, Louis-Charles Fortier3, Rita Tamayo2 and Vincent Burrus1*

1 Département de biologie, Faculté des sciences, Université de Sherbrooke, QC, Canada 2 Department of Microbiology and Immunology, University of North Carolina, Chapel Hill, NC, USA 3 Département de microbiologie et d’infectiologie, Faculté de médecine et sciences de la santé, Université de Sherbrooke, QC, Canada * Corresponding author

Supplemental Table S1. Supplemental Table S2. Supplemental Table S3. Supplemental Figure S1. Supplemental Figure S2. Supplemental Figure S3. Supplemental Figure S4. Supplemental Figure S5. Supplemental Experimental Procedures. Supplemental References.

137 Table S1. Putative type IV pilus proteins encoded by C. difficile 630.

b Proposed annotation Size Melville and Protein (aa) Genbank function annotationa Craig Maldarelli et al. Cluster 1 putative type IV prepilin leader CD630_35030 220 peptidase PilD1 PilD2 CD630_35040 260 putative type IV prepilin peptidase PilD2 PilD1 putative twitching motility protein CD630_35050 360 PilT PilT PilT CD630_35060 512 conserved hypothetical protein minor pilin PilK, minor pilin CD630_35070 175 putative type IV pilin minor pilin PilU, minor pilin CD630_35080 189 putative type IV pilin minor pilin PilV, minor pilin putative type IV pilus assembly CD630_35090 302 protein PilO PilO CD630_35100 570 putative membrane protein PilM2 PilMN putative type IV pilus secretion CD630_35110 402 protein PilC2 PilC1 CD630_35120 558 putative type IV pilus transporter PilB2 PilB1 CD630_35130 171 putative pilin protein major pilin PilA1,major pilin Cluster 2 CD630_32930 290 putative pilus assembly protein PilM1 PilM CD630_32940 119 putative type IV pilin PilA major pilin PilA2, major pilin putative type II secretion system CD630_32950 361 protein PilC1 PilC2 CD630_32960 449 putative pilus assembly ATPase PilB1 PilB2 Other pilins CD630_07550 267 putative cell surface protein major pilin PilJ, minor pilin CD630_23050 164 putative pilin protein major pilin PilW, minor pilin a Genbank database protein annotations for C. difficile 630 (AM180355.1) (Benson et al., 2013). b Annotations proposed by Melville and Craig (2013) in C. difficile 630 and annotations proposed by Maldarelli et al. (2014) for homologous proteins in C. difficile R20291. Major and minor pilins are indicated as predicted in the proposed annotations.

138 Table S2. Primers used in this study.

Primer name Nucleotide sequence (5 to 3)a CD3513-282 s-F CAGTAGTGGCAGTTCCAGCTTTAT CD3512-795s-F AAGAGAATATATGAGGATGAACTCAGCACC LCF439 GCATCGCTTTCGTTTCGTTCCCAT Rb3513-F2-EcoRI NNNNNNGAATTCAATATTAGCAAAGGATGGAA CD3513-R3-EcoRI NNNNNNGAATTCTTAAGCTCCTTGTTGATCCA Ll.ltrB-F CACAGGAAACAGCTATGA Ll.ltrB-R TATAGGAACTTCACGCGT GlyQS-F1 ATTGATTTATATTACGAAGAATATTCGGGATTGTATTTAAAATCAAAGCGCTTT GlyQS-R1 AAGATGTTTCATGCTTTCCATTTGATCTAAAAAGCGCTTTGATTTTAAATACAA GlyQS-F2 CAAATGGAAAGCATGAAACATCTTATGGGTGAAAACAAAAGTTGACATTTGGTC GlyQS-R2 GCAAATGATCATATAAAAAGATGGACCAAATGTCAACTTTTGTTTTC GlyQS-RB3513-F ATTGATTTATATTACGAAGAATA GlyQS-RB3513-F2 TCTTTTTATATGATCATTTGCTTATATAATTTTGAATTTAATAAA GlyQS-RB3513-R CTTATAATATATCTGGTTAAAAATC GlyQS-RB3513term-R2 ATAAAAAAAGCTATGAAAATAATAGTATC RbCD3513-a70c-F GGAGCATATTGAGTTAGTGGTGCACCCGGCTATGAAATTG RbCD3513-a70c-R CAATTTCATAGCCGGGTGCACCACTAACTCAATATGCTCC RbCD3513-a70g-F GGAGCATATTGAGTTAGTGGTGCAGCCGGCTATGAAATTG RbCD3513-a70g-R CAATTTCATAGCCGGCTGCACCACTAACTCAATATGCTCC GGGAATAAAAATGAAGTTAAAAAAGAATAAAAAATATTTCACTTTAGTGGAATTA CD3513-G9Y-F2 TTGGTAGTAATTGCA TGCAATTACTACCAATAATTCCACTAAAGTGAAATATTTTTTATTCTTTTTTAAC CD3513-G9Y-R2 TTCATTTTTATTCCC adk.CD0091.qcd630.F1 TCCAAGAAATGTAGCACAAGGAGAACAT adk.CD0091.qcd630.R1 ACTCAACATGGTAAGTAGCTCCACAAG CD0755.qcd.F2 TGGCTGAGGAATCAAGTAAGGGCAAG CD0755.qcd.R2 TCCATCATAACCATCATCAAACGTTGCT CD2305.qcd.F1 TGGTGGCTTTAAAGATGACCCATTAGC CD2305.qcd.R1 ACCACTTTTGCTTATTTCTCCTTCTGGA CD3294.qcd.F1 AACCGCAGCTAGCATGGCAA CD3294.qcd.R1 TGTTCCGCTATCATTTATGACAAGTTCG CD3507.qcd.F1 GGTGATGTGAGCTTATATGATACTCCA

139 CD3507.qcd.R1 AGTCATTCATGCTTTTTCCAGTTACAG CD3508.qcd.F1 TCTACAATCAACCACTAGTTCAAAACAA CD3508.qcd.R1 TTGCTATTTTTATGACTTTCTTGCACTG CD3510.qcd.F1 CGTCAAGAAACTTTGGTAAGGTTACGGA CD3510.qcd.R1 GCCTTGAGAAACATAGCTTGAAAATGCA CD3511.qcd.F1 TGACTCTGTGTTGGTACAGATGATAGCT CD3511.qcd.R1 CAGGCTCTATCATTGAGGCCATCC CD3512.qcd.F2 GCCAACTAACAGGGATGAAAAGATAGCT CD3512.qcd.R2 TTTTCCCACTTCCAGTAGGGCCA CD3513.qcd.F2 AGCAGTAGTGGCAGTTCCAGC CD3513.qcd.R1 CCACCTATATCAGCTTTATCAGGCAGAG gyrA.CD0006.qcd630.F1 ACAGAAATAAGACATGCTGAAGGCGA gyrA.CD0006.qcd630.R1 GTGCTGAAATACCTCTTCCACCCC rpoA.CD0098.qcd630.F1 TCATTACCAGGTGTAGCAGTGAATGC rpoA.CD0098.qcd630.R1 TGATAGAGCATGGTCCTTGAGCTTCT

Boldface text indicates EcoRI restriction sites.

140

Figure S1. Southern blot hybridization confirming single intron insertion in either pilA1 or pilB2. HindIII- digested gDNA isolated from C. difficile 630erm (WT) or the pilB2 and pilA1 insertion mutants with pDccA, pDccAmut or the empty vector was probed for the presence of the Clostron (intron). The probe generated by PCR amplification also contained a fragment of 205 bp corresponding to bases from pMTL007C-E5-CD3513- 474/475s that hybridizes with the vectors introduced in each strain (vector). HindIII-digested l DNA was used as molecular weight marker (MWM).

141

Figure S2. Absorbance data of the 16-h cell aggregation assays and cell viability. Cell aggregation was

142 monitored by measuring the absorbance (OD600nm) of supernatants before (Pre) and after vortexing (Post) from 16-h static liquid cultures in BHIS-tm with and without 1 mg ml-1 nisin. The absorbance values used to calculate the aggregation percentages shown in figures 3A (A) and 3B (B) are represented. (C) The amount of viable (CFU ml-1) was determined by serial dilution and on BHIS-tm agar plates from 200-ml samples collected from the supernatant of static liquid cultures in BHIS-tm with and without 1 mg ml-1 nisin, before (Pre) and after vortexing (Post).

Figure S3. Map of pDcca-RB-pilA1. The pilA1 gene, together with its upstream regulatory region including the Cdi2_4 c-di-GMP-II riboswitch (RB), was cloned into pDccA at the unique EcoRI restriction site located between the nisin-inducible promoter Pcpr and ori69 fragment for replication in C. difficile. The region encompassing ORF B, repA and the repeats (most likely the origin of replication) shares 100% nucleotide identity with pCD6 replicon except for a 70-bp deletion in the repeat region.

143

Figure S4. Growth curves of control cultures for time-course aggregation monitoring. The growth of non- aggregating control cultures was monitored for all strains as in figure 4, but without inducing the expression of dccA or dccAmut with nisin. The means and standard deviations obtained from three independent assays are shown. Differences in the duration of the lag phase, independent of the strains, were attenuated by correcting the start point of the plots.

Figure S5. Predicted transcription terminator downstream pilA1. The predicted transcription terminator located 36 bp downstream of the pilA1 stop codon was retrieved from WebGester DB (http://pallab.serc.iisc.ernet.in/gester) (Mitra et al., 2011). The predicted transcription terminator structure has a calculated G of -10.8 kcal mol-1.

144 EXPERIMENTAL PROCEDURES

Southern Blot

Southern blotting with radiolabelled probes was performed as previously described (Bordeleau et al., 2010). Genomic DNA was digested with HindIII-HF. The probe for the Ll.ltrB intron derivatives was generated by PCR using primers Ll.ltrB-F and Ll.ltrB-R on pMTL007C-E5- CD3513-474/475s. The PCR product and HindIII-digested lambda DNA were separately radiolabelled by dephosphorylation with Antarctic Phosphatase (NEB) and phosphorylation with T4 polynucleotide kinase (NEB) and [-32P]-ATP, according to manufacturer’s instructions. The exposed phosphor imager screen was scanned and analyzed using an FX molecular imager and the Quantity One software (Bio-Rad).

145 REFERENCES

Benson, D.A., M. Cavanaugh, K. Clark, I. Karsch-Mizrachi, D.J. Lipman, J. Ostell & E.W. Sayers, (2013) GenBank. Nucleic Acids Res 41: D36-42. Bordeleau, E., E. Brouillette, N. Robichaud & V. Burrus, (2010) Beyond antibiotic resistance: integrating conjugative elements of the SXT/R391 family that encode novel diguanylate cyclases participate to c-di-GMP signalling in Vibrio cholerae. Environ Microbiol 12: 510-523. Maldarelli, G.A., L. De Masi, E.C. von Rosenvinge, M. Carter & M.S. Donnenberg, (2014) Identification, immunogenicity, and cross-reactivity of type IV pilin and pilin-like proteins from Clostridium difficile. Pathog Dis: In press (DOI: 10.1111/2049- 1632X.12137). Melville, S. & L. Craig, (2013) Type IV pili in Gram-positive bacteria. Microbiol. Mol. Biol. Rev. 77: 323-341. Mitra, A., A.K. Kesarwani, D. Pal & V. Nagaraja, (2011) WebGeSTer DB—a transcription terminator database. Nucleic Acids Res 39: D129-135.

146 CHAPITRE 4

DISCUSSION

4.1 — Rôles et régulation des T4P par le c-di-GMP chez C. difficile

Les travaux présentés dans le chapitre 3 ont permis d’établir un rôle des T4P dans l’agrégation des cellules de C. difficile induite par le c-di-GMP. Une augmentation du c-di-GMP extracellulaire par l’expression de la DGC DccA mène à l’agrégation des cellules en suspension. Cette agrégation induite par le c-di-GMP étant significativement réduite dans un mutant de la piline PilA1 ou de l’ATPase PilB2, il semble que les T4P soient nécessaires à l’agrégation mais ne sont pas le seul facteur impliqué. Pour expliquer l’abolition partielle du phénotype d’agrégation dans les deux mutants, on pourrait envisager que d’autres pilines et l’ATPase PilB1 puissent partiellement palier à l’absence des protéines PilA1 et PilB2, respectivement. Par contre, cela semble peu probable puisqu’aucun pilus n’est observé à la surface des deux mutants par microscopie électronique. Néanmoins, il reste la possibilité que les pili (c.-à-d., les filaments) ne soient pas directement responsables de l’agrégation observée. L’appareil de sécrétion des T4P pourrait en fait être un simple T2SS servant à la sécrétion de facteurs impliqués dans l’agrégation. Par contre, les pili observés à la surface sont beaucoup plus longs que les pseudopili des T2SS qui se limitent normalement au périplasme des bactéries à Gram négatif (Korotkov et al., 2012). De plus, ce locus code aussi pour une ATPase de rétraction des pili PilT qui n’est pas retrouvée normalement chez les T2SS. Alternativement, il est probable que l’expression ou l’activité d’autres protéines impliquées directement ou indirectement dans l’agrégation soient régulées par le c-di-GMP et expliquent l’agrégation partielle observée pour les mutants pilA1 et pilB2. Notamment, l’expression des flagelles est réprimée par le c-di-GMP (Purcell et al., 2012). Hypothétiquement, l’absence de flagelles et de motilité pourrait provoquer une sédimentation des cellules en culture liquide statique. Une souche non flagellée et non motile, C. difficile M120, présente d’ailleurs une

147 sédimentation importante en culture liquide statique (Émilie St-Pierre, laboratoire de Louis- Charles Fortier, communication personnelle) et dans des tests d’autoagglutination (Stabler et al., 2009; Valiente et al., 2012). Par contre, l’interruption du gène de la flagelline (FliC) dans la souche R20291 ne provoque pas d’agrégation visible, bien que la souche ne produise plus de flagelles à sa surface et que sa motilité soit fortement réduite en gélose semi-solide (Émilie St-Pierre, laboratoire de Louis-Charles Fortier, communication personnelle). D’autre part, la cellulose synthétase putative BcsA pourrait être activée par le c-di-GMP et contribuer à l’agrégation de C. difficile par la production d’exopolysaccharides. Toutes les cibles maintenant connues ou putatives du c-di-GMP seront discutées plus en détails dans la section suivante (4.2 — Les multiples rôles du c-di-GMP chez C. difficile).

En plus de jouer un rôle dans l’agrégation de C. difficile, les T4P pourraient être impliqués dans d’autres phénotypes. Les pili pourraient être impliqués dans la formation de biofilm en favorisant l’adhésion des cellules entre elles (agrégation) ou encore à des surfaces. Des études récentes ont démontré la capacité des cellules de C. difficile à adhérer à des surfaces abiotiques dans une matrice formée de polysaccharides, d’ADN et de protéines (Dawson et al., 2012; Ethapa et al., 2013). Des résultats préliminaires obtenus récemment indiquent que la mutation de pilA1 ou pilB2 provoque une baisse de formation de biofilm normalement induite par l’augmentation de la concentration en c-di-GMP intracellulaire (Rita Tamayo, communication personnelle). Ainsi, les T4P pourraient être impliqués dans la formation de biofilm chez C. difficile.

Les T4P pourraient également permettre la motilité sur des surfaces solides par la rétraction des pili (en anglais, twitching motility ou gliding motility). Trois équipes ont observé en microscopie des mouvements des cellules de C. difficile pouvant correspondre à de la motilité par les T4P (Mackin et al., 2013; Melville et Craig, 2013; Reynolds et al., 2011). Pour valider le rôle des pili dans la motilité observée sur milieux solides, il faudrait néanmoins tester la

148 motilité de souches mutantes pour des protéines de T4P, telle que la protéine ATPase PilT nécessaire à la rétraction des pili et la motilité de surface chez d’autres espèces (Ayers et al., 2010). Des essais préliminaires avec les mutants pilA1 et pilB2 ainsi que la souche parente C. difficile 630erm n’ont pour l’instant pas permis de conclure avec certitude que les T4P permettent à C. difficile de se mouvoir sur milieux solides (données non présentées).

Enfin, la participation de certaines composantes de la machinerie des T4P de C. difficile dans la compétence naturelle, à l’image de chez certaines bactéries à Gram négatif telle que N. gonorrhoeae (Chapitre 1, Figure 10) (Aas et al., 2002b), ne peut être entièrement exclue. La compétence naturelle n’est pas connue chez C. difficile. Sur la base de ce qui est connu chez N. gonorrhoeae et B. subtilis, au moins deux protéines seraient essentielles pour permettre aux T4P de C. difficile de participer dans la compétence : 1) un homologue de la protéine de liaison de l’ADN (ComE, N. gonorrhoeae; ComEA, B. subtilis) et 2) un homologue de la protéine de translocation de l’ADN à travers la membrane cytoplasmique (ComA, N. gonorrhoeae; ComEC, B. subtilis (Chen et Dubnau, 2004). Une protéine ComEA (CD630_24970) et une protéine homologue de ComA/ComEC sont prédites dans le génome de C. difficile 630 (Tableau 1). Bien que ces deux protéines aient une faible homologie avec les protéines de N. gonorrhoeae et B. subtilis, elles semblent contenir les domaines conservés associés à la fonction de chacune d’elles. Ainsi, il serait envisageable que ces protéines, avec les protéines des T4P de C. difficile, permettent la compétence naturelle chez cette espèce.

149 Tableau 1. Homologues des composantes liées à la compétence chez C. difficile.

N. gonorrhoeae B. subtilis C. difficile Protéine de liaison d'ADN ComE ComEA ComEA (CD630_24970) taille (acides aminés) 99 205 234 Identité (%) 42 37 100 Similarité (%) 66 52 100 Couverture(%) 59 81 100 Domaine conservé1 ComE ComE ComE

Protéine de translocation de l'ADN ComA ComEC CD630_24780 taille (acides aminés) 691 776 278 Identité (%) 28 32 100 Similarité (%) 48 54 100 Couverture(%) 31 32 100 Domaine conservé1 ComEC ComEC ComEC Les protéines ComE (Refseq, YP_208252.1) et ComA (Refseq, YP_207438.1) de N. gonorrhoeae FA 1090 et les protéines ComEA (Refseq, NP_390437.1) et ComEC (Refseq, NP_390435.1) de B. subtilis 168 ont été utilisées pour identifier des homologues putatifs par BlastP (Altschul et al., 1997). 1: Domaines conservés indiqués sur CDD (Conserved Domain Database) (Marchler-Bauer et al., 2011).

150 4.2 — Les multiples rôles du c-di-GMP chez C. difficile

La démonstration du rôle des T4P dans l’agrégation des cellules induite par le c-di-GMP lors de mon doctorat s’inscrit dans un objectif plus global visant ultimement la détermination de l’ensemble des phénotypes régulés par le c-di-GMP chez C. difficile. Les travaux récents de plusieurs équipes de recherche ont permis de prédire de nouveaux effecteurs du c-di-GMP chez C. difficile, les riborégulateurs c-di-GMP (Lee et al., 2010; Sudarsan et al., 2008), de déterminer l’effet du c-di-GMP sur la transcription des gènes en aval de ces riborégulateurs (Soutourina et al., 2013) et finalement, de déterminer un rôle du c-di-GMP dans l’inhibition de la motilité par les flagelles et la production de toxines ainsi que dans l’induction de l’agrégation et la formation de biofilm (McKee et al., 2013; Purcell et al., 2012; Soutourina et al., 2013).

4.2.1 — Régulation de la motilité et des gènes des flagelles

Récemment, Purcell et al. (2012) ont démontré que le c-di-GMP inhibe la motilité par flagelles chez C. difficile. En effet, l’augmentation de la concentration de c-di-GMP intracellulaire par l’expression de la DGC DccA (codée par l’ORF CD630_14200) diminue la motilité de C. difficile en milieu semi-solide (0,3 % d’agar). Cette diminution de motilité est attribuée à la diminution de la transcription des gènes de flagelles suite à la liaison du c-di- GMP au riborégulateur c-di-GMP-I, nommé Cd1, situé en amont du gène flgB (CD630_02450). Par des essais de transcription in vitro et des essais -galactosidases avec le riborégulateur Cd1, Sudarsan et al. (2008) ont d’abord exposé l’effet négatif du c-di-GMP sur la transcription, suggérant ainsi un rôle du c-di-GMP dans la régulation des flagelles et de la motilité de C. difficile. Purcell et al. (2012) ont confirmé que la transcription des gènes en aval de Cd1 était bel et bien inhibée par le c-di-GMP en démontrant une réduction de l’expression des gènes flgB et fliA (CD630_02660) qui font partie de la région F3 de gènes des flagelles qui

151 comprend les gènes de flagelles de la phase précoce (early-stage) (CD630_02450 à CD630_02720) (Figure 11). De plus, le c-di-GMP inhibe la transcription du gène flgM (CD630_02290) se situant dans la région F1 des gènes de flagelles qui comprend les gènes de flagelles de la phase tardive (late-stage) (CD630_02260 à CD630_02240).

Figure 11. Gènes de synthèse et de modification des flagelles chez C. difficile 630. Les gènes de flagelles retrouvés dans 3 régions (F1, gènes de flagelle de la phase tardive (late-stage); F2, glycosylation des flagelles; F3, gènes de la phase précoce (early-stage) sont représentés par des flèches noires. Les flèches pointillées indiquent les sites prédits de liaison du facteur sigma alternatif FliA/SigD. Les régions non codantes sont indiquées en gris. Les triangles blancs indiquent les sites d’insertions de l’intron de groupe II utilisé pour inactiver certains gènes. Figure tirée de Aubry et al. (2012).

Il apparaissait alors fort probable que le c-di-GMP inhibe indirectement la transcription de gènes de la région F1 en inhibant la transcription de sigD (auparavant nommé fliA). SigD est

152 un homologue des facteurs sigma alternatifs de la famille sigma 28, sigma D de B. subtilis et FliA (également nommé sigma F) de Salmonella et E. coli, qui sont connus pour être nécessaire à la transcription des gènes de flagelle de la phase tardive et à la motilité (Liu et Matsumura, 1995; Mirel et Chamberlin, 1989; Ohnishi et al., 1990). Aubry et al. (2012) ont par ailleurs prédit des sites de liaison de SigD en amont de flgM, CD630_02260 et fliS2 (CD630_02360) (Figure 11). Enfin, El Meouche et al. (2013) ont observé une diminution de l’expression de plusieurs gènes de flagelles dans un mutant d’interruption du gène sigD, dont 5 gènes de la phase tardive (CD630_02260, flgM, CD630_02300, flgK, et fliC) précédés de sites de liaison putatif de SigD nouvellement prédits par ces auteurs. Ainsi, le c-di-GMP inhiberait la transcription des gènes en aval de Cd1, mais aussi d’autres gènes de flagelles dont la transcription nécessiterait SigD.

4.2.2 — Régulation des toxines TcdA et TcdB

En plus d’assurer la régulation de l’expression des gènes de flagelles de façon coordonnée, les régulateurs de l’expression des flagelles telle que SigD semblent réguler l’expression d’une variété de gènes dont des facteurs de virulence chez diverses espèces. Chez V. cholerae, l’expression de 6 facteurs de virulence dont la toxine cholérique est augmentée dans un mutant du gène fliA (Syed et al., 2009). Les travaux de Aubry et al. (2012) ont permis de démontrer que certains gènes des flagelles de C. difficile sont impliqués dans la régulation des toxines TcdA et TcdB. Notamment, l’inactivation de sigD diminue la transcription des gènes tcdA et tcdB ainsi que tcdR qui codent pour un facteur sigma qui agit comme un régulateur positif de l’expression des toxines (Aubry et al., 2012). Ceci suggère que le facteur sigma alternatif SigD serait un régulateur positif de l’expression des toxines et que le c-di-GMP, en inhibant la transcription de fliA, pourrait réguler l’expression des toxines. Cette hypothèse a été vérifiée récemment dans les travaux de McKee et al. (2013). Une augmentation du niveau intracellulaire de c-di-GMP par la surexpression de la DGC DccA diminue la transcription de des gènes tcdA, tcdB et tcdR et réduit la quantité de toxines détectables dans les surnageants de

153 cultures et les effets cytotoxiques de ceux-ci (McKee et al., 2013). De plus, l’expression de sigD conduit à des effets inverses, soit une augmentation de l’expression de tcdA, tcdB et tcdR, et l’augmentation de la production de toxines (McKee et al., 2013). Parmi les gènes testés codant pour des régulateurs de l’expression des toxines (tcdR, tcdC, codY et sigH), seul le gène tcdR était moins exprimé lorsque la concentration en c-di-GMP intracellulaire était augmentée. Des essais rapporteurs avec les promoteurs en amont de tcdR, tcdA et tcdB, suggèrent d’ailleurs que SigD augmente la transcription de tcdA et tcdB indirectement en augmentant l’expression du régulateur positif des toxines, TcdR (McKee et al., 2013) (Figure 12). Ce modèle a pu être renforcé récemment par les travaux de El Meouche et al. (2013) qui ont démontré la liaison directe de SigD au promoteur situé en amont de tcdR.

Cd1

Gènes des flagelles (région F3)

flgB flgC sigD c-di-GMP

tcdR Gènes des flagelles (région F1)

tcdA, tcdB (toxines)

Figure 12. Modèle de régulation de la motilité et des toxines par le c-di-GMP. La localisation du riborégulateur c-di-GMP-I Cd1 est indiquée sur cette portion de l’ADN génomique par la structure secondaire de l’aptamère dans l’ARN correspondant. L’effet négatif du c-di-GMP sur la transcription des gènes de flagelles en aval de Cd1 (région F3) est indiqué par une ligne avec une barre. L’effet positif de SigD (FliA) sur la transcription tcdR et de TcdR sur la transcription de tcdA et tcdB est indiqué par des flèches avec une pointe. Seuls les deux premiers gènes directement en aval de Cd1 (flgB et flgC) et le gène sigD (fliA), situé 21 gènes plus en aval sont représentés dans la région F3. Pour une représentation complète des gènes de flagelles, se référer à la Figure 11.

154 Finalement, l’analyse du transcriptome d’un mutant sigD suggère que le c-di-GMP, en régulant l’expression de sigD, pourrait réguler l’expression d’autres gènes que ceux des flagelles et des toxines (El Meouche et al., 2013). L’inactivation de sigD, entraîne un changement d’expression de 103 gènes codant pour différentes fonctions, dont des protéines de surface telle que la protéine de liaison du collagène CbpA (Tulli et al., 2013), des protéines putatives associées au transport membranaire (CD630_35250 à CD630_35270, ABC transporteur putatif; CD630_35730 et CD630_35750, transport du magnésium), mais aussi de régulateurs transcriptionnels prédits (CD630_22140, CD630_06180 et CD630_06160).

4.2.3 — Régulation de la métalloprotéase Zmp1

Parmi les gènes en aval de riborégulateurs c-di-GMP prédits, les gènes CD630_28310 et CD630_28300 (Figure 13) ont particulièrement retenu notre attention et ont été l’objet du projet de maîtrise de Maxime Paquette-D’Avignon, étudiante au laboratoire de Vincent Burrus en codirection avec François Malouin. Le gène CD630_28310, situé en aval du riborégulateur c-di-GMP-II Cdi2_3, coderait pour une adhésine putative (Figure 13). Le gène CD630_28300 est en aval du riborégulateur c-di-GMP-I Cdi1_12 et selon l’annotation initiale, code pour une protéine sécrétée contenant un domaine prédit de zinc métalloprotéase. Les données d’une étude sur le transcriptome de C. difficile en fonction de différents stress suggéraient une régulation de l’expression opposée pour ces deux gènes (Emerson et al., 2008). Sur la base de ces informations, nous avions émis l’hypothèse que le c-di-GMP régule la transcription des deux gènes en se liant aux riborégulateurs en amont (Figure 13) et que ces deux gènes étaient impliqués dans l’adhésion de C. difficile.

155 Cdi2_3 Cdi1_12

zmp1

CD630_28310 CD630_28300 Adhésine putative Zinc métalloprotéase sécrétée

c-di-GMP

Figure 13. Contexte génétique des gènes CD630_28300 et CD630_28310. La localisation des riborégulateurs c-di-GMP-I Cdi1_12 et c-di-GMP-II Cdi2_3 est indiquée sur cette portion de l’ADN génomique par les structures secondaires prédites des aptamères dans l’ARN correspondant. L’effet opposé du c-di-GMP sur la transcription des gènes est indiqué par une flèche (augmentation) et une ligne avec barre (diminution). Le gène zmp1 (CD630_28300) est annoté tel que proposé par Cafardi et al. (2013). La fonction du gène CD630_28310 prédite dans Genbank (AM180355.1) est indiquée.

Tel que prédit, l’expression de DccA afin d’augmenter le c-di-GMP intracellulaire augmente l’expression de CD630_28310 et diminue l’expression de zmp1 (CD630_28300) (Maxime Paquette-D’Avignon, résultats non-publiés; Soutourina et al., 2013). Ces résultats suggèrent que les protéines codées par ces deux gènes ont des fonctions opposées ou, à tout le moins, qui ne sont pas requises au même moment. La fonction de l’adhésine putative CD630_28310 n’a pu être confirmée à ce jour par les travaux menés par Maxime Paquette -D’avignon. Par contre, la fonction zinc métalloprotéase de la protéine CD630_CD28300 a été confirmée récemment (Cafardi et al., 2013; Hensbergen et al., 2014). Renommée Zmp1, cette protéine clive le fibrinogène et la fibronectine in vitro en présence de zinc (Cafardi et al., 2013). De plus, Zmp1 clive aussi in vitro l’adhésine putative CD630_28310. Bien que la fonction de CD630_28310 reste à prouver, les deux protéines pourraient constituer un module de régulation d’adhésion où un niveau élevé de c-di-GMP favoriserait l’adhésion par l’adhésine putative CD630_28310 et un faible niveau de c-di-GMP favoriserait le détachement de C. difficile via le clivage par la protéase Zmp1 de l’adhésine CD630_28310, le fibrinogène et la fibronectine. Finalement,

156 Zmp1 clive également la protéine de surface putative CD630_32460 dont l’expression est également sous le contrôle d’un riborégulateur c-di-GMP.

4.2.4 — Régulation de la protéine de surface putative CD630_32460

Le gène codant pour la protéine de surface putative CD630_32460 est situé en aval d’un riborégulateur c-di-GMP-II ayant une plate-forme d’expression particulière. La liaison du c- di-GMP au riborégulateur Cdi2_1 permet l’épissage alternatif d’un intron de groupe I dans la plate-forme d’expression (Lee et al., 2010). Lors de cet épissage alternatif en présence de c-di- GMP, il y a formation d’un RBS à la jonction des exons, ce qui permet la traduction du gène en aval (Lee et al., 2010). L’expression du gène en aval serait contrôlée également par la séquestration ou l’exposition de ce RBS nouvellement formé dans la structure du riborégulateur lors de la dissociation ou l’association du c-di-GMP au riborégulateur, respectivement (Chen et al., 2011). Ainsi, un niveau élevé de c-di-GMP favoriserait la traduction de la protéine de surface CD630_32460, mais, un niveau faible favoriserait l’expression de Zmp1 qui en retour cliverait CD630_32460 (voir section 4.2.3 — Régulation de la métalloprotéase Zmp1). La protéine de surface putative CD630_32460 et l’adhésine putative CD630_28310 pourraient être exprimées dans les mêmes conditions et sont toutes deux des cibles de la protéase Zmp1 (Hensbergen et al., 2014). Hypothétiquement, elles pourraient donc avoir des fonctions analogues ou complémentaires dans l’adhésion.

4.2.5 — Autres cibles du c-di-GMP

En plus des 3 riborégulateurs c-di-GMP-II discutés ci-dessus, 14 autres récepteurs putatifs du c-di-GMP chez C. difficile (souche 630) sont prédits. Parmi ceux-ci, se trouvent un autre riborégulateur c-di-GMP-II, 12 riborégulateurs c-di-GMP-II et la cellulose synthétase putative

157 BcsA (CD630_25450). La protéine BcsA contient un domaine prédit de liaison du c-di-GMP PilZ. Le c-di-GMP pourrait augmenter l’activité de synthèse de la cellulose (ou un polysaccharide apparenté) de BcsA telle que rapportée pour la protéine homologue BscA de K. xylinus (31 % d’identité en acides aminés) (Ross et al., 1987). De plus, le contexte génétique de BcsA appuie la prédiction de la fonction prédite de cette protéine. En effet, plusieurs gènes liés au métabolisme des sucres se trouvent à proximité de bcsA (Figure 14). Les gènes CD630_25490 et CD630_25480 codent pour des perméases putatives impliquées dans le transport de sucres, le gène CD630_25460, à 4 pb en amont de bscA, coderait pour une endoglucanase (cellulase) et CD630_25430 coderait pour une O-acétylase de sucres de surface.

CD630_25480 CD630_25420 CD630_25490 CD630_25470 CD630_25460 bcsA (CD630_25450) CD630_25440 CD630_25430

Figure 14. Représentation schématique du contexte génétique de bcsA.

Ainsi, le c-di-GMP pourrait jouer un rôle dans la formation de biofilm en augmentant la synthèse d’exopolysaccharides par activation allostérique de BcsA. Cette hypothèse est appuyée par l’observation récente d’une formation de biofilm accrue lors de l’augmentation de la concentration de c-di-GMP intracellulaire (Soutourina et al., 2013). Ce résultat n’étant pas le point central de l’article, les auteurs n’ont pas démontré si BcsA ou encore les T4P (voir section 4.1 — Rôles et régulation des T4P par le c-di-GMP chez C. difficile) sont responsables de cette augmentation de la formation de biofilm. En revanche, les résultats de cet article démontrent que les 16 riborégulateurs c-di-GMP sont transcrits et que l’expression des gènes en aval est modulée par le c-di-GMP (Soutourina et al., 2013).

Les 13 riborégulateurs c-di-GMP non discutés ci-dessus pourraient également contribuer à la régulation de phénotypes par le c-di-GMP chez C. difficile. La majorité de ces riborégulateurs

158 c-di-GMP se situent en amont de gènes prédits (Tableau 2). Le riborégulateur c-di-GMP-II Cdi2_2 contrôlerait l’expression de trois gènes qui semblent former un système à deux composantes dont le stimulus en amont et les cibles en aval n’ont pu être prédits. Ces gènes codent pour une histidine kinase putative (CD630_32660) n’ayant pas de domaine capteur prédit et pour deux facteurs de transcription putatifs (CD630_32650 et CD630_32670) ayant un domaine de phosphorylation REC et un domaine de liaison de l’ADN winged helix-turn- helix. Les gènes précédés d’un riborégulateur c-di-GMP-I sont cependant pour la plupart de fonction inconnue. Une recherche des domaines conservés a été effectuée afin de vérifier si une fonction pouvait être déduite pour certaines des protéines avec InterProScan (Quevillon et al., 2005) (Tableau 3). InterProScan permet d’effectuer un nombre important d’analyses de façon simultanée sur les serveurs majeurs d’analyse de séquences protéiques tels que Gene3D (Yeats et al., 2006), Pfam (Finn et al., 2010), PROSITE (Hulo et al., 2006), SUPFAM (Gough et al., 2001) et SMART (Letunic et al., 2006). Néanmoins, ces analyses n’ont pas permis de prédire de fonction évidente pour ces protéines.

La protéine CD630_19900 contiendrait 2 domaines conservés «Bacterial SH3». Les domaines «Bacterial SH3» sont peu informatifs sur la fonction de CD630_19900. Les domaines SH3 eucaryotes sont des domaines d’interaction protéine-protéine se liant typiquement à des régions riches en proline. Ces interactions pourraient être impliquées dans le recrutement et la localisation de certaines protéines dans la cellule ou encore dans l’assemblage de complexes multiprotéiques (Morton et Campbell, 1994). Chez les bactéries, les protéines contenant ces domaines contiennent souvent aussi des domaines de protéase ou de glycosyl hydrolase dans la banque de données Pfam (Finn et al., 2010). Ceci est d’ailleurs le cas des sept autres protéines de C. difficile 630 avec des domaines SH3 prédits. Par contre, il n’y a aucun domaine de protéase ou de glycosyl hydrolase détecté dans CD630_19900. De plus, il n’y a pas de portion de protéine sans domaine prédit suffisamment grande pour y contenir un domaine de glycosyl hydrolase. La prédiction de la fonction de CD630_19900 demeure donc hasardeuse.

159 Tableau 2. Riborégulateur c-di-GMP prédits et gènes en aval chez C. difficile 630

Coordonnées2 Distance Riborégulateur1 5’ 3’ Gène3 (pb) Annotation5 c-di-GMP-I Cdi1_1 2296528 2296435 CD630_19900 82 Protéine putative avec domaine SH3 Cdi1_2 3266964 3266879 CD630_27970 123 Protéine d’adhésion putative Cd1 (Cdi1_3) 308777 308867 flgB (CD630_02450) 405 Protéine de la base du flagelle Cdi1_4 3379980 3380075 AS : CD630_28890 202 Protéine de phage hypothétique Cdi1_5 1142665 1142570 AS : CD630_09771 202 Protéine de phage hypothétique Cdi1_6 2285922 2286011 VBICloDif38397_20784 52 Protéine hypothétique Cdi1_7 2907225 2907322 AS: CD630_25171 852 Protéine de phage hypothétique Cdi1_8 2297491 2297588 CD630_19903 54 Protéine hypothétique Cdi1_9 2671808 2671897 CD630_23090 53 Protéine hypothétique Cdi1_10 1653917 1653821 CD630_14240 229 Protéine hypothétique Cdi1_11 3936239 3936335 CD630_33682 53 Protéine hypothétique Cdi1_12 3303460 3303375 zmp1 (CD630_28300) 99 Zinc métalloprotéase c-di-GMP-II Cdi2_1 3801149 3801064 CD630_32460 581 Protéine de surface Cdi2_2 3826693 3826610 CD630_32670 à 32650 580 Système à 2 composantes Cdi2_3 3306767 3306682 CD630_28310 117 Adhésine putative Cdi2_4 4105882 4105797 pilA1 (CD630_35130) 161 Piline de pili de type IV 1 : Les riborégulateurs sont nommés tel que proposé par Soutourina et al. (2013) et Sudarsan et al. (2008) (Cd1). 2 : Coordonnées des aptamères des riborégulateurs c-di-GMP telles que répertoriées dans la banque Rfam 11.0 (Burge et al., 2013) dans le génome de C. difficile 630 (Genbank, AM180355.1) (Benson et al., 2013). 3 : Seuls les gènes à moins de 1000 pb de l’aptamère sont indiqués. AS : antisens par rapport au 3’ de l’aptamère. 4 : Gène annoté dans la banque PATRIC (Snyder et al., 2007) (absent de Genbank AM180355.1) 5 : Annotations respectives dans Genbank et PATRIC avec mises à jour selon la littérature discutée.

160 Tableau 3. Domaines conservés des protéines de C. difficile 630 encodés par des gènes de fonction inconnue en aval de riborégulateurs à c-di-GMP-I prédits. Protéine Taille (aa) Domaines1 (banque) CD630_14240 58 aucun CD630_19900 161 Bacterial SH3 domain (PF, SM) Bacterial SH3 domain (PF, SM) CD630_19903 58 aucun CD630_23090 58 aucun CD630_27970 1987 Type I dockerin domain (G3D, SF) Fibronectin type III domain-like (PS, SM) Fibronectin type III domain-like (PF, PS, SF, Galactose -binding domain-like (G3D, SF) Galactose-binding domain-like (G3D, SF) CD630_33682 58 aucun VBICloDif38397_2078 58 aucun 1 : Les domaines sont indiqués dans l’ordre de leur localisation de N- à C-terminale. Abréviations : aa, acides aminés; PF, Pfam; SM, SMART; SF, SUPFAM; G3D, Gene3D; PS, PROSITE.

La protéine CD630_27970 est annotée comme une adhésine putative et contient potentiellement 5 domaines conservés (Tableau 3). Les domaines Fibronectin type-II domain- like (FN3) sont assez répandus dans les protéines eucaryotes, mais les domaines FN3 dans les protéines bactériennes sont plutôt rares. Chez les procaryotes, ces domaines ont été associés exclusivement à des glycosyl hydrolases (Kataeva et al., 2002). Les « Type I dockerin domains » sont présents dans les glycosyl hydrolases où ils permettent l’attachement au complexe multiprotéique de dégradation de polysaccharides, le cellulosome (Lytle et al., 2001). Étonnamment CD630_27970 est la seule protéine chez C. difficile à contenir un tel domaine alors que les bactéries reconnues comme ayant un cellulosome ont plusieurs protéines de ce type (p. ex. Clostridium cellulotycum en contiendrait 62 selon SUPFAM). Les séquences protéiques « Galactose-binding domain-like » sont des domaines de liaison spécifiques aux sucres (SUPFAM). Bien que les domaines détectés laissent présager que CD630_27970 pourrait être une glycosyl hydrolase, étonnamment aucun domaine catalytique n’a été détecté. Par contre, de longues portions de la protéine ne contiennent pas de domaine prédit et pourraient éventuellement contenir un domaine catalytique non prédit. Cette protéine serait extracellulaire d’après les prédictions de LocateP (Zhou et al., 2008) mais pourrait être

161 fixée à la surface des bactéries par le « Type I dockerin domain ». Le gène CD630_27970 se retrouve d’ailleurs dans le groupe de gènes codant des protéines de surface (CD630_27820- CD630_27990) (Karjalainen et al., 2001; Sebaihia et al., 2006). Cette région de gènes encode entre autres l’adhésine Cwp66 (Varga et al., 2006; Waligora et al., 2001) et la cystéine protéase Cwp84 impliquée dans la dégradation de matrices extracellulaires protéiques (Janoir et al., 2007) et dans la maturation de la «surface layer protein A» SlpA, aussi encodée dans cette région (Janoir et al., 2007; Kirby et al., 2009). Hypothétiquement, CD630_27970 pourrait être une glycosyl hydrolase ou agir comme adhésine, par la liaison de sucres de surface.

Quatre protéines putatives très similaires de 58 acides aminées (identité de plus de 95 % en acides aminés) sont également codées par un gène en aval de riborégulateurs c-di-GMP-I (CD630_14240, CD630_19903, CD630_23090, CD630_33682, VBICloDif38397_2078) (Tableau 2 et Tableau 3). Ces protéines ne contiennent aucun domaine conservé. Une prédiction de la structure secondaire de CD630_14240 avec PSIPRED (Buchan et al., 2010) suggère que la structure de ces protéines de 58 acides aminées consiste en une longue hélice- contenant 21 résidus hydrophobes distribués sur toute la longueur du peptide ainsi que 5 résidus chargés positivement (Arginine, lysine et histidine) situés dans les deux extrémités. Les protéines présentent donc des caractéristiques similaires à certains peptides antimicrobiens, soit une structure en forme hélice- hydrophobe ainsi qu’une charge positive (Huang et al., 2010). Bien que l’analyse de la séquence des quatre peptides permette difficilement de prédire qu’il s’agit véritablement de peptides antimicrobiens, il serait intéressant d’explorer la possibilité que ces peptides aient une activité sur les membranes de bactéries d’autres espèces ou encore des cellules de l’hôte.

Enfin, une protéine avec un domaine GGDEF dégénérés, CD630_10280, pourrait également être un effecteur du c-di-GMP chez C. difficile (Chapitre 2, Figure 1). En effet, tel que

162 mentionné ci-dessus, certaines protéines GGDEF n’ont pas d’activité DGC en raison de substitutions au niveau de leur site-A mais on tout de même la capacité de lier le c-di-GMP au niveau de leur site-I (voir section 1.2.3.2 — Protéines avec domaines GGDEF ou EAL non catalytiques). En plus de ces caractéristiques, la protéine CD630_10280 contiendrait quatre régions transmembranaires et un motif coiled-coil qui pourraient respectivement localiser cet effecteur du c-di-GMP putatif au niveau de la membrane cytoplasmique et lui permettre de former un dimère et/ou d’agir sur l’activité d’autres protéines par des interactions protéine- protéine.

Ainsi, l'ampleur des cibles potentielles du c-di-GMP chez C. difficile est bien vaste et quelque peu déroutante, bien que des progrès notables aient été effectués durant les dernières années. En plus de déterminer les effecteurs du c-di-GMP et leurs rôles respectifs, il apparait crucial de s’interroger sur l’importance individuelle des multiples DGC et PDE afin de tenter de comprendre comment le c-di-GMP permet une régulation cohérente des phénotypes de C. difficile.

163 4.3 — Multiplicité des DGC et PDE

Tel que mentionné ci-dessus, le nombre de DGC et PDE peut varier grandement selon les espèces (voir section 1.2.1.1 — Distribution des DGC et PDE). L’identification et la caractérisation d’un grand nombre de DGC et de PDE chez C. difficile (Chapitre 2) ont permis d’exposer le grand potentiel de régulation du c-di-GMP chez cette espèce. Bien que plusieurs protéines avec des domaines GGDEF, EAL et HD-GYP étaient prédites chez une variété de bactéries à Gram positif, nos travaux constituaient la première étude démontrant leur fonctionnalité en tant que DGC et PDE régulant le c-di-GMP chez une Firmicutes. La prédiction des protéines avec des domaines GGDEF, EAL et HD-GYP dans les génomes des Clostridiaceae séquencés à ce moment a révélé que même les espèces les plus proches phylogénétiquement telle que C. hiranonis (1 domaine GGDEF) et C. bartletti (aucun domaine) ont très peu de DGC et PDE potentielles en comparaison de C. difficile (Chapitre 2, Figure S3 et Table S3). Depuis, C. difficile a été classé parmi la nouvelle famille des Peptostreptococcaceae (Yutin et Galperin, 2013). Néanmoins, les deux espèces les plus proches de C. difficile dans l’arbre phylogénétique des Clostridiaceae, C. hiranonis et C. bartletti, ont également été placées parmi la nouvelle famille des Peptostreptococcaceae. En raison de la disponibilité de nouveaux génomes séquencés et la nouvelle classification de C. difficile, il pourrait être intéressant de refaire ces analyses.

Il n’en demeure pas moins que plusieurs DGC et PDE fonctionnelles semblent participer à la régulation par le c-di-GMP chez C. difficile. En effet, j’ai pu démontré clairement la fonctionnalité d’une DGC, DccA (CD630_14200, anciennement CD1420) et d’une PDE, CD630_07570 (anciennement CD0757) par des essais enzymatiques in vitro (Chapitre 2, Figure 5). Les travaux de Purcell et al. (2012) ont également permis de confirmer l’activité DGC de DccA et de démontrer l’activité PDE de PdcA (CD630_15150, anciennement CD1515) par des essais enzymatiques in vitro. De plus, l’activité DGC et PDE de 31 des 37 protéines avec un domaine GGDEF et/ou EAL de C. difficile 630 a été évaluée indirectement.

164 Forts de notre expérience avec l’étude du c-di-GMP chez V. cholerae (Annexe 1), nous avons exprimé les protéines de C. difficile dans V. cholerae et leur impact opposé sur la motilité flagellaire et la formation de biofilm a été mesuré afin de pouvoir évaluer facilement l’activité des 31 protéines les plus conservées. Une majorité de DGC et de PDE ont modifié la motilité et la formation de biofilm de V. cholerae tel qu’attendu pour des DGC et des PDE actives (Chapitre 2, Figure 3). Bien que cette approche ait permis d’évaluer rapidement l’activité d’un grand nombre de DGC et de PDE, les conclusions pouvant être tirées sont sujettes à certaines précautions. Tel que discuté au Chapitre 2, les DGC et PDE prédites n’ayant pas démontré d’activité détectable par cette approche ne sont pas pour autant nécessairement inactives. Une expression inadéquate et l’absence de cofacteur ou de stimulus dans nos conditions de culture de cette hôte hétérologue sont des hypothèses plausibles qui pourraient expliquer l’absence de phénotype observable lors de l’expression hétérologue de certaines de ces protéines. Pour ces mêmes raisons, il serait hasardeux de conclure qu’une protéine produit ou dégrade le c-di- GMP plus efficacement qu’une autre seulement sur la base des résultats obtenus avec cette approche d’expression hétérologue. De plus cette approche repose en partie sur la présomption que : 1) les DGC et les PDE participent à la régulation d’un pool unique et homogène de c-di- GMP et que 2) la concentration intracellulaire de c-di-GMP est proportionnelle à l’effet de ce second messager sur les phénotypes qu’il régule. Or, cette conception est sujette à discussion et ne fait pas l’unanimité. Une étude n’a pu démontrer une corrélation claire entre l’augmentation de la formation de biofilm de V. cholerae et la concentration intracellulaire de c-di-GMP issue de l’expression de huit DGC différentes de V. cholerae (Massie et al., 2012). Par contre, les concentrations croissantes de c-di-GMP issues de l’expression croissante de chacune des DGC individuellement ont pu être corrélées avec l’augmentation de la formation de biofilm (Massie et al., 2012). Ces observations peuvent être expliquées par des différences dans la localisation des protéines. On peut imaginer que le c-di-GMP produit par une DGC localisée à un pôle de la bactérie pourrait mettre plus de temps à atteindre un effecteur du c-di- GMP localisé à l’autre pôle de la cellule que le c-di-GMP produit par une DGC avoisinante. Par contre, selon une estimation de la vitesse de diffusion de petites molécules telle que l’AMPc (Zaccolo et al., 2006), le c-di-GMP pourrait théoriquement atteindre une

165 concentration intracellulaire homogène en l’espace de quelques millisecondes (Mills et al., 2011). Néanmoins, il a été proposé que des PDE cytoplasmiques puissent limiter la quantité de c-di-GMP localement et ainsi permettre des gradients de c-di-GMP intracellulaire. Ainsi, plusieurs hypothèses raisonnables, autres qu’une différence d’activité DGC ou PDE, peuvent expliquer les différences observées entre les différentes protéines dans notre modèle d’expression.

Dans le même ordre d’idée, la majorité des DGC et les PDE de C. difficile ne sont probablement pas redondantes et ne participeraient pas nécessairement toutes à la régulation des mêmes phénotypes. Tel que mentionné au Chapitre 2, les 37 DGC et PDE de C. difficile, ont des domaines capteurs (sensor domains) variés qui pourraient permettre une régulation post-traductionnelle spécifique de certaines DGC et PDE afin de répondre directement ou indirectement à la présence d’un stimulus environnemental donné (voir section 1.2.1 — Synthèse et dégradation du c-di-GMP). Par exemple, la DGC CD630_05370 (anciennement CD0537) contient un domaine REC et pourrait être activée par la phosphorylation de ce domaine par la kinase putative CheA, à l’instar de la DGC WspR chez P. aeruginosa (Hickman et al., 2005). En effet, le gène codant cette DGC est situé parmi une région de gènes potentiellement liés au chimiotactisme. Ainsi, un signal extracellulaire pourrait agir sur l’activité des flagelles par la voie normale du chimiotactisme (c.-à-d., la phosphorylation de CheY par CheA pour moduler son interaction avec le moteur du flagelle) et/ou en régulant l’activité de la DGC putative CD630_05370. Le c-di-GMP produit par la DGC activée pourrait alors réduire l’expression des gènes de flagelles par la liaison au riborégulateur Cd1.

Plusieurs des DGC et PDE de C. difficile ont des régions transmembranaires ou un peptide signal prédits. Tel que discuté ci-dessus, certaines DGC et PDE pourraient avoir un impact distinct sur différents effecteurs en raison de leur localisation relative. Alors que cette hypothèse est facilement concevable si le c-di-GMP interagit avec des effecteurs protéiques,

166 elle semble a priori beaucoup moins attrayante si les effecteurs sont des riborégulateurs. Par contre, la localisation des ARNm, incluant ceux contentant des riborégulateurs, n’est pas aussi aléatoire que l’on pourrait imaginer. Une étude a permis de démontrer que les ARNm de C. crescentus et E. coli ne diffusent pas librement dans toute la cellule, mais qu’ils demeurent plutôt localisés à proximité de l’ADN matrice de ceux-ci (Montero Llopis et al., 2010). De plus, suite à la compaction du chromosome ou à la liaison de régulateurs à l’ADN, des gènes (et les ARNm et protéines exprimés par ceux-ci) a priori éloignés sur le chromosome pourraient se retrouver à proximité (Cournac et Plumbridge, 2013; Rocha, 2008). Ainsi, on ne peut totalement exclure que certains riborégulateurs c-di-GMP soient régulés par des DGC et PDE spécifiques par leur colocalisation.

Le contrôle de l’expression de certaines DGC et PDE pourrait également leur donner une certaine spécificité d’action. La caractérisation récente de plusieurs régulateurs transcriptionnels (CodY, Spo0A, SigH et AgrA) et des études portant sur les variations du transcriptome en fonction de la phase de croissance ou de différents stress ont permis de démontrer que la transcription de plusieurs des gènes codant pour des DGC et des PDE est régulée et ne se fait pas de façon constitutive (Dineen et al., 2010; Emerson et al., 2008; Martin et al., 2013; Pettit et al., 2014; Saujet et al., 2011). L’analyse rapide de cette masse de données ne permet pas pour la plupart d’émettre des hypothèses sur l’impact de ces régulateurs ou d’un stress particulier sur la régulation globale du c-di-GMP. Par contre, l’étude récente du locus de quorum sensing agr de C. difficile révèle néanmoins des informations qui méritent d’être soulevées (Martin et al., 2013). L’expression des gènes de trois PDE dans la souche R20291 (homologues de CD630_ 07570, CD630_14210, et CD630_16160) est diminuée dans un mutant du gène agrA, le régulateur transcriptionnel du locus de quorum sensing agr (Martin et al., 2013). Par ailleurs, la transcription de tcdA et des gènes de flagelles est réduite chez ce mutant. Il semble donc plausible que le locus agr puisse augmenter la transcription des gènes codant pour la toxine TcdA et les flagelles en diminuant la quantité de c-di-GMP intracellulaire via l’expression des trois PDE (Figure12).

167

Finalement, il serait donc intéressant de déterminer le rôle individuel de certaines des DGC et des PDE de C. difficile. L’inactivation de chacune des DGC et PDE individuelle est susceptible de révéler la spécificité de ces enzymes dans la régulation de phénotypes particuliers et en réponse à des stimuli spécifiques.

168 CONCLUSION

La signalisation par le c-di-GMP est maintenant un mécanisme incontournable de régulation des phénotypes chez les bactéries. En effet, ce diribonucléotide serait une molécule largement répandue chez les diverses espèces bactériennes selon la prédiction des enzymes de synthèse et de dégradation du c-di-GMP, les DGC (domaine GGDEF) et les PDE (domaines EAL ou HD-GYP), respectivement (Romling et al., 2013). Particulièrement étudié chez de multiples bactéries à Gram négatif, le c-di-GMP régule de multiples effecteurs impliqués principalement dans la motilité, la formation de biofilm, l’adhésion et parfois la production de toxines. C. difficile est une bactérie pathogène responsable de diarrhées, de colites qui peuvent être accompagnées d’une inflammation importante pouvant mener à une septicémie et au décès du patient. Les phénotypes important dans la pathogenèse et leur régulation demeurent en grande partie méconnus chez cette espèce pour laquelle peu d’outils de biologie moléculaire étaient disponibles jusqu’à très récemment. Intrigués par le haut nombre de gènes codant pour des DGC et des PDE putatives (37 protéines avec des domaines conservées GGDEF et EAL) dans le génome de C. difficile 630, nous avons entrepris d’étudier ce messager secondaire chez cette espèce à Gram positif dans le but de déterminer ultimement les rôles du c-di-GMP chez cette bactérie.

Le premier objectif de mon doctorat était d’évaluer le potentiel de régulation du c-di-GMP chez C. difficile en évaluant l’activité DGC et PDE des protéines contenants des domaines GGDEF et/ou EAL prédits. L’activité d’une majorité de DGC et PDE de C. difficile a été observée indirectement de façon exhaustive grâce à un modèle de surexpression hétérologue de ces protéines dans V. cholerae. De plus, l’activité d’une DCG et d’une PDE a été confirmée par des essais enzymatiques in vitro. Ces résultats ont permis de confirmer l’importance du c- di-GMP chez C. difficile et constituent du même coup la première étude démontrant la fonctionnalité de la signalisation à c-di-GMP chez les Firmicutes.

169 La découverte des riborégulateurs c-di-GMP-I et c-di-GMP-II et la prédiction de 16 de ces riborégulateurs chez C. difficile 630 a ensuite permis de préciser mon deuxième projet de doctorat (Lee et al., 2010; Sudarsan et al., 2008). Je me suis penché sur le riborégulateur c-di- GMP-II Cdi2_4 et les gènes en aval de ce dernier, le locus principal de synthèse des T4P, des structures qui n’étaient pas encore étudiées chez cette espèce. Mes travaux de doctorat ont permis de démontrer le rôle des pili de T4P dans l’agrégation de C. difficile et la régulation de leur expression par le riborégulateur c-di-GMP-II Cdi2_4. L’augmentation de la concentration de c-di-GMP intracellulaire entraine une augmentation de la transcription des gènes de T4P, la formation de T4P à la surface des cellules et l’agrégation dépendante des T4P. De plus, la liaison du c-di-GMP au riborégulateur Cdi2_4 prévient la formation d’un terminateur transcriptionnel et favorise ainsi la transcription des gènes de T4P en aval.

Les travaux récents de plusieurs équipes et mes résultats de doctorat ont permis de mettre en lumière certains aspects de la signalisation à c-di-GMP chez C. difficile. Il apparaît maintenant évident que la signalisation à c-di-GMP chez C. difficile est impliquée globalement dans la transition entre les états mobiles et adhérés ainsi que dans la production de toxines, à l’instar de chez plusieurs autres espèces bactériennes. Néanmoins, la régulation de ces phénotypes repose en grande partie sur des effecteurs du c-di-GMP distincts de ceux décrits jusqu’à présent chez les autres bactéries étudiées, soit une quinzaine de riborégulateurs c-di-GMP. En effet, bien que des riborégulateurs c-di-GMP-I soient prédits chez certaines bactéries à Gram négatif, celles-ci semblent plutôt compter sur une variété déconcertante d’effecteurs protéiques du c-di-GMP. Ces différences d’effecteurs entre les espèces alimentent l’intérêt d’approfondir la caractérisation de la signalisation à c-di-GMP chez C. difficile mais aussi, dans d’autres espèces afin de mieux comprendre certains aspects plus généraux de cette signalisation. Notamment, l’étude de divers effecteurs est susceptible d’amener un éclairage nouveau concernant le débat à savoir si l’ensemble des DGC et des PDE participe à un pool global de c-di-GMP et/ou si certaines de ces enzymes sont seulement dédiées à la régulation d’effecteurs spécifiques.

170 ANNEXE 1

171 Environmental Microbiology (2010) 12(2), 510–523 doi:10.1111/j.1462-2920.2009.02094.x

Beyond antibiotic resistance: integrating conjugative elements of the SXT/R391 family that encode novel diguanylate cyclases participate to c-di-GMP signalling

in Vibrio choleraeemi_2094 510..523

Eric Bordeleau,† Eric Brouillette,† originating from Asia, Africa and Central America. We Nathaniel Robichaud and Vincent Burrus* propose that besides conferring usual antibiotic Centre d’étude et de valorisation de la diversité resistances, dgcKL-bearing SXT/R391 ICEs could microbienne (CEVDM), Département de biologie, enhance the survival of vibrios in aquatic environ- Faculté des sciences, Université de Sherbrooke, QC, ments by increasing c-di-GMP level. Canada, J1K 2R1. Introduction Summary Vibrio cholerae, the causative agent of cholera, is a In Vibrio cholerae, the second messenger bis-(3Ј-5Ј)- natural inhabitant of aquatic environments that can tran- cyclic dimeric guanosine monophosphate (c-di-GMP) siently colonize the human small intestine. One of the key increases exopolysaccharides production and biofilm factors that enable the survival of the bacterium in aquatic formation and decreases virulence and motility. As habitats is its capacity to form biofilms (rugose variant) such, c-di-GMP is considered an important player in (Faruque et al., 2006). On the contrary, the motile pheno- the transition from the host to persistence in the type (smooth variant) is of major importance for the intes- environment. c-di-GMP level is regulated through tinal colonization of the host (Butler and Camilli, 2004). a complex network of more than 60 chromosomal Therefore, part of the epidemic potential of V. cholerae is genes encoding predicted diguanylate cyclases a direct outcome of its ability to switch between these two (DGCs) and phosphodiesterases. Herein we report phenotypes. In V. cholerae, and in many other bacteria, the characterization of two additional DGCs, DgcK the transition from free-living, motile to biofilm lifestyle is and DgcL, encoded by integrating conjugative ele- controlled by the ubiquitous second messenger molecule ments (ICEs) belonging to the SXT/R391 family. SXT/ bis-(3′-5′)-cyclic dimeric guanosine monophosphate R391 ICEs are self-transmissible mobile elements that (c-di-GMP) (Jenal and Malone, 2006; Romling and are widespread among vibrios and several species of Amikam, 2006). enterobacteria. We found that deletion of dgcL Intracellular levels of c-di-GMP are regulated by the increases the motility of V. cholerae, that overexpres- antagonistic activities of diguanylate cyclases (DGCs) sion of DgcK or DgcL modulates gene expression, and phosphodiesterases (PDEs), which, respectively, biofilm formation and bacterial motility, and that a produce and degrade the cyclic compound in response to single amino acid change in the active site of either various environmental stimuli. Although the gathered enzyme abolishes these phenotypes. We also show information regarding the implication of DGCs, PDEs and that DgcK and DgcL are able to synthesize c-di-GMP c-di-GMP in cellular processes for several species of bac- in vitro from GTP. DgcK was found to co-purify with teria is rapidly growing, the pathways followed from stimu- non-covalently bound flavin mononucleotide (FMN). lus sensing to regulation of gene expression only begin to DgcL’s enzymatic activity was augmented upon be understood. Diguanylate cyclases have a conserved phosphorylation of its phosphorylatable response- GGDEF domain, whereas PDEs contain EAL or HD-GYP regulator domain suggesting that DgcL is part of a domains (Ryjenkov et al., 2005; Schmidt et al., 2005; two-component signal transduction system. Interest- Ryan et al., 2006). Each one of these domains is usually ingly, we found orthologues of dgcK and dgcL in associated with one or several input receiver or signal several SXT/R391 ICEs from two species of Vibrio transduction domain(s). Vibrio cholerae was predicted to contain 62 genes encoding proteins with GGDEF, EAL or Received 28 July, 2009; accepted 17 September, 2009. *For HD-GYP domains (Galperin et al., 2001), few of which correspondence. E-mail [email protected]; Tel. (+1) 819 821 8000 ext. 65223; Fax (+1) 819 821 8049. †E.B. and E.B. have been extensively studied. Proteins containing PilZ contributed equally to this work. domains that bind c-di-GMP, non-PilZ domain c-di-GMP

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd Diguanylate cyclases on ICEs 511 receptor proteins such as PelD or CdgG and newly dis- Mexico (Burrus et al., 2006b). Unlike most SXT/R391 covered c-di-GMP binding riboswitches are downstream ICEs, ICEVchMex1 does not confer resistance to any effectors that relay signals to cellular processes and gene known antibiotic, raising the possibility that SXT/R391 expression (Ryjenkov et al., 2006; Christen et al., 2007; ICEs could confer other advantages to the host bacterium Lee et al., 2007; Pratt et al., 2007; Beyhan et al., 2008; during its life cycle. Sudarsan et al., 2008). Here, we show that ICEVchMex1 and several other Besides regulating biofilm formation, motility and ICEs of the SXT/R391 family encode two proteins with smooth-to-rugose phase variation, c-di-GMP has also DGC activity. When overexpressed in V. cholerae, these been shown to regulate genes involved in virulence of two proteins, DgcK and DgcL, modulate gene expression, V. cholerae, suggesting that it is critical to the bacterium motility and biofilm formation. In addition, we found that natural life cycle (Tischler and Camilli, 2005). The high DgcK binds non-covalently flavin mononucleotide (FMN), number of genes involved in c-di-GMP metabolism and its whereas DgcL requires phosphorylation for DGC activity. pleiotropic effect are a hallmark of the complexity of c-di- Together, our data suggest that ICEs of the SXT/R391 GMP pathways in V. cholerae. This complexity is further family can potentially modulate the behaviour of V. chol- enhanced by our discovery of genes encoding predicted erae by altering its c-di-GMP signalization pathways. DGCs borne by integrating conjugative elements (ICEs) Additionally, given their intrinsic ability to disseminate to a of the SXT/R391 family that thrive among vibrios. Integrat- broad range of hosts, these ICEs could also play a similar ing conjugative elements are self-transmissible mobile role in other species of vibrios and in enterobacteria. elements that usually reside integrated into the bacterial chromosome and eventually disseminate via conjugation (Seth-Smith and Croucher, 2009). Integrating conjugative Results elements can excise from the donor cell chromosome and Two genes that encode putative DGCs are found in form circular molecules that are thought to be the sub- members of the SXT/R391 family strates for the conjugative machinery that they encode. Integrating conjugative elements are widespread among Nucleotide sequence analysis of the variable region diverse taxonomic groups of bacterial species and are located between the conserved genes s073 and traF of able to transfer between genetically unrelated bacteria ICEVchMex1 (Accession No. DQ180352) revealed five (Burrus et al., 2002; Burrus and Waldor, 2004; Seth-Smith open reading frames (ORFs) (Fig. 1A). Two of these and Croucher, 2009). The SXT/R391 family of ICEs, one ORFs, s074 and s075, are nearly identical to those found of the largest and most diverse set of ICEs studied to in the corresponding locus of SXTMO10 (Accession No. date, includes elements that have been detected in clini- AY055428). s074 and s075 were predicted to encode a cal and environmental isolates of several species of response regulator and a histidine kinase respectively g-proteobacteria from four continents over the past (Beaber et al., 2002). The three other ORFs of ICEVch- 40 years (Burrus et al., 2006a). The 99.5 kb ICE SXTMO10 Mex1, provisionally named mex03, mex04 and mex05, was originally discovered in one of the initial V. cholera are absent from SXTMO10 and encode proteins of unknown O139 clinical isolates from India in late 1992 (Waldor function (Burrus et al., 2006b). Protein BLAST analysis and et al., 1996). SXTMO10 and closely related ICEs confer comparison with the NCBI CDD and Pfam databases of resistance to sulfamethoxazole, trimethoprim, streptomy- the translation products of mex05 and s074 predicted that cin and chloramphenicol. SXTMO10-related ICEs have both proteins have a putative GGDEF domain (PF00990). become widespread in clinical V. cholerae isolates in Asia Our DNA sequence alignment analysis showed that both and Africa in the past 15 years (Burrus et al., 2006a). GGDEF domains carry GGEEF residues in the active site Based on the analysis of several SXT/R391 ICEs (Beaber (A-site) (Fig. 1C), which have been shown to be essential et al., 2002; Boltner et al., 2002; Pembroke and Piterina, for the function of the DGC enzymes (Malone et al., 2006; Osorio et al., 2008), these elements all contain a 2007). Both domains also carry a predicted inhibitory site conserved set of ~25 genes that mediate their common (I-site) with a conserved RXXD motif (Fig. 1C) that has functions that include excision/integration, conjugative been shown to bind c-di-GMP (Christen et al., 2006). transfer and regulation. Distinct variable regions that From this point on, for reasons that will become clear confer element-specific phenotypes, such as resistance below, mex05 and s074 are referred to as dgcK and dgcL, to antibiotics or heavy metals are interspersed within this respectively, for diguanylate cyclase. conserved and syntenous SXT/R391 backbone; yet most To assess whether dgcK and dgcL genes are frequently of the genes found in these variable regions remain of found on ICEs of the SXT/R391 family, we carried out unknown function. In 2001, an ICE of the SXT/R391 PCR and Southern blot experiments designed to detect family, ICEVchMex1, was identified in an environmental dgcK, dgcL and s075 in the genome of 12 strains of V. cholerae isolate from sewage in San Luis Potosi, clinical and environmental V. cholerae bearing SXT/R391

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 512 E. Bordeleau, E. Brouillette, N. Robichaud and V. Burrus

Fig. 1. Sequence analysis of the s073-traF region of several SXT/R391 ICEs reveals genes encoding predicted DGCs. A. Schematic representation of the s073-traF variable region of ICEVchMex1 and of the predicted protein domains of DgcK, DgcL and S074. Previous gene annotations are shown in parenthesis. B. Detection of dgcK, dgcL and s075 in SXT/R391 ICEs. Numbers indicate the percentage of identity relatively to the corresponding genes of ICEVchMex1 in (A). The plus symbol (+) indicates that the gene was detected but not sequenced and the minus symbol (-) indicates that the gene was absent. C. Multiple sequence alignment of Pseudomonas aeruginosa Pa3702 protein WspR (3BRE_A), Caulobacter crescentus CB15 protein PleD (AAA87378), DgcK and DgcL from ICEVchMex1, and Orf53 from R391 (AAM08056) at the predicted I-site and A-site of the GGDEF domain. Conserved residues are shown.

ICEs, isolated from India, Bangladesh, Angola and free strain, to investigate the effect of each deletion on cell Mozambique (Table 2). In addition, we also analysed motility and biofilm formation. Based on previous studies Vibrio parahaemolyticus 21AMOZ (R391-like ICE), Vibrio regarding DGCs and PDEs, deletion of dgcK or dgcL was fluvialis H-08942 (ICEVflInd1) and Providencia alcalifa- expected to alter V. cholerae colony appearance, diminish ciens BI731 (ICEPalBan1). This screening revealed that the ability of the bacterium to form biofilm and enhance its besides ICEVchMex1, two other ICEs, ICEVflInd1 and motility. We did not observe any change in biofilm forma- ICEVchMoz3, also contain dgcL and dgcK (Fig. 1B). This tion using any of the mutants compared with wild-type indicates that members of the SXT/R391 family harbour- ICEVflInd1, in both static and agitated condition, at 25°C, ing genes encoding potential DGCs are not limited to 30°C or 37°C (data not shown). V. cholerae from India or Mexico, but are also found in However, we observed that motility of V. cholerae was other species of Vibrio from diverse geographic area. increased for DdgcL or DdgcKL mutations compared with wild type (Fig. 2A and B). This result was consistent with our hypothesis that dgcL encodes a DGC. We were Deletion of dgcL in ICEVflInd1 augments the motility unable to observe any significant change in motility or of V. cholerae morphology for DdgcK compared with the wild-type ICE, To characterize the function of dgcK and dgcL and their suggesting either that dgcK was not expressed or that the contribution to complex c-di-GMP signalling pathways of enzyme was not active in our experimental conditions. V. cholerae, we created in-frame deletions of each indi- vidual gene and a deletion of both genes in ICEVflInd1. Overexpression of dgcK or dgcL alters V. cholerae ICEVflInd1 was preferred to ICEVchMex1 as this latter biofilm formation and motility ICE is downregulated and requires the addition of mito- mycin C for efficient transfer (Burrus et al., 2006b). In Given the conservation of the GGEEF catalytic residues addition, ICEVchMex1 transferred 2 and 3 log less fre- in the GGDEF domains found in DgcK and DgcL, we quently than SXTMO10 and ICEVflInd1 between Escheri- hypothesized that both proteins were functional DGCs. To chia coli strains, and conjugative transfer from E. coli to test this hypothesis, we constructed an arabinose- V. cholerae was undetectable (data not shown). Wild-type inducible version of either gene in the pBAD24 vector and ICEVflInd1 and its deletion mutants were transferred into the resulting plasmids, pDgcK and pDgcL, were intro- V. cholerae N16961, a well-studied, sequenced and ICE- duced into V. cholerae N16961.

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 Diguanylate cyclases on ICEs 513

Fig. 2. Effect of DdgcK, DdgcL and DdgcKL mutations in ICEVflInd1 on motility of V. cholerae N16961. A. Motility of the wild-type (WT) and mutant strains on LB soft agar plates at 30°C. The picture shown is representative of four independent experiments. B. Comparison of the motility of the same strains in the same conditions. Error bars represent the standard deviation. One-way ANOVA with a Tukey–Kramer post-test was used to compare the means of surface area of the colonies. The confidence intervals for the comparisons of the mutants relatively to WT were P < 0.01 (**) and P < 0.05 (*). ns, non-significant.

As predicted, and in concordance with the expression of functional and active DGC enzymes, colony morphology was clearly affected under inducing conditions in the pres- ence of either pDgcK or pDgcL but not in the presence of pBAD24 (empty vector) (Fig. 3A). Colonies of V. cholerae expressing dgcK or dgcL appeared denser and more

Fig. 3. Phenotypic changes triggered by the overexpression of dgcK and dgcL in V. cholerae N16961. A. Rugose colony morphology and biofilm formation on borosilicate glass upon expression of dgcK or dgcL. B. Comparison of biofilm formation upon expression of dgcK and dgcL in microplates. Error bars represent the standard deviation. C. Effect of expression of dgcK and dgcL on motility of the strain on soft agar plates. The data shown are representative of three independent experiments. D. Biofilm-, virulence- and motility-related genes are modulated by the expression of dgcK and dgcL. The expression of genes VC918 (biofilm), VC936 (biofilm), VC0834, VC1450, VC1888 (heamolysin-related), VC1008 (motility) and VC2188 (motility) was quantified by quantitative PCR upon overexpression of dgcK or dgcL. The graph shows differential gene expression values (DDCt) compared with V. cholerae N16961 transformed with empty pBD24 also grown in the presence of arabinose.

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 514 E. Bordeleau, E. Brouillette, N. Robichaud and V. Burrus opaque. Moreover, the colony surface appeared granu- biofilm-associated protein 1, has been shown to be lated, similarly to rugose phenotype frequently observed induced by crude bile (Hung et al., 2006), to be more when V. cholerae secretes exopolysaccharides. expressed in a rugose colony variant compared with the When V. cholerae N16961 was incubated with agitation smooth version isolate of the same strain (Fong and in LB broth supplemented with L-arabinose, expression of Yildiz, 2007) and to be required for biofilm accumulation dgcK or dgcL led to the production of a thick biofilm on the (Moorthy and Watnick, 2005). wall of borosilicate glass tubes (Fig. 3A) in striking con- When expression of dgcL was induced, no significant trast with the same strain containing the empty vector. change of expression of VC0834 or VC1450 was detected. Quantification of the capacity of the same strains to form However, the expression of the biofilm markers VC0918 biofilm under static conditions in polystyrene microplates and VC0936 increased by more than 112- and 26-fold, in LB broth showed that the bacteria expressing dgcK and respectively (Fig. 3D), whereas the biofilm-coregulated dgcL formed up to five times more biofilm than the control. gene VC1888 increased 42-fold when compared with the We also observed that the amount of biofilm formed cor- control. Furthermore, the expression of VC1008 and related with the concentration of inducer (Fig. 3B) and that VC2188, genes involved in motility, was reduced by 2.9- the bacteria expressing dgcL formed slightly more biofilm and 2.6-fold respectively. These results are consistent with than those expressing dgcK. the phenotypical observations described above, where The rugose phenotype usually correlates with biofilm more biofilm was produced while the bacterium’s capacity production. Both phenotypes are mutually exclusive to the to move through soft agar was reduced. smooth phenotype that correlates with enhanced motility A similar gene expression pattern was obtained when of V. cholerae cells. Expecting to observe a reduced motil- dgcK was expressed; however, differences in expression ity, we conducted motility assays on soft agar plates with level were slightly less marked than for dgcL, especially V. cholerae N16961 overexpressing dgcK or dgcL. Over- for genes encoding proteins involved in motility. expression of either gene led to a markedly reduced motil- Our results are comparable to those obtained by Beyhan ity on soft agar compared with empty pBAD24 (Fig. 3C). and colleagues (2006) during the whole-genome expres- No significant difference of motility could be observed sion profiling carried out after overexpression of the DGC between the cells harbouring the empty vector and pDgcK enzyme VCA0956 in a V. cholerae El Tor biotype strain. or pDgcL when expression was not induced. Taken together, these results demonstrate that overexpression of dgcK or dgcL has an influence on biofilm formation and A single amino acid replacement in the GGEEF motif motility. abolishes the effects of DgcK and DgcL It has been previously shown that replacement of the Effect of dgcK and dgcL overexpression on expression second glycine in the GGDEF domain is sufficient to of biofilm- and motility-related genes greatly reduce the DGC enzymatic activity (Kirillina et al., 2004). Based on these results, we hypothesized that the The apparent loss of motility observed upon overexpres- substitution of the second glycine residue of the GGEEF sion of dgcK or dgcL could be a direct consequence of an motif by a glutamic acid residue in both DgcK (pDgcK- increased production of exopolysaccharides. Conversely, G333E) and DgcL (pDgcL-G342E) (Fig. 4A) would inhibit loss of motility caused by events such as low expression c-di-GMP synthesis and ultimately prevent their overex- of flagellum structural genes could also favour biofilm pression from affecting the biofilm and motility pheno- formation. To investigate the specific effects of dgcK and types. Such a result would virtually confirm that the two dgcL expression at the transcriptional level, the expres- genes under investigation encode genuine DGCs. sion of seven V. cholerae genes selected for their respon- As expected, the capacity of V. cholerae N16961 siveness to c-di-GMP concentration in the cell (Beyhan expressing dgcKG333E or dgcLG342E to swim on soft agar as et al., 2006) was measured by quantitative PCR: two well as its ability to form biofilm was similar to the control genes implicated in biofilm formation [VC0936, a polysac- (Fig. 4B and C). These results clearly indicate that the charide export-related protein; VC0918, a UDP-N-acetyl- DGC activity became marginal for both proteins when the D-mannosaminuronic acid dehydrogenase (espD)], two second glycine of the GGEEF A-site was replaced by a genes related to motility [VC1008, a sodium-type flagellar glutamate. protein (motY); VC2188, the flagellin core protein A (flaA)] as well as three genes related to pathogenesis [VC1450, the Rtx toxin activating protein (rtxC); VC1888, an DgcK is a flavoprotein that has DGC activity in vitro haemolysin-related protein; VC0834, the toxin- coregulated pilus biosynthesis protein S (tcpS)]. VC1888, The purification of DgcK and DgcL from E. coli BL21 was also called vcp for vps-coregulated protein or Bap1 for carried out in order to show indubitably that both proteins

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 Diguanylate cyclases on ICEs 515

potential (Christie et al., 1999; Taylor and Zhulin, 1999). Since flavins have a typical yellow colour, we postulated that a flavin cofactor was likely bound to DgcK in its native form and released during denaturation. The emission spectrum of flavin fluorescence is known to peak at 525 nm and the maxima of the excitation (absorbance) spectrum are reached at 360–390 and 440–470 nm (Munro and Noble, 1999). Interestingly, when a flavin is bound to form a holoprotein, fluorescence is usually strongly quenched, leading to a marked reduction of emis- sion and excitation intensities (Munro and Noble, 1999). We purified DgcK from V. cholerae, and analysed by fluo- rescence spectroscopy both the native and denatured form after thermal treatment. Typical flavin spectra (exci- tation and emission) were observed for the denatured DgcK apoprotein while nearly no fluorescence was detected for the native holoprotein (Fig. 5A). These results strongly suggested that a flavin was non- covalently bound to native DgcK. To identify the flavin, we carried out thin-layer chromatography (TLC) analysis of a sample concentrated from the supernatant of boiled DgcK run in parallel with commercially available riboflavin (RF), FMN and flavin adenine dinucleotide (FAD). Based on the retention factor (Rf) values calculated, the bound flavin was found to be FMN (Fig. 5B). In the presence of GTP, purified DgcK was able to produce c-di-GMP in vitro as revealed by TLC (Fig. 5C). The quantity of c-di-GMP produced increased with longer incubation times. Enzymatic activity was abolished when protein samples were inactivated by boiling before the assay, confirming the direct implication of DgcK in the formation of c-di-GMP.

Phosphorylation of DgcL is required to augment its DGC activity in vitro Unlike DgcK, purified DgcL exhibited a very weak DGC Fig. 4. Effect of a single amino acid replacement in the predicted active sites (A-sites) of DgcK and DgcL. activity in vitro (Fig. 6 and data not shown). In addition to A. Mutations introduced into the A-site of the predicted GGDEF the GGDEF catalytic domain, DgcL was predicted to have domains of DgcK and DgcL are shown in bold. a phosphorylatable REC (Response regulator) domain B. Measurement of biofilm formation upon expression of mutated dgcK and dgcL in microplates. Error bars represent the standard (Fig. 1A). REC domains (PF00072) are known to contain a deviation. phosphoacceptor site that is phosphorylated by the kinase C. Colony morphology and motility on soft agar plate of V. cholerae activity of a sensor partner in a two-component signal N16961 expressing mutated dgcK or dgcL. transduction system. The predicted REC domain on DgcL raised the possibility that phosphorylation was required for have a DGC activity using in vitro enzymatic assays for protein activation. Therefore, in vitro phosphorylation of production of c-di-GMP from GTP. DgcL was attempted using acetyl phosphate. The pre- Interestingly, in the course of the purification of DgcK treatment of DgcL by addition of acetyl phosphate prior to from E. coli BL21, lysis of the bacteria in denaturing con- c-di-GMP formation assays confirmed that phosphoryla- ditions produced a bright yellow colour in solution (data tion was required to augment the DGC activity of this not shown). In addition to the GGDEF domain, DgcK was enzyme (Fig. 6). Like DgcK, the quantity of c-di-GMP also predicted to have two putative PAS domains synthesized by DgcL increased over time, whereas no (Fig. 1A). PAS domains (PF00989) are known to bind activity was detected when the protein was boiled before flavin, and act as sensors for light, oxygen and redox proceeding to the c-di-GMP formation assay.

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 516 E. Bordeleau, E. Brouillette, N. Robichaud and V. Burrus

Fig. 5. Characterization of the flavin bound to native DgcK and c-di-GMP formation assay. A. Excitation and emission spectrum of native and denatured DgcK. B. Above, TLC analysis of purified flavin obtained from native DgcK denatured by boiling (Flavin sample), and for commercial flavin adenine dinucleotide (FAD), flavin mononucleotide (FMN) and riboflavin (RF) used as controls. Below, retention factor (Rf) values obtained by TLC for the different commercial flavins and for the flavin bound to DgcK. C. Analysis of the enzymatic activity of DgcK. Purified DgcK is capable to transform GTP into c-di-GMP, as shown by the migration of nucleotide products on TLC plates after the enzymatic assay.

Discussion Vibrio cholerae has the ability to survive both in aquatic environments and in the human gastrointestinal tract by switching between sessile and motile phenotypes. This phenomenon relies mostly on c-di-GMP signalling path- ways to regulate the expression of specific sets of genes. The fine tuning of these pathways enables V. cholerae to modify its behaviour in response to environmental signals and even allows the bacterium to anticipate its transition from the human intestine to the outside world by altering its metabolism (Schild et al., 2007). All of this is achieved by relying on a complex network of chromosomally encoded DGCs, PDEs and c-di-GMP binding effector pro- teins, and of genes under control of c-di-GMP sensing riboswitches. We have reported here that several SXT/ R391 ICEs bear between the conserved genes s073 and traA, a variable region that contains genes encoding func- tional DGCs. Our results show that DgcK or DgcL over- expression produces phenotypes that are consistent with elevated intracellular levels of c-di-GMP in V. cholerae, i.e. increased biofilm formation and reduced motility (Tis- Fig. 6. Analysis of the enzymatic activity of DgcL. Purified chler and Camilli, 2004). Moreover, purified recombinant untreated DgcL is unable to transform free GTP in c-di-GMP, as observed on the TLC plate shown on the left. DgcL is active DgcK and DgcL proteins possess the capacity to synthe- following phosphorylation by acetyl phosphate prior to the c-di-GMP size c-di-GMP from GTP as a substrate. formation assay as shown on the right.

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 Diguanylate cyclases on ICEs 517

Only a slight but significant increase of motility was LOV (Light Oxygen Voltage) domain subfamily. Analysis observed when V. cholerae bore DdgcL or DdgcKL ICEV- of the sequence of both PAS domains of DgcK revealed flInd1 compared with the wild-type ICE, and no significant that a conserved cystein amino acid residue that forms a effect was detected on biofilm formation. This subtle phe- covalent bond with FMN upon blue light excitation is notypic alteration suggests that the contribution of dgcK missing (Briggs, 2007). Moreover, native DgcK does not and/or dgcL to the life cycle of V. cholerae is only mar- exhibit the two broad absorption bands at 450 and ginal. However, this interpretation could be erroneous. In 370 nm with secondary spectral peaks that are charac- fact, it appears that individual deletions of most of the teristic LOV domains (Briggs, 2007); instead light absorp- V. cholerae genes encoding GGDEF and/or EAL domains tion by FMN seems to be quenched. Therefore, a redox- have no detectable phenotypes under standard laboratory dependent regulation of the DGC activity of DgcK is more conditions. This was clearly illustrated by Beyhan and likely. colleagues (2008) who produced in-frame deletion DgcL, whose DGC activity is augmented in vitro by mutants of V. cholerae for 42 genes encoding proteins phosphorylation, is likely the effector protein of a two- that have predicted GGDEF and/or EAL domains to find component signal transduction system in which S075 genes contributing to the rugose phenotype. Surprisingly, would be the sensor partner. dgcL is located only 6 bp deletion of only two of these genes caused a significant downstream of s075 suggesting that both genes form an observable phenotypic change. This astonishing result operon structure (Fig. 1A). s075 is predicted to encode a could be interpreted as the possible inability for most of multidomain protein containing a periplasmic ligand- these proteins to synthesize or degrade c-di-GMP sug- binding sensor (CHASE) domain, a PAS domain, a histi- gesting that they are not genuine DGCs or PDEs. Alter- dine kinase domain (HisKA), a histidine ATPase, two natively, a more plausible explanation is that specific phosphoacceptor (REC) domains and a histidine phos- environmental conditions might be required to express photransferase domain (HPT) (Fig. 1A). Based on this these proteins or to trigger their enzymatic activity. structure, it is plausible that S075 is able to sense one or VC1593, VC2370 and VC2697, three genes encoding several extracellular signals and relays them to its effector GGDEF domain-containing proteins that are expressed partner DgcL through a phosphorylation cascade. Further late during the passage through the intestinal tract, illus- studies to discover the signals that modulate the DGC trate perfectly such requirements. Schild and colleagues activity of DgcK and trigger the kinase activity of S075 are (2007) reported that deletion of these three genes did not ongoing. attenuate the survival of V. cholerae in rice stools, pond Our discovery of genes encoding DGCs in several water or LB medium after in vitro passage in LB broth; yet members of the SXT/R391 family raises the possibility the same mutants exhibited a significant fitness defect in that in suitable conditions enabling the expression of dgcK rice stools and pond water but not LB medium after in vivo or dgcL, these mobile elements may ultimately impact the passage in the intestinal tract (Schild et al., 2007). The survival of their hosts in the environment by enhancing complexity of the c-di-GMP signalling systems in V. chol- their capacity to form a biofilm. Since biofilms promote erae, where different input signals (e.g. oxygen and cell-to-cell contacts, it is expected that expression of dgcK redox potential, light, starvation, phosphorylation, various or dgcL would in return benefit to the ICE that bears them, ligands) ultimately affect various targets (e.g. gene tran- by increasing the opportunities for conjugation events with scription, direct enzymatic function alteration, direct inter- the same species or with other species or genera in action with cellular structures) (see Hengge, 2009 for complex microbial communities. We tested this hypoth- review), could account for the difficulty to isolate a clear esis by performing preliminary intergenera mating experi- effect for a single DGC-encoding gene deletion. We are ments between V. cholerae N16961 bearing ICEVflInd1 or currently investigating the environmental conditions that its deletion mutants and E. coli K12 but failed to show that regulate the expression of dgcK and dgcL. the presence of dgcK and dgcL increased the transmissi- The organization of the functional domains in DgcK bility of ICEVflInd1 (data not shown). However, our failure suggests that this enzyme is a stand-alone protein that was likely due to the experimental setting that did not acts both as a sensor through its PAS domains, and as allow us to show any improvement in biofilm formation. an effector through its GGDEF domain. We propose that Finally, we cannot rule out that dgcK and dgcL could the DGC activity of DgcK’s GGDEF domain is modulated have little or no effect in V. cholerae in its natural habitat. in a redox- or light-dependent manner, as sensed through It must be pointed out that members of the SXT/R391 the oxidation state of the FMN moiety possibly bound to family have a fairly broad host range and that V. cholerae the PAS domains of the native protein. However, two is only one of the numerous species able to host these major pieces of evidence suggest that light does not act mobile elements (Burrus et al., 2006a). Since this micro- as a regulator of DgcK enzymatic activity and that these organism already has a large number of genes involved in two PAS domains do not belong to the photosensitive regulation of intracellular c-di-GMP level, it is possible that

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 518 E. Bordeleau, E. Brouillette, N. Robichaud and V. Burrus

Table 1. GGDEF domain-containing proteins encoded by mobile elements.

Element Microorganism Protein name Size (aa) Associated domains I/A-site Source

ICEs R391 Providencia rettgeri Orf53 149 – ReeDtasRisGDEF AAM08056 PFGI-1 Pseudomonas fluorescens PFL_4715 525 Cache_1 RpgDlaaRfGGEEF YP_261798 pSE211 Saccharopolyspora erythraea SACE_7027 202 – lrgeravRlhGDEF YP_001109113 pSAM2 Streptomyces ambofaciens Orf183 183 – grhgtagRlGGDEF CAA06454 Phages YuA Pseudomonas aeruginosa gp44 156 – fgaDwvfRvsGDEF YP_001595867 Plasmidsa pACRY01 Acidiphilium cryptum Acry_3252 620 EAL ResDmiaRiGGDEF YP_001220011 pACRY02 Acidiphilium cryptum Acry_3334 321 – RrnDvvaRlGGDEF YP_001220088 pBVIE01 Burkholderia vietnamiensis Bcep1808_6801 884 PAS EAL RpgDtvaRfGGDEF YP_001110491 Bcep1808_6800 901 PAS EAL RdsDslasdGsgrF YP_001110490 Plasmid 1 Deinococcus geothermalis Dgeo_2370 336 – qqdetayRlGGDEy YP_593879 Dgeo_2479 531 GAF geagnvyRvGGDEF YP_593987 Dgeo_2740 555 HAMP GAF RteDivcRyGGEEF YP_594247 pMFLV01 Mycobacterium gilvum Mflv_5498 371 – grqaliaRiGGEEF YP_001136747 pMFLV02 Mycobacterium gilvum Mflv_5583 370 – sptalvaRvGGEEF YP_001136832 Plasmid 3 Nitrobacter hamburgensis Nham_4588 613 CHASE3 HAMP RdtD-afRyGGEEF YP_571993 pNL2 Novosphingobium aromaticivorans Saro_3652 656 EAL paharlaRvGGDEF YP_001166037 pOANT01 Ochrobactrum anthropi Oant_4599 231 – RegDlfgRlGGEEF YP_001373115 pDC3000A Pseudomonas syringae PSPTOA0034 649 EAL sdtvhvsRlGGDEF NP_808692 pMOL30 Ralstonia metallidurans pMOL30_130 705 PAS EAL keaDfvgRiGGDEF YP_145661 p42d Rhizobium etli RHE_PD00105 852 (3¥)PAS EAL RdlDtvaRlGGDEF NP_659931 Plasmid B Rhodobacter sphaeroides RSP_4018 311 PAS RsvDlfgRfGGEEF YP_345187 Plasmid C Rhodobacter sphaeroides RSP_4111 931 GAF EAL RphDlvvRlGGDEF YP_345327 Plasmid D Rhodobacter sphaeroides RSP_4191 980 PAS GAF EAL RphDlvvRlGGDEF YP_345384 Plasmid 1 Rhodoferax ferrireducens Rfer_4387 718 PAS EAL RqtDfisRlsGDEF YP_516071 Plasmid 1 Shewanella sp. Shewana3_4330 325 GAF RnyDffaRyGGEEF YP_863843 TM1040 Silicibacter sp. TM1040_3136 700 HAMP EAL gpdDmtaRvGGDEF YP_611372 TM1040_3132 677 EAL RpgitigRiGGDEF YP_611368 pSMED01 Sinorhizobium medicae Smed_3683 564 EAL RdrefayRlaGDEF YP_001312429 Smed_3728 341 GAF RagDipvRmGGDEF YP_001312474 pSMED02 Sinorhizobium medicae Smed_5113 1071 (4¥)PAS EAL RttetvaRlGGDEF YP_001313823 Smed_5153 729 CHASE4-EAL asagvisRiGGDEF YP_001313863 Smed_5323 422 – RnrDhvaRyGGEEF YP_001314029 Smed_5672 685 COG3300-EAL dsdefvaRlGGDEF YP_001314335 Smed_6138 551 EAL geggiayRlaGDEF YP_001314701 pSWIT01 Sphingomonas wittichii Swit_5300 710 CHASE4-EAL ageafiaRaGGDEF YP_001260175 Swit_5213 813 EAL RenehvaRlGGDEF YP_001260089 a. Plasmids bearing genes that encode predicted GGDEF domains were found in the ACLAME database (version 0.3). dcgK and/or dgcL are redundant. However, they could nylate cyclase-encoding genes have been identified in have a more significant role in other hosts capable of other ICEs, conjugative plasmids and bacteriophages receiving SXT/R391 ICEs. (Table 1). A rapid survey of the ACLAME database Besides dgcK and dgcL, SXT/R391 ICEs, of which (Leplae et al., 2004) revealed a cluster of 176 plasmid- only a small subset of genomes have been entirely borne genes encoding proteins (family ID 35) that are sequenced, may carry additional genes involved in c-di- predicted to be involved in c-di-GMP metabolism. To date, GMP formation or degradation. For instance, the putative the role and the advantages that these genes may confer translation product of orf53 in R391 from Providencia to their host bacteria or to the mobile elements that bear rettgeri has a predicted GGDEF domain. However, while them remain unexplored. Investigation of this unknown the I-site of the GGDEF domain in Orf53 is conserved, not aspect of mobile genetic elements could reveal useful all the amino acid residues of the A-site are conserved, since expression of virulence has been shown to be regu- suggesting that it is not a functional DGC (Fig. 1C). lated in part by intracellular c-di-GMP concentration in Instead, it could act as a c-di-GMP sensor like PelD or several pathogens. However, based on our and previ- CdgG (Lee et al., 2007; Beyhan et al., 2008). Interest- ously published work, these investigations are likely to be ingly, members of the SXT/R391 family are not the only extremely challenging considering the large number of mobile elements bearing genes that may influence the endogenous genes modulating intracellular c-di-GMP intracellular concentration of c-di-GMP in bacteria. In fact, levels in the microorganisms between which these mobile such mobile elements seem to be rather common. Digua- elements are exchanged.

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 Diguanylate cyclases on ICEs 519

Table 2. Bacterial strains and plasmids used in this study.

Strain or plasmid Description Reference or source

Vibrio cholerae N16961 O1 El Tor Heidelberg et al. (2000) 1-010118-075 Non-O1/O139, 2001 Mexico, environmental, ICEVchMex1 Burrus et al. (2006b) MO10 O139, 1992 India, clinical, SXTMO10 Waldor et al. (1996) BI142 ICEVchInd1 of clinical origin, India, 1994 Hochhut et al. (2001) 13571 O1, 2000 India, clinical, ICEVchInd5 M.M. Colombo AC1923 O1, 2005 Bangladesh, clinical, ICEVchBan2 Marrero and Waldor (2007) AC1924 O1, 2005 Bangladesh, clinical, ICEVchBan3 Marrero and Waldor (2007) 698 Non-O1, 1991 Angola, environmental, ICEVchAng2 Ceccarelli et al. (2006a) 582 O1, 1992 Angola, clinical, ICEVchAng1 Ceccarelli et al. (2006a) 175 O1, 2006 Angola, clinical, ICEVchAng3 M.M. Colombo 5556 O1, 1997 Mozambique, clinical, ICEVchMoz8 Ceccarelli et al. (2006b) 5594 O1, 1997 Mozambique, clinical, ICEVchMoz7 Ceccarelli et al. (2006b) 7698 O1, 1997 Mozambique, clinical, ICEVchMoz9 Ceccarelli et al. (2006b) 7AMOZ Non-O1, 2002 Mozambique, environmental, ICEVchMoz3 Taviani et al. (2008) Vibrio parahaemolyticus 21AMOZ 2003, Mozambique, environmental, R391-like Taviani et al. (2008) Vibrio fluvialis H-08942 2002, India, clinical, ICEVflInd1 Ahmed et al. (2005) Providencia alcalifaciens BI731 1999, Bangladesh, clinical, ICEPalBan1 Hochhut et al. (2001) Escherichia coli - - - BL21 F ompT hsdS (rB mB ) gal dcm Stratagene CAG18439 MG1655 lacZU118 lacI42::Tn10 (TcR) Singer et al. (1989) DH5a F- D(lacZYA-argF)U169 recA1 endA1 hsdR17 supE44 thi-1 gyrA96 relA1 Kolter et al. (1978); Hanahan (1983) Plasmids pBAD24 ApR, arabinose-inducible expression vector, high copy Guzman et al. (1995) pDgcK pBAD24 containing dgcK from ICEVchMex1 This study pDgcL pBAD24 containing dgcL from ICEVchMex1 This study pDgcK-G333E pDgcK with a G to E substitution at amino acid 333 This study pDgcL-G342E pDgcL with a G to E substitution at amino acid 342 This study pGEX6P-1 ApR, IPTG-inducible glutathione S-transferase gene fusion vector GE Healthcare pGDgcK pGEX6P-1 containing dgcK from ICEVchMex1 This study pGDgcL pGEX6P-1 containing dgcL from ICEVchMex1 This study pKD13 KnR PCR template for one-step chromosomal gene inactivation Datsenko and Wanner (2000)

Experimental procedures chromosomal gene inactivation technique (Datsenko and Wanner, 2000). All deletions were designed to be non-polar. Growth conditions The DdgcK, DdgcL and DdgcLK mutations were introduced using primer pairs mex05WF/mex05WR, mex06WF/ Bacterial strains were routinely grown in Luria–Bertani (LB) mex06WR and mex05WF/mex06WR, respectively, and broth at 37°C in an orbital shaker and maintained at -80°C in plasmid pKD13 as a template. All deletion mutations were LB broth containing 15% (v/v) glycerol. Antibiotics were used verified by PCR amplification using primers flanking the dele- at the following concentrations: ampicillin (Ap), 100 mg ml-1; tion, cloning and sequencing. kanamycin (Kn), 50 mg ml-1; streptomycin (Sm), 200 mg ml-1; Plasmids pDgcL and pDgcK were constructed by ligating sulfamethoxazole (Su), 160 mg ml-1; trimethoprim (Tm), dgcK and dgcL into pBAD24, both under control of the P 32 mg ml-1. For induction of gene expression in the strains BAD arabinose-inducible promoter. The 1272 bp dgcK and 930 bp carrying arabinose-inducible vectors (pBAD series), dgcL fragments were amplified by PCR from V. cholerae L-arabinose was added to the growth medium at a final con- 1-010118-075 genomic DNA using primers mex5eb5/ centration ranging from 0.002% to 0.2% (w/v). mex5eb11 and mex6eb7/mex6eb12 respectively. dgcK was cloned into the NcoI site of pBAD24 blunted using the Klenow Strain and plasmid construction polymerase while dgcL was digested by XmaI and cloned into pBAD24 digested by NcoI, blunted with the Klenow poly- The bacterial strains and plasmids used in this study are merase and then digested by XmaI. described in Table 2. The oligonucleotides primers used for Plasmids pDgcL-G333E and pDgcK-G342E, which contain construction of plasmids, chromosomal deletions and site- a single amino acid substitution in the GGEEF conserved directed mutagenesis are described in Table S1. A-site, were created by site-directed mutagenesis of pDgcK Deletion mutants were constructed in E. coli CAG18439 and pDgcL using primer pairs mutmex5eb39/mutmex5eb40 containing wild-type ICEVflInd1 by using the one-step and mutmex6eb41/mutmex6eb42, respectively, and the

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 520 E. Bordeleau, E. Brouillette, N. Robichaud and V. Burrus

QuickChangeII Site-directed Mutagenesis Kit (Stratagene). Biofilm formation assays The mutations replaced the glycine residues at position 333 of DgcK and 342 of DgcL by a glutamate residue. In both The capacity of V. cholerae strains to form biofilm was deter- cases, a new EarI restriction site was created and used for mined using 96-well microplates. Overnight-grown cultures screening of the mutated plasmids. were diluted 1:200 in LB broth containing ampicillin and For DgcK and DgcL purification, dgcK and dgcL were L-arabinose, in a final volume of 200 ml per well. Biofilms were amplified by PCR from V. cholerae 1-010118-075 genomic formed in static conditions at 30°C for 4 h, washed twice with DNA using primer pairs gmex5eb55/gmex5eb56 and 300 ml of phosphate-buffered saline (PBS) and allowed to dry gmex5eb57/gmex5eb58, respectively, and cloned into the for 30 min. Biofilms were then stained with 200 ml of crystal BamHI site of pGEX6P-1 in frame with the glutathione violet (0.1% in water) for 30 min, washed with water and S-transferase (GST) coding sequence. allowed to dry for 30 min. Bound crystal violet was solubilized All plasmids were constructed and propagated in E. coli with 200 ml of 95% ethanol and quantified by absorbance at DH5a. All constructs were confirmed by enzyme digestions, 570 nm in a Model 680 microplate reader (Bio-Rad). Assays PCR and sequencing. Sequencing was performed by DNA were carried in triplicate and data were normalized as fold LandMarks (Saint-Jean-sur-Richelieu, QC, Canada). expression compared with pBAD24. On every graph, data from three independent experiments were combined.

Molecular biology methods Motility assay

All the enzymes used in this study were provided by New A semi-solid medium composed of 1% tryptone, 0.5% NaCl, England BioLabs and were used according to the manufac- 0.3% agar and 100 mg ml-1 ampicillin, with or without turer’s instructions. Plasmid DNA was prepared with a L-arabinose, was used to evaluate motility of V. cholerae Qiaprep Spin miniprep kit (Qiagen), and chromosomal DNA strains during overexpression assays. Motility of the deletion was prepared with a Wizard Genomic DNA purification Kit mutants was tested on semi-solid LB medium. Inoculation (Promega) as described in the manufacturer’s instructions. was made by stabbing agar with pipette tips previously Southern blotting was performed with probes conjugated to dipped into overnight-grown bacterial cultures diluted 1/200. horseradish peroxidase and detected with a chemilumines- Plates were then incubated overnight at 30°C. Assays were cent substrate (GE Healthcare) according to the manufactur- performed four times in triplicates. Motility was assessed er’s instructions. Probes were obtained from pDgcK and from the comparison of the surface area (mm2) of the colo- pDgcL digests. Genomic DNA was digested with PciI and the nies from plate images captured and analysed using a Gel fragments were separated by electrophoresis in 1% agarose. Doc XR system and Quantity One software (Bio-Rad). DNA was then transferred onto nylon membranes (Magma) using a vacuum blotter (Bio-Rad) according to the manu- RNA isolation and cDNA synthesis facturer’s instructions. Detection was achieved on Biomax light films (Kodak). Additionally, detection of dgcK, dgcL and For RNA isolation, V. cholerae was grown in LB with aeration s075 and their localization were carried out by PCR using at 37°C to mid-exponential phase (OD600 of 0.5). Induction of primer pairs mex5eb15/mex5eb11, mex6eb16/mex6eb12 dgcK or dgcL expression was initiated by addition of 0.2% and mex7Feb20/mex7Feb21, and mex5eb15/HotS3F2 and L-arabinose. Forty minutes after induction, aliquots of bacte- mex7Feb22/TraF2R respectively. rial cultures were directly mixed with RNA Protect Bacteria PCR assays were performed with the primers described in Reagent (Qiagen) and treated according to the manufactur- Table S1 in 50 ml of PCR mixtures with 1 U of Taq DNA er’s instructions. Bacterial RNA was isolated after treating the polymerase. PCR conditions were as follows: (i) 3 min at cells with lysozyme and proteinase K (Qiagen), in conjunction 94°C, (ii) 30 cycles of 30 s at 94°C, 30 s at suitable annealing with the use of the RNeasy Mini kit (Qiagen). In addition, RNA temperature, and 30–90 s at 72°C, and (iii) 2 min at 72°C. samples were treated with DNase (RNase-free DNase set, When needed PCR products were purified using a QIAquick Qiagen) during purification. To ensure that there was no con- PCR Purification Kit (Qiagen) according to the manufactur- tamination by genomic DNA, the procedure was repeated a er’s instructions. second time on the purified RNA. RNA purity and concentra- Escherichia coli was transformed by electroporation tion were evaluated with a ND-1000 NanoDrop Spectropho- according to Dower and colleagues (1988). Vibrio cholerae tometer (Thermo Fisher Scientific/NanoDrop Products). RNA was transformed by electroporation according to Occhino samples were stored at -80°C. and colleagues (1998). In both cases, transformation was cDNA was prepared using the SuperScript III First-Strand carried out in 0.1 cm electroporation cuvettes using a Bio- Synthesis kit (Invitrogen) following the manufacturer’s recom- Rad GenePulser Xcell apparatus set at 25 mF, 200 W and mendations. Fifty nanograms of random hexamer primers 1.8 kV. and 2.5 mg of total bacterial RNA were used in each reaction. After cDNA synthesis, template RNA was digested by RNase H (Amersham). cDNA sample mixtures were purified with the Colony morphology PCR Purification Kit (Qiagen) and stored at -80°C.

For colony morphology assays, 10 ml of overnight-grown cul- Quantitative PCR tures of V. cholerae strains diluted 1:200 were spotted on LB agar plates supplemented with ampicillin and L-arabinose. Reactions were performed with Quantifast SYBR Green PCR Incubation was carried out overnight at 30°C. master Mix (Qiagen) using a Mastercycler ep gradient S

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 Diguanylate cyclases on ICEs 521 realplex4 system (Eppendorf AG) for data acquisition and washed in 0.5 M LiCl and air dried. Plates were then soaked analysis. Reaction mixtures contained approximately 40 ng for 5 min in MeOH, dried, and developed in 2:3 (v/v) satu- of cDNA template and 1 mM of primers (Table S1). The PCR rated (NH4)2SO4/1.5 M KH2PO4 (pH 3.5). Plates were soaked conditions were (i) 95°C for 5 min, followed by (ii) 40 cycles at 15 min in MeOH and allowed to dry prior to exposition to a 95°C for 10 s and 60°C for 30 s. Melting curves were carried phosphor imaging screen (Molecular Dynamics). Data were out on the final reaction products (165–190 bp) to verify that collected and analysed using a FX molecular imager and the amplification was specific to targets. All primer pairs showed Quantity One software (Bio-Rad). efficiencies comprised between 96% and 103% (data not shown). For normalization, the gyrase subunit A gene Fluorescence spectroscopy (VC1258) was used and results expressed as relative expression based on the DDCt calculation method. Experi- Fluorescence emission and excitation spectra for DgcK ments were carried out three times in triplicate and combined. protein samples (100 mg ml-1) were performed on a Quanta Master spectrofluorometer (Photon Technology Interna- tional). All data were collected at 25°C in the PreScission Production of recombinant proteins protease cleavage buffer and spectra were corrected for background. Fluorescence emission and excitation spectra Overnight-grown cultures of E. coli BL21 bearing pGDgcK or were obtained by excitation at 380 nm and by monitoring the pGDgcL were diluted 1:100 in fresh 2¥ YTA broth and incu- emission at 520 nm respectively. bated at 37°C with agitation. Protein expression was induced at mid-exponential phase (OD600 of 0.6) with 0.1 mM (for TLC analysis of flavoprotein DgcK) or 0.5 mM (for DgcL) IPTG (isopropyl 1-thio-b-D- galactopyranoside), and the cultures were grown for an addi- Determination of protein-bound flavin was performed simi- tional 4 h at 37°C (for DgcK) or 6 h at 25°C (for DgcL). Cells larly to previously described protocols (Christie et al., 1999; were collected by centrifugation, re-suspended in PBS con- Laan et al., 2004) with the following modifications. Protein taining 1% Triton X-100 and protease inhibitors (Protease samples were dialysed overnight against dH2O using D-Tube Inhibitor Cocktail, Sigma), and lysed by sonication. DgcK and Dialyser Maxi (MWCO 12-14) and concentrated by centrifu- DgcL were recovered by affinity chromatography using the gation on Amicon Ultra-15 columns (MWCO 10) to obtain GST purification module (GE Healthcare) with the PreScis- salt-free protein samples that were re-suspended in a sion protease (GE Healthcare) according to the manufactur- minimal volume of 70% EtOH. The flavin was released by er’s instructions. After elution, proteins samples were boiling the protein samples for 2 min. After centrifugation at dialysed against the conservation buffer (50 mM Tris-HCl 16 000 g for 15 min at 4°C, the supernatant was lyophilized pH 7.8, 250 mM NaCl, 25 mM KCl, 10 mM MgCl2, 30% glyc- and re-suspended in 35% EtOH. Aliquots were spotted on erol) for 12 h in D-Tube Dialyser Maxi (MWCO 12-14, silica TLC plates (Merck), separated in 3:1:1 (v/v) n-butanol/ Novagen), concentrated by centrifugation on Amicon Ultra-15 acetic acid/water and visualized at 365 nm on a GelDoc columns (MWCO 10, Millipore), and stored at -20°C. Protein system (Bio-Rad). Riboflavin, FMN and FAD standards concentration was estimated using a BCA Protein Assay Kit (200 mM in 35% EtOH) were run along with the concentrated (ThermoScientific) and purity was determined by SDS-PAGE flavin-containing solution. analysis. Acknowledgements

Assays for enzymatic activity and TLC analysis The authors would like to thank Benoît Heppell for helpful hints regarding spectrofluorescence analysis. We thank M.K. Diguanylate cyclase activity of DgcK and DgcL was assessed Waldor and M.M. Colombo for the kind gift of many strains. We according to previously described procedures (Paul et al., are grateful to Alain Lavigueur, Daniela Ceccarelli and Gabriel 2004; Christen et al., 2005; Tamayo et al., 2005) with the Mitchell for helpful comments on the manuscript. E. Bordeleau following modifications. Enzymatic assays were performed is the recipient of anAlexander Graham Bell Canada Graduate with approximately 10 mg of purified proteins in a final volume Scholarship from the Natural Sciences and Engineering of 50 ml. Reaction mixtures were pre-incubated for 5 min at Research Council of Canada and a Master Research schol- 30°C in the reaction buffer (50 mM Tris-HCl pH 7.8, 250 mM arship from FQRNT. V.B. holds a Canada Research Chair in NaCl, 25 mM KCl, 10 mM MgCl2) and DGC reactions were molecular biology, impact and evolution of bacterial mobile initiated by adding 100 mM GTP/[a-33P]-GTP (0.1 mCi ml-1) elements and is a member of the FRSQ-funded Centre and incubated at 30°C. Samples were taken at various times, de Recherche Clinique Étienne-Le Bel, Université de and the reaction was stopped by addition of one volume Sherbrooke. The complete sequences for ICEVchMex1 and 0.5 M EDTA. DgcL phosphorylation by acetyl phosphate was ICEVflInd1 are now available in GenBank under the accession achieved as described previously (McCleary and Stock, numbers GQ463143 and GQ463144, respectively. 1994; Hickman et al., 2005), by incubating the proteins in the reaction buffer supplemented with 25 mM acetyl phosphate References at 30°C for 30 min prior to enzymatic assay. Reaction prod- ucts were analysed by TLC based on previously published Ahmed, A.M., Shinoda, S., and Shimamoto, T. (2005) A procedures (Bochner and Ames, 1982; Christen et al., 2005; variant type of Vibrio cholerae SXT element in a multidrug- Hickman et al., 2005). Briefly, aliquots (8 ml) were spotted on resistant strain of Vibrio fluvialis. FEMS Microbiol Lett 242: polyethyleneimine-cellulose TLC plates (Sigma) previously 241–247.

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 522 E. Bordeleau, E. Brouillette, N. Robichaud and V. Burrus

Beaber, J.W., Hochhut, B., and Waldor, M.K. (2002) Genomic Christie, J.M., Salomon, M., Nozue, K., Wada, M., and and functional analyses of SXT, an integrating antibiotic Briggs, W.R. (1999) LOV (light, oxygen, or voltage) resistance gene transfer element derived from Vibrio chol- domains of the blue-light photoreceptor phototropin (nph1): erae. J Bacteriol 184: 4259–4269. binding sites for the chromophore flavin mononucleotide. Beyhan, S., Tischler, A.D., Camilli, A., and Yildiz, F.H. (2006) Proc Natl Acad Sci USA 96: 8779–8783. Transcriptome and phenotypic responses of Vibrio chol- Datsenko, K.A., and Wanner, B.L. (2000) One-step inactiva- erae to increased cyclic di-GMP level. J Bacteriol 188: tion of chromosomal genes in Escherichia coli K-12 3600–3613. using PCR products. Proc Natl Acad Sci USA 97: 6640– Beyhan, S., Odell, L.S., and Yildiz, F.H. (2008) Identification 6645. and characterization of cyclic diguanylate signaling Dower, W.J., Miller, J.F., and Ragsdale, C.W. (1988) High systems controlling rugosity in Vibrio cholerae. J Bacteriol efficiency transformation of E. coli by high voltage elec- 190: 7392–7405. troporation. Nucleic Acids Res 16: 6127–6145. Bochner, B.R., and Ames, B.N. (1982) Complete analysis of Faruque, S.M., Biswas, K., Udden, S.M., Ahmad, Q.S., Sack, cellular nucleotides by two-dimensional thin layer chroma- D.A., Nair, G.B., and Mekalanos, J.J. (2006) Transmissibil- tography. J Biol Chem 257: 9759–9769. ity of cholera: in vivo-formed biofilms and their relationship Boltner, D., MacMahon, C., Pembroke, J.T., Strike, P., and to infectivity and persistence in the environment. Proc Natl Osborn, A.M. (2002) R391: a conjugative integrating Acad Sci USA 103: 6350–6355. mosaic comprised of phage, plasmid, and transposon ele- Fong, J.C., and Yildiz, F.H. (2007) The rbmBCDEF gene ments. J Bacteriol 184: 5158–5169. cluster modulates development of rugose colony morphol- Briggs, W.R. (2007) The LOV domain: a chromophore ogy and biofilm formation in Vibrio cholerae. J Bacteriol module servicing multiple photoreceptors. J Biomed Sci 189: 2319–2330. 14: 499–504. Galperin, M.Y., Nikolskaya, A.N., and Koonin, E.V. (2001) Burrus, V., and Waldor, M.K. (2004) Shaping bacterial Novel domains of the prokaryotic two-component signal genomes with integrative and conjugative elements. Res transduction systems. FEMS Microbiol Lett 203: 11–21. Microbiol 155: 376–386. Guzman, L.M., Belin, D., Carson, M.J., and Beckwith, J. Burrus, V., Pavlovic, G., Decaris, B., and Guedon, G. (2002) (1995) Tight regulation, modulation, and high-level expres- Conjugative transposons: the tip of the iceberg. Mol Micro- sion by vectors containing the arabinose PBAD promoter. biol 46: 601–610. J Bacteriol 177: 4121–4130. Burrus, V., Marrero, J., and Waldor, M.K. (2006a) The current Hanahan, D. (1983) Studies on transformation of Escherichia ICE age: biology and evolution of SXT-related integrating coli with plasmids. J Mol Biol 166: 557–580. conjugative elements. Plasmid 55: 173–183. Heidelberg, J.F., Eisen, J.A., Nelson, W.C., Clayton, R.A., Burrus, V., Quezada-Calvillo, R., Marrero, J., and Waldor, Gwinn, M.L., Dodson, R.J., et al. (2000) DNA sequence of M.K. (2006b) SXT-related integrating conjugative element both chromosomes of the cholera pathogen Vibrio chol- in New World Vibrio cholerae. Appl Environ Microbiol 72: erae. Nature 406: 477–483. 3054–3057. Hengge, R. (2009) Principles of c-di-GMP signalling in bac- Butler, S.M., and Camilli, A. (2004) Both chemotaxis and net teria. Nat Rev Microbiol 7: 263–273. motility greatly influence the infectivity of Vibrio cholerae. Hickman, J.W., Tifrea, D.F., and Harwood, C.S. (2005) A Proc Natl Acad Sci USA 101: 5018–5023. chemosensory system that regulates biofilm formation Ceccarelli, D., Salvia, A.M., Sami, J., Cappuccinelli, P., and through modulation of cyclic diguanylate levels. Proc Natl Colombo, M.M. (2006a) New cluster of plasmid-located Acad Sci USA 102: 14422–14427. class 1 integrons in Vibrio cholerae O1 and a dfrA15 Hochhut, B., Lotfi, Y., Mazel, D., Faruque, S.M., Woodgate, cassette-containing integron in Vibrio parahaemolyticus R., and Waldor, M.K. (2001) Molecular analysis of antibiotic isolated in Angola. Antimicrob Agents Chemother 50: resistance gene clusters in Vibrio cholerae O139 and O1 2493–2499. SXT constins. Antimicrob Agents Chemother 45: 2991– Ceccarelli, D., Bani, S., Cappuccinelli, P., and Colombo, 3000. M.M. (2006b) Prevalence of aadA1 and dfrA15 class 1 Hung, D.T., Zhu, J., Sturtevant, D., and Mekalanos, J.J. integron cassettes and SXT circulation in Vibrio cholerae (2006) Bile acids stimulate biofilm formation in Vibrio chol- O1 isolates from Africa. J Antimicrob Chemother 58: 1095– erae. Mol Microbiol 59: 193–201. 1097. Jenal, U., and Malone, J. (2006) Mechanisms of cyclic-di- Christen, M., Christen, B., Folcher, M., Schauerte, A., and GMP signaling in bacteria. Annu Rev Genet 40: 385–407. Jenal, U. (2005) Identification and characterization of a Kirillina, O., Fetherston, J.D., Bobrov, A.G., Abney, J., and cyclic di-GMP-specific phosphodiesterase and its allosteric Perry, R.D. (2004) HmsP, a putative phosphodiesterase, control by GTP. J Biol Chem 280: 30829–30837. and HmsT, a putative diguanylate cyclase, control Hms- Christen, M., Christen, B., Allan, M.G., Folcher, M., Jeno, P., dependent biofilm formation in Yersinia pestis. Mol Micro- Grzesiek, S., and Jenal, U. (2007) DgrA is a member of a biol 54: 75–88. new family of cyclic diguanosine monophosphate receptors Kolter, R., Inuzuka, M., and Helinski, D.R. (1978) Trans- and controls flagellar motor function in Caulobacter cres- complementation-dependent replication of a low molecular centus. Proc Natl Acad Sci USA 104: 4112–4117. weight origin fragment from plasmid R6K. Cell 15: 1199– Christen, B., Christen, M., Paul, R., Schmid, F., Folcher, M., 1208. Jenoe, P., et al. (2006) Allosteric control of cyclic di-GMP Laan, W., Bednarz, T., Heberle, J., and Hellingwerf, K.J. signaling. J Biol Chem 281: 32015–32024. (2004) Chromophore composition of a heterologously

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 Diguanylate cyclases on ICEs 523

expressed BLUF-domain. Photochem Photobiol Sci 3: molecule in bacteria: insights into biochemistry of the 1011–1016. GGDEF protein domain. J Bacteriol 187: 1792–1798. Lee, V.T., Matewish, J.M., Kessler, J.L., Hyodo, M., Hay- Ryjenkov, D.A., Simm, R., Romling, U., and Gomelsky, M. akawa, Y., and Lory, S. (2007) A cyclic-di-GMP receptor (2006) The PilZ domain is a receptor for the second mes- required for bacterial exopolysaccharide production. Mol senger c-di-GMP: the PilZ domain protein YcgR controls Microbiol 65: 1474–1484. motility in enterobacteria. J Biol Chem 281: 30310–30314. Leplae, R., Hebrant, A., Wodak, S.J., and Toussaint, A. Schild, S., Tamayo, R., Nelson, E.J., Qadri, F., Calderwood, (2004) ACLAME: a CLAssification of Mobile genetic Ele- S.B., and Camilli, A. (2007) Genes induced late in infection ments. Nucleic Acids Res 32: D45–D49. increase fitness of Vibrio cholerae after release into the McCleary, W.R., and Stock, J.B. (1994) Acetyl phosphate environment. Cell Host Microbe 2: 264–277. and the activation of two-component response regulators. Schmidt, A.J., Ryjenkov, D.A., and Gomelsky, M. (2005) The J Biol Chem 269: 31567–31572. ubiquitous protein domain EAL is a cyclic diguanylate- Malone, J.G., Williams, R., Christen, M., Jenal, U., Spiers, specific phosphodiesterase: enzymatically active and inac- A.J., and Rainey, P.B. (2007) The structure–function rela- tive EAL domains. J Bacteriol 187: 4774–4781. tionship of WspR, a Pseudomonas fluorescens response Seth-Smith, H., and Croucher, N.J. (2009) Genome watch: regulator with a GGDEF output domain. Microbiology 153: breaking the ICE. Nat Rev Microbiol 7: 328–329. 980–994. Singer, M., Baker, T.A., Schnitzler, G., Deischel, S.M., Goel, Marrero, J., and Waldor, M.K. (2007) The SXT/R391 family of M., Dove, W., et al. (1989) A collection of strains containing integrative conjugative elements is composed of two exclu- genetically linked alternating antibiotic resistance elements sion groups. J Bacteriol 189: 3302–3305. for genetic mapping of Escherichia coli. Microbiol Rev 53: Moorthy, S., and Watnick, P.I. (2005) Identification of novel 1–24. stage-specific genetic requirements through whole Sudarsan, N., Lee, E.R., Weinberg, Z., Moy, R.H., Kim, J.N., genome transcription profiling of Vibrio cholerae biofilm Link, K.H., and Breaker, R.R. (2008) Riboswitches in development. Mol Microbiol 57: 1623–1635. eubacteria sense the second messenger cyclic di-GMP. Munro, A.W., and Noble, M.A. (1999) Fluorescence analysis Science 321: 411–413. of flavoproteins. Methods Mol Biol 131: 25–48. Tamayo, R., Tischler, A.D., and Camilli, A. (2005) The EAL Occhino, D.A., Wyckoff, E.E., Henderson, D.P., Wrona, T.J., domain protein VieA is a cyclic diguanylate phosphodi- and Payne, S.M. (1998) Vibrio cholerae iron transport: esterase. J Biol Chem 280: 33324–33330. haem transport genes are linked to one of two sets of tonB, Taviani, E., Ceccarelli, D., Lazaro, N., Bani, S., Cappuccinelli, exbB, exbD genes. Mol Microbiol 29: 1493–1507. P., Colwell, R.R., and Colombo, M.M. (2008) Environmen- Osorio, C.R., Marrero, J., Wozniak, R.A.F., Lemos, M.L., tal Vibrio spp., isolated in Mozambique, contain a polymor- Burrus, V., and Waldor, M.K. (2008) Genomic and func- phic group of integrative conjugative elements and class 1 tional analysis of ICEPdaSpa1, a fish-pathogen-derived integrons. FEMS Microbiol Ecol 64: 45–54. SXT-related integrating conjugative element that can Taylor, B.L., and Zhulin, I.B. (1999) PAS domains: internal mobilize a virulence plasmid. J Bacteriol 190: 3353– sensors of oxygen, redox potential, and light. Microbiol Mol 3361. Biol Rev 63: 479–506. Paul, R., Weiser, S., Amiot, N.C., Chan, C., Schirmer, T., Tischler, A.D., and Camilli, A. (2004) Cyclic diguanylate (c-di- Giese, B., and Jenal, U. (2004) Cell cycle-dependent GMP) regulates Vibrio cholerae biofilm formation. Mol dynamic localization of a bacterial response regulator with Microbiol 53: 857–869. a novel di-guanylate cyclase output domain. Genes Dev Tischler, A.D., and Camilli, A. (2005) Cyclic diguanylate regu- 18: 715–727. lates Vibrio cholerae virulence gene expression. Infect Pembroke, J.T., and Piterina, A.V. (2006) A novel ICE in the Immun 73: 5873–5882. genome of Shewanella putrefaciens W3-18-1: comparison Waldor, M.K., Tschape, H., and Mekalanos, J.J. (1996) A new with the SXT/R391 ICE-like elements. FEMS Microbiol Lett type of conjugative transposon encodes resistance to sul- 264: 80–88. famethoxazole, trimethoprim, and streptomycin in Vibrio Pratt, J.T., Tamayo, R., Tischler, A.D., and Camilli, A. (2007) cholerae O139. J Bacteriol 178: 4157–4165. PilZ domain proteins bind cyclic diguanylate and regulate diverse processes in Vibrio cholerae. J Biol Chem 282: Supporting information 12860–12870. Romling, U., and Amikam, D. (2006) Cyclic di-GMP as a Additional Supporting Information may be found in the online second messenger. Curr Opin Microbiol 9: 218–228. version of this article: Ryan, R.P., Fouhy, Y., Lucey, J.F., Crossman, L.C., Spiro, S., Table S1. DNA sequences of the oligonucleotides used in He, Y.W., et al. (2006) Cell–cell signaling in Xanthomonas this study. campestris involves an HD-GYP domain protein that func- tions in cyclic di-GMP turnover. Proc Natl Acad Sci USA Please note: Wiley-Blackwell are not responsible for the 103: 6712–6717. content or functionality of any supporting materials supplied Ryjenkov, D.A., Tarutina, M., Moskvin, O.V., and Gomelsky, by the authors. Any queries (other than missing material) M. (2005) Cyclic diguanylate is a ubiquitous signaling should be directed to the corresponding author for the article.

©2009SocietyforAppliedMicrobiologyandBlackwellPublishingLtd,Environmental Microbiology, 12,510–523 ANNEXE 2

186 Uncovering the Prevalence and Diversity of Integrating Conjugative Elements in Actinobacteria

Mariana Gabriela Ghinet., Eric Bordeleau., Julie Beaudin, Ryszard Brzezinski, Se´bastien Roy*, Vincent Burrus* Centre d’e´tude et de valorisation de la diversite´ microbienne, De´partement de biologie, Universite´ de Sherbrooke, Sherbrooke, Que´bec, Canada

Abstract Horizontal gene transfer greatly facilitates rapid genetic adaptation of bacteria to shifts in environmental conditions and colonization of new niches by allowing one-step acquisition of novel functions. Conjugation is a major mechanism of horizontal gene transfer mediated by conjugative plasmids and integrating conjugative elements (ICEs). While in most bacterial conjugative systems DNA translocation requires the assembly of a complex type IV secretion system (T4SS), in Actinobacteria a single DNA FtsK/SpoIIIE-like translocation protein is required. To date, the role and diversity of ICEs in Actinobacteria have received little attention. Putative ICEs were searched for in 275 genomes of Actinobacteria using HMM- profiles of proteins involved in ICE maintenance and transfer. These exhaustive analyses revealed 144 putative FtsK/SpoIIIE- type ICEs and 17 putative T4SS-type ICEs. Grouping of the ICEs based on the phylogenetic analyses of maintenance and transfer proteins revealed extensive exchanges between different sub-families of ICEs. 17 ICEs were found in Actinobacteria from the genus Frankia, globally important nitrogen-fixing microorganisms that establish root nodule symbioses with actinorhizal plants. Structural analysis of ICEs from Frankia revealed their unexpected diversity and a vast array of predicted adaptive functions. Frankia ICEs were found to excise by site-specific recombination from their host’s chromosome in vitro and in planta suggesting that they are functional mobile elements whether Frankiae live as soil saprophytes or plant endosymbionts. Phylogenetic analyses of proteins involved in ICEs maintenance and transfer suggests that active exchange between ICEs cargo-borne and chromosomal genes took place within the Actinomycetales order. Functionality of Frankia ICEs in vitro as well as in planta lets us anticipate that conjugation and ICEs could allow the development of genetic manipulation tools for this challenging microorganism and for many other Actinobacteria.

Citation: Ghinet MG, Bordeleau E, Beaudin J, Brzezinski R, Roy S, et al. (2011) Uncovering the Prevalence and Diversity of Integrating Conjugative Elements in Actinobacteria. PLoS ONE 6(11): e27846. doi:10.1371/journal.pone.0027846 Editor: Niyaz Ahmed, University of Hyderabad, India Received September 9, 2011; Accepted October 26, 2011; Published November 16, 2011 Copyright: ß 2011 Ghinet et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: Financial support for this work was provided by Centre d’e´tude et de valorisation de la diversite´ microbienne (Universite´ de Sherbrooke), Centre SE`VE (Fonds que´becois de la recherche sur la nature et les technologies), and Discovery Grants from the Natural Sciences and Engineering Research Council of Canada (NSERC) to RB, SR, and VB. EB is the recipient of an Alexander Graham Bell Canada Graduate Scholarship from NSERC. VB holds a Canada Research Chair in molecular biology, impact and evolution of bacterial mobile elements. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] (VB); [email protected] (SR) . These authors contributed equally to this work.

Introduction Frankiae have circular chromosomes, in contrast to streptomy- cetes. Genome sizes are highly variable, as exemplified by strains Actinobacteria are found in many different ecological niches. Cci3, ACN14a and EAN1pec (5.4, 7.5, and 9.4 Mbp, respectively) These high G+C Gram-positive bacteria are soil and aquatic [4]. This diversity suggests the existence of a spectrum of lifestyles inhabitants (e.g., Streptomyces, Micromonospora, Rhodococcus), plant ranging from quasi-obligatory symbiosis (small genome size) to symbionts (e.g., Frankia), plant and animal pathogens (e.g., free-living status with the possibility of symbiosis in a wide array of Corynebacterium, Mycobacterium, Nocardia), or gastrointestinal com- host plants (large genome size). The size of the larger genomes of mensals (e.g., Bifidobacterium). Nitrogen-fixing actinobacteria Frankiae is also reminiscent of that of related, GC-rich, belonging to genus Frankia live as soil saprophytes and as biotechnologically important genus Streptomyces. However, in great endophytic symbionts in over 200 plant species [1]. These host contrast to what is known of streptomycete genetics, much remains plants (actinorhizal plants) are found on nearly all continents, and to be learned about the horizontal exchange of genetic material in the contribution of Frankia and actinorhizal plants to global Frankiae. Such knowledge is essential to decipher the evolution of nitrogen fixation is estimated at 25% [2,3]. Frankiae are members of this genus and to achieve stable genetic manipulation understudied compared to other actinomycetes and this is largely in these strains – a goal that has eluded researchers to date. attributable to difficulties in isolation and maintenance of actively A growing number of reports indicate that horizontal gene proliferating cultures. Nevertheless, the global environmental transfer is an essential mechanism by which some actinobacterial importance of Frankiae command more research to better species acquired pathogenesis-related functions [5,6,7]. Mobile understand their interactions with other microorganisms, and genetic elements are also responsible for extensive genomic host plants. rearrangements revealed by structural genomics and observed even

PLoS ONE | www.plosone.org 1 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements among related actinobacterial species [8]. Actinomycete integrative data extracted from the RefSeq database [21]. The predicted and conjugative elements (AICEs), aka. integrating conjugative proteomes encoded by 275 genomes of Actinobacteria were plasmids, are particularly prevalent in the genomes of several species screened using HHM profiles of protein orthologs associated with of Streptomyces [9]. AICEs belong to the broader class of integrative the functions of core modules of ICEs previously identified in and conjugative elements (ICEs) usually described in Gram-negative actinomycetes [9] (Figure 1 and Table 1): (i) integrases (Int) of the bacteria and in the Firmicutes. Like temperate bacteriophages, ICEs tyrosine or serine family of recombinases that are involved in ICE integrate into and replicate with the chromosome of the bacterial integration and excision, (ii) replication initiator proteins (Rep) of host [10,11,12]. The site-specific integration of ICEs in the host the RepSA- and RepAM-type, and putative polymerases of the chromosome is usually mediated by a tyrosine recombinase or Prim-Pol-type (bifunctional DNA primase/polymerase), and (iii) occasionally by a serine recombinase, encoded by an int gene. Site- FtsK/SpoIIIE domain-containing proteins (Tra) that are involved specific recombination occurs between two short identical or nearly in translocation of double-stranded chromosomal DNA. This identical sequences located in attachment sites, one on the element analysis revealed a large number of possible AICE-associated (attP) and the other one on the chromosome (attB). Upon various proteins (Table 2); yet many of these hits may be part of other conditions, ICEs can excise from the chromosome to form circular types of mobile elements (transposons, prophages, integrated covalently closed molecules. Like most conjugative plasmids, ICEs plasmids) or have functions unrelated to mobile elements. For disseminate via conjugation. In Gram-negative bacteria and instance, FtsK/SpoIIIE domains are known to be found in Firmicutes, conjugative DNA transfer typically requires the assembly proteins rescuing the remainder of chromosomal DNA into the of a type IV secretion system (T4SS) which is involved in appropriate daughter-cell compartments during constriction of the translocation of the DNA to recipient cells [13]. Biochemical septal membranes in prokaryotic cell division (FtsK) [22,23,24] processing of the DNA molecule to transfer is initiated at a specific and in asymmetric division of sporulating Bacillus subtilis (SpoIIIE) cis-acting site called the origin of transfer (oriT), which is bound by a [25]. Similarly, subsets of tyrosine recombinases such as MrpA DNA relaxase (Mob protein) and other auxiliary proteins, forming encoded by the Streptomyces coelicolor A3(2) plasmid SCP2 [26] altogether a nucleoprotein complex called the relaxosome [14]. The catalyze the resolution of plasmid multimers into the monomeric phosphodiesterase activity of the relaxase mediates a strand-specific form after replication. Therefore, to limit our investigation to cleavage within oriT, allowing the unwinding and 59 to 39 transfer of genuine putative AICEs, 30- to 60-kb chromosomal regions a single-stranded DNA (T-strand) to the recipient cell. A coupling containing genes encoding at least one of each of the three protein (T4CP) links the relaxosome to the T4SS and is thought to components (Int, Tra and Rep) were further considered in our form the conjugative pore through which the T-strand is analyses (Figure 1). However, we excluded the very few cases for translocated to the recipient cell. While in the donor cell the which the int gene was located in between the rep and tra genes as complementary strand is used as a template for replacement strand recombinase genes are typically located adjacent to one of the two synthesis, within the recipient cell host the T-strand is converted into att sites flanking an integrated ICE or prophage. The chromo- double-stranded DNA that can be either re-circularized and/or somal fragments coding for two or less of these components were recombined into the recipient chromosome. also excluded and likely correspond to AICE remnants, prophages, Interestingly, inter-mycelial conjugative transfer of ICEs transposons and other genomic islands. A total of 139 putative identified in actinomycetes significantly differs from this general AICEs were identified in chromosomal sequences (Table 2 and mechanism as it proceeds via translocation of unprocessed double- S1). In addition, the analysis of 176 actinobacterial plasmids stranded DNA. This mechanism is reminiscent of chromosome revealed 4 additional ICEs: the mega plasmid pNDAS01 contains partitioning during sporulation and cell division (reviewed by te a putative ICE, while plasmids pSA1.1, pSLS and pWTY27 likely Poele et al. [9]). Furthermore, once excised from the chromosome, correspond to ICEs in their excised form (Table S1). actinomycete ICEs replicate autonomously like genuine plasmids This first in silico analysis unambiguously showed that based on [15,16]. They encode a replication initiator protein and carry an the criteria indicated above, AICEs are constrained to members of origin of replication. Mutations in either of these two elements has the Actinomycetales order (Figure 2A). On the contrary, the closely been shown to lead to loss of transfer of the prototypical related Bifidobateriales order, and the other Actinobacteria sub- actinomycete ICE pSAM2, indicating that autonomous replication classes (Acidimicrobidae, Coriobacteridae and Rubrobacteridae) appeared is required for conjugative transfer to recipient cells [17,18,19]. to be completely devoid of AICEs. This observation echoes a AICEs have been developed into useful tools for genetic similar conclusion that was drawn by te Poele et al. [9] from a engineering of actinobacteria [20]. smaller genome sample. However, since this initial analysis was In this study, we investigated the prevalence and diversity of ICEs based on specific assumptions concerning the nature and gene in Actinobacteria, and focused particularly on ICEs in six Frankia conservation among known AICEs, it was inherently biased. To strains, EAN1pec, ACN14a, Cci3, EuI1c, EUN1f and Frankia circumvent this bias, we extended our analysis to include 15 other symbiont of Datisca glomerata. Prior to this study, three AICEs, families of replication initiator proteins of plasmid or viral origin. Fean5323, Fean6303 and Faln5456 originating from two members Surprisingly, using this approach, we were able to identify only of the genus Frankia, have been briefly described [9]. Our study one additional putative ICE, which relies on a Rep2-type reports the frequent occurrence of AICEs in Frankia genomes. Their replication initiator protein, in a strain of Bifidobacterium longum ability to excise from the chromosome as circular molecules both in (Table S1). vitro and in planta was examined as excision is the initial step We then analyzed the distribution of AICEs among Actino- preceding their eventual transfer to a recipient cell via conjugation. bacteria as a function of their host’s environmental niche. In general, species isolated from plant, soil and water samples tend to Results and discussion contain a significantly larger number of AICEs per genome than those isolated from dairy products, animal, insects or sediments Identification of putative AICEs in the genomes of (Figure 2B). This observation suggests that AICEs could confer Actinobacteria specific advantages to their respective bacterial hosts in these To address the prevalence, repartition and diversity of AICEs in specialized ecological niches. However, additional extensive Actinobacteria, we carried out a large-scale in silico analysis using sequence analyses and functional characterization of the gene

PLoS ONE | www.plosone.org 2 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

Table 2. Pfam HMM profiles used in this study.

Number of hits

Domains completea draftb

Tyrosine integrase 1029 931 Serine integrase 143 139 FtsK/SpoIIIE 882 1012 RepSA-like 76 64 RepAM (DUF3631) 24 69 Figure 1. Schematic representation of canonical AICEs modules and corresponding protein combinations. Possible combination Prim-Pol 99 179 of integration/excision, replication and conjugative transfer modules Rep2 7 6 are represented. For each functional module, specific domain-contain- AICEs 73 67 ing proteins where searched for as follow: Tra (FtsK/SpoIIIE, PF01580), Int-Tyr (Phage integrase, PF00589), Int-Ser (Recombinase, PF07508), a130 complete genomes RepSA (this study), RepAM (DUF3631, PF12307) and Prim-pol (Prim-pol, b145 draft genomes PF09250). AICEs Prim-pol proteins are encoded along with putative doi:10.1371/journal.pone.0027846.t002 replication proteins (RepPP) or can be associated with RepAM replication initiator proteins. constrained by specific barriers, such as the impossibility for two doi:10.1371/journal.pone.0027846.g001 different AICEs to occupy the same integration site and form tandem arrays, or by the existence of specific AICE-encoded cargo borne by AICEs will be necessary to determine conclusively immunity- or exclusion-like systems. For example, Pif, a NUDIX what benefits these mobile elements confer to their host. As hydrolase domain-containing protein encoded by pSAM2, has expected, large actinomycete genomes tend to bear significantly been shown to act in the recipient cell to prevent redundant more AICEs than small genomes (Figure 2C). It is not clear to date exchange between two cells harboring the same conjugative whether the acquisition of multiple AICEs by a single genome is element [27]. According to our analysis, AICEs often bear genes

Table 1. Pfam HMM profiles used in this study.

Function Domain name Accession number Description

Integration Phage_integrase PF00589 Tyrosine recombinase Recombinase PF07508 Serine recombinase Replication RepSA This study Replication initiator protein, pSAM2 RepAM, DUF3631 PF12307 Replication initiator protein, pMEA300 Prim-Pol PF09250 Bifunctional DNA primase/polymerase, N-terminal Rep_1 PF01446 RCR replication protein Rep_2 PF01719 Plasmid replication protein Rep_3 PF01051 Initiator of plasmid replication RepA_C PF04796 Plasmid encoded RepA protein Replicase PF03090 Replication initiator protein Rep_trans PF02486 Replication initiation factor Phage_rep_O PF04492 Bacteriophage replication protein O RepL PF05732 Firmicute plasmid replication protein Phage_CRI PF05144 Phage replication protein CRI Phage_GPA PF05840 Bacteriophage replication gene A protein IncFII_repA PF02387 IncFII RepA protein RP-C PF03428 Viral replication protein C, N-terminal domain RepA_N PF06970 Replication initiator protein A (RepA), N-terminus RepC PF06504 Replication protein C RPA PF10134 Replication initiator protein A DNA transfer FtsK_SpoIIIE PF01580 Intercellular chromosomal DNA transfer T4SS-DNA_transf PF02534 Coupling protein (VirD4-like T4CP) AAA_10 PF12846 ATPases Associated with diverse cellular Activities (VirB4-like T4SS component) TrwC PF08751 TrwC relaxase

doi:10.1371/journal.pone.0027846.t001

PLoS ONE | www.plosone.org 3 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

encoding NUDIX hydrolase domain-containing proteins, suggest- ing that conjugal immunity is widespread (data not shown). However, the molecular mechanism of conjugal immunity mediated by proteins such as Pif and the range of AICEs that a single Pif-like protein can target for immunity remain unknown.

AICE recombination functions and integration sites Among the 144 AICEs that we detected in our analysis, only 19 (,14%) were found to rely on an integrase of the serine recombinase family for their integration into and excision from the chromosome (Figure 3A and Table S1). All the other AICEs appear to rely on integrases of the tyrosine recombinase family instead (Figure 3B and Table S1). This imbalanced proportion correlates rather well with what is generally observed for ICEs and temperate bacteriophages; yet surprisingly, more than half of the AICEs encoding serine recombinases were found in the order Corynebacterineae. In fact, all Mycobacterium AICEs but Mkan24688 encode serine recombinases, which clustered in two groups. The first group contains the recombinases encoded by Mmar2129 and Mav3790 as well as Srot1958 from Segniliparus rotundus, another member of Corynebacterineae. These AICEs are integrated into protein-encoding genes. The second group has 8 recombinases which originate from different species of Mycobacterium yet exhibit high sequence similarities, and catalyze the integration into the same tRNA Leu gene (Figure 3A). Serine recombinase-mediated integration into tRNA genes is rare [28]. In fact to our knowledge, only QRSM1, a filamentous phage infecting Ralstonia solanacearum, was found to encode a serine recombinase catalyzing its integration into the 39 end of a tRNA Ser gene [29]. Interestingly, no AICE was detected in any of the 30 genomes of Mycobacterium tuberculosis or in the 3 genomes of Mycobacterium bovis that were part of our analysis. Phylogenetic analyses of 125 AICE integrases of the tyrosine recombinase family revealed several groups of proteins clustering into subfamilies which possess more extensive amino acid sequence similarities (Figure 3B). Eight such groups containing four or more integrases were identified in the phylogenic tree of tyrosine recombinases. One of these subfamilies, pSE211, was previously identified by Williams [28]. Eight new subfamilies clearly emerge from our analysis, SLP1, pSLS, pMR2, pSAM2, Fcci3390, pSA1.1, Feun3577 and Fean5518. No single subfamily was found to be exclusive of a particular species, genus or even sub-order. One of the possible benefits for AICEs of the availability of such large pool of different integrase subfamilies is that they are more likely to target different integration sites, allowing them to stably coexist within the same genome. Different mobile genetic elements sharing the same integrase and integra- tion site have been shown to form tandem arrays that are inherently unstable, leading to recombination between elements composing the array, incompatibility and/or loss [30,31,32,33,34]. We observed that the vast majority of AICE encoding tyrosine recombinases were integrated in the 39 end of a tRNA gene, with the exception of all AICEs encoding integrases belonging to the pSLS sub-family. Several past reports revealed a strong preference for tRNA and tmRNA gene sequences as prophage integration sites [28,35,36]. In at least two instances, i.e. Sslg03435 and Ssrg2924, no tRNA gene could be readily identified as the Figure 2. Distribution of AICEs in completed genomes. (A) integration site by tRNAscan-SE; yet the att sequences flanking Distribution per suborder of actinobacteria. (B) Distribution per these two AICEs were found to be identical to the att sites flanking environmental niche of the original host. (C) Distribution as a function Ssbg03435 and pSAM2 respectively, which encode closely related of genome size. (A and B) Median, standard deviation (rectangles) and integrases and integrate into the 39 end of tRNA genes (Figure 3B). extreme values (brackets) are shown. (B and C) Only the Actinomycetales This observation indicates that probable tRNA gene remnants can are shown. doi:10.1371/journal.pone.0027846.g002 be suitable integration sites. As expected, closely related integrases seem to target the same tRNA gene in their respective hosts. For

PLoS ONE | www.plosone.org 4 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

PLoS ONE | www.plosone.org 5 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

Figure 3. Phylogenetic trees of the site-specific recombinases found in AICEs. (A) Serine recombinases. (B) Tyrosine recombinases. Sub- families were defined as clades containing at least 4 members with a distance cut-off of 1 and with a .80 bootstraps support. Label colors indicate the suborders of Actinomycetales within which the AICEs were found (see legend). Colors in the outer wheel in panel A and the dark green strip in panel B indicate the type of tRNA genes into which each AICE is integrated (see Table S1 for details). Black dots in panel A for Ssrg02924 and Sslg03435 indicate that while no tRNA-encoding gene was detected as their integration sites, the att sites nearest to their respective int genes were identical to the ones of their closest relatives, i.e. pSAM2 and Ssbg00935, respectively. doi:10.1371/journal.pone.0027846.g003 example, the 6 closely related pSE211-type integrases encoded by hypothesis is supported by motif 1 of the RepSAMR2 group which the group Ssqg04320/Sslg03699 catalyze integration into a tRNA- clearly differs from motif 1 of QX174 or pC194, while motif 1 of Thr gene. Closely related integrases can also catalyze integration RepSASAM2 group is very similar to these RCR initiators into different tRNA genes, as observed for Fcci1033/Fean5534 (Figure 4B). A sequence analysis of other RCR initiator proteins (tRNA-Lys) and Faln5456 (tRNA-Met). Mobile elements such as and their putative DNA binding sequences, albeit from the distant prophages and ICEs likely take advantage of the frequent plant infecting geminiviruses family, suggests that motif 1 is association between tyrosine recombinase with tRNA genes as involved in the dso binding region specificity [41,42]. Consequent- integration sites. While the benefits of such associations have not ly, we predict that RepSA proteins cut the dso nick region using a yet been fully elucidated, several suitable hypotheses have been similar mechanism but the RepSAMR2 would recognize discrete proposed including sequence reliability over protein-coding dso binding regions. Such recognition differences may facilitate the sequences, transcriptional coupling and so on [28]. coexistence of similar yet different AICE within the same genome. Other less-well characterized proteins likely involved in AICE Replication functions of AICEs replication were also sought for in order to unearth additional Replication of the first described AICE, pSAM2, was found to putative AICEs. The RepAM proteins, first demonstrated to be be dependent upon the protein RepSA [17,18]. As other responsible for autonomous replication in AICE pMEA300 from replication initiator proteins involved in rolling-circle replication Amycolatopsis methanolica [9], are predicted to function as RCR (RCR), RepSA was shown to bind to the binding region of the double- initiator proteins. Although they lack similarity to other known stranded origin (dso) and to introduce a site-specific nick in the nick replication initiator protein, their putative nick sites were shown to region of dso, leaving a 39-OH end used for priming the elongation be similar to those of the pC194 family of RCR plasmids [9]. by host encoded replication proteins (DNA polymerase III, Identification of 14 new AICEs relying on RepAM proteins reveals helicase) [37]. Surprisingly, phylogenetic analysis of replication such AICEs are not constrained to the Pseudonocardinae unlike initiator proteins detected with the RepSA HMM profile that we previously reported [9] (Figure 5A). Phylogenetic analysis revealed generated, revealed clustering in two major distantly related 3 clades of RepAM initiators, one of which grouping AICEs only subfamilies that were not previously reported as phylogenetically found in Streptomyces. Interestingly, RepAM initiator genes of all distinct by te Poele et al. [9]. The first subfamily, RepSASAM2, three clades are occasionally associated with a gene coding for a counts 69 proteins and is represented by the RepSA protein of Prim-Pol domain-containing protein (Figure 5B and data not pSAM2. The second subfamily, RepSAMR2, counts 30 proteins shown). and is represented by the RepSA protein of pMR2 (Figure 4A). Prim-Pol proteins could be involved in AICE replication as well Interestingly, all RepSA proteins encoded by Streptomycineae AICEs and were identified within one of the two region important for the belong to the RepSASAM2 subfamily. RepSA proteins encoded by replication of AICE SLP1 from S. coelicolor A3(2) [43]. This region Mmu2220/Mmul0108 and Avis01490 do not seem to belong to contains a small operon consisting of a Prim-Pol gene, SCO4617, either subfamily; instead they cluster into a distant monophyletic followed by a putative replication gene, SCO4618, different from clade of Actinomycineae replication initiator proteins. repAM. Our analysis revealed 40 AICEs encoding a Prim-Pol Replication initiator proteins from distant RCR replicons (i.e., domain-protein (Figure 3A). 12 of these AICEs were found to bear bacteriophages, plasmids, eukaryotic viruses) contain three con- a repAM gene immediately adjacent and 16 were found to carry a served amino acids motifs [38,39]. Motif 1 was initially of SCO4618-like gene (Figure 3B). Two ICEs, Sgha02332 and unknown function. Motif 2 contains a HUH motif (histidine- Fdat2245, were found to respectively encode a Poxvirus_D5 hydrophobic-histidine) important for metal-binding whereas motif (PF03288, PF08706) domain- and a DNA_pol_A (PF00476) 3 contains the catalytic tyrosine residue. Motif 3 was used by domain-containing protein. Other proteins encoded by genes Koonin and Ilyina [39] to define 2 superfamilies of RCR associated to Prim-Pol genes in the remaining AICEs do not replication initiator proteins; superfamily I contains two conserved exhibit any similarity with any replication or DNA processing tyrosine residues whereas superfamily II contains only one. Based proteins known to date, nor contain domains related to this on the conservation of these motifs, the pSAM2 RepSA protein function (data not shown). Yet together with the Prim-Pol genes, was initially found to be related to replication initiator protein these genes likely are unforeseen AICE replication modules. RepA of coliphage QX174 by Hage`ge et al. [17]. Later te Poele AICEs replication modules are diverse. Interestingly, none of et al. [9] reported that the replicon pSAM2 belongs to the pC194 them seem to be exclusive to phylogenetically related bacteria. family based on the conserved motifs in RepSA and nick site at dso. Consequently, some of these mobile elements may have been To further examine the functional relatedness of the RepSA-like exchanged between distant bacteria. Additionally, AICEs from proteins of the AICEs detected in our analysis, we generated distant bacteria containing similar replication modules may have amino acid logo sequences for the 3 conserved motifs. While the arisen from distinct recombination events with other mobile RepSASAM2 and RepSAMR2 protein motifs 2 and 3 exhibit some elements (e.g., plasmids, phages), although this seems less likely. similarity with the QX174 Rep family, they clearly diverge at specific positions (Figure 4B). Solely based on motif 3, RepSASAM2 AICE conjugation proteins are clearly related to the QX174-group of RCR initiators Unlike most known conjugation systems, AICEs and Streptomyces defined by Wigel and Seitz [40] while the RepSAMR2 proteins conjugative plasmids transfer as double stranded circular DNA would define an entirely new group of RCR initiators. This molecules and do not require a genuine T4SS [9]. AICEs

PLoS ONE | www.plosone.org 6 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

PLoS ONE | www.plosone.org 7 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

Figure 4. Evolutionary relationship of RepSA-like rolling-circle initiator proteins found in AICEs. (A) Phylogenetic tree of RepSA proteins. Sub-families were defined as clades containing at least 4 members with a distance cut-off of 1 and with a .80 bootstraps support. Label colors indicate the suborders of Actinomycetales within which the AICEs were found (see legend). (B) Comparison of the conserved motifs found in RepSA- like proteins from AICEs with the conserved motifs found in the catalytic domain of rolling-circle initiator proteins of plasmids and single-stranded DNA coliphages. The consensus sequences shown are as previously reported by Del Solar et al. [37]. Motif 2 (HUH or His-hydrophobic-His) corresponds to the putative divalent cation-binding site and motif 3 contains the tyrosine catalytic residue. Mmu2220, Mmul0108 and Avis01496 were not included in the alignment used to generate the logo sequence of the RepSASAM2 conserved motifs. doi:10.1371/journal.pone.0027846.g004 conjugative transfer seems to only require a single transfer protein, [48]. VirB4/TraC-like T4SS components usually have ATPase Tra or TraB, which recognize and bind the cis-acting locus (clt) activity and are involved in the assembly of the pilus in Gram- [44]. Recently, these Tra proteins were shown to be structurally negative bacteria [13]. Relaxases have been shown to be related to and functionally related to septal DNA translocators of the FtsK/ several types of DNA-processing proteins including RCR proteins SpoIIIE family, gathering into hexameric channel structures and [14,49]. For this study, we limited our analysis to relaxases of the forming pores in lipid bilayers [44]. MOBF family (TrwC relaxase domain) whose members are well AICEs Tra proteins identified by our analysis clustered in represented in the Actinobacteria [49]. With a few known several monophyletic clades (Figure 6). The pSAM2 and pMR2 exceptions [50,51], it is generally assumed that such ICEs do clades, named after the eponym elements, contain the largest not replicate autonomously. Therefore the presence of a number of AICEs, which interestingly all encode RepSA-like RCR replication initiator protein was not considered as a requirement initiators. Three other clades, pMEA100, pSE211 and SLP1, each for this identification as this role can be eventually played by the groups a smaller number of AICEs, which exclusively rely on Mob protein [50]. Based on these criteria, we identified 17 RepAM and/or Prim-Pol proteins for replication. The pSAM2 putative ICEs in all major Actinobacteria suborders (Table S2). and pMR2 clades are devoid of Tra protein encoded by Among them, one was found in Frankia sp. Cci3 stain. Analysis of Streptomyces conjugative plasmids (pSG5, pSVH1, pJV1, pIJ101, the molecular structure of Fcci3350 revealed that this ICE encodes pJV1, pFP1 and SLP2), which were also included in our a lantibiotic dehydratase-like protein. Proteins from this family phylogenetic analyses. This suggests that exchanges of genetic participate to the biosynthesis of lantibiotic, a ribosomally material between AICE and Streptomyces conjugative plasmids are synthesised antimicrobial agent [52]. This T4SS-like ICE also rare events. A noticeable exception is the Tra protein of pJV1, encodes genes that may participate or limit inter-bacterial transfer which belong to the SLP1 clade. pJV1 Tra is very similar to the of mobile genetic elements (Table S3). Several components of a Tra protein of AICE pSLS (76% identity). In addition to putative CRISPR-Cas system, an RNA-based immune system harboring orthologous intermycelial transfer genes, pJV1 and targeting bacteriophages and plasmids in bacteria [53], are pSLS also carry similar intramycelial spread genes (spdB123) encoded by Fcci3350. However, this system is likely not functional contiguous to tra (Figure S1). However, plasmid pJV1 does not as shown for the linear plasmid pSHK1 from Streptomyces [54]. code for an integrase and does not code for Rep proteins similar to Additional characterization of these new ICEs is beyond the scope those encoded by pSLS. This example clearly indicates that some of this article and will be the topic of another publication. of the AICEs and conjugative plasmids could have arisen from common ancestors and/or have exchanged functional modules. AICEs and related elements in Frankia: general structure, Conjugative plasmids in Streptomyces have been shown to mediate synteny and gene conservation chromosomal gene transfer. pIJ101-induced mobilization of Frankiae and their symbioses are present on all continents chromosomal genes in Streptomyces lividans has been shown to except Antarctica, and it is estimated that Frankia sp. is responsible occur without prior plasmid integration into the chromosome [45]. for 15–25% of global nitrogen fixation [2,3]. The critical role of Recently, Vogelmann et al [44] reported that TraB of the this microorganism and its symbiosis in the global nitrogen cycle Streptomyces venezuelae conjugative plasmid pSVH1 binds several clt- underscores the importance of deciphering their genetic inheri- like regions in the whole chromosome of S. coelicolor suggesting a yet tance, diversity, genetic plasticity and evolution. Research to date elusive mechanism that likely involves Tra binding onto clt-like has barely touched these questions, often limited by the fastidious sequences on the chromosome eventually promoting transfer of nature of this microorganism in the laboratory and the complete chromosomal DNA fragments into the recipient cells. Conse- lack of genetic tools. AICEs as a whole and their building blocks, quently, the numerous Tra-dependent AICEs identified herein used to generate new genetic tools, could significantly facilitate the represent a huge potential for exchange of chromosomal genes study of Frankia and their symbiotic interactions with plants. within the Actinomycetales order in addition to the AICE-borne Frankia strains are among the species of Actinomycetales counting cargo of auxiliary genes. the highest number of AICEs per genome on average (Figure 2A). In this study 13 new AICEs were found in six genomes of Frankia Identification of putative T4SS-based ICEs in the in addition to the 3 AICEs identified by te Poele et al. [9] (Figure 7 genomes of actinobacteria and Table S1). The 16 Frankia AICEs have an average G+C Furthermore, we also looked for putative ICEs relying on a content of 68%; yet a subset carries genes with a G+C content T4SS-like conjugative machinery similar to the one encoded by below 55% (dashed black arrows in Figure 7) suggesting that conjugative elements of Gram-negative bacteria [14] and of significant amounts of genetic material have been acquired outside Firmicutes such as ICEBs1 from Bacillus subtilis [46] or the the Actinobacteria. The synteny between the 16 Frankia AICEs enterococcal plasmid pIP501 [47]. For that purpose, in addition to was examined to better understand the relationships linking an int gene, we sought for (i) a gene coding for a putative T4CP, (ii) AICEs found in the same or different Frankia strains. This a gene coding for a VirB4/TraC-like T4SS component, and (iii) a sequence analysis revealed a clear segregation of AICEs into three gene coding for a putative DNA relaxase (Mob) (Table 1). T4CP families (Figure 7 and S2). proteins are VirD4 homologs of the Agrobacterium T-DNA transfer The first family regroups ten AICEs (Fean5323, Fean5518, system, which is a distant relative of the FtsK/SpoIIIE domain Fean5534, Fean6303, Faln1739, Faln5456, Fcci1033, FeuI0027,

PLoS ONE | www.plosone.org 8 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

Figure 5. Phylogenetic trees of putative replication-associated proteins identified in putative AICEs. (A) RepAM (DUF3631) domain- containing proteins. (B) Prim-Pol (bifunctional DNA primase/polymerase) domain-containing proteins Label colors indicate the suborders of

PLoS ONE | www.plosone.org 9 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

Actinomycetales within which the AICEs were found (see legend). The color strip in panel B indicates the type of associated replication protein: dark blue, RepAM; light blue, RepPP. doi:10.1371/journal.pone.0027846.g005

Feun0941 and Feun3577) that are characterized by a conserved protein and a RepSAMR2 RCR initiator protein (Figure 4 and 6). structural module containing pra, tra, rep, xis and int (Table S4). Although int was found to be the most divergent gene (Figure 3B), Phylogenetic analysis showed that all encode a pMR2-type Tra these observations suggest that these 10 AICEs all derive from a

Figure 6. Phylogenetic tree of proteins containing an FtsK/SpoIIIE domain (Tra) found in AICEs. Sub-families were defined as clades containing at least 4 members with a distance cut-off of 2 and with a .80 bootstraps support. Label colors indicate the suborders of Actinomycetales within which the AICEs were found (see legend). Outer color strips indicate AICEs encoding Prim-Pol (red), RepAM (dark blue), RepPP (light blue), and RepSASAM2 (green), RepSAMR2 (light green) and other RepSA-like (dark green) replication proteins. Black dots indicate AICEs encoding serine site- specific serine recombinases. doi:10.1371/journal.pone.0027846.g006

PLoS ONE | www.plosone.org 10 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

Figure 7. Genetic organization of putative AICEs identified in 6 Frankia genomes. The name, size and type of tRNA gene within which AICEs are inserted are indicated. Colour coding: orange, recombination; red, recombination directionality; yellow and gray, replication; blue, intermycelial transfer; lime green, regulation; dark green, transcriptional regulator LacI; pink, IS elements; mauve and fuchsia, conserved hypothetical proteins; dashed black arrows, ORFs with G+C contents ,55%. Arrows indicate the noncoding sequence regions that were deleted in order to simplify the schematic representation of Fean6303. pept, peptidase C14 caspase catalytic subunit P20; HD, metal dependent phosphohydrolase; hap, Peptidase S1 and S6 chymotrypsin/Hap; LigT,29–59 RNA-ligase; PhyH, phytanoyl-CoA dioxygenase; MTase_11, methyltransferase, domain MTase_11; NodU, carbamoyltransferase; PurN, phosphoribosylglycinamide formyltransferase; akr, aldo/keto reductase; amt, aminotransferase class I and II ; P450, cytochrome P450; hap, 2-alkenal reductase; NB-ARC, NB-ARC domain protein; p-kin, putative protein kinase; pcmt, protein-L-isoaspartate(D-aspartate)

PLoS ONE | www.plosone.org 11 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

O-methyltransferase; php, PHP -like protein; MT_19, putative S-adenosyl-L-methionine-dependent methyltransferase, pks, Putative modular polyketide synthase; cbiA, CobQ/CobB/MinD/ParA nucleotide binding domain protein; hth, helix-turn-helix domain protein; phz, phenazine biosynthesis PhzC/PhzF protein; nbp, nucleotide-binding protein; ssb, single-strand binding protein; C2H2-lp, zinc finger - C2H2-like protein; if-2, translation initiation factor IF-2; whiB, transcription factor WhiB; cutA1, divalent ion tolerance protein CutA1; C5-MT, C-5 cytosine-specific DNA methylase; NTP-ase, putative signal transduction protein with Nacht domain – NTPase; DNA_pol3_beta_2, DNA polymerase III beta subunit, central domain. doi:10.1371/journal.pone.0027846.g007 common ancestor. Interestingly, Xis and Pra encoded by AICEs belonging to two different sub-families (Figure 3B). Another originating from the same strain share up to 97% identity anomalous organization is found for Fean6303 which contains suggesting potential cross-talks between elements of the same three direct repeated sequences extending beyond the 39 end of family. For instance, transactivation of excision and conjugative the tRNA-Arg gene. Two of these repeated sequences correspond transfer was reported in cells harbouring multiple copies of the to two different attL sites, attL and attL2, and the third one is attR ICE Tn916 from Enterococcus faecalis [55]. Likewise, coordinated (Figure 7). The first attL (81 bp) defines, together with attR, an induction of the lytic cycle of Gifsy prophages was recently AICE 5 kb shorter than the structure previously described by te reported in polylysogenic strains of Salmonella [56]. Poele et al. [9]. These findings suggest that Fean6303 is able to The second family of AICEs includes the two smallest AICEs excise as two different circular molecules of 15.3 kb and 32.3 kb. A identified in Frankia, Fcci4274 and Fdat4298. While both encode a similar organization was reported for ICEs found in the lactic acid pMR2-type Tra protein, unlike AICEs of the previous family, both bacteria Streptococcus thermophilus [59]. Finally, Fcci3390 bears two encode a RepSASAM2 RCR initiator (Figure 4 and 6) suggesting different int genes separated by an attL site and a gene coding for a that module exchange took place between AICEs. hypothetical protein. Interestingly, unlike most AICEs, these two The third family includes Faln2929 and Fcci3390. Like AICEs int genes are located upstream of the tra gene instead of of the second family, both elements retain pMR2-type tra genes downstream of rep. and RepSASAM2 RCR initiator rep genes. However, they encode Excision of ICEs from the chromosome is the first step completely different int genes (Figure 3, 4 and 6). In Faln2929, int preceding their conjugative transfer. Together, the integrase and encodes a serine recombinase, which is convergent with tra and rep. the RDF Xis catalyze a site-specific recombination event between Fcci3390 appears to correspond to a circular permutation of the the attL and attR attachment sites flanking the ICE to generate an classical AICE att-tra-rep-xis-int-att structure into an att-int-tra-rep-xis- attB site on the chromosome and an attP site on the circularized att structure. ICE. The latter can readily be detected using a nested PCR assay. Finally, FeuI6863 and Fdat2245 do not seem to share any To investigate whether the putative AICEs detected in the Frankia extended sequence identities with each other or with any AICEs of genomes were functional mobile elements, we carried out nested the 3 previous families. While the Tra proteins encoded by these PCR experiments to detected attP for all of the putative AICEs two ICEs are distantly related to those encoded by pMEA300 and detected in Frankia strains ACN14a, Ean1pec and Cci3. Nine out pSE222, they encode different replication initiator proteins and of ten tested AICEs were found to excise when the strains were Fdat2245 is the only Frankia AICE encoding a Prim-pol replication grown in liquid cultures. Excision of Fcci4274 could not be protein (Figure 4, 5 and 6). Altogether, these observations suggest detected in our assay. that FeuI6863 and Fdat2245 are recent acquisitions in Frankia Interestingly, Fean5518 and Fean5534, which are inserted in a species. tandem fashion, were found to excise both independently and Fean1457, Fcci0407, Fcci1144 and Feul5809 form a group associatively within a large ,31.8 kb circular molecule. Fean6303 apart as they lack either a tra or rep gene (Figure S3 and Table S5). was found to be able to excise by recombination between attR and Therefore, this group may be considered as remnant AICEs or either attL (15.3 kb form) or attL2 (32.3 kb form) (Figure 8A, lanes 6 mobilizable integrating elements. and 7). Similar observations have been reported for ICESt1 and related ICEs from S. thermophilus [59] as well as for the High Integration and excision of Frankia AICEs Pathogenicity Island (HPI) in ICEEc1 from Escherichia coli [60]. With the exception of Faln2929, which encodes a serine Surprisingly, excision of Faln1739 was detectable despite the recombinase and disrupts a gene encoding a putative endonucle- absence of a functional recombinase (Figure 8A, lane 8). The int ase (Fraal2928, data not shown), all Frankia AICEs are site- gene of Faln1739 encodes a predicted truncated protein due to the specifically integrated into the 39 end of tRNA genes, a process likely mediated by the tyrosine recombinase that they encode (Figure 3 and 7). These integrases grouped within pSE211, Fean5518, pSA1.1, Fcci3330 and SLP1 sub-families (Figure 3B). While integration of AICEs is mediated by the integrase alone, excision requires in most cases the concerted expression of a recombination directionality factor (RDF), usually called excisio- nase or Xis protein [57,58]. All the AICEs identified but Faln 2929 carry a gene coding for a putative RDF (Figure 7). Most Frankia AICEs exhibit the typical att-tra-rep-int-att structure, with a few exceptions. Fean5518 and Fean5534 are integrated in a Figure 8. Detection of excision of putative Frankia AICEs in vitro tandem fashion into a tRNA-Lys gene in Frankia sp. EAN1pec. and in planta. Formation of attP sites was detected by nested PCR in Such organization is also found in Streptomyces avermitilis for the circular excised elements. (A) Detection in vitro. 1, Fean1457; 2, Fean5323; 3, Fean5518; 4, Fean5534; 5, Fean5518–5534; 6, Fean6303; AICEs Sav3708 and Sav3728 (te Poele et al. 2008 and this study), 7, Fean6303–6336; 8, Faln1739; 9, Faln5456; 10, Fcci0407; 11, Fcci1033; which are also integrated in tandem fashion in a tRNA-Arg gene. 12, Faln2929; 13, Fcci3388. (B) Detection in planta. 1, Faln1739; 2, Interestingly, while Fean5518 and Fean5534 encode two closely Faln5456. M: marker DNA fragments. related integrases, Sav3708 and Sav3728 encode integrases doi:10.1371/journal.pone.0027846.g008

PLoS ONE | www.plosone.org 12 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements presence of a stop codon at position 291. Another integrase transcription factor. WhiB has an essential role in sporulation in S. encoded by the genome of ACN14a could mediate the integration coelicolor [65]. and excision of this AICE; yet we failed to detect any closely The presence of such transcriptional regulators in AICEs related int gene in this genome. Furthermore, no mutation was suggests that acquisition of specific AICEs by Frankia strains could detected in the tRNA genes for Tyr or Gln codons or other tRNA eventually alter particular pathways or functions in Frankia such as genes, ruling out the involvement of a suppressor tRNA in the secondary metabolite production, establishment of symbiosis or translation of this gene into a functional protein. We hypothesize nitrogen fixation. that slippage of the ribosome through the TAA stop codon during translation of Faln17399s int allows sufficient production of a Auxiliary functions encoded by AICEs in Frankia functional integrase. Examination of the general molecular structure of AICEs from Excision of Fcci3390 was observed only for the segment located the six strains of Frankia revealed the presence of an impressive between attL and attR (Figure 8A, lane 13). Phylogenetic analysis number of genes coding for hypothetical proteins. A search with showed that the two integrases encoded by this AICE (see batch Blast combined with an HMMsearch against all Pfam-A Fcci3388 and Fcci3390 in figure 3B) belong to two different sub- families from Pfam 25.0 database allowed us to attribute putative families. Excision of the segment located between attL2 and attR2 functions to a significant number of hypothetical genes (Table S6). which should be mediated by int2 is likely prevented by the lack of Two main groups of genes carried by Frankia AICEs can be a cognate RDF. distinguished based on the predicted functions: (i) genes coding for Excision of Faln2929, which encodes a serine recombinase but proteins involved in nucleic acids and amino acids metabolism and no identifiable RDF, was also detected. This suggests that this (ii) genes coding for adaptive functions. recombinase is able to catalyze both integration and excision of In the first group, a cluster of 3 genes is shared by four Frankia this AICE without any requirement for an auxiliary AICE- AICEs and by Sco5349, pMEA100 and pSE211. This gene cluster encoded Xis protein. A similar observation has been reported for contains mdp (metal-dependent phosphohydrolase), nud (Nudix Streptomyces phage BT1 [61]. hydrolase) and xre (XRE-family transcriptional regulator). Metal- Finally, excision of Faln1739 and Faln5456 was also detected in dependent phosphohydrolases are HD hydrolases usually involved Alnus viridis ssp. crispa nodules infected by Frankia alni ACN14a and in the nucleic acid metabolism and signal transduction in bacteria grown in controlled laboratory conditions (Figure 8B, lanes 1 and and archaea [66]. Nudix hydrolases such as MutT in E. coli have 2). These results demonstrate the possibility of genetic exchange in the ability to degrade potentially mutagenic oxidised nucleotides planta involving Frankia and other endophytes occupying this same, [67,68]. The Nudix hydrolase Pif of pSAM2 is involved in narrow, ecological niche. conjugal immunity [27]. Fcci1033 encodes a predicted phosphor- ibosylglycinamide formyltransferase (EC:2.1.2.2). Proteins from this family are involved in the biosynthesis of secondary Putative regulatory functions in Frankia ICEs metabolites and nucleotide metabolism. Conjugative transfer of most natural ICEs identified to date is The second group of genes code for adaptive functions such as tightly regulated and can be induced by ICE-specific environ- heavy metals resistance, antibiotic biosynthesis and response to mental stimuli, such as the presence of antibiotics (tetracycline), different environmental stressors. Fean6303 encodes a carbamoyl- DNA-damaging agents (UV light, mitomycin C) or quorum transferase belonging to the same family as the Rhizobium protein sensing [11]. These diverse mechanisms of induction employ a vast NodU, which is involved in the synthesis of nodulation factors, and array of ICE-encoded regulator proteins. Some of these proteins CmcH, a protein from Nocardia lactamdurans, which is involved in can interact positively or negatively with other mobile genetic the biosynthesis of the antibiotic cephamycin (Coque et al. 1995; elements [62]. Frankia AICEs also include putative regulators such Jabbouri et al. 1995). as orthologs of Pra encoded by pSAM2 as well as putative proteins Faln5456 encodes a 150-amino acid protein with predicted containing XRE or GntR domains. In pSAM2 from Streptomyces modular polyketide synthase activity. In streptomycetes, polyketide ambofaciens, int, xis and repSA form an operon that is positively synthases participates to the synthesis of molecules with antibiotic regulated by the product of pra [16]. All but three Frankia AICEs and antifungal properties [69,70]. Interestingly, Faln5456 and carry a pra ortholog. Faln5323 encode two distantly related putative S-adenosyl-L- GntR-family transcriptional regulators encoded by Faln1739, methionine-dependent methyltransferases. Such enzymes are Fcci4274, Fdat4298, and Feul5809 are related (35%–48% known to be important for polyketide activation [69]. Finally, identity) to KorSA, a protein encoded by pSAM2. KorSA belongs FeuI6863 encodes a putative CutA1-like divalent ion tolerance to a kil-kor system and negatively regulates the replication and protein, which may be involved in resistance toward heavy metals. transfer of pSAM2 by repressing pra [63]. Interestingly, the gene korSA of Fdat4298 has a low G+C content compared to the Concluding remarks neighbouring ORFs suggesting that this gene had been acquired Together with other mobile elements such as plasmids and from bacteria with lower G+C content. prophages, ICEs are important catalysts for bacterial genomes Eleven out of the sixteen studied AICEs contain two different evolution. In addition to their own dissemination, ICEs of different types of ORFs coding for XRE-family transcriptional regulators families are also mobilizing chromosomal DNA by an increasing (Figure 7). XRE proteins have an N-terminal helix-turn-helix number of ways such as hfr-like transfer [71], mobilization of DNA-binding motif and belong to the xenobiotic response element related genomic islands [62,72] and possibly by recognition of clt- family of transcriptional regulators. The first type of gene, mostly like regions by Tra proteins in Streptomyces [44,45]. In contrast to located near pra encodes small (129 to 180 amino acid residues) T4SS ICEs which were recently determined to be the most XRE proteins. The second type encodes larger proteins (281 to abundant conjugative elements in prokaryotes [73], AICEs have 441 amino acid residues) sharing 27–36% identity with NsdA received little attention. The systematic in silico identification of proteins from various Streptomyces strains, such as SCO5582 from S. 144 AICEs in the whole Actinobacteria phylum reveals the coelicolor. SCO5582 is a negative regulator of antibiotic production diversity of possible canonical AICEs modules (integration, and sporulation [64]. FeuI6863 encodes a putative WhiB-like replication, transfer) and but mostly their surprising combination

PLoS ONE | www.plosone.org 13 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements plasticity. As an example, Int-Tyr family proteins cluster in several Predictions of domains and functions of Frankia ICE-encoded sub-families which are not necessarily associated to the same proteins were determined using HMMsearch against all Pfam-A replication protein families. Both tyrosine and serine integrase families from Pfam 25.0 database. Comparative analyses of AICEs family proteins promote the specific integration in a vast array of were carried out using Mauve [82]. different sites, mostly tRNA genes. These elements are promising substrates for the development of molecular genetic tools for gene Bacterial strain and culture methods expression as well as gene interruption. Moreover, the observation Frankia alni sp. ACN14a was isolated in 1982 from nodules of of many AICEs in the same genome, as well as putative T4SS Alnus viridis subsp. crispa plants in Tadoussac, QC, Canada [83] based ICEs in Actinobacteria, further expands the possibility for (catalog registry number ULQ010201401) [84]. Frankia sp. large genome modification by these possible molecular tools in EAN1pec was isolated in 1978 from nodules of Elaeagnus augustifolia diverse species, many of which, such as Frankia spp, are currently growing in Ohio, U.S.A. [85] (catalog registry number lacking effective genetic tools for their study. ULQ13100144). Frankia sp. CcI3 was isolated in 1983 from A total of sixteen different AICEs were identified in the genomes nodules of Casuarina cunninghamiana (MA, U.S.A.) (catalog registry of six Frankia strains. Nine of these AICEs were shown to excise in number HFP020203) [86]. Frankia alni ACN14a and Frankia sp. vitro from the chromosome by site-specific recombination, the Cci3 were grown at 30uC (obscurity, static) in BAPS medium: initial prerequisite step for their conjugation. Additionally, excision propionate-BAP medium [87] supplemented with 5 g/L sodium in planta of two of these AICEs was shown to occur while in succinate. Frankia sp. EAN1pec was grown at 25uC (obscurity, symbiotic association in root nodules of Alnus viridis. Frankia AICEs static) in MP medium supplemented with 20 mM fructose as harbor diverse sets of genes that do not seem to be involved in described by Tisa and Ensign [88]. their mobility, but rather encode for putative proteins likely involved in AICEs compatibility and exclusion, nodulation or DNA extraction from liquid culture and nodules secondary metabolism. Although the possible advantages con- Total genomic DNA from Frankia strains grown in liquid cultures ferred by AICE-borne genes have yet to be determined, our or present in Alnus viridis subsp. crispa nodules grown in controlled findings suggest that AICEs play an adaptive role and establishes laboratory conditions was purified using REDExtract-N-AmpTM new grounds for evolutionary and molecular study of Actinobac- Plant or Seeds PCR Kit (Sigma-Aldrich; St-Louis, MO, USA). teria. ICE excision assays Materials and methods ICE excision was determined by nested PCR [62]. Primers used for the detection of circularized forms of ICEs are described in Bioinformatic analyses Table S9. REDExtract-N-AmpTM PCR ReadyMixTM (Sigma- The predicted proteomes of 275 genomes (Table S7) and 176 Aldrich) was used to perform amplification in 20 ml reactions in (Table S8) plasmids of Actinobacteria available in June 2011 were the following conditions: 95uC for 5 min, 35 cycles of 95uC for 30 recovered from the RefSeq database [21] and analyzed to identify s, 57–67uC for 1 min (depending on the set of primers used) and putative ICEs. Integrases, and putative transfer and replication 72uC for 30 s, with a final elongation step at 72u for 10 min. Then proteins were sought for in these proteomes using profile hidden 1 ml of amplification product, obtained following the first round of Markov models (HMMs) with HMMsearch from the HMMER PCR, was used for a second round of PCR using different sets of v3.0 software package [74]. The HMM profiles that were used for primers identified as inner primers in Table S9. PCR conditions data mining were recovered from the Pfam 25.0 database [75] and were the same as described above. Amplicons from the second are summarized in Table 1. Since no HMM profile for the RepSA- round of PCR were sequenced by Genome Quebec Innovation like replication proteins was available at the time of this study, it was Center of McGill University (Montreal, QC, Canada). built using HMMbuild from the HMMER v3.0 software package from a C- and N-terminal trimmed multiple sequence alignment of Supporting Information the RepSA proteins described by te Poele et al [9] using MUSCLE multiple sequence alignment software [76] (File S1). Maximum- Figure S1 Selected AICES and plasmids synteny. The likehood phylogenies of AICE conserved proteins were generated synteny maps were generated by Mauve. Colored blocks correspond using the PhyML v3.0 program [77] with the LG substitution depict collinear and homologous sequences or regions between model. Tree topologies were optimized by PhyML using both the AICEs DNA sequences. A similarity profile is represented inside each NNI and SPR methods and the starting tree was estimated using block. The height of the similarity profile corresponds to the average BioNJ. Branch support of the phylogenies was estimated using non- level of conservation in that region of AICEs sequence. Regions parametric bootstrap (100 replicates). Phylogenetic analyses were outside blocks lack detectable homology among the studied AICEs. computed from reliable amino acid alignments built by MUSCLE. (TIFF) Removal of poorly aligned regions from amino acid alignments was Figure S2 Synteny of Frankia AICEs. Synteny between carried out by trimAl v1.2 software using the automated heuristic Frankia AICEs was determined as indicated in Figure S2. attL and approach [78] prior to phylogenetic analyses. Phylogenetic trees attR sites are indicated by flanking red vertical lines. were viewed using iTOL v2 [79] and are available as a shared (TIFF) project on the iTOL website (http://itol.embl.de/shared/Vincent Burrus). The search for tRNA genes, located near integrase genes, Figure S3 Genetic organization of putative Frankia was performed using tRNAscan-SE webserver [80]. AICE remnants. The genetic organisation of the elements is Large genome segments (30 to 60 kbp) containing at least a site- depicted as in Figure 7. GT2, putative glycosyl transferase family 2; specific recombination gene and a DNA translocation and HATP-ase, putative signal transduction histidine kinase; GP49, replication gene were analysed with Yass [81] to identify the GP49-like protein; apc5, anaphase-promoting complex subunit 5; direct repeats of the attL and attR attachment sites flanking Frankia s-kin, serine/threonine protein kinase; lt, lytic transglycosylase; ICEs. The direct repeat sequence corresponding to the attR site PALP, pyridoxal-59-phosphate-dependent protein subunit beta. was usually found to be the 39 end of a tRNA-encoding gene. (TIFF)

PLoS ONE | www.plosone.org 14 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

File S1 RepSA HMM. Table S7 Actinobacterial genomes analysed in this (HMM) study. (XLS) Table S1 AICEs identified in this study. (XLS) Table S8 Actinobacterial plasmids analysed in this Table S2 T4SS ICEs identified in this study. study. (XLS) (XLS) Table S3 Frankia T4SS ICE gene content. Table S9 Primers used in this study. (DOC) (DOC) Table S4 Frankia AICEs orthologs. Acknowledgments (XLS) The authors acknowledge the support of Universite´ Laval for granting Table S5 Predicted functions of putative proteins access to the Frankia collection of Pr. Maurice Lalonde. encoded by Frankia remnant AICEs or mobilizable integrating elements. Author Contributions (DOC) Conceived and designed the experiments: MG EB VB. Performed the Table S6 Predicted functions of putative proteins experiments: MG EB VB. Analyzed the data: MG EB VB. Contributed encoded by Frankia AICEs. reagents/materials/analysis tools: JB RB SR VB. Wrote the paper: MG EB (DOC) RB SR VB.

References 1. Roy S, Khasa DP, Greer CW (2007) Combining alders, frankiae, and 22. Biller SJ, Burkholder WF (2009) The Bacillus subtilis SftA (YtpS) and SpoIIIE mycorrhizae for the revegetation and remediation of contaminated ecosystems. DNA translocases play distinct roles in growing cells to ensure faithful Can J Bot 85: 237–251. chromosome partitioning. Mol Microbiol 74: 790–809. 2. Dawson JO (2008) Ecology of actinorhizal plants. In: Pawlowski K, Newton WE, 23. Kaimer C, Gonzalez-Pastor JE, Graumann PL (2009) SpoIIIE and a novel type eds. Nitrogen-fixing actinorhizal symbioses. Dordrecht: Springer. pp 199–234. of DNA translocase, SftA, couple chromosome segregation with cell division in 3. Dixon ROD, Wheeler CT (1986) Nitrogen Fixation in Plants. Glasgow: Biackie. Bacillus subtilis. Mol Microbiol 74: 810–825. 4. Normand P, Lapierre P, Tisa LS, Gogarten JP, Alloisio N, et al. (2007) Genome 24. Begg KJ, Dewar SJ, Donachie WD (1995) A new Escherichia coli cell division characteristics of facultatively symbiotic Frankia sp. strains reflect host range and gene, ftsK. J Bacteriol 177: 6211–6222. host plant biogeography. Genome Res 17: 7–15. 25. Burton BM, Marquis KA, Sullivan NL, Rapoport TA, Rudner DZ (2007) The 5. Lerat S, Simao-Beaunoir AM, Beaulieu C (2009) Genetic and physiological ATPase SpoIIIE transports DNA across fused septal membranes during determinants of Streptomyces scabies pathogenicity. Mol Plant Pathol 10: 579–585. sporulation in Bacillus subtilis. Cell 131: 1301–1312. 6. Rosas-Magallanes V, Deschavanne P, Quintana-Murci L, Brosch R, Gicquel B, 26. Warth L, Haug I, Altenbuchner J (2011) Characterization of the tyrosine et al. (2006) Horizontal transfer of a virulence operon to the ancestor of recombinase MrpA encoded by the Streptomyces coelicolor A3(2) plasmid SCP2*. Mycobacterium tuberculosis. Mol Biol Evol 23: 1129–1135. Arch Microbiol 193: 187–200. 7. Kers JA, Cameron KD, Joshi MV, Bukhalid RA, Morello JE, et al. (2005) A 27. Possoz C, Gagnat J, Sezonov G, Guerineau M, Pernodet JL (2003) Conjugal large, mobile pathogenicity island confers plant pathogenicity on Streptomyces immunity of Streptomyces strains carrying the integrative element pSAM2 is due to species. Mol Microbiol 55: 1025–1033. the pif gene (pSAM2 immunity factor). Mol Microbiol 47: 1385–1393. 8. Jayapal KP, Lian W, Glod F, Sherman DH, Hu WS (2007) Comparative 28. Williams KP (2002) Integration sites for genetic elements in prokaryotic tRNA genomic hybridizations reveal absence of large Streptomyces coelicolor genomic and tmRNA genes: sublocation preference of integrase subfamilies. Nucleic islands in Streptomyces lividans. BMC Genomics 8: 229. Acids Res 30: 866–875. 9. te Poele EM, Bolhuis H, Dijkhuizen L (2008) Actinomycete integrative and 29. Askora A, Kawasaki T, Fujie M, Yamada T (2011) Resolvase-like serine conjugative elements. Van Leeuwenhoek 94: 127–143. recombinase mediates integration/excision in the bacteriophage phiRSM. 10. Burrus V, Waldor MK (2004) Shaping bacterial genomes with integrative and J Biosci Bioeng 111: 109–116. conjugative elements. Res Microbiol 155: 376–386. 30. Hochhut B, Beaber JW, Woodgate R, Waldor MK (2001) Formation of 11. Wozniak RA, Waldor MK (2010) Integrative and conjugative elements: mosaic chromosomal tandem arrays of the SXT element and R391, two conjugative mobile genetic elements enabling dynamic lateral gene flow. Nat Rev Microbiol chromosomally integrating elements that share an attachment site. J Bacteriol 8: 552–563. 183: 1124–1132. 12. Burrus V, Pavlovic G, Decaris B, Guedon G (2002) Conjugative transposons: the 31. Garriss G, Waldor MK, Burrus V (2009) Mobile antibiotic resistance encoding tip of the iceberg. Mol Microbiol 46: 601–610. elements promote their own diversity. PLoS Genet 5: e1000775. 13. Christie PJ, Atmakuri K, Krishnamoorthy V, Jakubowski S, Cascales E (2005) 32. Burrus V, Waldor MK (2004) Formation of SXT tandem arrays and SXT-R391 Biogenesis, architecture, and function of bacterial type IV secretion systems. hybrids. J Bacteriol 186: 2636–2645. Annu Rev Microbiol 59: 451–485. 33. Pembroke JT, Murphy DB (2000) Isolation and analysis of a circular form of the 14. de la Cruz F, Frost LS, Meyer RJ, Zechner EL (2010) Conjugative DNA IncJ conjugative transposon-like elements, R391 and R997: implications for IncJ metabolism in Gram-negative bacteria. FEMS Microbiol Rev 34: 18–40. incompatibility. FEMS Microbiol Lett 187: 133–138. 15. Sezonov G, Hagege J, Pernodet JL, Friedmann A, Guerineau M (1995) 34. Murphy DB, Pembroke JT (1999) Monitoring of chromosomal insertions of the IncJ Characterization of pra, a gene for replication control in pSAM2, the integrating elements R391 and R997 in Escherichia coli K-12. FEMS Microbiol Lett 174: 355–361. element of Streptomyces ambofaciens. Mol Microbiol 17: 533–544. 35. Fouts DE (2006) Phage_Finder: automated identification and classification of 16. Sezonov G, Duchene AM, Friedmann A, Guerineau M, Pernodet JL (1998) prophage regions in complete bacterial genome sequences. Nucleic Acids Res Replicase, excisionase, and integrase genes of the Streptomyces element pSAM2 34: 5839–5851. constitute an operon positively regulated by the pra gene. J Bacteriol 180: 36. Panis G, Duverger Y, Courvoisier-Dezord E, Champ S, Talla E, et al. (2010) 3056–3061. Tight regulation of the intS gene of the KplE1 prophage: a new paradigm for 17. Hagege J, Boccard F, Smokvina T, Pernodet JL, Friedmann A, et al. (1994) integrase gene regulation. PLoS Genet 6. Identification of a gene encoding the replication initiator protein of the 37. del Solar G, Giraldo R, Ruiz-Echevarria MJ, Espinosa M, Diaz-Orejas R (1998) Streptomyces integrating element, pSAM2. Plasmid 31: 166–183. Replication and control of circular bacterial plasmids. Microbiol Mol Biol Rev 18. Hagege J, Pernodet JL, Friedmann A, Guerineau M (1993) Mode and origin of 62: 434–464. replication of pSAM2, a conjugative integrating element of Streptomyces 38. Ilyina TV, Koonin EV (1992) Conserved sequence motifs in the initiator ambofaciens. Mol Microbiol 10: 799–812. proteins for rolling circle DNA replication encoded by diverse replicons from 19. Smokvina T, Boccard F, Pernodet JL, Friedmann A, Guerineau M (1991) Functional eubacteria, eucaryotes and archaebacteria. Nucleic Acids Res 20: 3279–3285. analysis of the Streptomyces ambofaciens element pSAM2. Plasmid 25: 40–52. 39. Koonin EV, Ilyina TV (1993) Computer-assisted dissection of rolling circle DNA 20. Raynal A, Karray F, Tuphile K, Darbon-Rongere E, Pernodet JL (2006) replication. Biosystems 30: 241–268. Excisable cassettes: new tools for functional analysis of Streptomyces genomes. Appl 40. Weigel C, Seitz H (2006) Bacteriophage replication modules. FEMS Microbiol Environ Microbiol 72: 4839–4844. Rev 30: 321–381. 21. Pruitt KD, Tatusova T, Klimke W, Maglott DR (2009) NCBI Reference 41. Orozco BM, Hanley-Bowdoin L (1998) Conserved sequence and structural Sequences: current status, policy and new initiatives. Nucleic Acids Res 37: motifs contribute to the DNA binding and cleavage activities of a geminivirus D32–36. replication protein. J Biol Chem 273: 24448–24456.

PLoS ONE | www.plosone.org 15 November 2011 | Volume 6 | Issue 11 | e27846 Actinomycete Integrating Conjugative Elements

42. Orozco BM, Gladfelter HJ, Settlage SB, Eagle PA, Gentry RN, et al. (1998) 64. Li W, Ying X, Guo Y, Yu Z, Zhou X, et al. (2006) Identification of a gene Multiple cis elements contribute to geminivirus origin function. Virology 242: negatively affecting antibiotic production and morphological differentiation in 346–356. Streptomyces coelicolor A3(2). J Bacteriol 188: 8368–8375. 43. Omer CA, Cohen SN (1989) SLP1: a paradigm for plasmids that site-specifically 65. Davis NK, Chater KF (1992) The Streptomyces coelicolor whiB gene encodes a small integrate in the actinomycetes. In: Berg DE, Howe MM, eds. Mobile DNA. transcription factor-like protein dispensable for growth but essential for Washington, DC: American Society for Microbiology. pp 289–296. sporulation. Mol Gen Genet 232: 351–358. 44. Vogelmann J, Ammelburg M, Finger C, Guezguez J, Linke D, et al. (2011) 66. Aravind L, Koonin EV (1998) The HD domain defines a new superfamily of Conjugal plasmid transfer in Streptomyces resembles bacterial chromosome metal-dependent phosphohydrolases. Trends Biochem Sci 23: 469–472. segregation by FtsK/SpoIIIE. EMBO J 30: 2246–2254. 67. Bessman MJ, Frick DN, O’Handley SF (1996) The MutT proteins or ‘‘Nudix’’ 45. Pettis GS, Cohen SN (1994) Transfer of the plJ101 plasmid in Streptomyces lividans hydrolases, a family of versatile, widely distributed, ‘‘housecleaning’’ enzymes. requires a cis-acting function dispensable for chromosomal gene transfer. Mol J Biol Chem 271: 25059–25062. Microbiol 13: 955–964. 68. Kamiya H, Ishiguro C, Harashima H (2004) Increased A:T--.C:G mutations in 46. Berkmen M, Laurer SJ, Giarusso BK, Romero R (2011) The Integrative and the mutT strain upon 8-hydroxy-dGTP treatment: direct evidence for MutT Conjugative Element ICEBs1 of Bacillus subtilis. In: Roberts AP, Mullany P, eds. involvement in the prevention of mutations by oxidized dGTP. J Biochem 136: Integrative Mobile Gnetic Elements: Landes Bioscience, in press. 359–362. 47. Abajy MY, Kopec J, Schiwon K, Burzynski M, Doring M, et al. (2007) A type 69. Zotchev S, Haugan K, Sekurova O, Sletta H, Ellingsen TE, et al. (2000) IV-secretion-like system is required for conjugative DNA transport of broad- Identification of a gene cluster for antibacterial polyketide-derived antibiotic host-range plasmid pIP501 in gram-positive bacteria. J Bacteriol 189: biosynthesis in the nystatin producer Streptomyces noursei ATCC 11455. 2487–2496. Microbiology 146(Pt 3): 611–619. 48. Gomis-Ruth FX, Sola M, de la Cruz F, Coll M (2004) Coupling factors in 70. Hopwood DA (1997) Genetic Contributions to Understanding Polyketide macromolecular type-IV secretion machineries. Curr Pharm Des 10: Synthases. Chem Rev 97: 2465–2498. 1551–1565. 71. Hochhut B, Marrero J, Waldor MK (2000) Mobilization of plasmids and 49. Garcillan-Barcia MP, Francia MV, de la Cruz F (2009) The diversity of chromosomal DNA mediated by the SXT element, a constin found in Vibrio conjugative relaxases and its application in plasmid classification. FEMS cholerae O139. J Bacteriol 182: 2043–2047. Microbiol Rev 33: 657–687. 72. Osorio CR, Marrero J, Wozniak RA, Lemos ML, Burrus V, et al. (2008) 50. Lee CA, Babic A, Grossman AD (2010) Autonomous plasmid-like replication of Genomic and functional analysis of ICEPdaSpa1, a fish-pathogen-derived SXT- a conjugative transposon. Mol Microbiol 75: 268–279. related integrating conjugative element that can mobilize a virulence plasmid. 51. Mohd-Zain Z, Turner SL, Cerdeno-Tarraga AM, Lilley AK, Inzana TJ, et al. J Bacteriol 190: 3353–3361. (2004) Transferable antibiotic resistance elements in Haemophilus influenzae share a 73. Guglielmini J, Quintais L, Garcillan-Barcia MP, de la Cruz F, Rocha EP (2011) common evolutionary origin with a diverse family of syntenic genomic islands. The Repertoire of ICE in Prokaryotes Underscores the Unity, Diversity, and J Bacteriol 186: 8114–8122. Ubiquity of Conjugation. PLoS Genet 7: e1002222. 74. Eddy SR (2009) A new generation of homology search tools based on 52. Mavaro A, Abts A, Bakkes PJ, Moll GN, Driessen AJ, et al. (2011) Substrate probabilistic inference. Genome Inform 23: 205–211. Recognition and Specificity of the NisB Protein, the Lantibiotic Dehydratase 75. Finn RD, Mistry J, Tate J, Coggill P, Heger A, et al. (2010) The Pfam protein Involved in Nisin Biosynthesis. J Biol Chem 286: 30552–30560. families database. Nucleic Acids Res 38: D211–222. 53. Terns MP, Terns RM (2011) CRISPR-based adaptive immune systems. Curr 76. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy Opin Microbiol 14: 321–327. and high throughput. Nucleic Acids Res 32: 1792–1797. 54. Guo P, Cheng Q, Xie P, Fan Y, Jiang W, et al. (2011) Characterization of the 77. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, et al. (2010) New multiple CRISPR loci on Streptomyces linear plasmid pSHK1. Acta Biochim algorithms and methods to estimate maximum-likelihood phylogenies: assessing Biophys Sin (Shanghai) 43: 630–639. the performance of PhyML 3.0. Syst Biol 59: 307–321. 55. Flannagan SE, Clewell DB (1991) Conjugative transfer of Tn916 in Enterococcus 78. Capella-Gutierrez S, Silla-Martinez JM, Gabaldon T (2009) trimAl: a tool for faecalis: trans activation of homologous transposons. J Bacteriol 173: 7136–7141. automated alignment trimming in large-scale phylogenetic analyses. Bioinfor- 56. Lemire S, Figueroa-Bossi N, Bossi L (2011) Bacteriophage crosstalk: coordina- matics 25: 1972–1973. tion of prophage induction by trans-acting antirepressors. PLoS Genet 7: 79. Letunic I, Bork P (2011) Interactive Tree Of Life v2: online annotation and e1002149. display of phylogenetic trees made easy. Nucleic Acids Res. 57. Raynal A, Tuphile K, Gerbaud C, Luther T, Guerineau M, et al. (1998) 80. Schattner P, Brooks AN, Lowe TM (2005) The tRNAscan-SE, snoscan and Structure of the chromosomal insertion site for pSAM2: functional analysis in snoGPS web servers for the detection of tRNAs and snoRNAs. Nucleic Acids Escherichia coli. Mol Microbiol 28: 333–342. Res 33: W686–689. 58. Lewis JA, Hatfull GF (2001) Control of directionality in integrase-mediated 81. Noe L, Kucherov G (2005) YASS: enhancing the sensitivity of DNA similarity recombination: examination of recombination directionality factors (RDFs) search. Nucleic Acids Res 33: W540–543. including Xis and Cox proteins. Nucleic Acids Res 29: 2205–2216. 82. Darling AE, Mau B, Perna NT (2010) progressiveMauve: multiple genome 59. Pavlovic G, Burrus V, Gintz B, Decaris B, Guedon G (2004) Evolution of alignment with gene gain, loss and rearrangement. PLoS One 5: e11147. genomic islands by deletion and tandem accretion by site-specific recombina- 83. Normand P, Lalonde M (1982) Evaluation of Frankia strains isolated from tion: ICESt1-related elements from Streptococcus thermophilus. Microbiology 150: provenances of two Alnus species. Can J Microbiol 28: 1133–1142. 759–774. 84. LeChevalier MP (1985) Catalog of Frankia strains, second edition. Actinomycetes 60. Schubert S, Dufke S, Sorsa J, Heesemann J (2004) A novel integrative and 19: 131–162. conjugative element (ICE) of Escherichia coli: the putative progenitor of the Yersinia 85. Lalonde M, Calvert HE, Pine S (1981) Isolation and use of Frankia strains in high-pathogenicity island. Mol Microbiol 51: 837–848. actinorhizae formation. In: Gibson AH, Newton WE, eds. Current perspectives 61. Zhang L, Ou X, Zhao G, Ding X (2008) Highly efficient in vitro site-specific in nitrogen fixation. Canberra: Australian Academy of Science. pp 296–299. recombination system based on Streptomyces phage phiBT1 integrase. J Bacteriol 86. Zhang Z, Lopez MF, Torrey JG (1984) A comparison of cultural characteristics 190: 6392–6397. and infectivity of Frankia isolates from root nodules of Casuarina species. Plant Soil 62. Daccord A, Ceccarelli D, Burrus V (2010) Integrating conjugative elements of 78: 79–90. the SXT/R391 family trigger the excision and drive the mobilization of a new 87. Igual JM, Dawson JO (1999) Stimulatory effects of aluminum on in vitro growth class of Vibrio genomic islands. Mol Microbiol 78: 576–588. of Frankia. Can J Bot 77: 1321–1326. 63. Sezonov G, Possoz C, Friedmann A, Pernodet JL, Guerineau M (2000) KorSA 88. Tisa LS, Ensign JC (1987) Comparative physiology of nitrogenase activity and from the Streptomyces integrative element pSAM2 is a central transcriptional vesicle development for Frankia strains CpI1, ACN1Ag, EAN1pec and EUN1f. repressor: target genes and binding sites. J Bacteriol 182: 1243–1250. Arch Microbiol 147: 383–388.

PLoS ONE | www.plosone.org 16 November 2011 | Volume 6 | Issue 11 | e27846 ANNEXE 3

203 COMMENTARY Mobile Genetic Elements 2:2, 119–124; March/April 2012; G 2012 Landes Bioscience

Diversity of integrating conjugative elements in actinobacteria Coexistence of two mechanistically different DNA-translocation systems

Eric Bordeleau, Mariana Gabriela Ghinet and Vincent Burrus* Département de biologie; Faculté des sciences; Université de Sherbrooke; QC, Canada

onjugation is certainly the most integrating conjugative elements (ICEs) Cwidespread and promiscuous mech- have the ability to integrate within the anism of horizontal gene transfer in host’schromosometobeverticallyinherited bacteria. During conjugation, DNA (for reviews see refs. 1 and 2). translocation across membranes of two Consequently, ICEs need to excise from distribute. cells forming a mating pair is mediated a donor cell’s chromosome into a circular by two types of mobile genetic elements: form prior to transfer (Fig. 1). Integration conjugative plasmids and integrating and excision of ICEs are recombination not conjugative elements (ICEs). The vast events catalyzed by serine or tyrosine

majority of conjugative plasmids and integrases (Int) between short homolog- Do ICEs employ a sophisticated protein ous sequences called attachment sites secretion apparatus called type IV secre- (att), on the circular element (attP) and tion system to transfer to a recipient cell. the chromosome (attB), or flanking the Yet another type of conjugative DNA integrated element (attL and attR), translocation machinery exists and to date respectively (Fig. 1). Although they share appears to be unique to conjugative the same preliminary step, the mechan- Keywords: actinobacteria, integrating plasmids and ICEs of the Actinomycetales isms of conjugative transfer of ICEs and conjugative elements, conjugation, DNA order, a sub-group of high G + C Gram- actinomycete ICEs (AICEs) fundament-

translocation, horizontal gene transfer positive bacteria. This conjugative system is ally differ. Conjugative transfer of ICEs is Bioscience. reminiscent of the machinery that allows presumed to be mechanistically similar to Abbreviations: ICE, integrating segregation of chromosomal DNA during conjugative transfer of prototypical Gram- conjugative element; AICE, actinomycete bacterial cell division and sporulation, and negative bacteria conjugative plasmids, integrating conjugative element; relies on a single FtsK-homolog protein while the mechanism of AICEs transfer T4SS, type IV secretion system; T4CP, to translocate double-stranded DNA is rather reminiscent of the one used by type IV coupling protein; MGE, mobile 3,4

molecules to the recipient cell. Recent Streptomyces conjugative plasmids. Landes genetic element; HMM, Hidden Markov thorough sequence analyses reveal that model; clt, cis-acting locus of transfer; while this latter strategy appears to be T4SS-Mediated Translocation clcs, clt-like chromosomal sequences used by the majority of ICEs in of Single-Stranded DNA (ICEs) Actinomycetales, the former is also Submitted: 03/28/12 predicted to be important in exchange of Widespread in Gram-negative bacteria, Revised: 04/23/12 genetic material in actinobacteria. conjugation mediated by type IV secretion ©2012 Accepted: 04/24/12 systems (T4SS) requires the assembly of the mating pore, which varies in com- http://dx.doi.org/10.4161/mge.20498 Integrating Conjugative Elements plexity, and usually involves secretion and *Correspondence to: Vincent Burrus; assembly of an extracellular pilus.5,6 One Email: [email protected] Conjugative DNA transfer allows rapid of the key components of the T4SS is adaptation of bacteria through leaps of a VirB4-like sub-unit, which exhibits Commentary to: Ghinet MG, Bordeleau E, Beaudin J, Brzezinski R, Roy S, Burrus V. acquisition and exchange of massive ATPase activity and likely energizes the Uncovering the prevalence and diversity of amounts of genetic material even between assembly and/or activity of the secretion integrating conjugative elements in actinobac- distantly related microorganisms. While channel. Biochemical processing of the teria. PLoS One 2011; 6:e27846; PMID:22114709; conjugative plasmids maintain in the DNA molecule to transfer is initiated at http://dx.doi.org/10.1371/journal.pone.0027846 host genome by autonomous replication, the origin of transfer (oriT), which is

www.landesbioscience.com Mobile Genetic Elements 119 distribute. not Do

Figure 1. Conjugative transfer models of ICEs from the two superfamilies. (A): (1) In the donor cell, ICE excision from the chromosome results from site- specific recombination between the attL and attR sites. Following excision, the relaxase (Mob), which is part of a multiprotein complex called relaxosome, recognizes the origin of transfer (oriT). (2) The Mob protein generates a nick in one strand and becomes covalently bound to the 5’ end of the nicked strand. (3) While the single-stranded nucleoprotein complex is displaced by ongoing rolling-circle (RC) replication, it interacts with the type IV coupling protein (T4CP) which generates the energy for its translocation through a dedicated type IV secretion system (T4SS). (4) Once transferred in the recipient cell, the Mob protein ligates the single-stranded DNA molecule and the complementary strand is synthesized. (5) Integration in the recipient cell’s chromosome is mediated by recombination between the attP site on the circular ICE and the chromosomal attB site. (B): (1) Like ICEs, AICEs excise from the chromosome by site-specific recombination. (2) The excised circular AICE then replicates by RC replication and reintegrate into the chromosome Bioscience. and/or transfer to a recipient cell by conjugation. (3) The transfer protein Tra recognizes the AICE cis-acting locus (clt) and mediates the transfer of the double-stranded AICE by forming a pore (Tra hexamer) in the lipid bilayer and the use of its ATPase activity. (4) The circular AICE integrates into the chromosome by site-specific recombination as described above. Alternatively, integration into the chromosome of the recipient cell could be preceded by an additional step in which RC replication would occur. Landes bound by a DNA relaxase (Mob protein) of single-stranded DNA across the donor demonstrated for pSVH1 from Streptomyces and other auxiliary proteins (Fig. 1A). and recipient cell membranes. venezuelae (Fig. 1B).4 Like for pSVH1, Altogether they assemble as a nucleo- TraB-homologs encoded by AICEs would protein complex, the relaxosome, which TraB-Mediated Translocation recognize and bind to a specific double- is recognized as a T4SS substrate.3,5,7 The of Double-Stranded DNA (AICEs) stranded DNA region on the circularized phosphodiesterase activity of the relaxase AICE, the cis-acting locus of transfer (clt) ©2012 mediates a strand-specific cleavage within FtsK-homolog based conjugative DNA- which is conceptually equivalent to the oriT, allowing the unwinding of the DNA translocation systems are structurally oriT of ICEs and necessary for efficient molecule and 5' to 3' transfer of a single- simpler, relying on a single protein, transfer. TraB of pSVH1 has been shown stranded DNA to the recipient cell. TraB, aka TraSA or Tra, which resembles to oligomerise, forming a hexameric pore Another key component found associated the septal DNA translocator FtsK.4 FtsK structure that is large enough to trans- with most conjugative T4SS is the coupl- mediates proper segregation of the freshly locate double-stranded DNA. Since trans- ing protein (T4CP), a VirD4-like subunit, duplicated circular chromosomal DNA location of double-stranded DNA is not which likely acts as a docking site for into daughter-cell compartments during a conservative mechanism, transfer to the T4SS substrates. T4CPs are phylogeneti- constriction of the septal membranes in recipient of a mobile genetic element cally and structurally related to FtsK and prokaryotic cell division.8-10 AICEs likely (MGE) using this strategy would ultim- SpoIIIE ATPases and power translocation transfer following the mechanism recently ately lead to its loss from the donor cell.

120 Mobile Genetic Elements Volume 2 Issue 2 To circumvent this limitation, a dedicated conjugative mobile elements in the geno- proteins encoded by all 144 predicted rolling-circle replication module (Rep) mes of non-Actinomycetales actinobacteria AICEs, together with proteins of other mediates replication of AICEs after suggests that this mechanism of transfer is mobile genetic elements, reveals for the excision from the chromosome and prior specifically adapted to the hyphal nature first time the considerable diversity of to transfer to the recipient.11 of the Actinomycetales. On the opposite, AICEs in Actinomycetales. Despite the conjugative elements using this strategy large sample of genomes investigated, there Prevalence of AICEs may be unable to efficiently spread and is no clear evidence of AICE exchange therefore persist in populations of actino- between bacteria of the same species, of One of the earliest genome-wide identifi- bacteria species growing as cocci or short the same genus or between genera as no cation of AICEs was performed by te Poele chains. identical or nearly identical AICEs were et al. by using sequence homology-based Single genomes hosting multiple differ- detected. This observation suggests that methods on actinomycete genomes.12 ent AICEs are frequent and the occurrence these elements are not as promiscuous as Recently, the prevalence of AICEs in 275 of AICEs correlates rather well with some ICEs or conjugative plasmids found chromosomes and 176 plasmids of acti- genome size. The genome of members of in the Firmicutes and Gram-negative nobacteria was estimated using methods the Frankinaea, Micromonosporineae and bacteria, which often carry multiple anti- based on hidden Markov model (HMM) Streptomycineae sub-orders seem to be biotic resistance genes. AICEs have never protein profiles to search for various more prone to harbor multiple unrelated been described as vectors of such genes, proteins families involved in maintenance AICEs, particularly in species isolated preventing them from benefiting from [serine and tyrosine recombinases (Int), from plants, soil and water. 13 This observ- the current anthropic selection pressure distribute. replication initiator proteins (Rep)] and ation suggests that these specific niches exerted by antibiotics in the environment. transfer [FtsK-like conjugative DNA- likely favor cell-to-cell contacts but also Alternatively, despite the 275 tested translocation proteins (Tra)] of AICEs.13 provide frequent opportunities of contact genomes, the sample size might just be not These extensive in silico analyses revealed between unrelated or distantly related too small and their respective ecological

144 putative AICEs using an FtsK-like Tra bacterial partners. Conversely, genomes niches to diverse to allow detection of Do protein. With one exception found in a of bacteria isolated from dairy products, such events. Bifidobacterium strain, all of these AICEs animals, human or insects rarely harbor Integration and excision of the pre- are exclusively detected in genomes of multiple AICEs, likely reflecting the dicted AICEs seem to rely primarily (87%) members of the Actinomycetales order specialization of these more insulated, less on integrases of the tyrosine recombinase (Fig. 2). The apparent absence of AICEs diverse microbial floras. family. As often reported for other MGEs, in the other actinobacteria subclasses predicted AICEs coding for a tyrosine (Acidimicrobidae, Coriobacteridae and AICEs Canonical Proteins recombinase often integrate into the 3' end Rubrobacteridae) is intriguing and tends of a tRNA gene (73%). Tyrosine integrase

to justify their classification as AICEs. Analysis of the relationships between AICEs for which the integration site is not Bioscience. The quasi-absence of FtsK-like Tra-based the putative Int, Rep and FtsK-like Tra a tRNA gene mostly belong to the pSLS clade, one of eight novel tyrosine integrase subfamilies. AICEs coding for a serine recombinase are rather uncommon (13%). Notably, a cluster of 8 closely related

serine integrases seem to catalyze AICE Landes integration into a distinct and unique tRNA Leu gene in several species of Mycobacterium. To date, only a few examples of MGEs, mostly bacteriophages from Mycobacteria, have been shown to use a serine integrase for integration into ©2012 or near tRNA genes.14,15 Rolling circle replication (RCR) was found to be necessary for successful conjugative transfer of pSAM2, the first described AICE (RepSA replication pro- 11,16 Figure 2. Taxonomic distribution of AICEs. Percent of the 140 AICEs detected in complete (130) and tein). Replication is also known to draft (145) actinobacterial genomes is indicated for each clade. Numbers in parentheses represent occur for pMEA300 (RepAM replication the number of genomes analyzed for each clade. The clades for which no AICEs could be found protein).17 Consequently, a replication are not shown [Acidimicrobidae (1) Coriobacteridae (15), Rubrobacteridae (2), Glycomycineae (1), initiator gene is considered as an essential Kineosporiineae (1), unclassified actinobacteria (1)]. The four AICEs detected in the analysis of 176 component of a canonical AICE. Indeed actinobacterial plasmids are not included. RepSA proteins are the most prevalent

www.landesbioscience.com Mobile Genetic Elements 121 RCR proteins found in AICEs (69.4%). islands and plasmids. Acquisition of chro- of a 71-kb circular molecule (data not Phylogenetic analysis of these proteins mosomal genes, not identified as being shown).13 Interestingly, two actinobacterial reveals two RepSA subfamilies, RepSASAM2 part of any self-transmissible MGE, was T4SS ICEs, Intca3128 and Nbcg01645, and RepSAMR2. Interestingly, sequence shown to depend upon the presence in the encode tyrosine integrases that cluster comparison of the three conserved amino donor cell of the Tra proteins encoded by with those of AICEs pMR2 and pSLS, acid motifs suggests that while all RepSA Streptomyces lividans conjugative plasmid respectively (Fig. 3). This suggests that proteins likely catalyze the nick formation pIJ101.20 Recent work from Vogelmann recombination between ancestral AICEs at the double strand origin (dso) using et al. suggests that such events could result and T4SS ICEs occurred, leading to the same mechanism, RepSASAM2 and from the action of trans-encoded Tra exchange of functional modules. RepSAMR2 likely exhibit significant dis- proteins binding clt-like chromosomal similarities for dso recognition.13,18,19 sequences (clcs).4 Based on this observa- Auxiliary Functions Encoded RepAM proteins are encoded by a mino- tion, mobilization of genomic islands or by T4SS-Based ICEs rity (11.8%) of AICEs. Furthermore, plasmids containing clcs by an AICE or from Actinobacteria genes encoding a Prim-pol domain-protein conjugative plasmid coding for a com- are often located immediately upstream of patible Tra protein is plausible. Con- Besides genes required for their own a repAM gene. Prim-pol domain proteins ceptually related mobilizable genomic mobility, T4SS-based ICEs from actino- are likely involved in replication by acting islands (MGIs) were recently characterized bacteria also carry numerous “cargo” genes as primases. As a consequence, the pre- in c-proteobacteria.21 These MGIs rely on encoding a variety of putative functions sence of Prim-pol genes can also be used the recognition of their oriT by DNA identified using HMMsearch23 against all distribute. to detect new classes of AICEs relying on processing enzymes (relaxase and auxiliary Pfam-A families from Pfam 26.0 data- replication proteins different from RepSA proteins forming the relaxosome) of ICEs base.24 The “cargo” genes of seven of the and RepAM. Out of 40 AICEs bearing of the SXT/R391 family for their own putative T4SS ICEs, can be classified into not a Prim-Pol gene, only 12 also bear an transfer, in addition to the ICE encoded three groups based on their predicted

adjacent repAM gene, while 16 AICEs T4CP and T4SS conjugative machinery. functions: (1) genes coding for proteins Do carry a SCO4618-like gene instead, involved in cell wall metabolism, (2) genes which could also be involved in replica- T4SS-Based ICEs in coding for adaptive functions and (3) tion. The adjacent genes in the 12 other Actinobacteria genes coding for proteins involved in the AICEs do not code for known RCR translocation of a variety of molecules replication proteins and do not form a While ICEs relying on a T4SS-type DNA across the bacterial cell wall (Table S1). homogenous group based on sequence translocation machinery for their conjuga- The first group includes genes coding comparison. Nevertheless, solely based on tive transfer are widely distributed in for proteins involved in the cell wall the location of these genes, several of them Gram-negative bacteria and, to a lesser degradation such as putative CHAPS

could carry out replication-associated extent in Firmicutes, they appear to be domain protein, bacteriophage peptidogly- Bioscience. functions. scarce in actinobacteria.13,22 This discrep- can hydrolase and other members of the By definition, all AICEs encode a ancy could be genuine. Conversely, it amidase family. Several genes coding for conserved FtsK-domain family transfer could result from the inherent and inevit- proteins involved in cell wall biosynthesis protein (Tra). Tra proteins are thought able bias in the predictions introduced by such as putative LysM domain protein, to be the main and sole protein required using protein models based on the proteins GtrA family protein and glycosyl trans-

for double-stranded DNA intermycelial of the widely studied prototypical con- ferase family 2 proteins are also predicted. Landes transfer of AICEs (TraSA) and Strepto- jugative plasmids found in proteobacteria. Genes associated with adaptive functions myces conjugative plasmids (TraB). A Nonetheless, actinobacteria are predicted include heavy metals resistance (putative phylogenetic analysis of AICE Tra proteins to contain at least 17 putative T4SS- chromate resistance protein and chromate reveals that they group into six subfamilies based ICEs.13 These elements were pre- ion transporter CHR family), type II and with only one containing also a TraB dicted in silico by seeking co-occurrences type III DNA restriction-modification protein. While TraB protein of plasmid of genes coding for an integrase (Int), a systems, and antibiotic synthesis (amino- ©2012 pJVI clusters within the TraSLP1 subfamily, TrwC-like relaxase (MOB), a VirD4-like transferase from the DegT/DnrJ/EryC1/ which contains almost exclusively AICEs, coupling protein (T4CP) and a VirB4- StrS family).25 T4SS-based ICEs from it only shares 76% identity with its closest like T4SS component. attL and attR actinobacteria also carry genes involved in AICE relative encoded by pSLS. There- attachment sites were predicted for seven cell persistence and/or plasmid stabiliza- fore, exclusively based on the comparison out of the 17 ICEs. Most of these T4SS tion systems such as toxin-antitoxin and of their Tra proteins, emergence of new ICEs are site specifically integrated into plasmid DNA partitioning loci. AICEs through recombination events with the 3' end of tRNA-encoding genes, with To the third group appertain genes Streptomyces conjugative plasmids seems the exception of Nbcg01645 and Fcci3350. coding for secretion and transport functions very unlikely. The excision of Fcci3350 T4SS-based such as type II secretion system and proteins Tra proteins could potentially promote ICE from Frankia sp Cci3 was confirmed of major facilitator superfamily and ATP- the dissemination of unrelated genomic by nested PCR resulting in the formation binding cassette (ABC) transporters that

122 Mobile Genetic Elements Volume 2 Issue 2 domain and a central or C-terminal DNA helicase domain. Given that the helicase domain appears to be dispensable in some instances, only the TrwC Pfam HMM protein profile responsible for the MOBF relaxases activity was used. This illustrates well some of the limitations of HMM protein profile-based studies. Protein profiles models are chosen based on assumptions of the conservation of known and previously predicted proteins. HMM protein profile-based analyses are therefore prone to the introduction of a bias, although it represents a major improvement compared with single- protein based alignment (BLAST). Hence, despite recent extensive in silico analyses designed to identify ICEs relying on either T4SS13,22 or FtsK-like13 con- distribute. jugative machineries, entire new families of more exotic types of conjugative elements could have been easily over- not looked. As an example, genomes of non-

actinomycete actinobacteria could host Do FtsK-like ICEs relying on more distant and/or unrelated proteins for their repli- cation. In fact, an AICE-related element has been predicted in a strain of Figure 3. Phylogenetic analysis of the actinobacterial T4SS-type ICE tyrosine integrases. Bifidobacterium longum by using replica- The phylogenetic relatedness of the tyrosine integrases encoded by 17 T4SS-type ICEs (black) with those of the 9 AICEs tyrosine-integrase subfamilies (green) is represented. For simplification, tion protein profiles not usually associated 13 only the 9 proteins encoded by the AICEs of each eponymous subfamily were used for this analysis. with AICEs. While predicted to encode The relatedness of the tyrosine integrases of Intca3128 and Nbcg01645 with those of pMR2 and a tyrosine int gene and an FtsK/SpoIIIE pSLS, respectively, is emphasized (gray shading).

tra gene, this AICE-related element would Bioscience. replicate by means of a rep2-type replica- tion protein. transport a wide variety of substrates such was retrieved by Guglielmini and coworkers. Furthermore, most if not all ICEs as ions, sugars, lipids, sterols, peptides, Differences in the protein profiles used in characterized to date seem to be exclusively proteins and drugs across the biological the two studies most likely explain this relying on a tyrosine or, more scarcely,

membrane. observation. On one hand, to have an serine recombinase, to promote their Landes insight on the prevalence of T4SS-like integration into and excision from a Other Conjugative MGEs and ICEs in actinobacteria only the MOBF replicon. Yet, several examples of trans- Related Elements to be Found? relaxase family was considered given that posons, genomic islands and even viruses to date, the vast majority of MOB use DDE transposases instead to mediate Recently, T4SS-based ICEs and related relaxases associated with conjugative plas- similar events.27 One can wonder why elements were also more extensively mids in actinobacteria belong to the such enzymes are not more frequently ©2012 sought for in all prokaryotes by MOBF family and very few to the found associated with maintenance and 22 13 Guglielmini et al. The predictions of MOBQ family. On the other hand, all mobility of ICEs. It is possible that DDE this exhaustive study based on protein six MOB relaxase families26 were con- recombinases are not as well suited as HMM profiles, built using proteins sidered to identify ICEs in all prokaryo- tyrosine recombinases to maintain the 22 encoded by known conjugative plasmids, tes. However, the 17 MOBF encoding integrity of large MGEs. It is also possible led to the identification of 335 T4SS- ICEs found in actinobacteria were not that the combination of a tyrosine recom- based ICEs in prokaryotes. Interestingly, identified by this latter study presumably binase and of a recombination directiona- the elements identified in actinobacteria because of the use of a more stringent lity factor (Int/Xis pair) is better suited to in that study were not identified by our protein profile for this family. Relaxases finely tune the integration into and the analyses. Conversely, none of the putative belonging to the MOBF family typically excision from the chromosome of large T4SS-based ICEs identified in our study contain an N-terminal TrwC relaxase DNA molecules.

www.landesbioscience.com Mobile Genetic Elements 123 It is tempting to speculate that many Predictions of ICEs and related Acknowledgments more elements related to ICEs of both elements in actinobacteria have enabled We are grateful to Nicolas Carraro for superfamilies are yet unidentified. As an to gain better insights on the distri- helpful comments on the manuscript. V.B. example, further examination of Frankia bution, diversity and evolution of these holds a Canada Research Chair in Bacterial genomes reveals putative elements lacking MGE. Interestingly, ICEs encoding Molecular Genetics. E.B. is the recipient one of the AICE canonical components. two mechanistically different DNA of an Alexander Graham Bell Canada These elements could be AICEs remnants translocation machineries are present in Graduate Scholarship from NSERC. subject to genetic decay or more interest- actinobacteria. However, despite the ingly, be genuine MGEs, which could apparent simplicity of their “conjugative Supplemental Material depend on interactions with proteins apparatus,” AICEs seem to have the Supplemental material may be found here: encoded by other self-transmissible mobile biggest share of the gene trade in www.landesbioscience.com/journals/mge/ elements. actinomycetes. article/20498

References 12. te Poele EM, Bolhuis H, Dijkhuizen L. Actinomycete 20. Pettis GS, Cohen SN. Transfer of the plJ101 plasmid integrative and conjugative elements. Antonie Van in Streptomyces lividans requires a cis-acting function 1. Burrus V, Pavlovic G, Decaris B, Guédon G. Leeuwenhoek 2008; 94:127-43; PMID:18523858; dispensable for chromosomal gene transfer. Mol Conjugative transposons: the tip of the iceberg. Mol http://dx.doi.org/10.1007/s10482-008-9255-x Microbiol 1994; 13:955-64; PMID:7854128; http:// Microbiol 2002; 46:601-10; PMID:12410819; http:// 13. Ghinet MG, Bordeleau E, Beaudin J, Brzezinski R, dx.doi.org/10.1111/j.1365-2958.1994.tb00487.x dx.doi.org/10.1046/j.1365-2958.2002.03191.x Roy S, Burrus V. Uncovering the prevalence and 21. Daccord A, Ceccarelli D, Burrus V. Integrating 2. Wozniak RA, Waldor MK. Integrative and conjugative diversity of integrating conjugative elements in actino- conjugative elements of the SXT/R391 family trigger elements: mosaic mobile genetic elements enabling bacteria. PLoS One 2011; 6:e27846; PMID: the excision and drive the mobilization of a new class of distribute. dynamic lateral gene flow. Nat Rev Microbiol 2010; 22114709; http://dx.doi.org/10.1371/journal.pone. Vibrio genomic islands. Mol Microbiol 2010; 78:576- 8:552-63; PMID:20601965; http://dx.doi.org/10. 0027846 88; PMID:20807202; http://dx.doi.org/10.1111/j. 1038/nrmicro2382 14. Askora A, Kawasaki T, Fujie M, Yamada T. Resolvase- 1365-2958.2010.07364.x

3. de la Cruz F, Frost LS, Meyer RJ, Zechner EL. like serine recombinase mediates integration/excision 22. Guglielmini J, Quintais L, Garcillán-Barcia MP, de la not Conjugative DNA metabolism in Gram-negative in the bacteriophage φRSM. J Biosci Bioeng 2011; Cruz F, Rocha EP. The repertoire of ICE in bacteria. FEMS Microbiol Rev 2010; 34:18-40; PMID: 111:109-16; PMID:21035394; http://dx.doi.org/10. prokaryotes underscores the unity, diversity, and 19919603; http://dx.doi.org/10.1111/j.1574-6976. 1016/j.jbiosc.2010.10.001 ubiquity of conjugation. PLoS Genet 2011; 7: Do 2009.00195.x 15. Pope WH, Jacobs-Sera D, Russell DA, Peebles CL, Al- e1002222; PMID:21876676; http://dx.doi.org/10. 4. Vogelmann J, Ammelburg M, Finger C, Guezguez J, Atrache Z, Alcoser TA, et al. Expanding the diversity of 1371/journal.pgen.1002222 Linke D, Flötenmeyer M, et al. Conjugal plasmid mycobacteriophages: insights into genome architecture 23. Eddy SR. A new generation of homology search tools transfer in Streptomyces resembles bacterial chromosome and evolution. PLoS One 2011; 6:e16329; PMID: based on probabilistic inference. Genome Inform 2009; segregation by FtsK/SpoIIIE. EMBO J 2011; 30:2246- 21298013; http://dx.doi.org/10.1371/journal.pone. 23:205-11; PMID:20180275; http://dx.doi.org/10. 54; PMID:21505418; http://dx.doi.org/10.1038/ 0016329 1142/9781848165632_0019 emboj.2011.121 16. Hagège J, Boccard F, Smokvina T, Pernodet JL, 24. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington 5. Alvarez-Martinez CE, Christie PJ. Biological diversity Friedmann A, Guérineau M. Identification of a gene JE, et al. The Pfam protein families database. Nucleic of prokaryotic type IV secretion systems. Microbiol encoding the replication initiator protein of the Acids Res 2010; 38(Database issue):D211-22; PMID: Mol Biol Rev 2009; 73:775-808; PMID:19946141; Streptomyces integrating element, pSAM2. Plasmid 19920124; http://dx.doi.org/10.1093/nar/gkp985 http://dx.doi.org/10.1128/MMBR.00023-09 1994; 31:166-83; PMID:8029324; http://dx.doi.org/ 25. Ahlert J, Distler J, Mansouri K, Piepersberg W. 6. Fronzes R, Christie PJ, Waksman G. The structural 10.1006/plas.1994.1018 Identification of stsC,thegeneencodingthe

biology of type IV secretion systems. Nat Rev 17. te Poele EM, Kloosterman H, Hessels GI, Bolhuis H, L-glutamine:scyllo-inosose aminotransferase from strep- Bioscience. Microbiol 2009; 7:703-14; PMID:19756009; http:// Dijkhuizen L. RepAM of the Amycolatopsis methanolica tomycin-producing Streptomycetes. Arch Microbiol dx.doi.org/10.1038/nrmicro2218 integrative element pMEA300 belongs to a novel class 1997; 168:102-13; PMID:9238101; http://dx.doi. 7. Smillie C, Garcillán-Barcia MP, Francia MV, Rocha of replication initiator proteins. Microbiology 2006; org/10.1007/s002030050475 EP, de la Cruz F. Mobility of plasmids. Microbiol Mol 152:2943-50; PMID:17005975; http://dx.doi.org/10. 26. Garcillán-Barcia MP, Francia MV, de la Cruz F. The Biol Rev 2010; 74:434-52; PMID:20805406; http:// 1099/mic.0.28746-0 diversity of conjugative relaxases and its application dx.doi.org/10.1128/MMBR.00020-10 18. Orozco BM, Gladfelter HJ, Settlage SB, Eagle PA, in plasmid classification. FEMS Microbiol Rev 2009; 8. Biller SJ, Burkholder WF. The Bacillus subtilis SftA Gentry RN, Hanley-Bowdoin L. Multiple cis elements 33:657-87; PMID:19396961; http://dx.doi.org/10. (YtpS) and SpoIIIE DNA translocases play distinct

contribute to geminivirus origin function. Virology 1111/j.1574-6976.2009.00168.x Landes roles in growing cells to ensure faithful chromosome 1998; 242:346-56; PMID:9514968; http://dx.doi.org/ 27. Nesmelova IV, Hackett PB. DDE transposases: Struc- partitioning. Mol Microbiol 2009; 74:790-809; 10.1006/viro.1997.9013 tural similarity and diversity. Adv Drug Deliv Rev PMID:19788545; http://dx.doi.org/10.1111/j.1365- 19. Orozco BM, Hanley-Bowdoin L. Conserved sequence 2010; 62:1187-95; PMID:20615441; http://dx.doi. 2958.2009.06893.x and structural motifs contribute to the DNA binding org/10.1016/j.addr.2010.06.006 9. Kaimer C, González-Pastor JE, Graumann PL. SpoIIIE and cleavage activities of a geminivirus replication and a novel type of DNA translocase, SftA, couple protein. J Biol Chem 1998; 273:24448-56; PMID: chromosome segregation with cell division in Bacillus 9733736; http://dx.doi.org/10.1074/jbc.273.38.24448

subtilis. Mol Microbiol 2009; 74:810-25; PMID: ©2012 19818024; http://dx.doi.org/10.1111/j.1365-2958. 2009.06894.x 10. Begg KJ, Dewar SJ, Donachie WD. A new Escherichia coli cell division gene, ftsK. J Bacteriol 1995; 177:6211- 22; PMID:7592387 11. Hagège J, Pernodet JL, Friedmann A, Guérineau M. Mode and origin of replication of pSAM2, a conjugative integrating element of Streptomyces ambo- faciens. Mol Microbiol 1993; 10:799-812; PMID: 7934842; http://dx.doi.org/10.1111/j.1365-2958. 1993.tb00950.x

124 Mobile Genetic Elements Volume 2 Issue 2 BIBLIOGRAPHIE

Aas, F.E., Lovold, C., et Koomey, M. (2002a). An inhibitor of DNA binding and uptake events dictates the proficiency of genetic transformation in Neisseria gonorrhoeae: mechanism of action and links to Type IV pilus expression. Mol Microbiol 46, 1441-1450.

Aas, F.E., Wolfgang, M., Frye, S., Dunham, S., Lovold, C., et Koomey, M. (2002b). Competence for natural transformation in Neisseria gonorrhoeae: components of DNA binding and uptake linked to type IV pilus expression. Mol Microbiol 46, 749-760.

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., et Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389-3402.

Amikam, D., et Galperin, M.Y. (2006). PilZ domain is part of the bacterial c-di-GMP binding protein. Bioinformatics 22, 3-6.

Aubry, A., Hussack, G., Chen, W., KuoLee, R., Twine, S.M., Fulton, K.M., Foote, S., Carrillo, C.D., Tanha, J., et Logan, S.M. (2012). Modulation of toxin production by the flagellar regulon in Clostridium difficile. Infect Immun 80, 3521-3532.

Ayers, M., Howell, P.L., et Burrows, L.L. (2010). Architecture of the type II secretion and type IV pilus machineries. Future Microbiol 5, 1203-1218.

Baban, S.T., Kuehne, S.A., Barketi-Klai, A., Cartman, S.T., Kelly, M.L., Hardie, K.R., Kansau, I., Collignon, A., et Minton, N.P. (2013). The role of flagella in Clostridium difficile pathogenesis: comparison between a non-epidemic and an epidemic strain. PLoS One 8, e73026.

Bakken, J.S. (2009). Fecal bacteriotherapy for recurrent Clostridium difficile infection. Anaerobe 15, 285-289.

Bakker, D., Smits, W.K., Kuijper, E.J., et Corver, J. (2012). TcdC does not significantly repress toxin expression in Clostridium difficile 630Erm. PLoS One 7, e43247.

Barketi-Klai, A., Hoys, S., Lambert-Bordes, S., Collignon, A., et Kansau, I. (2011). Role of fibronectin-binding protein A in Clostridium difficile intestinal colonization. J Med Microbiol 60, 1155-1161.

210 Barth, H., Aktories, K., Popoff, M.R., et Stiles, B.G. (2004). Binary bacterial toxins: biochemistry, biology, and applications of common Clostridium and Bacillus proteins. Microbiol Mol Biol Rev 68, 373-402.

Bartlett, J.G., Chang, T.W., Gurwith, M., Gorbach, S.L., et Onderdonk, A.B. (1978). Antibiotic-associated pseudomembranous colitis due to toxin-producing clostridia. N Engl J Med 298, 531-534.

Benach, J., Swaminathan, S.S., Tamayo, R., Handelman, S.K., Folta-Stogniew, E., Ramos, J.E., Forouhar, F., Neely, H., Seetharaman, J., Camilli, A., et al. (2007). The structural basis of cyclic diguanylate signal transduction by PilZ domains. EMBO J 26, 5153-5166.

Benson, D.A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., et Sayers, E.W. (2013). GenBank. Nucleic Acids Res 41, D36-42.

Biswas, G.D., Sox, T., Blackman, E., et Sparling, P.F. (1977). Factors affecting genetic transformation of Neisseria gonorrhoeae. J Bacteriol 129, 983-992.

Boehm, A., Kaiser, M., Li, H., Spangler, C., Kasper, C.A., Ackermann, M., Kaever, V., Sourjik, V., Roth, V., et Jenal, U. (2010). Second messenger-mediated adjustment of bacterial swimming velocity. Cell 141, 107-116.

Bordeleau, E., Brouillette, E., Robichaud, N., et Burrus, V. (2010). Beyond antibiotic resistance: integrating conjugative elements of the SXT/R391 family that encode novel diguanylate cyclases participate to c-di-GMP signalling in Vibrio cholerae. Environ Microbiol 12, 510-523.

Bradley, D.E. (1980). A function of Pseudomonas aeruginosa PAO polar pili: twitching motility. Can J Microbiol 26, 146-154.

Buchan, D.W., Ward, S.M., Lobley, A.E., Nugent, T.C., Bryson, K., et Jones, D.T. (2010). Protein annotation and modelling servers at University College London. Nucleic Acids Res 38, W563-568.

Burdette, D.L., Monroe, K.M., Sotelo-Troha, K., Iwig, J.S., Eckert, B., Hyodo, M., Hayakawa, Y., et Vance, R.E. (2011). STING is a direct innate immune sensor of cyclic di-GMP. Nature 478, 515-518.

Burge, S.W., Daub, J., Eberhardt, R., Tate, J., Barquist, L., Nawrocki, E.P., Eddy, S.R., Gardner, P.P., et Bateman, A. (2013). Rfam 11.0: 10 years of RNA families. Nucleic Acids Res 41, D226-232.

211 Burrus, V., Pavlovic, G., Decaris, B., et Guedon, G. (2002). Conjugative transposons: the tip of the iceberg. Mol Microbiol 46, 601-610.

Cafardi, V., Biagini, M., Martinelli, M., Leuzzi, R., Rubino, J.T., Cantini, F., Norais, N., Scarselli, M., Serruto, D., et Unnikrishnan, M. (2013). Identification of a novel zinc metalloprotease through a global analysis of Clostridium difficile extracellular proteins. PLoS One 8, e81306.

Calabi, E., Ward, S., Wren, B., Paxton, T., Panico, M., Morris, H., Dell, A., Dougan, G., et Fairweather, N. (2001). Molecular characterization of the surface layer proteins from Clostridium difficile. Mol Microbiol 40, 1187-1199.

Cammarota, G., Ianiro, G., et Gasbarrini, A. (2014). Fecal Microbiota Transplantation for the Treatment of Clostridium difficile Infection: A Systematic Review. J Clin Gastroenterol (sous- presse, DOI: 10.1097/MCG.0000000000000046).

Carpousis, A.J. (2007). The RNA degradosome of Escherichia coli: an mRNA-degrading machine assembled on RNase E. Annu Rev Microbiol 61, 71-87.

Cartman, S.T., Kelly, M.L., Heeg, D., Heap, J.T., et Minton, N.P. (2012). Precise manipulation of the Clostridium difficile chromosome reveals a lack of association between the tcdC genotype and toxin production. Appl Environ Microbiol 78, 4683-4690.

Chen, A.G., Sudarsan, N., et Breaker, R.R. (2011). Mechanism for gene control by a natural allosteric group I ribozyme. RNA 17, 1967-1972.

Chen, I., et Dubnau, D. (2004). DNA uptake during bacterial transformation. Nat Rev Microbiol 2, 241-249.

Chen, Z.H., et Schaap, P. (2012). The prokaryote messenger c-di-GMP triggers stalk cell differentiation in Dictyostelium. Nature 488, 680-683.

Chin, K.H., Lee, Y.C., Tu, Z.L., Chen, C.H., Tseng, Y.H., Yang, J.M., Ryan, R.P., McCarthy, Y., Dow, J.M., Wang, A.H., et al. (2010). The cAMP receptor-like protein CLP is a novel c- di-GMP receptor linking cell-cell signaling to virulence gene expression in Xanthomonas campestris. J Mol Biol 396, 646-662.

Christen, B., Christen, M., Paul, R., Schmid, F., Folcher, M., Jenoe, P., Meuwly, M., et Jenal, U. (2006). Allosteric control of cyclic di-GMP signaling. J Biol Chem 281, 32015-32024.

212 Christen, M., Christen, B., Folcher, M., Schauerte, A., et Jenal, U. (2005). Identification and characterization of a cyclic di-GMP-specific phosphodiesterase and its allosteric control by GTP. J Biol Chem 280, 30829-30837.

Coppins, R.L., Hall, K.B., et Groisman, E.A. (2007). The intricate world of riboswitches. Curr Opin Microbiol 10, 176-181.

Cournac, A., et Plumbridge, J. (2013). DNA looping in prokaryotes: experimental and theoretical approaches. J Bacteriol 195, 1109-1119.

Craig, L., et Li, J. (2008). Type IV pili: paradoxes in form and function. Curr Opin Struct Biol 18, 267-277.

Craig, L., Pique, M.E., et Tainer, J.A. (2004). Type IV pilus structure and bacterial pathogenicity. Nat Rev Microbiol 2, 363-378.

Curtis, P.D., et Brun, Y.V. (2010). Getting in the loop: regulation of development in Caulobacter crescentus. Microbiol Mol Biol Rev 74, 13-41.

Dallal, R.M., Harbrecht, B.G., Boujoukas, A.J., Sirio, C.A., Farkas, L.M., Lee, K.K., et Simmons, R.L. (2002). Fulminant Clostridium difficile: an underappreciated and increasing cause of death and complications. Ann Surg 235, 363-372.

Dawson, L.F., Valiente, E., Faulds-Pain, A., Donahue, E.H., et Wren, B.W. (2012). Characterisation of Clostridium difficile biofilm formation, a role for Spo0A. PLoS One 7, e50527.

Dineen, S.S., McBride, S.M., et Sonenshein, A.L. (2010). Integration of metabolism and virulence by Clostridium difficile CodY. J Bacteriol 192, 5350-5362.

Dineen, S.S., Villapakkam, A.C., Nordman, J.T., et Sonenshein, A.L. (2007). Repression of Clostridium difficile toxin gene expression by CodY. Mol Microbiol 66, 206-219.

Dingle, T.C., Mulvey, G.L., et Armstrong, G.D. (2011). Mutagenic analysis of the Clostridium difficile flagellar proteins, FliC and FliD, and their contribution to virulence in hamsters. Infect Immun 79, 4061-4067.

Douzi, B., Filloux, A., et Voulhoux, R. (2012). On the path to uncover the bacterial type II secretion system. Philosophical transactions of the Royal Society of London Series B, Biological sciences 367, 1059-1072.

213 Dubensky, T.W., Jr., Kanne, D.B., et Leong, M.L. (2013). Rationale, progress and development of vaccines utilizing STING-activating cyclic dinucleotide adjuvants. Ther Adv Vaccines 1, 131-143.

Duerig, A., Abel, S., Folcher, M., Nicollier, M., Schwede, T., Amiot, N., Giese, B., et Jenal, U. (2009). Second messenger-mediated spatiotemporal control of protein degradation regulates bacterial cell cycle progression. Genes Dev 23, 93-104.

Duguid, J.P., et Anderson, E.S. (1967). Terminology of bacterial fimbriae, or pili, and their types. Nature 215, 89-90.

Dupuy, B., Govind, R., , A., et Matamouros, S. (2008). Clostridium difficile toxin synthesis is negatively regulated by TcdC. J Med Microbiol 57, 685-689.

El Meouche, I., Peltier, J., Monot, M., Soutourina, O., Pestel-Caron, M., Dupuy, B., et Pons, J.L. (2013). Characterization of the SigD regulon of C. difficile and its positive control of toxin production through the regulation of tcdR. PLoS One 8, e83748.

Emerson, J.E., Stabler, R.A., Wren, B.W., et Fairweather, N.F. (2008). Microarray analysis of the transcriptional responses of Clostridium difficile to environmental and antibiotic stress. J Med Microbiol 57, 757-764.

Ethapa, T., Leuzzi, R., Ng, Y.K., Baban, S.T., Adamo, R., Kuehne, S.A., Scarselli, M., Minton, N.P., Serruto, D., et Unnikrishnan, M. (2013). Multiple factors modulate biofilm formation by the anaerobic pathogen Clostridium difficile. J Bacteriol 195, 545-555.

Fang, X., et Gomelsky, M. (2010). A post-translational, c-di-GMP-dependent mechanism regulating flagellar motility. Mol Microbiol 76, 1295-1305.

Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., et al. (2010). The Pfam protein families database. Nucleic Acids Res 38, D211-222.

Fouhy, Y., Lucey, J.F., Ryan, R.P., et Dow, J.M. (2006). Cell-cell signaling, cyclic di-GMP turnover and regulation of virulence in Xanthomonas campestris. Res Microbiol 157, 899-904.

Franklin, M.J., Nivens, D.E., Weadge, J.T., et Howell, P.L. (2011). Biosynthesis of the Pseudomonas aeruginosa Extracellular Polysaccharides, Alginate, Pel, and Psl. Front Microbiol 2, 167.

214 Fronzes, R., Remaut, H., et Waksman, G. (2008). Architectures and biogenesis of non- flagellar protein appendages in Gram-negative bacteria. Embo J 27, 2271-2280.

Galperin, M.Y., Natale, D.A., Aravind, L., et Koonin, E.V. (1999). A specialized version of the HD hydrolase domain implicated in signal transduction. J Mol Microbiol Biotechnol 1, 303-305.

Gao, R., et Stock, A.M. (2009). Biological insights from structures of two-component proteins. Annu Rev Microbiol 63, 133-154.

Garst, A.D., et Batey, R.T. (2009). A switch in time: detailing the life of a riboswitch. Biochim Biophys Acta 1789, 584-591.

Giel, J.L., Sorg, J.A., Sonenshein, A.L., et Zhu, J. (2010). Metabolism of bile salts in mice influences spore germination in Clostridium difficile. PLoS One 5, e8740.

Giltner, C.L., Habash, M., et Burrows, L.L. (2010). Pseudomonas aeruginosa minor pilins are incorporated into type IV pili. J Mol Biol 398, 444-461.

Girard, A.E., et Jacius, B.H. (1974). Ultrastructure of Actinomyces viscosus and Actinomyces naeslundii. Arch Oral Biol 19, 71-79.

Gough, J., Karplus, K., Hughey, R., et Chothia, C. (2001). Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol 313, 903-919.

Govind, R., et Dupuy, B. (2012). Secretion of Clostridium difficile toxins A and B requires the holin-like protein TcdE. PLoS pathogens 8, e1002727.

Gupta, A., et Khanna, S. (2014). Community-acquired infection: an increasing public health threat. Infec Drug Resist 7, 63-72.

Hamilton, H.L., et Dillard, J.P. (2006). Natural transformation of Neisseria gonorrhoeae: from DNA donation to homologous recombination. Mol Microbiol 59, 376-385.

Heasman, S.J., et Ridley, A.J. (2008). Mammalian Rho GTPases: new insights into their functions from in vivo studies. Nat Rev Mol Cell Biol 9, 690-701.

Hecht, G.B., et Newton, A. (1995). Identification of a novel response regulator required for the swarmer-to-stalked-cell transition in Caulobacter crescentus. J Bacteriol 177, 6223-6229.

215 Helaine, S., Dyer, D.H., Nassif, X., Pelicic, V., et Forest, K.T. (2007). 3D structure/function analysis of PilX reveals how minor pilins can modulate the virulence properties of type IV pili. Proc Natl Acad Sci U S A 104, 15888-15893.

Hendrickx, A.P., Budzik, J.M., Oh, S.Y., et Schneewind, O. (2011). Architects at the bacterial surface - sortases and the assembly of pili with isopeptide bonds. Nat Rev Microbiol 9, 166- 176.

Hengge, R. (2009). Principles of c-di-GMP signalling in bacteria. Nat Rev Microbiol 7, 263- 273.

Hengge, R. (2010). Cyclic-di-GMP reaches out into the bacterial RNA world. Sci Signal 3, pe44.

Hennequin, C., Janoir, C., Barc, M.C., Collignon, A., et Karjalainen, T. (2003). Identification and characterization of a fibronectin-binding protein from Clostridium difficile. Microbiology 149, 2779-2787.

Henrichsen, J. (1972). Bacterial surface translocation: a survey and a classification. Bacteriol Rev 36, 478-503.

Henry, J.T., et Crosson, S. (2011). Ligand-binding PAS domains in a genomic, cellular, and structural context. Annu Rev Microbiol 65, 261-286.

Hensbergen, P.J., Klychnikov, O.I., Bakker, D., van Winden, V.J., Ras, N., Kemp, A.C., Cordfunke, R.A., Dragan, I., Deelder, A.M., Kuijper, E.J., et al. (2014). A novel secreted metalloprotease (CD2830) from Clostridium difficile cleaves specific proline sequences in LPXTG cell surface proteins. Mol Cell Proteomics 13, 1231-1244.

Hickman, J.W., et Harwood, C.S. (2008). Identification of FleQ from Pseudomonas aeruginosa as a c-di-GMP-responsive transcription factor. Mol Microbiol 69, 376-389.

Hickman, J.W., Tifrea, D.F., et Harwood, C.S. (2005). A chemosensory system that regulates biofilm formation through modulation of cyclic diguanylate levels. Proc Natl Acad Sci U S A 102, 14422-14427.

Hisert, K.B., MacCoss, M., Shiloh, M.U., Darwin, K.H., Singh, S., Jones, R.A., Ehrt, S., Zhang, Z., Gaffney, B.L., Gandotra, S., et al. (2005). A glutamate-alanine-leucine (EAL) domain protein of Salmonella controls bacterial survival in mice, antioxidant defence and killing of macrophages: role of cyclic diGMP. Mol Microbiol 56, 1234-1245.

216 Holland, L.M., O'Donnell, S.T., Ryjenkov, D.A., Gomelsky, L., Slater, S.R., Fey, P.D., Gomelsky, M., et O'Gara, J.P. (2008). A staphylococcal GGDEF domain protein regulates biofilm formation independently of cyclic dimeric GMP. J Bacteriol 190, 5178-5189.

Huang, Y., Huang, J., et Chen, Y. (2010). Alpha-helical cationic antimicrobial peptides: relationships of structure and function. Protein & cell 1, 143-152.

Hulo, N., Bairoch, A., Bulliard, V., Cerutti, L., De Castro, E., Langendijk-Genevaux, P.S., Pagni, M., et Sigrist, C.J. (2006). The PROSITE database. Nucleic Acids Res 34, D227-230.

Janoir, C., Pechine, S., Grosdidier, C., et Collignon, A. (2007). Cwp84, a surface-associated protein of Clostridium difficile, is a cysteine protease with degrading activity on extracellular matrix proteins. J Bacteriol 189, 7174-7180.

Johnson, T.L., Abendroth, J., Hol, W.G., et Sandkvist, M. (2006). Type II secretion: from structure to function. FEMS Microbiol Lett 255, 175-186.

Just, I., Selzer, J., Wilm, M., von Eichel-Streiber, C., Mann, M., et Aktories, K. (1995a). Glucosylation of Rho proteins by Clostridium difficile toxin B. Nature 375, 500-503.

Just, I., Wilm, M., Selzer, J., Rex, G., von Eichel-Streiber, C., Mann, M., et Aktories, K. (1995b). The enterotoxin from Clostridium difficile (ToxA) monoglucosylates the Rho proteins. J Biol Chem 270, 13932-13936.

Karaolis, D.K., Means, T.K., Yang, D., Takahashi, M., Yoshimura, T., Muraille, E., Philpott, D., Schroeder, J.T., Hyodo, M., Hayakawa, Y., et al. (2007). Bacterial c-di-GMP is an immunostimulatory molecule. J Immunol 178, 2171-2181.

Karjalainen, T., Waligora-Dupriet, A.J., Cerquetti, M., Spigaglia, P., Maggioni, A., Mauri, P., et , P. (2001). Molecular and genomic analysis of genes encoding surface- anchored proteins from Clostridium difficile. Infect Immun 69, 3442-3446.

Kataeva, I.A., Seidel, R.D., 3rd, Shah, A., West, L.T., Li, X.L., et Ljungdahl, L.G. (2002). The fibronectin type 3-like repeat from the Clostridium thermocellum cellobiohydrolase CbhA promotes hydrolysis of cellulose by modifying its surface. Appl Environ Microbiol 68, 4292- 4300.

Kelly, C.P., et Kyne, L. (2011). The host immune response to Clostridium difficile. J Med Microbiol 60, 1070-1079.

217 Kim, J.H., et Muder, R.R. (2011). Clostridium difficile enteritis: a review and pooled analysis of the cases. Anaerobe 17, 52-55.

Kim, J.N., et Breaker, R.R. (2008). Purine sensing by riboswitches. Biol Cell 100, 1-11.

Kirby, J.M., Ahern, H., Roberts, A.K., Kumar, V., Freeman, Z., Acharya, K.R., et Shone, C.C. (2009). Cwp84, a surface-associated cysteine protease, plays a role in the maturation of the surface layer of Clostridium difficile. J Biol Chem 284, 34666-34673.

Korotkov, K.V., Sandkvist, M., et Hol, W.G. (2012). The type II secretion system: biogenesis, molecular architecture and mechanism. Nat Rev Microbiol 10, 336-351.

Krasteva, P.V., Fong, J.C., Shikuma, N.J., Beyhan, S., Navarro, M.V., Yildiz, F.H., et Sondermann, H. (2010). Vibrio cholerae VpsT regulates matrix production and motility by directly sensing cyclic di-GMP. Science 327, 866-868.

Kuehne, S.A., Cartman, S.T., Heap, J.T., Kelly, M.L., Cockayne, A., et Minton, N.P. (2010). The role of toxin A and toxin B in Clostridium difficile infection. Nature 467, 711-713.

Kuijper, E.J., Coignard, B., et Tull, P. (2006). Emergence of Clostridium difficile-associated disease in North America and Europe. Clin Microbiol Infect 12 Suppl 6, 2-18.

Kulshina, N., Baird, N.J., et Ferre-D'Amare, A.R. (2009). Recognition of the bacterial second messenger cyclic diguanylate by its cognate riboswitch. Nat Struct Mol Biol 16, 1212-1217.

Lauer, P., Rinaudo, C.D., Soriani, M., Margarit, I., Maione, D., Rosini, R., Taddei, A.R., Mora, M., Rappuoli, R., Grandi, G., et al. (2005). Genome analysis reveals pili in Group B Streptococcus. Science 309, 105.

Leduc, J.L., et Roberts, G.P. (2009). Cyclic di-GMP allosterically inhibits the CRP-like protein (Clp) of Xanthomonas axonopodis pv. citri. J Bacteriol 191, 7121-7122.

Lee, E.R., Baker, J.L., Weinberg, Z., Sudarsan, N., et Breaker, R.R. (2010). An allosteric self- splicing ribozyme triggered by a bacterial second messenger. Science 329, 845-848.

Lee, V.T., Matewish, J.M., Kessler, J.L., Hyodo, M., Hayakawa, Y., et Lory, S. (2007). A cyclic-di-GMP receptor required for bacterial exopolysaccharide production. Mol Microbiol 65, 1474-1484.

218 Leontis, N.B., et Westhof, E. (2001). Geometric nomenclature and classification of RNA base pairs. RNA 7, 499-512.

Letunic, I., Copley, R.R., Pils, B., Pinkert, S., Schultz, J., et Bork, P. (2006). SMART 5: domains in the context of genomes and networks. Nucleic Acids Res 34, D257-260.

Liu, X., et Matsumura, P. (1995). An alternative sigma factor controls transcription of flagellar class-III operons in Escherichia coli: gene sequence, overproduction, purification and characterization. Gene 164, 81-84.

Lory, S., et Strom, M.S. (1997). Structure-function relationship of type-IV prepilin peptidase of Pseudomonas aeruginosa--a review. Gene 192, 117-121.

Lyras, D., O'Connor, J.R., Howarth, P.M., Sambol, S.P., Carter, G.P., Phumoonna, T., Poon, R., Adams, V., Vedantam, G., Johnson, S., et al. (2009). Toxin B is essential for virulence of Clostridium difficile. Nature 458, 1176-1179.

Lytle, B.L., Volkman, B.F., Westler, W.M., Heckman, M.P., et Wu, J.H. (2001). Solution structure of a type I dockerin domain, a novel prokaryotic, extracellular calcium-binding domain. J Mol Biol 307, 745-753.

MacCannell, D.R., Louie, T.J., Gregson, D.B., Laverdiere, M., Labbe, A.C., Laing, F., et Henwick, S. (2006). Molecular analysis of Clostridium difficile PCR ribotype 027 isolates from Eastern and Western Canada. J Clin Microbiol 44, 2147-2152.

Mackin, K.E., Carter, G.P., Howarth, P., Rood, J.I., et Lyras, D. (2013). Spo0A differentially regulates toxin production in evolutionarily diverse strains of Clostridium difficile. PLoS One 8, e79666.

Mandal, M., Boese, B., Barrick, J.E., Winkler, W.C., et Breaker, R.R. (2003). Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell 113, 577-586.

Marchler-Bauer, A., Lu, S., Anderson, J.B., Chitsaz, F., Derbyshire, M.K., DeWeese-Scott, C., Fong, J.H., Geer, L.Y., Geer, R.C., Gonzales, N.R., et al. (2011). CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res 39, D225-229.

Marraffini, L.A., Dedent, A.C., et Schneewind, O. (2006). Sortases and the art of anchoring proteins to the envelopes of gram-positive bacteria. Microbiol Mol Biol Rev 70, 192-221.

219 Martin, M.J., Clare, S., Goulding, D., Faulds-Pain, A., Barquist, L., Browne, H.P., Pettit, L., Dougan, G., Lawley, T.D., et Wren, B.W. (2013). The agr locus regulates virulence and colonization genes in Clostridium difficile 027. J Bacteriol 195, 3672-3681.

Martinez-Wilson, H.F., Tamayo, R., Tischler, A.D., Lazinski, D.W., et Camilli, A. (2008). The Vibrio cholerae hybrid sensor kinase VieS contributes to motility and biofilm regulation by altering the cyclic diguanylate level. J Bacteriol 190, 6439-6447.

Massie, J.P., Reynolds, E.L., Koestler, B.J., Cong, J.P., Agostoni, M., et Waters, C.M. (2012). Quantification of high-specificity cyclic diguanylate signaling. Proc Natl Acad Sci U S A 109, 12746-12751.

Matamouros, S., England, P., et Dupuy, B. (2007). Clostridium difficile toxin expression is inhibited by the novel regulator TcdC. Mol Microbiol 64, 1274-1288.

Mattick, J.S. (2002). Type IV pili and twitching motility. Annu Rev Microbiol 56, 289-314.

McKee, R.W., Mangalea, M.R., Purcell, E.B., Borchardt, E.K., et Tamayo, R. (2013). The Second Messenger Cyclic Di-GMP Regulates Clostridium difficile Toxin Production by Controlling Expression of sigD. J Bacteriol 195, 5174-5185.

Melville, S., et Craig, L. (2013). Type IV pili in Gram-positive bacteria. Microbiol Mol Biol Rev 77, 323-341.

Merighi, M., Lee, V.T., Hyodo, M., Hayakawa, Y., et Lory, S. (2007). The second messenger bis-(3'-5')-cyclic-GMP and its PilZ domain-containing receptor Alg44 are required for alginate biosynthesis in Pseudomonas aeruginosa. Mol Microbiol 65, 876-895.

Merrigan, M., Venugopal, A., Mallozzi, M., Roxas, B., Viswanathan, V.K., Johnson, S., Gerding, D.N., et Vedantam, G. (2010). Human hypervirulent Clostridium difficile strains exhibit increased sporulation as well as robust toxin production. J Bacteriol 192, 4904-4911.

Miller, B.A., Chen, L.F., Sexton, D.J., et Anderson, D.J. (2011). Comparison of the burdens of hospital-onset, healthcare facility-associated Clostridium difficile Infection and of healthcare- associated infection due to methicillin-resistant Staphylococcus aureus in community hospitals. Infect Control Hosp Epidemiol 32, 387-390.

Mills, E., Pultz, I.S., Kulasekara, H.D., et Miller, S.I. (2011). The bacterial second messenger c-di-GMP: mechanisms of signalling. Cell Microbiol 13, 1122-1129.

220 Mirel, D.B., et Chamberlin, M.J. (1989). The Bacillus subtilis flagellin gene (hag) is transcribed by the sigma 28 form of RNA polymerase. J Bacteriol 171, 3095-3101.

Montero Llopis, P., Jackson, A.F., Sliusarenko, O., Surovtsev, I., Heinritz, J., Emonet, T., et Jacobs-Wagner, C. (2010). Spatial organization of the flow of genetic information in bacteria. Nature 466, 77-81.

Mora, M., Bensi, G., Capo, S., Falugi, F., Zingaretti, C., Manetti, A.G., Maggi, T., Taddei, A.R., Grandi, G., et Telford, J.L. (2005). Group A Streptococcus produce pilus-like structures containing protective antigens and Lancefield T antigens. Proc Natl Acad Sci U S A 102, 15641-15646.

Morgan, J.L., McNamara, J.T., et Zimmer, J. (2014). Mechanism of activation of bacterial cellulose synthase by cyclic di-GMP. Nat Struct Mol Biol 21, 489-496.

Morton, C.J., et Campbell, I.D. (1994). SH3 domains. Molecular 'Velcro'. Current biology : CB 4, 615-617.

Murray, R., Boyd, D., Levett, P.N., Mulvey, M.R., et Alfa, M.J. (2009). Truncation in the tcdC region of the Clostridium difficile PathLoc of clinical isolates does not predict increased biological activity of Toxin B or Toxin A. BMC Infect Dis 9, 103.

Navarro, M.V., Newell, P.D., Krasteva, P.V., Chatterjee, D., Madden, D.R., O'Toole, G.A., et Sondermann, H. (2011). Structural basis for c-di-GMP-mediated inside-out signaling controlling periplasmic proteolysis. PLoS biology 9, e1000588.

Newell, P.D., Boyd, C.D., Sondermann, H., et O'Toole, G.A. (2011). A c-di-GMP effector system controls cell adhesion by inside-out signaling and surface protein cleavage. PLoS biology 9, e1000587.

Newell, P.D., Monds, R.D., et O'Toole, G.A. (2009). LapD is a bis-(3',5')-cyclic dimeric GMP-binding protein that regulates surface attachment by Pseudomonas fluorescens Pf0-1. Proc Natl Acad Sci U S A 106, 3461-3466.

O'Donoghue, C., et Kyne, L. (2011). Update on Clostridium difficile infection. Curr Opin Gastroenterol 27, 38-47.

Ohnishi, K., Kutsukake, K., Suzuki, H., et Iino, T. (1990). Gene fliA encodes an alternative sigma factor specific for flagellar operons in Salmonella typhimurium. Mol Gen Genet 21, 139-147.

221 Parvatiyar, K., Zhang, Z., Teles, R.M., Ouyang, S., Jiang, Y., Iyer, S.S., Zaver, S.A., Schenk, M., Zeng, S., Zhong, W., et al. (2012). The helicase DDX41 recognizes the bacterial secondary messengers cyclic di-GMP and cyclic di-AMP to activate a type I interferon immune response. Nat Immunol 13, 1155-1161.

Paul, K., Nieto, V., Carlquist, W.C., Blair, D.F., et Harshey, R.M. (2010). The c-di-GMP binding protein YcgR controls flagellar motor direction and speed to affect chemotaxis by a "backstop brake" mechanism. Mol Cell 38, 128-139.

Paul, R., Abel, S., Wassmann, P., Beck, A., Heerklotz, H., et Jenal, U. (2007). Activation of the diguanylate cyclase PleD by phosphorylation-mediated dimerization. J Biol Chem 282, 29170-29177.

Pelicic, V. (2008). Type IV pili: e pluribus unum? Mol Microbiol 68, 827-837.

Pepin, J. (2006). Improving the treatment of Clostridium difficile-associated disease: where should we start? Clin Infect Dis 43, 553-555.

Pepin, J., Valiquette, L., Gagnon, S., Routhier, S., et Brazeau, I. (2007). Outcomes of Clostridium difficile-associated disease treated with metronidazole or vancomycin before and after the emergence of NAP1/027. Am J Gastroenterol 102, 2781-2788.

Perry, R.D., Bobrov, A.G., Kirillina, O., Jones, H.A., Pedersen, L., Abney, J., et Fetherston, J.D. (2004). Temperature regulation of the hemin storage (Hms+) phenotype of Yersinia pestis is posttranscriptional. J Bacteriol 186, 1638-1647.

Pesavento, C., Becker, G., Sommerfeldt, N., Possling, A., Tschowri, N., Mehlis, A., et Hengge, R. (2008). Inverse regulatory coordination of motility and curli-mediated adhesion in Escherichia coli. Genes Dev 22, 2434-2446.

Pettit, L.J., Browne, H.P., Yu, L., Smits, W.K., Fagan, R.P., Barquist, L., Martin, M.J., Goulding, D., Duncan, S.H., Flint, H.J., et al. (2014). Functional genomics reveals that Clostridium difficile Spo0A coordinates sporulation, virulence and metabolism. BMC genomics 15, 160.

Poutanen, S.M., et Simor, A.E. (2004). Clostridium difficile-associated diarrhea in adults. CMAJ 171, 51-58.

Poxton, I.R. (2010). Fidaxomicin: a new macrocyclic, RNA polymerase-inhibiting antibiotic for the treatment of Clostridium difficile infections. Future Microbiol 5, 539-548.

222 Proft, T., et Baker, E.N. (2009). Pili in Gram-negative and Gram-positive bacteria - structure, assembly and their role in disease. Cell Mol Life Sci 66, 613-635.

Pugsley, A.P. (1993). The complete general secretory pathway in gram-negative bacteria. Microbiol Rev 57, 50-108.

Purcell, E.B., McKee, R.W., McBride, S.M., Waters, C.M., et Tamayo, R. (2012). Cyclic diguanylate inversely regulates motility and aggregation in Clostridium difficile. J Bacteriol 194, 3307-3316.

Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., et Lopez, R. (2005). InterProScan: protein domains identifier. Nucleic Acids Res 33, W116-120.

Rakotoarivonina, H., Jubelin, G., Hebraud, M., Gaillard-Martinie, B., Forano, E., et Mosoni, P. (2002). Adhesion to cellulose of the Gram-positive bacterium Ruminococcus albus involves type IV pili. Microbiology 148, 1871-1880.

Ratnayake-Lecamwasam, M., Serror, P., Wong, K.W., et Sonenshein, A.L. (2001). Bacillus subtilis CodY represses early-stationary-phase genes by sensing GTP levels. Genes Dev 15, 1093-1103.

Reynolds, C.B., Emerson, J.E., de la Riva, L., Fagan, R.P., et Fairweather, N.F. (2011). The Clostridium difficile cell wall protein CwpV is antigenically variable between strains, but exhibits conserved aggregation-promoting function. PLoS Pathog 7, e1002024.

Rocha, E.P. (2008). The organization of the bacterial genome. Annual review of genetics 42, 211-233.

Romling, U., Galperin, M.Y., et Gomelsky, M. (2013). Cyclic di-GMP: the first 25 years of a universal bacterial second messenger. Microbiol Mol Biol Rev 77, 1-52.

Romling, U., Gomelsky, M., et Galperin, M.Y. (2005). C-di-GMP: the dawning of a novel bacterial signalling system. Mol Microbiol 57, 629-639.

Ross, P., Weinhouse, H., Aloni, Y., Michaeli, D., Weinberger-Ohana, P., Mayer, R., Braun, S., de Vroom, E., van der Marel, G.A., van Boom, J.H., et al. (1987). Regulation of cellulose synthesis in Acetobacter xylinum by cyclic diguanylic acid. Nature 325, 279-281.

Rupnik, M., Wilcox, M.H., et Gerding, D.N. (2009). Clostridium difficile infection: new developments in epidemiology and pathogenesis. Nat Rev Microbiol 7, 526-536.

223 Ryan, R.P. (2013). Cyclic di-GMP signalling and the regulation of bacterial virulence. Microbiology 159, 1286-1297.

Ryan, R.P., Fouhy, Y., Lucey, J.F., Crossman, L.C., Spiro, S., He, Y.W., Zhang, L.H., Heeb, S., Camara, M., Williams, P., et al. (2006). Cell-cell signaling in Xanthomonas campestris involves an HD-GYP domain protein that functions in cyclic di-GMP turnover. Proc Natl Acad Sci U S A 103, 6712-6717.

Ryjenkov, D.A., Simm, R., Romling, U., et Gomelsky, M. (2006). The PilZ domain is a receptor for the second messenger c-di-GMP: the PilZ domain protein YcgR controls motility in enterobacteria. J Biol Chem 281, 30310-30314.

Ryjenkov, D.A., Tarutina, M., Moskvin, O.V., et Gomelsky, M. (2005). Cyclic diguanylate is a ubiquitous signaling molecule in bacteria: insights into biochemistry of the GGDEF protein domain. J Bacteriol 187, 1792-1798.

Sauer, J.D., Sotelo-Troha, K., von Moltke, J., Monroe, K.M., Rae, C.S., Brubaker, S.W., Hyodo, M., Hayakawa, Y., Woodward, J.J., Portnoy, D.A., et al. (2011). The N-ethyl-N- nitrosourea-induced Goldenticket mouse mutant reveals an essential function of Sting in the in vivo interferon response to Listeria monocytogenes and cyclic dinucleotides. Infect Immun 79, 688-694.

Saujet, L., Monot, M., Dupuy, B., Soutourina, O., et Martin-Verstraete, I. (2011). The key sigma factor of transition phase, SigH, controls sporulation, metabolism, and virulence factor expression in Clostridium difficile. J Bacteriol 193, 3186-3196.

Schmidt, A.J., Ryjenkov, D.A., et Gomelsky, M. (2005). The ubiquitous protein domain EAL is a cyclic diguanylate-specific phosphodiesterase: enzymatically active and inactive EAL domains. J Bacteriol 187, 4774-4781.

Schwan, C., Kruppke, A.S., Nolke, T., Schumacher, L., Koch-Nolte, F., Kudryashev, M., Stahlberg, H., et Aktories, K. (2014). Clostridium difficile toxin CDT hijacks microtubule organization and reroutes vesicle traffic to increase pathogen adherence. Proc Natl Acad Sci U S A 111, 2313-2318.

Schwan, C., Stecher, B., Tzivelekidis, T., van Ham, M., Rohde, M., Hardt, W.D., Wehland, J., et Aktories, K. (2009). Clostridium difficile toxin CDT induces formation of microtubule- based protrusions and increases adherence of bacteria. PLoS Pathog 5, e1000626.

224 Sebaihia, M., Wren, B.W., Mullany, P., Fairweather, N.F., Minton, N., Stabler, R., Thomson, N.R., Roberts, A.P., Cerdeno-Tarraga, A.M., Wang, H., et al. (2006). The multidrug-resistant human pathogen Clostridium difficile has a highly mobile, mosaic genome. Nat Genet 38, 779- 786.

Seshasayee, A.S., Fraser, G.M., et Luscombe, N.M. (2010). Comparative genomics of cyclic- di-GMP signalling in bacteria: post-translational regulation and catalytic activity. Nucleic Acids Res 38, 5970-5981.

Shanahan, C.A., Gaffney, B.L., Jones, R.A., et Strobel, S.A. (2011). Differential Analogue Binding by Two Classes of c-di-GMP Riboswitches. J Am Chem Soc 133, 15578-15592.

Shang, F., Xue, T., Sun, H., Xing, L., Zhang, S., Yang, Z., Zhang, L., et Sun, B. (2009). The Staphylococcus aureus GGDEF domain-containing protein, GdpS, influences protein A gene expression in a cyclic diguanylic acid-independent manner. Infect Immun 77, 2849-2856.

Shivers, R.P., et Sonenshein, A.L. (2004). Activation of the Bacillus subtilis global regulator CodY by direct interaction with branched-chain amino acids. Mol Microbiol 53, 599-611.

Simm, R., Morr, M., Remminghorst, U., Andersson, M., et Romling, U. (2009). Quantitative determination of cyclic diguanosine monophosphate concentrations in nucleotide extracts of bacteria by matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry. Anal Biochem 386, 53-58.

Sirard, S., Valiquette, L., et Fortier, L.C. (2011). Lack of association between clinical outcome of Clostridium difficile infections, strain type, and virulence-associated phenotypes. J Clin Microbiol 49, 4040-4046.

Smith, K.D., Lipchock, S.V., Ames, T.D., Wang, J., Breaker, R.R., et Strobel, S.A. (2009). Structural basis of ligand binding by a c-di-GMP riboswitch. Nat Struct Mol Biol 16, 1218- 1223.

Smith, K.D., Shanahan, C.A., Moore, E.L., Simon, A.C., et Strobel, S.A. (2011). Structural basis of differential ligand recognition by two classes of bis-(3'-5')-cyclic dimeric guanosine monophosphate-binding riboswitches. Proc Natl Acad Sci U S A 108, 7757-7762.

Snyder, E.E., Kampanya, N., Lu, J., Nordberg, E.K., Karur, H.R., Shukla, M., Soneja, J., Tian, Y., Xue, T., Yoo, H., et al. (2007). PATRIC: the VBI PathoSystems Resource Integration Center. Nucleic Acids Res 35, D401-406.

225 Sonenshein, A.L. (2005). CodY, a global regulator of stationary phase and virulence in Gram- positive bacteria. Curr Opin Microbiol 8, 203-207.

Soutourina, O.A., Monot, M., Boudry, P., Saujet, L., Pichon, C., Sismeiro, O., Semenova, E., Severinov, K., Le Bouguenec, C., Coppee, J.Y., et al. (2013). Genome-wide identification of regulatory RNAs in the human pathogen Clostridium difficile. PLoS Genet 9, e1003493.

Stabler, R.A., He, M., Dawson, L., Martin, M., Valiente, E., Corton, C., Lawley, T.D., Sebaihia, M., Quail, M.A., Rose, G., et al. (2009). Comparative genome and phenotypic analysis of Clostridium difficile 027 strains provides insight into the evolution of a hypervirulent bacterium. Genome Biol 10, R102.

Stenz, L., Francois, P., Whiteson, K., Wolz, C., Linder, P., et Schrenzel, J. (2011). The CodY pleiotropic repressor controls virulence in gram-positive pathogens. FEMS Immunol Med Microbiol 62, 123-139.

Stimson, E., Virji, M., Makepeace, K., Dell, A., Morris, H.R., Payne, G., Saunders, J.R., Jennings, M.P., Barker, S., Panico, M., et al. (1995). Meningococcal pilin: a glycoprotein substituted with digalactosyl 2,4-diacetamido-2,4,6-trideoxyhexose. Mol Microbiol 17, 1201- 1214.

Sudarsan, N., Lee, E.R., Weinberg, Z., Moy, R.H., Kim, J.N., Link, K.H., et Breaker, R.R. (2008). Riboswitches in eubacteria sense the second messenger cyclic di-GMP. Science 321, 411-413.

Syed, K.A., Beyhan, S., Correa, N., Queen, J., Liu, J., Peng, F., Satchell, K.J., Yildiz, F., et Klose, K.E. (2009). The Vibrio cholerae flagellar regulatory hierarchy controls expression of virulence factors. J Bacteriol 191, 6555-6570.

Tal, R., Wong, H.C., Calhoon, R., Gelfand, D., Fear, A.L., Volman, G., Mayer, R., Ross, P., Amikam, D., Weinhouse, H., et al. (1998). Three cdg operons control cellular turnover of cyclic di-GMP in Acetobacter xylinum: genetic organization and occurrence of conserved domains in isoenzymes. J Bacteriol 180, 4416-4425.

Tamayo, R., Tischler, A.D., et Camilli, A. (2005). The EAL domain protein VieA is a cyclic diguanylate phosphodiesterase. J Biol Chem 280, 33324-33330.

Tan, K.S., Wee, B.Y., et Song, K.P. (2001). Evidence for holin function of tcdE gene in the pathogenicity of Clostridium difficile. J Med Microbiol 50, 613-619.

226 Tannock, G.W., Munro, K., Taylor, C., Lawley, B., Young, W., Byrne, B., Emery, J., et Louie, T. (2010). A new macrocyclic antibiotic, fidaxomicin (OPT-80), causes less alteration to the bowel microbiota of Clostridium difficile-infected patients than does vancomycin. Microbiology 156, 3354-3359.

Tao, F., He, Y.W., Wu, D.H., Swarup, S., et Zhang, L.H. (2010). The cyclic nucleotide monophosphate domain of Xanthomonas campestris global regulator Clp defines a new class of cyclic di-GMP effectors. J Bacteriol 192, 1020-1029.

Tasteyre, A., Barc, M.C., Collignon, A., Boureau, H., et Karjalainen, T. (2001). Role of FliC and FliD flagellar proteins of Clostridium difficile in adherence and gut colonization. Infect Immun 69, 7937-7940.

Taur, Y., et Pamer, E.G. (2014). Harnessing microbiota to kill a pathogen: Fixing the microbiota to treat Clostridium difficile infections. Nat Med 20, 246-247.

Tischler, A.D., Lee, S.H., et Camilli, A. (2002). The Vibrio cholerae vieSAB locus encodes a pathway contributing to cholera toxin production. J Bacteriol 184, 4104-4113.

Traynor, K. (2011). Fidaxomicin approved for C. difficile infections. Am J Health Syst Pharm 68, 1276.

Tuckerman, J.R., Gonzalez, G., et Gilles-Gonzalez, M.A. (2011). Cyclic di-GMP activation of polynucleotide phosphorylase signal-dependent RNA processing. J Mol Biol 407, 633-639.

Tuckerman, J.R., Gonzalez, G., Sousa, E.H., Wan, X., Saito, J.A., Alam, M., et Gilles- Gonzalez, M.A. (2009). An oxygen-sensing diguanylate cyclase and phosphodiesterase couple for c-di-GMP control. Biochemistry 48, 9764-9774.

Tulli, L., Marchi, S., Petracca, R., Shaw, H.A., Fairweather, N.F., Scarselli, M., Soriani, M., et Leuzzi, R. (2013). CbpA: a novel surface exposed adhesin of Clostridium difficile targeting human collagen. Cell Microbiol 15, 1674-1687.

Valiente, E., Dawson, L.F., Cairns, M.D., Stabler, R.A., et Wren, B.W. (2012). Emergence of new PCR ribotypes from the hypervirulent Clostridium difficile 027 lineage. J Med Microbiol 61, 49-56.

Varga, J.J., Nguyen, V., O'Brien, D.K., Rodgers, K., Walker, R.A., et Melville, S.B. (2006). Type IV pili-dependent gliding motility in the Gram-positive pathogen Clostridium perfringens and other Clostridia. Mol Microbiol 62, 680-694.

227 Varga, J.J., Therit, B., et Melville, S.B. (2008). Type IV pili and the CcpA protein are needed for maximal biofilm formation by the Gram-positive anaerobic pathogen Clostridium perfringens. Infect Immun 76, 4944-4951.

Voisin, S., Kus, J.V., Houliston, S., St-Michael, F., Watson, D., Cvitkovitch, D.G., Kelly, J., Brisson, J.R., et Burrows, L.L. (2007). Glycosylation of Pseudomonas aeruginosa strain Pa5196 type IV pilins with mycobacterium-like alpha-1,5-linked d-Araf oligosaccharides. J Bacteriol 189, 151-159.

Waligora, A.J., Hennequin, C., Mullany, P., Bourlioux, P., Collignon, A., et Karjalainen, T. (2001). Characterization of a cell surface protein of Clostridium difficile with adhesive properties. Infect Immun 69, 2144-2153.

Walk, S.T., Micic, D., Jain, R., Lo, E.S., Trivedi, I., Liu, E.W., Almassalha, L.M., Ewing, S.A., Ring, C., Galecki, A.T., et al. (2012). Clostridium difficile ribotype does not predict severe infection. Clin Infect Dis 55, 1661-1668.

Warny, M., Pepin, J., Fang, A., Killgore, G., Thompson, A., Brazier, J., Frost, E., et McDonald, L.C. (2005). Toxin production by an emerging strain of Clostridium difficile associated with outbreaks of severe disease in North America and Europe. Lancet 366, 1079- 1084.

Wassmann, P., Chan, C., Paul, R., Beck, A., Heerklotz, H., Jenal, U., et Schirmer, T. (2007). Structure of BeF3- -modified response regulator PleD: implications for diguanylate cyclase activation, catalysis, and feedback inhibition. Structure 15, 915-927.

Weber, H., Pesavento, C., Possling, A., Tischendorf, G., et Hengge, R. (2006). Cyclic-di- GMP-mediated signalling within the sigma network of Escherichia coli. Mol Microbiol 62, 1014-1034.

Weinberg, Z., Barrick, J.E., Yao, Z., Roth, A., Kim, J.N., Gore, J., Wang, J.X., Lee, E.R., Block, K.F., Sudarsan, N., et al. (2007). Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucleic Acids Res 35, 4809- 4819.

Whitney, J.C., Colvin, K.M., Marmont, L.S., Robinson, H., Parsek, M.R., et Howell, P.L. (2012). Structure of the cytoplasmic region of PelD, a degenerate diguanylate cyclase receptor that regulates exopolysaccharide production in Pseudomonas aeruginosa. J Biol Chem 287, 23582-23593.

228 Wilson, K.H., Kennedy, M.J., et Fekety, F.R. (1982). Use of sodium taurocholate to enhance spore recovery on a medium selective for Clostridium difficile. J Clin Microbiol 15, 443-446.

Wilson, K.H., Sheagren, J.N., et Freter, R. (1985). Population dynamics of ingested Clostridium difficile in the gastrointestinal tract of the Syrian hamster. J Infect Dis 151, 355- 361.

Winther-Larsen, H.C., Wolfgang, M., Dunham, S., van Putten, J.P., Dorward, D., Lovold, C., Aas, F.E., et Koomey, M. (2005). A conserved set of pilin-like molecules controls type IV pilus dynamics and organelle-associated functions in Neisseria gonorrhoeae. Mol Microbiol 56, 903-917.

Won, H.S., Lee, Y.S., Lee, S.H., et Lee, B.J. (2009). Structural overview on the allosteric activation of cyclic AMP receptor protein. Biochim Biophys Acta 1794, 1299-1308.

Yanagawa, R., et Honda, E. (1976). Presence of pili in species of human and animal parasites and pathogens of the genus Corynebacterium. Infect Immun 13, 1293-1295.

Yanagawa, R., Otsuki, K., et Tokui, T. (1968). Electron microscopy of fine structure of Corynebacterium renale with special reference to pili. Jpn J Vet Res 16, 31-37.

Yeats, C., Maibaum, M., Marsden, R., Dibley, M., Lee, D., Addou, S., et Orengo, C.A. (2006). Gene3D: modelling protein structure, function and evolution. Nucleic Acids Res 34, D281- 284.

Yi, X., Yamazaki, A., Biddle, E., Zeng, Q., et Yang, C.H. (2010). Genetic analysis of two phosphodiesterases reveals cyclic diguanylate regulation of virulence factors in Dickeya dadantii. Mol Microbiol 77, 787-800.

Yutin, N., et Galperin, M.Y. (2013). A genomic update on clostridial phylogeny: Gram- negative spore formers and other misplaced clostridia. Environ Microbiol 15, 2631-2641.

Zaccolo, M., Di Benedetto, G., Lissandron, V., Mancuso, L., Terrin, A., et Zamparo, I. (2006). Restricted diffusion of a freely diffusible second messenger: mechanisms underlying compartmentalized cAMP signalling. Biochem Soc Trans 34, 495-497.

Zhou, M., Boekhorst, J., Francke, C., et Siezen, R.J. (2008). LocateP: genome-scale subcellular-location predictor for bacterial proteins. BMC Bioinformatics 9, 173.

229