CONTROL OF RNA STRUCTURE BY CSPA PROTEINS IN

By

JASON PEELE PRICE

A dissertation submitted in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

WASHINGTON STATE UNIVERSITY School of Molecular Biosciences

JULY 2017

© Copyright by JASON PEELE PRICE, 2017 All Rights Reserved

© Copyright by JASON PEELE PRICE, 2017 All Rights Reserved

To the Faculty of Washington State University: The members of the Committee appointed to examine the dissertation of JASON PEELE PRICE find it satisfactory and recommend that it be accepted.

______Michael L. Kahn, Ph.D., Chair

______Jon M. Oatley, Ph.D.

______Cynthia A. Haseltine, Ph.D.

______Kelly A. Brayton, Ph.D.

______Svetlana N. Yurgel, Ph.D

ii

ACKNOWLEDGMENT

Firstly, I would like to thank my PhD advisor, Dr. Michael Kahn for his guidance and mentorship over the course of my graduate studies both in science and in life.

I am grateful to my committee members, Dr. Jon Oatley, Dr. Kelly Brayton, Dr. Cynthia

Haseltine, and Dr. Svetlana Yurgel for their interest and support in my work.

I sincerely thank fellow lab members, both past and present, for sharing this adventure with me and for all the good times.

Finally, I would like to thank my friends and family for their unconditional love and support.

iii

CONTROL OF RNA STRUCTURE BY CSPA PROTEINS IN RHIZOBIA

Abstract

by Jason Peele Price, Ph.D. Washington State University July 2017

Chair: Michael L. Kahn

Rhizobia are soil that can associate with some and participate in symbiotic nitrogen fixation. Bacterial CspA family members are small, single stranded nucleic acid binding proteins conserved throughout all domains of life. Here, the role of CspA family proteins in the symbiotic development of meliloti with sativa () is investigated. Expression and genetic deletion strain analysis revealed that CspA family proteins are differentially expressed in and contribute to symbiotic effectiveness.

RNAseq analysis of native co-immunoprecipitated RNAs identified a novel interaction between several CspA family proteins and the αR14 family of small non-coding RNA (sRNAs). Whole transcriptome analysis defined transcriptional defects associated with loss of CspA function.

The development of a new in vitro RNA binding assay using broccoli, a Green Fluorescent

Protein (GFP) RNA mimic, is described as well as its use in defining binding specificity of CspA family proteins with synthetic and native αR14 family sRNA structures. This work concludes that

CspA family proteins interact with and influence the stability of specific RNA structures and these interactions control RNA regulated processes important for symbiotic development.

iv

TABLE OF CONTENTS

Page ACKNOWLEDGMENT ...... iii

ABSTRACT ...... iv

LIST OF TABLES ...... vii

LIST OF FIGURES ...... viii

Dedication ...... xi

CHAPTER 1. General Introduction...... 1

Nitrogen and Agriculture ...... 1

Symbiotic Nitrogen Fixation ...... 2

RNA Structure Control of Gene Expression ...... 3

Bacterial CspA Family Proteins Control of RNA Structure ...... 4

CspA Family Proteins in Symbiotic Nitrogen Fixation ...... 7

REFERENCES ...... 9

CHAPTER 2. Sinorhizobium meliloti CspA family members mediate stability of structured sRNAs contributing to stress adaptation and effective symbiosis with alfalfa ...... 30

PREFACE ...... 30

ABSTRACT ...... 31

INTRODUCTION ...... 33

METHODS ...... 37

RESULTS ...... 48

DISCUSSION ...... 60

v

REFERENCES ...... 69

CHAPTER 3. A new kinetic fluorescent RNA binding assay for RNA chaperone activity reveals cooperative binding of S. meliloti CspA2 and CspA4 with αR14 sRNA family targets ...... 110

PREFACE ...... 110

ABSTRACT ...... 111

INTRODUCTION ...... 112

METHODS ...... 114

RESULTS ...... 119

DISCUSSION ...... 125

REFERENCES ...... 131

CHAPTER 4. Conclusions and Perspectives ...... 158

PREFACE ...... 158

S. meliloti CspAs family proteins and bacterial stress adaptation ...... 159

S. meliloti CspAs family proteins and rhizobia – symbiosis ...... 161

CspA family proteins and aR14 family sRNAs ...... 164

The broccoli RNA binding assay ...... 168

Concluding remarks ...... 169

vi

LIST OF TABLES

CHAPTER 2

TABLE S1. Genes corresponding to RNA significantly enriched by co-IP with CspA2-GFP ...... 106

TABLE S2. Genes corresponding to RNA significantly enriched by co-IP with CspA4-GFP ...... 107

TABLE S3. Genes corresponding to RNA significantly enriched by co-IP with CspA5-GFP ...... 108

TABLE S4. Strains and ...... 109

CHAPTER 3

TABLE S1. Broccoli-tag sequences ...... 156

TABLE S2. List of calculated binding constants...... 157

vii

LIST OF FIGURES

CHAPTER 1

Figure 1. The World depends on Haber-Bosch nitrogen...... 14

Figure 2. Symbiotic nitrogen fixation...... 16

Figure 3. Bacterial differentiation within an indeterminate nodule...... 18

Figure 4. Bacterial RNA structure control of gene expression...... 20

Figure 5. CspA family protein structure...... 22

Figure 6. CspA role in cold shock...... 24

Figure 7. CspA control of RNA structures...... 26

Figure 8. CspA family expression in symbiosis...... 28

CHAPTER 2

Figure 1. Multiple sequence alignment of the S. meliloti CspA protein family members with E. coli CspA...... 75

Figure 2. Symbiotic nodule zone specific and free-living stress responsive expression of S. meliloti CspAs...... 78

Figure 3. Free-living and symbiotic phenotypes of S. meliloti cspA deletion strains...... 80

Figure 4. CspA-GFP fusion native immunoprecipitation followed by RNA sequencing...... 82

Figure 5. rpoE2 expression analysis from whole transcriptome sequencing of free-living Sm1021 WT and Sm1021 ΔcspA2 ΔcspA4 strains...... 84

Figure 6. αR14 family sRNA and rpoE8 expression analysis from whole transcriptome sequencing of Sm1021 WT and Sm1021 ΔcspA2 ΔcspA4 strains from nodules...... 86

Figure 7. Cartoon representing CspA interactions in transcription antitermination, sRNA stabilization, and translation initiation...... 88

Figure S1. CspA family amino acid sequence alignments...... 90

viii

Figure S2. S. meliloti CspA family transcript and eGFP fusion expression...... 92

Figure S3. Endogenous GFP-tagging strategy...... 94

Figure S4. (A) Generation of ΔcspA2 and ΔcspA4 strains...... 96

Figure S5. Antibiotic sensitivity phenotypes of ΔcspA2 and ΔcspA4 strains...... 98

Figure S6. S. meliloti CspA – RNA target interactions...... 100

Figure S7. Illumina sequencing data from CspA IPs and ΔCspA strain whole transcriptome sequencing...... 102

Figure S8. αR14 sRNA family members zone specific expression...... 104

CHAPTER 3

Figure 1. Broccoli-tags and the broccoli kinetic binding assay...... 135

Figure 2. Comparison of loop sequence specificity of CspA2 and CspA4 for polyG9-loop vs polyUCC3-loop broccoli-tags...... 138

Figure 3. Smr14C2 Arm1 and Arm2 broccoli-tags ...... 140

Figure 4. Binding kinetics of CspA2 and CspA4 with Arm1 and Arm2 broccoli tags ...... 142

Figure 5. CspA2 and CspA4 rescue kinetically trapped alternative structures and cDNA inhibited structures...... 144

Figure 6. Cartoon of CspA mechanism in the broccoli kinetic binding assay...... 146

Figure S1. Broccoli tag generation scheme...... 148

Figure S2. Purified proteins and RNAs...... 150

Figure S3. 2-phase association modeling of broccoli kinetic binding assay curves...... 152

Figure S4. Smr14C2 Arm2 and Arm2 cDNA Inhibitor...... 154

ix

CHAPTER 4

Figure 1. Diagram of the RNA response to nitrogen and phosphate stress conditions...... 172

Figure 2. Differential gene expression plots of the NSR and NSR(p)...... 174

Figure 3. Conflicting gene expression through central carbon metabolism...... 176

x

Dedication

This dissertation is dedicated to my mother and father.

xi

CHAPTER 1. General Introduction

Nitrogen and Agriculture

Nitrogen is the nutrient that most often limits crop yield in agriculture. Although the atmosphere has abundant chemically inert di-nitrogen (N2), plants are unable to utilize this form of nitrogen and require nitrogen to be chemically combined into a form that can be assimilated more readily (Fig. 1A). The industrial reduction of atmospheric nitrogen to ammonia by the Haber-Bosch process, which is the first step in producing most nitrogen , is the major source of fixed nitrogen for current agricultural practices. However, chemical fixation of nitrogen is an inefficient and expensive means of supplying plant crops with the nitrogen they need (Erisman et al., 2008; Gu et al., 2013; Stein and Klotz, 2016). Invented in the early 20th century by German chemists Fritz Haber and Carl Bosch, the Haber-Bosch process reacts nitrogen (N2) with hydrogen (H2) derived from methane in natural gas at high temperature and extreme pressure to produce ammonia (NH3) on an industrial scale (Fig. 1B). Despite the substantial monetary and environmental costs of using nitrogen fertilizers, the world's population has become dependent on chemical nitrogen fixation (Fig. 1C) (Erisman et al., 2008).

The use of chemically fixed nitrogen fertilizers produced from the Haber-Bosch process was introduced into agricultural practice in a large scale beginning in the second half of the 20th century. The exponential increase of input into agriculture over the past century is directly responsible for the exponential growth of the human population over the same time period and it is estimated that approximately half of the world’s population is supported by the introduction of Haber-Bosch nitrogen into modern agriculture (Erisman et al., 2008; Smil, 2002).

It is thus essential that alternatives to the continued substantial use of chemically fixed nitrogen

1 fertilizers be explored to meet the needs of the growing population. Biological nitrogen fixation, carried out by bacteria such as those involved in symbiosis between rhizobia and legumes, represents such an alternative (Fig. 1D) (Hirsch and Mauchline, 2015).

Symbiotic Nitrogen Fixation

Rhizobial bacteria fix atmospheric N2 into ammonia in symbiosis with some legume crop species, such as alfalfa (Fig. 1A). The symbiotic association of rhizobia with leguminous plants allows the plant to receive the necessary amount of fixed nitrogen for optimal growth and can enrich the surrounding soil with fixed nitrogen for succeeding crop. The bacteria initiate infection at the root hairs of the plant and infection results in differentiation of both bacterial and plant cells formation of a specialized root organ called a nodule (Fig. 2B). Within the nodule, differentiated rhizobial bacteria (bacteroids) fix nitrogen within subcellular organelles called and export that nitrogen to the plant. The plant supplies the bacteria with nutrients and protects the bacteria from the outside environment (Fig. 2C). Carbon and reducing potential in the form of dicarboxylic acids like malate are supplied to the bacteria from the cytoplasm of infected plant cells. The energy from catabolism of these compounds drives enzymatic activity, which converts N2 into ammonia which the plant can ultimately incorporate into amino acids and transport out of the nodule and into other plant tissues.

Establishment of effective symbiosis requires a massive shift in rhizobial gene expression between the free-living and symbiotic states (Fig. 3A, 3B, and 3C). Indeterminate nodules, like those found in the symbiosis between rhizobia and alfalfa, mature by elongation of a plant root meristem resulting in the formation of metabolically distinct zones (Łotocka et al., 2012; Roux

2 et al., 2014; Vasse et al., 1990). Zone 1 (ZI) consists of the plant meristem, Zone II (ZII) represents the site of bacterial infection, the interzone (IZ) is the region between ZII and Zone

III (ZIII) where the bacteria are beginning to transition into nitrogen fixing bacteroids and where a microaerobic environment suitable for nitrogen fixation is established, and ZIII is the site of active nitrogen fixation, where fully differentiated bacteroids convert atmospheric N2 to ammonia (Fig. 3A). Zone IV appears later in development and is termed the senescent zone, where once active symbiosomes are degraded and recycled. How the extreme transformation in gene expression as rhizobia transition between free-living and symbiotic states within the plant cell are regulated is of significant interest. A better understanding of the factors governing rhizobial development will contribute to efforts aimed at improving the application of symbiotic nitrogen fixation in agriculture and alleviating the world’s dependence upon chemically fixed

Haber-Bosch nitrogen.

RNA Structure Control of Gene Expression

RNA structure controls regulation of gene expression in prokaryotes at the level of transcription termination, translation initiation, and mRNA stability (Lalaouna et al., 2013;

Meyer, 2017; Zhang and Landick, 2016). RNA structures formed within nascent transcripts define RNA polymerase processivity. The formation of secondary structures by intrinsic terminator sequences destabilize RNA polymerase interactions with the DNA template and increase the frequency of transcription termination. In contrast, the formation of competing, alternative secondary structures that block the folding of terminator structures promotes transcript elongation or antitermination (Fig. 4A) (Santangelo and Artsimovitch, 2011).

3

Moreover, the dynamic process of folding into terminator or antiterminator alternatives can be influenced by many factors such as cellular metabolite concentrations (Price et al., 2014), protein factors (Ait-Bara et al., 2017), and by environmental conditions such as temperature

(Krajewski and Narberhaus, 2014). In a very similar manner, the secondary structure of mRNA at or near the sites where translation begins can define the success of ribosome binding and the translational efficiency of that mRNA (Meyer, 2017). Translation initiation in prokaryotes occurs through recognition of the Shine-Dalgarno ribosome binding site (RBS) by the 30S ribosomal subunit. Masking the RBS in a base-paired stem region of an RNA hairpin structure inhibits translation (Fig. 4B). As with the structural switches that influence terminator / antiterminator

RNA conformations, the transition between open and closed RBS structures can be influenced by metabolite concentrations (Meyer, 2017), metal ion concentrations (Wedekind et al., 2017) and environmental factors such as temperature (Kortmann and Narberhaus, 2012). In some cases, structures are able to respond to multiple influences, such as temperature and metabolite concentration (Giuliodori et al., 2010; Reining et al., 2013). Understanding the roles of factors that mediate the formation of the operation of these key RNA structures is thus an area of considerable interest.

Bacterial CspA Family Proteins Control of RNA Structure

The bacterial Cold Shock Protein A (CspA) family of proteins are RNA binding proteins that contain at least one Cold Shock Domain (CSD), an ancient, single stranded nucleic acid binding structure, found throughout all domains of life (Horn et al., 2007; Lee et al., 1994). CspA family proteins are small, typically around 70 amino acids long. The nucleic acid binding motif

4 determined by X-ray crystallography consists of five antiparallel β-sheets organized into a β- barrel. Highly conserved RNA binding motifs, RNP-1 and RNP-2, are located on the 2nd and 3rd β- sheets and contain several surface exposed aromatic amino acids flanked by positively charged residues (Sachs et al., 2012) (Fig. 5A). The outward facing aromatic amino acid residues participate in base stacking interactions with single stranded nucleic acid and are responsible for the characteristic RNA chaperone activity of the family (Fig. 5B) (Phadtare et al., 2004;

Rennella et al., 2017; Sachs et al., 2012).

CspA family proteins in E. coli were initially identified as non-specific RNA chaperones important for cellular adaptation to cold stress (Jiang et al., 1997; Xia et al., 2001; Yamanaka,

1999). In the E. coli cold shock response, the cell enters an acclimation phase immediately following a temperature downshift, where global translation is inhibited but specific translation of a small number of cold-induced proteins (CIPs) is activated (Fig. 6A). Fairly quickly after initiation of the acclimation phase this induction of CIPs decreases dramatically and the cell enters into a cold adapted growth phase where global translation resumes but at a lower rate than before the shift in temperature (Horn et al., 2007). In E. coli, the 5ʹ untranslated region

(UTR) of the cspA mRNA plays an important role in regulating expression of CspA (Jiang et al.,

1996; Kortmann and Narberhaus, 2012; Xia et al., 2002; Yamanaka et al., 1999). The 5ʹ UTR of the cspA mRNA adopts different secondary structures, depending on the temperature. The conformation at low temperature facilitates initiation of translation while the structure at 37°C both inhibits translation and destabilizes the transcript (Giuliodori et al., 2010). CspA can bind the cspA 5ʹ UTR and regulate expression by interacting with a sequence/motif called the "cold box" (Kortmann and Narberhaus, 2012). An additional level of post-transcriptional regulation

5 has been described for CspA in E. coli where the CspA transcript is targeted by the small RNA binding protein, Hfq. Hfq preferentially binds near the 3ʹ UTR of the CspA mRNA and also mediates its expression (Hankins et al., 2010). Interestingly, the E. coli CspA family member

CspC can complement an hfq deletion in E. coli (Cohen-Or et al., 2010).

While the CspA family of proteins were initially characterized for their role in cold stress adaptation (Jiang et al., 1997; Xia et al., 2001; Yamanaka, 1999), more recently it has become clear that they play important roles at physiological temperatures and in response to other stresses (Phadtare and Severinov, 2010). Of nine identified CspA family members in E. coli

(CspA-CspI), only a few play a role in responding to cold stress (Czapski and Trun, 2014). CspC and CspE are constitutively expressed and have been shown to regulate the general stress response through interaction with the general stress response sigma factor, RpoS (Cohen-Or et al., 2010; Phadtare and Inouye, 2001). CspD is induced during stationary phase and in response to nutrient starvation (Jones and Inouye, 1994). Several CspA family members gene expression in Clostridium botulinum responds to salt, pH, and ethanol stress (Derman et al., 2015). In

Listeria, Salmonella, and Brucella a loss of host cell invasiveness through deletion of CspA family members supports an emerging role for the CspA family of proteins in mediating bacterial adaptation to stress encountered within eukaryotic hosts (Loepfe et al., 2010; Michaux et al.,

2017; Schmid et al., 2009; Wang et al., 2014, 2016).

CspA family proteins participate in antitermination of transcription in E. coli. Melting of intrinsic terminator sequences by CspA, CspC, and CspE has been demonstrated in vivo in response to cold stress, and also in vitro (Bae et al., 2000; Phadtare and Severinov, 2009;

Phadtare et al., 2002a, 2002b). CspA family control over RNA secondary structures regulating

6 transcription (Fig. 7A), and translation (Fig 7B), demonstrates the ability of these proteins to function as systemic regulators of differential gene expression important for bacterial adaptation to stressful environments (Jiménez-Zurdo et al., 2013; Phadtare, 2004; Phadtare and

Severinov, 2010). During the establishment of effective symbiotic nitrogen fixation rhizobia encounter many stresses within the plant cells as they differentiate from free-living to symbiotic states. Given that CspA proteins act as global regulators of stress adaptation, investigating their potential role in mediating symbiotic development is of interest. This is particularly of interest because of the relatively high level of cspA mRNAs in symbiotic transcriptomes (Roux et al., 2014; Sallet et al., 2013) and CspA proteins in nodule proteomes

(Marx et al., 2016; Yurgel et al., unpublished).

CspA Family Proteins in Symbiotic Nitrogen Fixation

A whole proteome study of Sinorhizobium medicae in free-living cells and in root nodules revealed up-regulation of CspA family member proteins CspA2 and CspA4 in symbiosis with vs free-living culture (Yurgel, unpublished). In the study 632 proteins were identified as uniquely expressed in the free-living state, 275 proteins were identified has uniquely expressed in the symbiotic state, and 965 proteins were found to have a detectable presence in both the free-living and symbiotic states (Fig. 8A). CspA family member homologs, identified by amino acid sequence similarity to the Sinorhizobium meliloti CspA family of proteins, presented an intriguing pattern of differential expression between the two states (Fig.

8B). Identification of CspA2a and CspA4 as up-regulated in symbiosis supported a role for the

CspA family of proteins in symbiotic development. This led to the hypothesis that rhizobial CspA

7 family proteins were playing a role in symbiotic development through control of RNA structures regulating gene expression important for adaptation to the stressful environment encountered within the plant cell, the free-living to symbiotic transition, and establishment of effective symbiosis.

In Chapter 2, Sinorhizobium meliloti CspA family member expression in response to stress and in symbiosis is defined, free-living and symbiotic phenotypes of constructed S. meliloti cspA2 and cspA4 deletion strains are assessed, RNAs that bind to S. meliloti CspA2,

CspA4. and CspA5 are identified, and free-living and symbiotic transcriptional defects associated with a S. meliloti ∆cspA2 ∆cspA4 double deletion strain are characterized. In Chapter

3, the development of a new fluorescent RNA binding assay and its use in further characterization of S. meliloti CspA2 and CspA4 interactions with RNA targets is presented. In

Chapter 4, impacts of this research on symbiotic nitrogen fixation and overarching conclusions of this work are discussed.

8

REFERENCES Ait-Bara, S., Clerte, C., Declerck, N., and Margeat, E. (2017). Competitive folding of RNA structures at a termination / antitermination site. RNA N. Y. N.

Bae, W., Xia, B., Inouye, M., and Severinov, K. (2000). Escherichia coli CspA-family RNA chaperones are transcription antiterminators. Proc. Natl. Acad. Sci. U. S. A. 97, 7784–7789.

Cohen-Or, I., Shenhar, Y., Biran, D., and Ron, E.Z. (2010). CspC regulates rpoS transcript levels and complements hfq deletions. Res. Microbiol. 161, 694–700.

Czapski, T.R., and Trun, N. (2014). Expression of csp genes in E. coli K-12 in defined rich and defined minimal media during normal growth, and after cold-shock. Gene 547, 91–97.

Derman, Y., Söderholm, H., Lindström, M., and Korkeala, H. (2015). Role of csp genes in NaCl, pH, and ethanol stress response and motility in Clostridium botulinum ATCC 3502. Food Microbiol. 46, 463–470.

Erisman, J.W., Sutton, M.A., Galloway, J., Klimont, Z., and Winiwarter, W. (2008). How a century of ammonia synthesis changed the world. Nat. Geosci. 1, 636–639.

Giuliodori, A.M., Di Pietro, F., Marzi, S., Masquida, B., Wagner, R., Romby, P., Gualerzi, C.O., and Pon, C.L. (2010). The cspA mRNA is a thermosensor that modulates translation of the cold- shock protein CspA. Mol. Cell 37, 21–33.

Gu, B., Chang, J., Min, Y., Ge, Y., Zhu, Q., Galloway, J.N., and Peng, C. (2013). The role of industrial nitrogen in the global nitrogen biogeochemical cycle. Sci. Rep. 3, 2579.

Hankins, J.S., Denroche, H., and Mackie, G.A. (2010). Interactions of the RNA-binding protein Hfq with cspA mRNA, encoding the major cold shock protein. J. Bacteriol. 192, 2482–2490.

Hirsch, P.R., and Mauchline, T.H. (2015). The Importance of the Microbial N Cycle in Soil for Crop Plant Nutrition. Adv. Appl. Microbiol. 93, 45–71.

Horn, G., Hofweber, R., Kremer, W., and Kalbitzer, H.R. (2007). Structure and function of bacterial cold shock proteins. Cell. Mol. Life Sci. CMLS 64, 1457–1470.

Jiang, W., Fang, L., and Inouye, M. (1996). The role of the 5ʹ-end untranslated region of the mRNA for CspA, the major cold-shock protein of Escherichia coli, in cold-shock adaptation. J. Bacteriol. 178, 4919–4925.

Jiang, W., Hou, Y., and Inouye, M. (1997). CspA, the major cold-shock protein of Escherichia coli, is an RNA chaperone. J. Biol. Chem. 272, 196–202.

Jiménez-Zurdo, J.I., Valverde, C., and Becker, A. (2013). Insights into the noncoding RNome of nitrogen-fixing endosymbiotic α-. Mol. Plant-Microbe Interact. MPMI 26, 160– 167.

9

Jones, P.G., and Inouye, M. (1994). The cold-shock response--a hot topic. Mol. Microbiol. 11, 811–818.

Kortmann, J., and Narberhaus, F. (2012). Bacterial RNA thermometers: molecular zippers and switches. Nat. Rev. Microbiol. 10, 255–265.

Krajewski, S.S., and Narberhaus, F. (2014). Temperature-driven differential gene expression by RNA thermosensors. Biochim. Biophys. Acta 1839, 978–988.

Lalaouna, D., Simoneau-Roy, M., Lafontaine, D., and Massé, E. (2013). Regulatory RNAs and target mRNA decay in prokaryotes. Biochim. Biophys. Acta 1829, 742–747.

Lee, S.J., Xie, A., Jiang, W., Etchegaray, J.P., Jones, P.G., and Inouye, M. (1994). Family of the major cold-shock protein, CspA (CS7.4), of Escherichia coli, whose members show a high sequence similarity with the eukaryotic Y-box binding proteins. Mol. Microbiol. 11, 833–839.

Loepfe, C., Raimann, E., Stephan, R., and Tasara, T. (2010). Reduced host cell invasiveness and oxidative stress tolerance in double and triple csp gene family deletion mutants of Listeria monocytogenes. Foodborne Pathog. Dis. 7, 775–783.

Łotocka, B., Kopcińska, J., and Skalniak, M. (2012). Review article: The meristem in indeterminate root nodules of Faboideae. Symbiosis 58, 63–72.

Marx, H., Minogue, C.E., Jayaraman, D., Richards, A.L., Kwiecien, N.W., Siahpirani, A.F., Rajasekar, S., Maeda, J., Garcia, K., Del Valle-Echevarria, A.R., et al. (2016). A proteomic atlas of the legume Medicago truncatula and its nitrogen-fixing Sinorhizobium meliloti. Nat. Biotechnol. 34, 1198–1205.

Meyer, M.M. (2017). The role of mRNA structure in bacterial translational regulation. Wiley Interdiscip. Rev. RNA 8.

Michaux, C., Holmqvist, E., Vasicek, E., Sharan, M., Barquist, L., Westermann, A.J., Gunn, J.S., and Vogel, J. (2017). RNA target profiles direct the discovery of virulence functions for the cold- shock proteins CspC and CspE. Proc. Natl. Acad. Sci. U. S. A.

Phadtare, S. (2004). Recent developments in bacterial cold-shock response. Curr. Issues Mol. Biol. 6, 125–136.

Phadtare, S., and Inouye, M. (2001). Role of CspC and CspE in regulation of expression of RpoS and UspA, the stress response proteins in Escherichia coli. J. Bacteriol. 183, 1205–1214.

Phadtare, S., and Severinov, K. (2009). Comparative analysis of changes in gene expression due to RNA melting activities of translation initiation factor IF1 and a cold shock protein of the CspA family. Genes Cells Devoted Mol. Cell. Mech. 14, 1227–1239.

10

Phadtare, S., and Severinov, K. (2010). RNA remodeling and gene regulation by cold shock proteins. RNA Biol. 7, 788–795.

Phadtare, S., Inouye, M., and Severinov, K. (2002a). The nucleic acid melting activity of Escherichia coli CspE is critical for transcription antitermination and cold acclimation of cells. J. Biol. Chem. 277, 7239–7245.

Phadtare, S., Tyagi, S., Inouye, M., and Severinov, K. (2002b). Three amino acids in Escherichia coli CspE surface-exposed aromatic patch are critical for nucleic acid melting activity leading to transcription antitermination and cold acclimation of cells. J. Biol. Chem. 277, 46706–46711.

Phadtare, S., Inouye, M., and Severinov, K. (2004). The mechanism of nucleic acid melting by a CspA family protein. J. Mol. Biol. 337, 147–155.

Price, I.R., Grigg, J.C., and Ke, A. (2014). Common themes and differences in SAM recognition among SAM riboswitches. Biochim. Biophys. Acta 1839, 931–938.

Reining, A., Nozinovic, S., Schlepckow, K., Buhr, F., Fürtig, B., and Schwalbe, H. (2013). Three- state mechanism couples ligand and temperature sensing in riboswitches. Nature 499, 355– 359.

Rennella, E., Sára, T., Juen, M., Wunderlich, C., Imbert, L., Solyom, Z., Favier, A., Ayala, I., Weinhäupl, K., Schanda, P., et al. (2017). RNA binding and chaperone activity of the E. coli cold- shock protein CspA. Nucleic Acids Res.

Roux, B., Rodde, N., Jardinaud, M.-F., Timmers, T., Sauviac, L., Cottret, L., Carrère, S., Sallet, E., Courcelle, E., Moreau, S., et al. (2014). An integrated analysis of plant and bacterial gene expression in symbiotic root nodules using laser-capture microdissection coupled to RNA sequencing. Plant J. Cell Mol. Biol. 77, 817–837.

Sachs, R., Max, K.E.A., Heinemann, U., and Balbach, J. (2012). RNA single strands bind to a conserved surface of the major cold shock protein in crystals and solution. RNA N. Y. N 18, 65– 76.

Sallet, E., Roux, B., Sauviac, L., Jardinaud, M.-F., Carrère, S., Faraut, T., de Carvalho-Niebel, F., Gouzy, J., Gamas, P., Capela, D., et al. (2013). Next-generation annotation of prokaryotic with EuGene-P: application to Sinorhizobium meliloti 2011. DNA Res. Int. J. Rapid Publ. Rep. Genes Genomes 20, 339–354.

Santangelo, T.J., and Artsimovitch, I. (2011). Termination and antitermination: RNA polymerase runs a stop sign. Nat. Rev. Microbiol. 9, 319–329.

Schmid, B., Klumpp, J., Raimann, E., Loessner, M.J., Stephan, R., and Tasara, T. (2009). Role of cold shock proteins in growth of Listeria monocytogenes under cold and osmotic stress conditions. Appl. Environ. Microbiol. 75, 1621–1627.

11

Smil, V. (2002). Nitrogen and food production: proteins for human diets. Ambio 31, 126–131.

Stein, L.Y., and Klotz, M.G. (2016). The nitrogen cycle. Curr. Biol. CB 26, R94-98.

Vasse, J., de Billy, F., Camut, S., and Truchet, G. (1990). Correlation between ultrastructural differentiation of bacteroids and nitrogen fixation in alfalfa nodules. J. Bacteriol. 172, 4295– 4306.

Wang, Z., Wang, S., and Wu, Q. (2014). Cold shock protein A plays an important role in the stress adaptation and virulence of Brucella melitensis. FEMS Microbiol. Lett. 354, 27–36.

Wang, Z., Liu, W., Wu, T., Bie, P., and Wu, Q. (2016). RNA-seq reveals the critical role of CspA in regulating Brucella melitensis metabolism and virulence. Sci. China Life Sci.

Wedekind, J.E., Dutta, D., Belashov, I.A., and Jenkins, J.L. (2017). Metalloriboswitches: RNA- based inorganic ion sensors that regulate genes. J. Biol. Chem. 292, 9441–9450.

Xia, B., Ke, H., and Inouye, M. (2001). Acquirement of cold sensitivity by quadruple deletion of the cspA family and its suppression by PNPase S1 domain in Escherichia coli. Mol. Microbiol. 40, 179–188.

Xia, B., Ke, H., Jiang, W., and Inouye, M. (2002). The Cold Box stem-loop proximal to the 5ʹ-end of the Escherichia coli cspA gene stabilizes its mRNA at low temperature. J. Biol. Chem. 277, 6005–6011.

Yamanaka, K. (1999). Cold shock response in Escherichia coli. J. Mol. Microbiol. Biotechnol. 1, 193–202.

Yamanaka, K., Mitta, M., and Inouye, M. (1999). Mutation analysis of the 5ʹ untranslated region of the cold shock cspA mRNA of Escherichia coli. J. Bacteriol. 181, 6284–6291.

Zhang, J., and Landick, R. (2016). A Two-Way Street: Regulatory Interplay between RNA Polymerase and Nascent RNA Structure. Trends Biochem. Sci. 41, 293–310.

12

This page left intentionally blank

13

Figure 1. The world depends on Haber-Bosch nitrogen. (A) Fixed nitrogen is often the limiting nutrient in crop fields and needs to be added through nitrogen fertilizers to obtain maximal yield. (B) Diagram of the Haber-Bosch process (C) Predicated world’s population without the input of Haber-Bosch nitrogen into agricultural (graph adapted from Erisman et al., 2008). (D)

Rhizobia-legume symbiosis biologically fixes nitrogen and can provide crops with the necessary amount of nitrogen for optimal yields.

14

15

Figure 2. Symbiotic nitrogen fixation. (A) An alfalfa field (image from Vermont Valley

Community Farms). (B) Symbiotic root nodules (image from Ninjatacoshell via Wikimedia

Commons, https://commons.wikimedia.org/wiki/User:Ninjatacoshell). (C) Cartoon diagram describing the symbiotic exchange of energy from the plant into ammonia from the bacteroid in symbiotic nitrogen fixation.

16

17

Figure 3. Bacterial differentiation within an indeterminate nodule. (A) Bright field image of an alfalfa nodule 20 days post inoculation with S. meliloti. Pink coloration from expression marks the indicated zone boundaries. (B) Scanning Electron Micrograph (SEM) of free-living S. meliloti (Image from William Margolin & Sharon R. Long). (C) Confocal image of a symbiotic plant cell found within ZIII. Symbiosomes are visualized through detection of a S. meliloti CspA2-GFP fusion and false-colored blue.

18

19

Figure 4. Bacterial RNA structure control of gene expression. (A) Cartoon of RNA structure control of transcription termination / antitermination (B) Cartoon of RNA structure control of translation initiation.

20

21

Figure 5. CspA family protein structure. (A) Protein crystal structure of Bacillus subtilis CspB with conserved aromatic amino acids involved in base stacking interactions with single stranded

RNA highlighted in magenta (PDB-3PF5) (B) bs-CSPB in complex with rU6 RNA highlighted in green (PDB-3PF5).

22

23

Figure 6. CspA role in cold shock. (A) Expression of Cold Induced Proteins (CIPS) and non-CIPS in cold shock.

24

25

Figure 7. CspA control of RNA structures. (A) Cartoon of CspA proteins role in control of RNA structures involved with antitermination. (B) Cartoon of CspA proteins role in control of RNA structures involved translation initiation.

26

27

Figure 8. CspA family expression in symbiosis. (A) Venn diagram comparing the total number of proteins uniquely identified and shared between free-living and symbiotic proteomes. (B)

Log2 fold change (symbiotic/free-living) in expression of the identified CspA family members

(Yurgel et al., unpublished).

28

29

CHAPTER 2. Sinorhizobium meliloti CspA family members mediate stability of structured

sRNAs contributing to stress adaptation and effective symbiosis with alfalfa

PREFACE

This chapter represents a manuscript prepared for submission to Cell Host Microbe. Preliminary proteomic studies, not part of this manuscript but instrumental in the initiation of the study, were performed Dr. Svetlana Yurgel who will be listed as a co-author. Protein overexpression plasmids used in this study were cloned by Jennifer Rice. An undergraduate student Xingkai Liu executed aspects of the RT-qPCR experiments presented here and carried out the CspA-GFP expression experiments under my mentorship and will also be listed as a co-author. I was responsible for all other aspects of the study. I wrote the manuscript with Dr. Kahn.

30

ABSTRACT

Rhizobia are soil bacteria that can associate with some legumes and participate in symbiotic nitrogen fixation. Bacterial CspA family members are small, single stranded nucleic acid binding proteins conserved throughout all domains of life. Differentiation of rhizobial bacteria from a free-living to symbiotic state within legume root nodules follows a massive re-programming of bacterial gene expression. Here, the role of Sinorhizobium meliloti CspA family members in symbiotic development with Medicago sativa (alfalfa) was investigated. Endogenous S. meliloti

(Sm1021) CspA2-GFP, CspA4-GFP, and CspA5-GFP tagged strains were constructed and used to report expression levels in free-living culture in response to stress and within alfalfa root nodules. Sm1021 cspA2 and cspA4 genetic deletion strains were constructed and we characterized free-living stress responsive phenotypes and a nodule maturation defect found with a cspA2 cspA4 double deletion strain. We identified RNAs that interact with CspA2, CspA4, and CspA5 by native immunoprecipitation (IP) followed by RNA sequencing. The αR14 family of small non-coding RNAs were among the most enriched RNAs in the CspA2 and CspA4 IPs as were several other sRNAs and the 16S rRNA. Whole transcriptome sequencing of the cspA2 cspA4 double deletion strain from free-living culture and alfalfa nodules identified mis- regulation of the stress responsive sigma factor, rpoE2, in addition to global destabilization of the αR14 family sRNAs in symbiosis. We propose that these proteins affect rhizobial physiology through their global control of the cellular RNA secondary structure strength environment and through specific modulation of small non-coding RNA (sRNA) structures involved in cis- regulation of stress responsive sigma factor expression. This work describes an RNA structure

31 mediated mechanism important for bacterial stress adaptation and symbiotic development within a plant host.

32

INTRODUCTION

Legumes are important crops that replenish depleted fixed nitrogen sources in soil and provide significant nutritional value to humans and livestock (Erisman et al., 2008). The impact of legumes in agricultural and natural ecosystems is derived from an ancient symbiotic relationship in which rhizobial bacteria associate with the plants to form nitrogen-fixing root nodules that can provide the fixed nitrogen needed for optimal growth (Oldroyd et al., 2011).

During nodule development, the differentiation of rhizobia from a free-living state into symbiotic, nitrogen-fixing bacteroids is associated with significant changes in bacterial gene expression that are coordinated by a complex exchange of signals between the bacteria and the plant (Jones et al., 2007; Sallet et al., 2013). released by nitrogen-stressed legume roots are recognized by rhizobia in the soil, which respond by producing lipochitooligosaccharide Nod factors that affect root growth and cell division. In the presence of

Nod factors, rhizobia infect plant root hairs and form infection threads which ultimately lead to endocytosis and colonization of the plant cytoplasm (Jones et al., 2007). This is followed by a complex developmental program leading to formation of root nodules, specialized nitrogen fixing organs where the plant supplies the bacteria with energy in the form of dicarboxylic acids, and the bacteria supply the plant with fixed nitrogen in the form of ammonia (Oldroyd et al.,

2011; Udvardi and Poole, 2013). In an indeterminate symbiotic nodule of the type found on alfalfa, nodule development involves growth of a persistent meristem, which results in the formation of distinct zones that have characteristic transcriptomes (Roux et al., 2014), proteomes, and metabolomes (Ogden et al., 2017). Zone 1 in these nodules is the meristem. In

33

Zone II (ZII) the bacteria have initiated their infection and begun to invade some of the developing plant cells. The interzone (IZ) is the region between the ZII and Zone III (ZIII) where the bacteria are beginning to transition into nitrogen-fixing bacteroids and where a microaerobic environment suitable for nitrogen fixation is established. ZIII is where active nitrogen fixation occurs. Zone IV is called the senescent zone and is a region where the tissue is breaking down.

The Cold Shock Domain (CSD) superfamily of proteins is a highly conserved set of nucleic acid binding proteins that are found throughout the Tree of Life (Lee et al., 1994). A CSD is a small, ancient, single stranded nucleic acid binding structure comprised of ~70 amino acids that form five antiparallel ß-sheets organized into a ß-barrel (Sachs et al., 2012). Two highly conserved RNA binding motifs, RNP-1 and RNP-2, are located on the 2nd and 3rd β-strands and contain several surface exposed aromatic residues interspersed with basic amino acids (Sachs et al., 2012). Bacterial CSD proteins generally contain a single CSD and form a distinct subgroup of the superfamily (Lee et al., 1994) but larger proteins with more than one CSD domain are occasionally found. CSD proteins have been generally characterized as non-specific RNA chaperones that relax inhibitory RNA secondary structures formed at low temperature

(Phadtare et al., 1999) but, of the nine identified CspA family members in E. coli (CspA-CspI), only some are induced by cold stress (Czapski and Trun, 2014). In E. coli, CspC and CspE are constitutively expressed and have been shown to regulate general stress response through influencing stability of the general stress response sigma factor, RpoS (Cohen-Or et al., 2010;

Phadtare and Inouye, 2001). CspD is induced during stationary phase and in response to nutrient starvation (Jones and Inouye, 1994). It is thought that the various CspA proteins arose

34 from genetic duplication and then evolved to respond to various specific environmental stresses (Yamanaka et al., 1998) but whether this response acts by changing total CspA abundance or through differential action on specific sequences is not well-established. CspA proteins have structures similar to 30S ribosomal protein S1 and translation initiation factors

IF1, IF2 and IF3 (Phadtare and Severinov, 2009; Phadtare et al., 2007). In E. coli, CspA proteins share similar in vitro function with S1 and the cold sensitive phenotypes of CspA mutants can be rescued by the S1 like domain of pnp, a gene that codes for polynucleotide phosphorylase (Xia et al., 2001). Cold shock expression of Csp proteins in E. coli are insensitive to kanamycin, an aminoglycoside antibiotic that binds to the 30S ribosomal subunit (Etchegaray and Inouye,

1999). A CspA homolog in Brucella melitensis is required for stress adaptation and virulence

(Wang et al., 2014, 2016). Like rhizobia, Brucella are alphaprotobacteria and many cellular mechanisms are shared between Brucella colonization / pathogenesis and rhizobial symbiosis

(Jones et al., 2007).

Hfq is a well-studied bacterial RNA chaperone that mediates global stress response by facilitating interactions between regulatory sRNAs and their mRNA targets (Updegrove et al.,

2016). In S. meliloti, Hfq deletions result in decreased stability of interacting sRNAs (Torres-

Quesada et al., 2010, 2014). The effect of Hfq on symbiosis is more ambiguous, with reports of normal symbiosis by hfq mutants (Sobrero and Valverde, 2011) or loss of symbiotic effectiveness (Barra-Bily et al., 2010; Gao et al., 2010). Interactions between Hfq protein and cspA messages have been described (Hankins et al., 2010). CspC has been shown to compliment hfq deletions in E. coli (Cohen-Or et al., 2010), suggesting that cspA may also function to

35 regulate global stress response of the cell through RNA chaperone function (Nogueira and

Springer, 2000).

S. meliloti has 8 CspA family members annotated in its , cspA1- cspA8. CspA1 was identified as being cold inducible by isolation of a luxAB reporter transposon insertion mutant, and it was subsequently found that cspA1 is part of a polycistronic operon with rplU (O’Connell and Thomashow, 2000; O’Connell et al., 2000). From proteomic studies in Sinorhizobium medicae we learned that that several CspA family members were highly induced in nitrogen- fixing root nodules compared to their expression in free-living cultures grown at nodule temperatures, suggesting these RNA binding proteins may be playing a role in symbiosis. Here we identify S. meliloti CspAs that are differently expressed in symbiosis and investigate their stress responsive expression. Furthermore, we address stress phenotypes and the symbiotic effectiveness of S. meliloti strains containing deletions of the cspA2 and cspA4 genes. In addition, we identify RNA targets of several CspA family members and define transcriptional defects associated with a cspA2 cspA4 double deletion strain and propose that CspA function is required for stabilization of specific sRNAs and for mediation of switching between alternative

RNA structures important for controlling gene expression required for effective symbiosis.

36

METHODS

Bacterial Strains, Plasmids, and Media

The bacterial strains and plasmids used are listed in Table S2. S. meliloti strains were grown on

+ Minimal Mannitol Media with ammonium (MM-NH4 ) or YMB media at 30°C or at other temperatures where indicated (Somerville and Kahn, 1983). E. coli strains were grown on LB medium (Green et al., 2012) at 37°C. Antibiotics were used at the following concentrations unless otherwise indicated: Kanamycin (40 µg/mL), Neomycin (200 µg/mL), Ampicillin (50

µg/mL).

CspA-GFP fluorescent expression assay

S. meliloti strains were grown in MMNH4 liquid broth overnight into stationary phase. Cultures were then pelleted and resuspended in MMNH4 liquid broth and diluted 1:10 into MMNH4 under the various stress conditions. Cultures were placed in shaking incubators at 15°C and

40°C to create high and low temperature stresses. 400mM NaCl was added to MMNH4 liquid broth to establish a salt stress condition and HCl or NaOH were added to MMNH4 liquid broth to bring final pH to 5 and 8, respectively. After growth for 24 hr under the various stress conditions samples were quickly centrifuged and resuspended with fresh MMNH4 and immediately placed into white bottom 96 well microplates to determine fluorescence in a

Fluostar SLT microplate reader (excitation: 485 nm / emission: 538 nm) and into a clear 96 well microplate to determine absorbance at 600 nm with a Molecular Devices SPECTRA MAX 250 microplate reader. Fluorescence readings were corrected by subtracting background and

37 normalized for cell density by dividing by the absorbance values. Experimental samples were compared to control samples grown identically under standard conditions (30°C in MMNH4 pH

6.7 for 24 hr). Log2 fold change is reported between experimental stress samples and controls.

Presented data is the average of at least 3 biological replicates for each condition tested.

Nodule fluorescence microscopy

Medicago sativa seedlings were infected with Sm1021 wild type, Sm1021 cspA2-gfp, and

Sm1021 cspA4-gfp strains and grown in sand in Magenta plant boxes at 20°C as reported previously (Yurgel et al., 2007). At 17 and 24 days post inoculation (dpi), nodules were harvested and immediately cut in half along the longitudinal axis and the sections were placed on a drop of water on a cover slip. GFP signal was imaged with an inverted fluorescent microscope (Leica DMI3000) using the 5X objective and Leica GFP filter cube. No fluorescence signal was detected from Sm1021 wild type nodules under the same excitation and camera exposure settings used to locate the GFP fusions.

Plant tests

Plant tests were performed as reported previously (Yurgel et al., 2007). Medicago sativa –

Ladak (Bruce Seed Farm, lot no: A6-008) were planted 4 per box in Magenta plant boxes and infected with Sm1021 wild type, Sm1021 ΔcspA2, Sm1021 ΔcspA4, and Sm1021 ΔcspA2 ΔcspA4 strains and grown at 20°C. 4 weeks post inoculation plant aerial tissue was harvested and placed into 40 dram vials with the caps fitted loosely. Vials were placed at 42°C for 1 week to dehydrate. Plant aerial tissue weight was determined by weighing the dried vials and

38 subtracting the weight of the empty vial. Presented data represents the average of 3 individual experiments with each experiment consisting of the average of 3 plant boxes (4 plants per box).

Generation of cspA2 and cspA4 deletion strains

Deletion strains were generated using an integration/excision double recombination procedure

(Yurgel and Kahn, 2005). Briefly, regions of approximately 500bp that were upstream and downstream of the annotated cspA gene regions were PCR amplified and ligated together.

Upstream regions ended 50 bp from the predicted transcription start site (TSS) and downstream regions began immediately following the predicted 3ʹ UTR respectively. TSS and 3ʹ

UTR were predicted from published SM2011 transcriptome data (Sallet et al., 2013).

Specifically, the cspA2 upstream shoulder region was amplified using the primers 5ʹ-

ATCGTCTAGACGACAGTTCCTCCATGTCGTCG-3ʹ and 5ʹ-

GATCGGATCCGCAATGCTGCAATTAGCTTGTG-3ʹ. The cspA2 downstream shoulder region was amplified using the primers 5ʹ- ATCGGGATCCGAGGATCTCGCATGAAGGACGTC-3ʹ and 5ʹ-

GATCAAGCTTGGTCGAAGGAAAGCGAGTCACCG-3ʹ. The two PCR products were cut with BamHI and ligated together then the entire fusion fragment was cut with EcoRI and cloned into pK19

(Schäfer et al., 1994) at the EcoRI site. The cspA4 upstream shoulder region was amplified using the primers 5ʹ- ACAGGGATCCAGCCAGACCAAGTCCTTGGG -3ʹ and 5ʹ-

CATGGAATTCACCTCGCACCTGCGGCGATGC-3ʹ. The cspA2 downstream shoulder region was amplified using the primers 5ʹ-CATGGAATTCTGGAGAGCGCCCATCCGGCCTGC -3ʹ and 5ʹ-

CATGGAATTCTGGAGAGCGCCCATCCGGCCTGC-3ʹ. The two PCR products were cut with EcoRI and ligated together then the entire fusion fragment was cut with BamHI and cloned into pK19

39 at the BamHI site. The upstream and downstream fusion fragments cloned into pK19 were transformed into E. coli S17-1 cells (Simon et al., 1983) for subsequent mating into Sm1021.

Sm1021 strains containing integrated constructs were selected on MMNH4 media plates containing 200 µg/mL neomycin sulphate to select for integration into the by homologous recombination. After purifying the recombinant, a second crossover event to excise the cspA gene was selected on YMB containing 5% sucrose, which is toxic to the strains containing the sacB gene on pK19. Deletion strains were identified by PCR using outside genomic primers and confirmed by DNA sequencing. Primers used to confirm the cspA2 deletion were 5ʹ- CGGGCGAGATTATAGAGCAC-3ʹ and 5ʹ- TTCGTCTTCACCATCGAACA-3ʹ. Primers used to confirm the cspA4 deletion were 5ʹ- GGGCATAGGGATCGATCACG-3ʹ and 5ʹ-

CGATTATGGCCTCGTACACC-3ʹ.

Generation of GFP tagged constructs and strains

Green Fluorescent Protein (GFP) tagged strains were generated by cloning cspA coding sequences missing the stop codon into pET GFP LIC (u-msfGFP), which carries an enhanced GFP

(Pédelacq et al., 2006). The pET GFP LIC cloning vector (u-msfGFP) was a gift from Scott Gradia

(Addgene # 29772). CspA CDS was amplified from genomic Sm1021 with primers including a 5ʹ LIC extension. Primers used for cspA2 were 5ʹ-

TTTAAGAAGGAGATATAGATCGTGGCTTCTGGAACGGTAAAGTGG-3ʹ and 5ʹ-

GTTGGAGGATGAGAGGATCCCGAGAGCGCGAAGGTTGTCCGCCGAG-3ʹ. Primers used for cspA4 were 5ʹ- TTTAAGAAGGAGATATAGATCATGGCCACCAAAGGCATCG-3ʹ and 5ʹ-

GTTGGAGGATGAGAGGATCCCGAGAGCCTGGAGGTTGACGG-3ʹ. Primers used for cspA5 were 5ʹ-

40

TTTAAGAAGGAGATATAGATCATGGCTGACAGGACATCTTCG-3ʹ and 5ʹ-

GTTGGAGGATGAGAGGATCCCGTGCGACCGCGTGGTCGGCGTC-3ʹ. Fragments containing cspA-

GFP fusions were cut from pET LIC GFP using the 5ʹ XbaI and 3ʹ EcoRI sites and cloned into pK19

(Schäfer et al., 1994) cut at the same sites. Fusion constructs in pK19 were transformed into

S17-1 cells (Simon et al., 1983) for subsequent mating into Sm1021. Sm1021 strains containing integrated constructs were selected on MMNH4 media plates containing 200 µg/mL neomycin sulphate and the strains were used as co-integrates.

Dilution replica plating experiments

Wild type and deletion strains were resuspended from plate cultures into YMB media and standardized by adjusting the culture densities to similar absorbance values at 600 nm. 5-fold serial dilutions were performed in a 96 well plate and aliquots were spot plated onto YMB/ agar media, YMB/ agar media containing 400 mM NaCl or 50 µg/mL kanamycin, and YMB agar media pH 5 and pH 8 as indicated. Plates were incubated at 30°C unless noted. For cold temperature stress, plates were placed at 14°C, and for heat stress, plates were placed at 42°C. Plates were imaged over a light box to generate a uniform background and quantified by densitometry using the Image-J program (Schneider et al., 2012).

Immunoprecipitation of GFP-tagged proteins

Sm1021 WT, cspA2-gfp, cspA4-gfp, and cspA5-gfp strain cultures were grown in YMB media and harvested by centrifugation at 4°C when the cultures reached mid log-phase. Bacterial cell pellets were resuspended in ice-cold native lysis buffer (10 mM Tris/Cl pH 7.5, 150 mM NaCl,

41

0.5 mM EDTA, 0.1% Triton X-100) and sonicated 2 X 15 seconds on ice. Lysates were centrifuged at 20,000 X g for 45 min at 4°C in a desktop microfuge. Supernatants were isolated and incubated with blocked agarose control beads (Chromotek Cat # bab-20) equilibrated in lysis buffer for 30 min at 4°C while rocking to pre-clear the lysates of non-specific interactions. A sample of the precleared lysate was added to Trizol reagent for subsequent RNA isolation and represents the input fraction of the immunoprecipitation (IP). Pre-cleared lysates were then incubated with equilibrated GFP-Trap®_A agarose beads (Chromotek Cat # gta-20) for 1 hr at

4°C while rocking. Beads were then washed 3X with lysis buffer and complexes were eluted with 200 mM Glycine pH 2.5 followed by rapid neutralization in 1M Tris base. A fraction of the elution was used for subsequent SDS PAGE and analysis and western blotting and fraction was added to Trizol for subsequent RNA Isolation. SDS PAGE analysis of the input, flow-through, and bound fractions was performed as described in (Byers et al., 2005). For anti-GFP western blots proteins were transferred to a PVDF membrane and blocked overnight at 4°C in Blocking Buffer

(1X PBS (10 mM Na2HP04, 1.8 mM KH2PO4, 137 mM NaCl, 2.7mM KCL) 0.1% Tween 20, 5% milk).

Blocked membranes were incubated for 2 hr with a rabbit polyclonal anti-GFP primary antibody

(Abcam ab290) at 1:10,000 dilution in Blocking Buffer followed by 3 washes for 5 min in Wash

Buffer (1X PBS with 0.1% Tween 20). Incubation with HRP conjugated goat anti-rabbit secondary antibody (Abcam ab97080) followed for 45 min at 1:50,000 dilution in Blocking

Buffer. Membranes were washed again 3 times for 5 min with Wash Buffer and blots were visualized with ECL Western Blotting Substrate (Pierce) according to manufacturer’s protocol.

RNA Isolation

42

RNA was isolated from nodules at 28 days dpi. Alfalfa plants were grown as described above and nodules were harvested. For each replicate, nodules from one plant box were pooled and collected into a 1.5 mL microfuge tube and flash frozen in liquid N2. Nodule tissue was ground for 1 min at -80°C in a QIAGEN TissueLyser at max setting. Ground samples were resuspended in QIAGEN buffer RLT and passed through a QIAGEN QIAshredder column followed by QIAGEN

RNeasy column purification using the manufacturer’s modified RNA clean-up protocol to capture RNA species smaller than 200 nucleotides. Elution from the RNeasy column was then

Trizol extracted following the manufacturer’s RNA clean-up protocol and used as the input for

RNA sequencing library sample preparation. RNA isolation from free-living samples was performed the same as input RNA from the IP.

RNA sequencing

Illumina TruSeq stranded RNA sequencing libraries were prepared according to the manufacturer’s protocol for total RNA (Illumina Cat # RS-122-2101). RNA for whole transcriptome sequencing was first treated with Ribo-Zero according to manufacturer’s protocol (Illumina Cat # MRZMB126). RNA from immunoprecipitation experiments was not depleted of ribosomal RNA before library preparation. Concentration of final libraries was determined using the Qubit dsDNA BR Assay kit (Invitrogen), and final library fragment size was validated with a Fragment Analyzer (Advanced Analytical). Single read sequencing to 100 bp length was performed at the WSU Spokane Genomics Core facility on an Illumina HiSeq 2500 machine.

43

RNA sequencing data analysis

TopHat and Cuffdiff in the Tuxedo suite of analysis programs were used to map sequenced cDNA and to define differential expression patterns of whole transcriptomes following the protocol described in Trapnell et al. (2012). IP sequencing data was mapped with TopHat and counts per gene feature were output with the samtools bedcov program. edgeR statistical analysis of count data was performed with edgeR using the RobiNA program (Lohse et al., 2012;

Robinson et al., 2010). Genomic transcript mapping coverage was visualized using the

Integrated Genome Viewer (IGV) (Robinson et al., 2011).

Ni affinity purification of His-tagged proteins

N-terminal 6X His-tagged proteins were heterologously expressed in E. coli BL21 DE3 cells from the pET-30 Ek/LIC expression vector (Novagen 69077). Expression of a 1 L culture was induced by adding 100 µM IPTG at an OD600 of 0.4 - 0.6. Cells were transferred to 18°C and allowed to express overnight. Cells were harvested by centrifugation and lysed by sonication in 1X PBS

Lysis Buffer (1X PBS (10 mM Na2HP04, 1.8 mM KH2PO4, 137 mM NaCl, 2.7mM KCL), 0.1% Triton,

20 mM imidazole, 1 mM DTT, and 0.1% PMSF from a saturated solution in 2-propanol).

Sonicated lysates were centrifuged at 20,000 g in a tabletop microfuge at 4°C. Spun lysate was incubated for 1 hr with Ni-NTA resin (Qiagen) equilibrated with Lysis Buffer (Qiagen) while rocking at 4°C and transferred to an Econo-Pac Chromatography column (Bio-Rad). The resin was washed using gravity flow with High Salt Wash Buffer (Lysis Buffer containing 1 M NaCl), followed by Low Salt Wash Buffer (Lysis Buffer with 150 mM NaCl). After the high and low salt washes, GnHCL Wash Buffer (Lysis buffer containing 6M guanidium hydrochloride (GnHCL)) was

44 added to the column for the purification of CspA proteins. Where indicated the column was capped, and the resin was allowed to incubate with GnHCL buffer for 1 hr at 4°C with gentle shaking in order to release and remove bound E. coli RNAs. After incubation, the column was uncapped and the resin was washed with Equilibration Buffer (Elution buffer with 20 mM imidazole). Proteins were eluted with Elution Buffer (25 mM HEPES, 25% glycerol, 70 mM KCL, 1 mM DTT, 0.1% PMSF, and 300 mM Imidazole). Elution fractions were analyzed by SDS PAGE and the peak fraction was desalted into HGKEDP Buffer (25 mM HEPES, 25% glycerol, 70 mM KCL, 1 mM DTT, 0.1% PMSF) with Bio-Rad P6DG according to manufacturer’s protocol and stored at -

80°C. Protein concentrations were determined with the Qubit protein assay kit (Thermo Fisher

Scientific).

EMSA assay

CspA4 protein for the assay was purified as described above. Smr14C2 RNA was generated by in vitro transcription with T7 RNA polymerase using the MEGAscript T7 transcription kit (Ambion). The template for the T7 transcription reaction was generated by PCR with Sm1021 wild type genomic DNA using the primers 5ʹ-

CATGGGATCCTAATACGACTCACTATAGGGATCGATCGGGCAGCGCACTC -3ʹ and 5ʹ-

CATGGGATCCATTTAGGTGACACTATAGAAATAAACAACGCCCGCTGGGG -3ʹ. RNA product from the

T7 transcription reactions were cleaned by Trizol (Thermo Fisher Scientific) extraction following the manufacture’s RNA clean-up protocol. EMSA binding reactions were carried out in HGKE buffer (25mM HEPES pH 7.6, 15% glycerol, 70mM KCl, 0.1mM EDTA). RNA was initially heated at 75°C for 5 min and then immediately placed on ice for 5 min before addition into the

45 reaction. The indicated concentrations of RNA and protein were incubated together at 30°C for

20 min before resolving complexes on 0.5X Tris/glycine 4% polyacrylamide native gel (Byers et al., 2005). RNA and RNA – CspA shifted complexes were imaged by staining with ethidium bromide and visualized by fluorescence on a UV light box (312 nm).

Reverse Transcription – Quantitative Polymerase Chain Reaction (RT-qPCR)

Total RNA was isolated from free-living Sm1021 strains as described above. Isolated RNA concentrations were normalized using the Qubit RNA HS Assay Kit (Invitrogen) and measured with a Qubit 2.0 Fluorometer (Invitrogen). Standardized amounts of RNA were used as input for a Superscript III Reverse transcription assay with random hexamer primers according to the manufacturer’s protocol (Invitrogen). Reverse transcription cDNA product was diluted 1:10 in nuclease-free water and 2 µl per reaction was used as template in subsequent qPCR reactions. qPCR reactions with Fast SYBR Green Master Mix (Thermo Fisher Scientific) were carried out according to the manufacturer’s recommended protocol. Primers used for 23S rRNA detection were 5ʹ-CGAAATTCCTTGTCGGGTAA-3ʹ and 5ʹ-TAGTAAAGGTGCACGGGGTC-3ʹ. Primers used for

16S rRNA detection were 5ʹ-CAGCTCGTGTCGTGAGATGT -3ʹ and 5ʹ-GTCACCACCATTGTAGCACG -

3ʹ. Primers used for Smr14C2 rRNA detection were 5ʹ-ATCGATCGGGCAGCGCACTC-3ʹ and 5ʹ-

ATAAACAACGCCCGCTGGGG-3ʹ. Applied Biosystems 7500 Fast Real-Time PCR System was used to carry out Comparative CT method (ΔΔCT) experiments. Applied Biosystems 7500 Fast Real-

Time PCR Software was used to calculate Relative Quantitation (RQ) measurements using the

23S amplicon as the endogenous control and the control IP as the reference sample. RT-qPCR

46 experiments were performed for Input as well as Bound IP samples. RQ values presented represent bound sample values normalized to input values.

47

RESULTS

Sinorhizobium meliloti has 8 CspA family members with high sequence similarity to E. coli

CspA

The Cold Shock Domain (CSD) characteristic of cspA family members is highly conserved and found across all domains of life. Sinorhizobium meliloti contains 8 proteins with high level of sequence similarity to the major cold shock protein in E. coli, CspA. In S. meliloti these are designated CspA1-CspA8 instead of the nine E. coli CspA-CspI proteins.

The S. meliloti CspA1-CspA8 and the E. coli CspA amino acid sequences were aligned using the multiple sequence alignment program Clustal Omega (Sievers et al., 2011) (Fig. 1). All S. meliloti cspA genes contain a single CSD element except cspA5, which contains two CSDs. The RNP-1 and RNP-2 motifs are nearly perfectly conserved between the two species. Phylogenetic analysis was performed using the ClustalW2 program (Larkin et al., 2007; McWilliam et al.,

2013) to generate a cladogram representing sequence similarity between S. meliloti CspA1-

CspA8 and E. coli CspA-CspI (Fig. S1B). With the exception of CspF and CspA5, which were most closely related to each other within the E. coli grouping, the E. coli CspAs and S. meliloti CspAs were more similar to Csp proteins within each species than to family members in the other species. This suggests that the family represented in each species evolved from a single ancestral CspA homolog. S. meliloti is a member of the while E. coli belongs to the gammaproteobacteria (Williams et al., 2007). Within the S. meliloti CspA family certain family members are nearly identical to each other, most likely representing a fairly recent duplication event on an evolutionary time scale (Fig. S1).

48

CspA family members were expressed within specific nodule developmental zones during symbiotic maturation and were differentially regulated in response to abiotic stress in the free-living state

Massive shifts in rhizobial gene expression occur in developing root nodules during the transition of rhizobia from free-living bacteria to mature symbiotic bacteroids. Alfalfa nodules are indeterminate, i.e. the nodule continues to grow in a linear way by elongating from the original site of infection on a growing root. As a result, a longitudinal slice from the nodule meristematic tip to its base exposes the range of developmental changes that are occurring. As nodules develop, specific zones can be identified that represent various stages in nodule maturation. Mining the symbimics database of mRNA abundance in these zones

(https://iant.toulouse.inra.fr/symbimics/), we found that expression of cspA2 and cspA5 was most abundant in ZIII and expression of cspA4 was most abundant in the IZ (Roux et al., 2014)

(Fig. S2A). We chose to investigate these family members further as their specific transcript localization suggested they were playing important roles in specific stages of nodule development. Endogenous, C-terminal, GFP translational fusion strains were constructed for

CspA2, CspA4, and CspA5 by first cloning the CspA ORF upstream of a peptide linker and GFP

CDS then recombining this into the S. meliloti using the cspA homology. These fusions leave the native promoter, 5ʹUTR, and CDS intact and add a 10 amino acid peptide linker region and GFP

CDS immediately preceding the endogenous stop codon (Fig S3). Fluorescence of the transformed strains reports on native expression of the respective CspA proteins. Imaging of the GFP signal using fluorescence microscopy localized CspA2-GFP and CspA4-GFP to the interzone at 17 dpi (Fig 2A). At 24 dpi, we found ubiquitous expression of CspA2 and CspA4 (Fig.

49

2A). We were not able to accurately visualize CspA5-GFP protein above background nodule fluorescence. The developmental time dependent localization of CspA2-GFP and CspA4-GFP to the IZ suggests that their expression is responding to specific zonal and developmental cues, and that their function is important for specific stages in nodule maturation.

To investigate what types of growing conditions might induce the expression of the various CspA-GFP proteins, a fluorescent expression assay was developed. This was used to monitor CspA expression in free-living cell culture under the influence of several environmental stresses. In order to validate the assay, normalized expression values determined from the fluorescent assay were compared to transcript abundance values extracted from the data sets presented by (Sallet et al., 2013). The relative values for transcript abundance and protein expression were very similar (Fig. S2B and S2C) and indicate that measuring CspA expression using the fluorescence of CspA-GFP fusion proteins was a good index of native CspA expression.

With baseline expression values determined for the cspA2-gfp, cspA4-gfp, and cspA5-gfp fusion strains we next determined the change in expression after a 24 h exposure to heat, cold, salt and higher or lower pH. We found that both CspA2 and CspA4 were induced in response to growth at 15°C, and were repressed in response to growth at 40°C. CspA2 and CspA4 were both upregulated by low pH and high NaCl concentrations. In all cases examined so far, the changes in expression of CspA5 were opposite those of CspA2 and CspA4, i.e. CspA5 was repressed by cold, acidity, and salt and increased by heat. The upregulation of CspA2 and CspA4 under conditions which stabilize RNA secondary structures (15°C and 400 mM NaCl), and their downregulation under conditions that destabilize global RNA secondary structure suggest that

50

CspA2 and CspA4 may be responding to compensate for changes in global RNA secondary structure stability.

∆cspA2 and ∆cspA4 deletion strains altered free-living abiotic stress phenotypes and a ∆cspA2

∆cspA4 double deletion strain delayed nodule maturation and led to less effective symbiosis with alfalfa

Individual Sm1021∆cspA2 and Sm1021∆cspA4 deletion strains were constructed as was a Sm1021∆cspA2 ∆cspA4 double deletion strain. Regions upstream and downstream of the presumed cspA transcripts were amplified and ligated together. The upstream and downstream fusion constructs were then recombined into the S. meliloti Sm1021 wild type strain at one of the regions then excised by recombination at the other region to generate the desired deletion mutation by a process equivalent to a double recombination that left a single restriction site to mark the location of the deletion (Fig. S4A). The Sm1021∆cspA2 ∆cspA4 double deletion strain was generated by recombining the ∆cspA2 deletion construct into the previously generated

∆cspA4 deletion strain. Deleted regions were designed to ensure deletion of the entire gene including 50bp upstream of the transcriptional start site and the 3ʹ UTR. This was done because work in E. coli had shown that simply altering a cspA ORF can lead to Low-temperature

Antibiotic effects of truncated CspA Expression (LACE) growth inhibition when the intact cspA promoter and 5ʹ UTR elements are fused to nonsense downstream elements containing premature translational termination signals (Hwang et al., 2012; Jiang et al., 1996). Sm2011 transcriptome data revealed that cspA2 and cspA4 coding transcripts are monocistronic and it is unlikely that deletion of these regions would disturb expression of downstream or upstream

51 genes. Generation of deletion strains was first tested by PCR amplification of the deleted regions using DNA primers flanking the shoulder regions used for recombination (Fig. S4B) and successful construction of the indicated deletions was confirmed by DNA sequencing (data not shown).

Growth of the Sm1021 ∆cspA2 and Sm1021 ∆cspA4 single deletion and the Sm1021

∆cspA2 ∆cspA4 double deletion strains were compared under stress to that of wild type 1021 by placing serial dilutions of the strains onto modified YMB media or at different temperatures

(Fig. 3A). Differences in growth of the various strains were observed by visualizing differences in colony density under each condition. There was no difference between the strains grown under non-stress conditions, e.g. growth on YMB plates at 30°C. There were no significant changes in growth on YMB plates at high temperature (42°C) or under higher pH (pH 8). However, at 14°C there was a significant decrease in growth on YMB plates with the Sm1021∆cspA2 and

Sm1021∆cspA4 single deletion strains, and an even more dramatic decrease in growth with the

Sm1021∆cspA2∆cspA4 double deletion strain compared to the wild type 1021. The Sm1021

∆cspA2 ∆cspA4 double deletion strain also has decreased growth on YMB plates containing

400mM NaCl (Fig. 7E). The deletion strains were tested for growth on YMB containing a sub- lethal concentration of kanamycin (50 µg/mL). Intriguingly, the Sm1021 ∆cspA2 single deletion strain had decreased sensitivity compared with Sm1021 wild type, while the Sm1021∆cspA4 single deletion had increased sensitivity. Growth of the Sm1021 ∆cspA2 ∆cspA4 double deletion strain was more sensitive than the Sm1021 ∆cspA4 single deletion strain. We further examined the changes in antibiotic sensitivity to several other antibiotics through additional replica plating experiments (Fig. S4A). Sm1021 ∆cspA2 was more resistant to neomycin and gentamicin

52 but the deletion had no significant effect on chloramphenicol resistance. We verified that the kanamycin/neomycin/gentamicin resistance phenotype that we observed with the Sm1021

ΔcspA2 strain was due to the loss of the cspA2 gene and not from a residual copy of the pK19 kanamycin resistance gene (nptII) left behind during deletion generation through an unusual integration event by attempting to generate a PCR product with primers located within the nptII coding sequence. This gave no signal (Fig S4B), suggesting that the nptII had not been left behind during strain generation. Growth phenotypes under cold stress and high salt stress, together with the expression data revealing an increase in expression under these conditions, strongly suggests that CspA2 and CspA4 are important in the cells’ adaptation to these stress conditions.

Deletion strains were assessed for symbiotic effectiveness by measuring the dry weight of plant aerial tissue 4 weeks after Medicago sativa (alfalfa) seedlings were inoculated with the various strains. Plants were grown in sand in double closed Magenta GA-7 boxes. In this controlled system, increased plant weight depends on an effective symbiosis. Plants in boxes that are not inoculated with Sm1021 do not have a source of fixed nitrogen and stop growing after 2 weeks. There was no detectable difference between the single Sm1021 ∆cspA2 and

Sm1021 ∆cspA4 deletion strains and wild type Sm1021 inoculated plants; however, there was a significant decrease in plant dry weight in those boxes inoculated with the

Sm1021 ∆cspA2 ∆cspA4 double deletion (Fig. 3D). To better explain the loss of symbiotic effectiveness in the Sm1021 ∆cspA2 ∆cspA4 strain, we looked at nodule development every 2 days over a 20-day time course. Images of nodules were taken under a dissecting scope at each time point and white (immature) and pink (more mature) nodules were counted. We observed

53 that at 10 dpi, plants inoculated with Sm1021 ∆cspA2 ∆cspA had similar numbers of nodules as those inoculated with the wild-type, but we observed significantly more immature nodules on the mutant as evidenced by smaller size and characteristic lack of pink coloration (Fig. 3B and

3C). This delayed nodulation phenotype may explain the loss of symbiotic effectiveness observed by decreased dry weight at 28 dpi.

CspA2 and CspA4 interacted with 16S rRNA, αR14 family sRNAs and other structured sRNAs and global mRNAs

The use of specific CspA proteins in different zones of developing root nodules suggests that there might be differences in their ability to interact with different RNA species and that these differences might be important. To examine the question of binding specificity, we profiled RNA molecules bound to different CspA-GFP proteins by native immunoprecipitation

(IP) of CspA-GFP fusions from free-living S. meliloti cultures followed by Illumina RNA sequencing. CspA2-GFP, CspA4-GFP, and CspA5-GFP fusions were successfully immunoprecipitated from log phase free-living cells with an anti-GFP single chain alpaca nanobody (Fig 4A). Success of the IP was confirmed by an anti-GFP western blot (Fig 4B). RNA was isolated from input and bound fractions of the IP and analysed by fragment analysis (Fig.

4C). Fragment analysis revealed that the quality of input RNA for all samples was high—16S and

23S rRNA molecules were visualized as sharp, non-degraded peaks in all samples. The bound fraction from the control IP, which was an IP carried out from Sm1021 WT cells without any GFP target, appeared similar to the input RNA fraction and was used to represent the background in the IP. Bound fractions from the CspA2-GFP and CspA4-GFP IP were significantly enriched for

54

16S rRNA over 23S rRNA and also contained a peak in the sRNA range (80-200nt) that was not present in the bound fraction of the control IP. Unlike CspA2-GFP and CspA4-GFP, immunoprecipitation of CspA5-GFP did not enrich either 16s rRNA or sRNA significantly.

Illumina RNA sequencing data (100 bp, single end reads) was obtained from the input and bound fractions of the native IP. Raw data was mapped to the Sm2011 genome instead of

Sm1021 because, while the two genomes are nearly identical (Sallet et al., 2013), the Sm2011 genome has a better annotation of non-coding regions. Raw count data was obtained per gene and normalized by total mapped reads in the sample. Bound IP fraction gene counts were normalized to gene counts in the input sample to correct for differences in transcript abundance before IP. These normalized gene counts were then compared between the CspA

IPs and the control IP to calculate fold enrichment. The statistical significance of observed differences was assessed with edgeR (Robinson et al., 2010). We found 19 genes to be significantly enriched with CspA2, 13 with CspA4, and 57 genes with CspA5 (Fig 4D). CspA2 and

CspA4 shared 7 genes in common, while CspA5 only shared 2 with CspA2 and 1 with CspA4. 1 gene (SMc07201) was common to all IPs. By far the most striking enrichment in the CspA2 and

CspA4 IPs was for the αR14 sRNA family (Reinkensmeier and Giegerich, 2015). αR14 family sRNAs can be folded into a unique cloverleaf-like secondary structure consisting of 3-4 stem loops, each with highly conserved 5ʹ-UCCUCCUCCC-3ʹ repeats in all of the loop regions (Fig.

S6A). The enrichment for these RNAs in the context of the entire data set is visualized in Fig. 4E.

CspA5 did not enrich the αR14 sRNAs but these RNAs represent 6 of the genes shared between

CspA2 and CspA4 (Fig. 4D). The enrichments of 16S rRNA and of the αR14 family sRNA member

Smr14C2 were verified by repeating the IPs and the abundance of these genes was directly

55 compared using qRT-PCR (Fig. 4F). We chose to further characterize the CspA4 – Smr14C2 interaction by reconstituting the complex in vitro by electrophoretic mobility shift assay (EMSA)

(Fig. 4G). It is relevant that we found that the His-tagged CspA4 eluted from the Ni-affinity column with large amounts of background RNA from its heterologous expression and purification in E. coli, despite extensive washes with high salt (1 M NaCl). We resolved this issue by including a 6M GnHCL denaturing wash step in the purification to remove RNA, followed by on-column renaturation. His-tagged RibE is not known to bind RNA and was used in parallel in the RNA binding experiments as a negative control. RibE was not subjected to GnHCl treatment since it did not purify with any detectable level of RNA. To verify that the observed CspA4 –

Smr14C2 shifted complex was not an artifact of the His-tag, we digested the protein with enterokinase, which cuts immediately upstream of the CspA4 CDS to generate a tagless, native protein. The native CspA4 protein still complexed with Smr14C2, suggesting that the tag was not influencing binding (Fig S6B and S6C).

We looked at the contribution various types of RNAs made to the total of the bound IP fractions (Fig. S7A). It is apparent that CspA2 and CspA4 IPs were enriched for global mRNA,

16sRNA, but not for 23S rRNA. edgeR statistical analysis was designed for identifying differential gene expression from whole transcriptome data and thus assumes in its data set normalization parameters that most genes between samples are not different. Using edgeR helped us to identify those genes in our IPs that were most significantly enriched but the program probably underestimated the significance of the global mRNA enrichment by CspA2 and CspA4 because of its normalizing assumptions. An arbitrary cut-off value of 5-fold is commonly used to represent significant enrichment in IP data. Global mRNA was approximately 6-fold enriched in

56 the CspA2 and CspA4 IPs. Taken together; the strong interaction with 16S rRNA, global mRNA,

αR14 family sRNAs that mimic the 16S rRNA 3ʹ end anti-Shine-Dalgarno sequence, as well as the lack of enrichment of 23S rRNA, suggests that CspA2 and CspA4 are involved in mediating RNA structure interactions involved in global translation initiation.

The RpoE2 regulon was upregulated in the Sm1021 ∆cspA2 ∆cspA transcriptome, as was

SMc06778, which is located upstream of rpoE2 and encodes an sRNA that interacts with

CspA2.

Whole transcriptome sequencing was performed from Sm1021 WT and Sm1021 ΔcspA2

ΔcspA4 strains from free-living culture and symbiotic nodules. RNA was isolated from log-phase free-living cultures grown in YMB and from alfalfa nodules harvested 28 days post inoculation

(Fig 5A). Illumina strand-specific RNA sequencing was performed with the isolated RNA and the identified RNA sequences were mapped to the Sm2011 genome with the TopHat program. As with the IP sequencing data, mapping of this information to the Sm2011 genome reveals more detailed information about the expression of the sRNAs and non-coding regions. Mapped RNA sequences were analyzed using Cuffdiff to define statistically significant differential expression

(Wang et al., 2016). The gene that was most highly upregulated in the free-living transcriptome of Sm1021 ∆cspA2 ∆cspA was an sRNA, SMc06778, which was also identified by IP as one of the

RNAs most enriched by binding to CspA2 (Fig. 5B). SMc06778 sRNA is predicted to have a secondary structure similar to those of the αR14 family sRNAs, with a clover-leaf like hairpin configuration that includes pyrimidine-rich loop regions (Fig. 5E). SMc06778 is located transcriptionally upstream of the rpoE2 gene, which codes for the ECF family sigma factor

57

RpoE2 (Fig 5D). rpoE2 was also among the genes that were most significantly induced by the

Sm1021 ∆cspA2 ∆cspA deletions (Fig. 5D). We identified 215 statistically significant upregulated coding genes in the free-living Sm1021 ΔcspA2 ΔcspA4 strain transcriptome, and among these,

41 had been identified previously as being part of the RpoE2 regulon (Schlüter et al., 2013) (Fig.

5C). This suggests that CspA2 and CspA4 are involved in coordinating the general stress response of free-living S. meliloti through interaction with an sRNA structure upstream of the general stress responsive rpoE2 sigma factor gene. Interestingly, SMc06778 appears to be destabilized in the symbiotic transcriptome (Fig. 5D), suggesting that CspA2 and CspA4 don’t specifically stabilize or destabilize this sRNA, but instead mediate its structure in response to stress and other factors.

αR14 family sRNAs were destabilized in the Sm1021 cspA2 cspA4 symbiotic transcriptome, including one family member located within the 5ʹ UTR of rpoE8

From whole transcriptome sequencing of S. meliloti Sm1021 WT and Sm1021 ΔcspA2

ΔcspA4 inoculated alfalfa nodules, we identified 27 upregulated genes and 138 downregulated genes (Fig. 6B). Most interestingly, we observed a decrease in abundance of 6 out of 7 αR14 family sRNAs (Fig. 6A and 6C, Fig. S7B) in the symbiotic transcriptome of Sm1021 ΔcspA2

ΔcspA4. Of particular interest was the destabilization of the αR14 family sRNA, Smr14B, which is located in the 5ʹ UTR of the rpoE8 gene, coding for the stress responsive sigma factor RpoE8 (Fig

6C). The rpoE8 coding sequence appears to be specifically expressed in symbiosis, as few reads map to the coding region of the gene in the free-living transcriptomes. Smr14B was expressed well in both free-living and symbiotic transcriptomes of Sm1021. It appears from in silico

58 prediction and from global transcriptional start site mapping in (Schlüter et al., 2013), that the

Smr14B promoter is the only transcriptional start site available to drive expression of the RpoE8

CDS. This suggests that transcriptional termination/ antitermination at the intrinsic terminator element within Smr14B controls expression of RpoE8 in symbiosis. The strong interaction observed in the co-IP of Smr14B with CspA2 and CspA4, the well characterized role of CspA family proteins in E. coli as regulators of termination and antitermination, and the destabilization of Smr14B in the Sm1021 ΔcspA2 ΔcspA4 symbiotic transcriptome, combine to strongly support a role for CspA2 and CspA4 in controlling transcription of rpoE8 in symbiosis through mediating stability of the intrinsic transcription terminator element within Smr14B upstream of rpoE8.

59

DISCUSSION

The CspA proteins are a highly conserved and ubiquitous set of nucleic acid binding proteins that interact with structured RNA molecules and mediate various processes that depend on RNA structure, including transcription termination, the initiation of translation and mRNA stability. Initial characterization of the CspA protein in E. coli identified it as highly induced during cold stress and important for adaptation to low temperature. Since this initial discovery it has become clear that CspA, as well as other members of the CspA family, are also highly expressed under physiological conditions and have important functions in addition to mediating the cold shock response (Czapski and Trun, 2014; Derman et al., 2015). More recently a role has emerged for CspAs as global stress regulators important for bacterial adaptation and survival within eukaryotic hosts. Deletion of a single cspA gene in Brucella results in loss of virulence and deletions of cspA genes in Listeria result in loss of host cell invasiveness (Loepfe et al., 2010; Wang et al., 2014).

The rhizobia-legume symbiosis within indeterminate root nodules is a unique system to investigate roles of CspA family proteins because the developing nodule contains a moving gradient of cellular differentiation. Indeterminate root nodules, like those formed in the interaction of S. meliloti and alfalfa, are a highly structured environment, with well-defined domains within the plant organ that have different physiological conditions and contain different stages of bacterial differentiation/adaptation within organelles located in the infected plant cell cytoplasm (Vasse et al., 1990). S. meliloti has a small family of CspA proteins, CspA1-CspA8, some of which are highly expressed during development.

60

We report here that S. meliloti CspAs were differentially expressed during nodule maturation, with CspA2 and CspA4 being most prominent in the narrow interzone (IZ) region at 17 dpi and then located throughout the nodule zones at 24 dpi (Fig 2A). The interzone is the nodule region where the bacteria are in the process of transitioning from an environment where there is enough oxygen to support aerobic metabolism to a microaerobic environment where the oxygen concentration is low enough to stabilize the nitrogenase enzyme but enough ATP and reductant can be generated to support nitrogen fixation. The patterned response of CspA protein expression within nodules suggests that RNA binding protein activity is altered to adapt to specific conditions as the tissue develops. This adaptation affects establishment of a fully effective symbiosis since removing CspA2 and CspA4 by mutation delays nodule development and results in less biomass accumulation. Biomass accumulation is an index of a more successful symbiotic interaction under conditions where bacterial nitrogen fixation is the only source of nitrogen that can be assimilated.

Expression of cspA2 and cspA4 in the IZ leads naturally to the question of what conditions might be inducing these genes. Measuring the level of fluorescence from CspA2-GFP and CspA4-GFP fusion proteins was used to demonstrate that lower temperature, low pH and high salt concentrations led to greater amounts of these proteins, while higher temperature lowered the amount of protein (Fig 2B). CspA5-GFP responded inversely to these stresses, declining when CspA2/CspA4 increased and increasing when they declined. The correlated expression seen between CspA2 and CspA4 under these stress conditions does not mean that the proteins have identical function. This can be seen most dramatically in experiments testing the susceptibility of cspA2 and cspA4 mutants to antibiotics (Fig 3A, Fig S5A) where deletion of

61 cspA2 led to a striking increase in resistance to the aminoglycoside antibiotics kanamycin, neomycin and gentamicin, but deletion of cspA4 increased sensitivity. Deletion of both genes produced a strain that was more sensitive to these antibiotics than was either single mutant or the parent strain. The differences in antibiotic sensitivity phenotypes caused by deletion were not a general result of exposing the mutants to inhibitors of protein synthesis since chloramphenicol, which targets the 50S ribosomal subunit, does not have these effects on the mutants (Fig S5A). As will be discussed below, the differences may be related to different affinities for RNA structures revealed by CspA-GFP immunoprecipitation and the likelihood that

CspA proteins interact with the structure of the 16S RNA, a core component of the 30S ribosome that the aminoglycoside antibiotics interact with. Data supporting this idea is presented in Fig S6C and Fig S6D, which show that when 6XHis-tagged S. meliloti CspA1, CspA2 and CspA4 proteins were overexpressed in E. coli they bind to significant levels of 16S rRNA that was not fully processed. It is also possible that the mutations affect accumulation of the antibiotics in the cell as a result of interaction with mRNAs that specify transporters or other determinants of cell permeability.

Specific target selectivity of CspA proteins has been proposed, with several groups reporting sequence specific interactions for E. coli CspA family members. E. coli CspB, CspC, and

CspE were shown by a SELEX approach to have such specificity (Phadtare and Inouye, 1999) and

Bacillus subtilis CspA homologs were shown to interact preferentially with pyrimidine rich sequences (Max et al., 2006; Morgan et al., 2007; Sachs et al., 2012). The growth inhibiting

LACE effect observed with cspA mutants in E. coli, which is caused by expression of cspA mRNA in the absence of CspA, can be reversed by exogenously expressed CspA, suggesting that CspA

62 interacts specifically with its own mRNA in a way that other Csp-family proteins cannot.

However, there is also evidence of functional overlap as generating a cold sensitive mutant in E. coli requires removing 4 cspA family genes. This is comparable to our finding that S. meliloti

∆cspA2 and ∆cspA4 single deletion mutations did not obviously affect symbiosis but the S. meliloti ∆cspA2 ∆cspA4 double deletion mutant was less effective (Fig 3D). We also observed that the double deletion mutant had more severe cold sensitive phenotype than the single mutation strains (Fig 3A). Our findings, taken together with others, suggests that individual

CspA family members possess both specific and overlapping functions.

Direct evidence for both specific and overlapping activities was obtained by comparing

RNA that was immunoprecipitated in association with CspA2-GFP, CspA4-GFP or CspA5-GFP (Fig

4D). While the criteria for including points in the Venn diagram of Fig 4D were fairly stringent, the analysis indicates that different CspA proteins significantly enrich different populations of

RNA molecules. This is particularly true comparing the CspA2 and CspA4 IPs with the CspA5 IP.

This implies that modulating expression of different CspA proteins can have differential effects on specific RNAs.

The overlap in binding specificity by CspA2 and CspA4 consists of six members of the

αR14 family of sRNA molecules. The αR14 family is characterized by their unique clover leaf like structure and anti Shine-Dalgarno loop repeats (del Val et al., 2012). Smr14C2 was the first

αR14 sRNA described in S. meliloti, and was identified through a genome wide screen as an abundant sRNA upregulated in symbiosis (del Val et al., 2007). Robledo et al. (2017), have recently examined the genetics and physiology of Smr14C2, which they renamed NfeR1, because of the nodule formation efficiency defect phenotype of a deletion mutant they

63 constructed. Deletion of nfeR1 leads to salt sensitivity of the resulting mutant and affects various symbiotic properties, including competitiveness, infectivity, nodule development and symbiotic efficiency. Interestingly, NfeR1 expression is up-regulated in response to high NaCl levels, as we report here for CspA2 and CspA4 mRNAs. Robledo et al. report that NfeR1 was strongly expressed throughout the nodules but we note that Roux et al showed strong zone- specific enrichment of this sRNA in the interzone region in their laser capture dissection study of alfalfa nodules (Fig. S8). Robledo et al. argue that complementarity between NfeR1 and various mRNAs is important to its function and their bioinformatic analysis with the CopraRNA web-tool (Wright et al., 2013, 2014), showed that sequences involved in initiating translation of mRNAs encoding transporters were especially likely to be involved. NfeR1 function may be affected by the deletions in Sm1021 cspA2 cspA4 and could contribute to the similar free-living phenotypes and partial symbiotic defect we report here, however in the Sm1021 ΔcspA2

ΔcspA4 strain symbiotic transcriptome we find all members of the αR14 family to be less abundant except for NfeR1 (Fig. S7B). This suggests that the similarities in observed phenotypes between the two strains either is related to a loss of function in the ΔcspA2 ΔcspA4 strain without loss of stability of the sRNA. If these interactions require opening the NfeR1 secondary structure to promote that interaction, then the CspA2 and CspA4 proteins are likely to be involved in mediating NfeR1 activity and expression of NfeR1 is necessary but not sufficient for the proposed NfeR1 interaction with the expression of these genes. Another possibility is that

αR14 family members function redundantly, thus the significant destabilization of the other

αR14 family members that we detect in the ΔcspA2 ΔcspA4 strain is responsible for the similar phenotypes observed between this strain and the nfeR1 single deletion.

64

Combining the data from immunoprecipitation and the changes in gene expression as the result of mutation leads to the conclusion that CspA2 and CspA4 are directly involved in regulating specific genes. In E. coli CspA family homologs CspC and CspE stabilize the rpoS transcript through interaction with the 5ʹ UTR and are involved in the complex coordination of the general stress response (Cohen-Or et al., 2010; Phadtare and Inouye, 2001). S. meliloti does not have a close homolog to RpoS, and it has been suggested that RpoE2 takes its place in activating genes responsible for the general stress response (Sauviac et al., 2007). Multiple levels of regulation on RpoE2 activity have been described in S. meliloti. Two paralogous anti- sigma factors, rsiA1 and rsiA2, negatively regulate RpoE2 post transcriptionally, while two paralogous anti-anti-sigma factors, RsiB1 and RsiB2, relieve this negative control (Bastiat et al.,

2010). We found that SMc06778, a non-coding sRNA of unknown function with structural similarity to αR14 family sRNAs, was among the RNAs most enriched in the CspA2 IP. This RNA is located immediately upstream of the rsiA1-rpoE2 operon and led us to postulate that control of antitermination at this site may contribute to control of RpoE2 levels. A role in antitermination has been well described for CspC and CspE E. coli (Bae et al., 2000; Phadtare and Severinov, 2005; Phadtare et al., 2002, 2003). A similar role for CspA2 in regulating rpoE2 transcription through regulation of termination / antitermination of an upstream sRNA was reinforced by the observation that the region downstream of SMc06778 and extending to the

RpoE2 coding region was the most up-regulated RNA in the transcriptome of the Sm1021

ΔcspA2 ΔcspA4 mutant. SMc06778 RNA was significantly less abundant in the symbiotic transcriptome of Sm1021 ΔcspA2 ΔcspA4, suggesting that SMc06778 expression was lower in the mutant or that the sRNA was destabilized. These very different effects of mutating CspA2

65 suggest that it plays a role in mediating the dynamics of SMc06678 in response to the environment through interaction with Smc06778.

In symbiosis, it appears that a similar mechanism was operating in regulating another stress sigma factor, RpoE8. In the 5ʹ UTR of the rpoE8 transcript we observed that an αR14 sRNA family member, Smr14B, was among the most enriched RNAs in both the CspA2 and

CspA4 IPs. In-silico and global transcriptional start site mapping identified the promoter of

Smr14B as the closest upstream region capable of transcribing rpoE8, and, if this is the case, antitermination must occur within Smr14B region in order to change rpoE8 gene expression.

The strong interaction between CspA2 and CspA4 with Smr14B and the lower level of Smr14B in the Sm1021 ΔcspA2 ΔcspA4 mutant suggests that Smr14B was destabilized in the Sm1021

ΔcspA2 ΔcspA4 strain and that CspA2 and CspA4 were involved with this. We propose that

CspAs have a role as master regulators of stress responses important for free-living and symbiotic stress adaptation through mediation of sRNA structure dynamics controlling expression of alternative sigma factor genes (Fig. 7A). This mechanism adds an additional level of control to an already complex regulatory circuit governing general stress adaptation in rhizobia.

In addition to the role we suggest above for CspA2 and CspA4 in mediating cis effects of

αR14 sRNAs, it is unclear what the role of CspA2 and CspA4 might be in mediating trans αR14 sRNA - mRNA target interactions (Fig. 7A). The RNA chaperone Hfq is involved in mediating most sRNA-mRNA target interactions currently characterized in bacteria (Updegrove et al.,

2016). However, the αR14 sRNA NfeR1 is thought to be an Hfq independent sRNA (Robledo et al., 2017). CspA2 and CspA4 may play an “Hfq-like” role in chaperoning αR14 sRNAs with their

66 mRNA targets. Similarly to the destabilization of αR14 sRNAs observed in the Sm1021 ΔcspA2

ΔcspA4 strain, Hfq dependent sRNAs AbcR1 and AbcR2 are destabilized in an hfq deletion strain background (Torres-Quesada et al., 2013). An alternative hypothesis is that αR14 sRNAs function as mimics of the 3ʹ end of the 16S rRNA, locally titrating CspA concentration and competing with the 16S rRNA in binding to mRNA molecules. This sort of mechanism is well characterized for sRNAs in E. coli that possess multiple loop repeats such as CsrC and CsrB

(Majdalani et al., 2005).

RNA structure makes an important contribution to RNA function and changes in RNA structure can lead to changes in function. CspA proteins are thought to globally affect RNA secondary strength by very weakly associating with RNA and disrupting base stacking interactions. In models for E. coli CspA activity during cold shock, one idea is that stronger base pairing at lower temperature converts accessible sequences in single stranded regions into less accessible double stranded configurations and that CspA reverses this by increasing the fraction of these molecules that are single-stranded. From the perspective of the RNA molecules, increasing CspA concentration decreases the stability of the dsRNA without raising the temperature. In a model system we have developed.

More generally, a cell can express CspA proteins to compensate for changes in RNA secondary structure strength caused by changes in the environment that stabilize dsRNA. In addition to temperature, these changes might include changes in the environment due to changes in ions, osmolytes or the presence of other RNA binding proteins. This represents a general or global function of CspAs. The data showing specificity of RNA binding by different

CspA proteins adds to this picture by suggesting that the cell can raise the effective

67

“temperature” seen by an individual RNA molecule by raising the level of a CspA that binds to the RNA. Thus, we hypothesize that rhizobia use the general CspA function of RNA binding to alter RNA accessibility but also use more sequence-specific functions by inducing individual

CspAs to regulate specific RNA structures, in response to symbiotic developmental cues and to control expression of relevant genes important for the effective establishment of symbiotic nitrogen fixation.

We conclude that, in addition to the global role of the S. meliloti CspA family of proteins in maintaining physiological RNA structure homeostasis in response to environmental stress, they can contribute to the cellular environment defining RNA structure dynamics of specific messages and alter transcription termination/ antitermination, translation initiation, and RNA stability/ turnover of specific genes necessary for stress adaptation. Furthermore, we conclude that S. meliloti has evolved to use this mechanism to establish effective symbiosis with plant hosts through coordinated stress adaptation defining its transition between free-living and symbiotic states.

68

REFERENCES

Bae, W., Xia, B., Inouye, M., and Severinov, K. (2000). Escherichia coli CspA-family RNA chaperones are transcription antiterminators. Proc. Natl. Acad. Sci. U. S. A. 97, 7784–7789.

Barra-Bily, L., Pandey, S.P., Trautwetter, A., Blanco, C., and Walker, G.C. (2010). The Sinorhizobium meliloti RNA chaperone Hfq mediates symbiosis of S. meliloti and alfalfa. J. Bacteriol. 192, 1710–1718.

Bastiat, B., Sauviac, L., and Bruand, C. (2010). Dual control of Sinorhizobium meliloti RpoE2 sigma factor activity by two PhyR-type two-component response regulators. J. Bacteriol. 192, 2255–2265.

Byers, S.A., Price, J.P., Cooper, J.J., Li, Q., and Price, D.H. (2005). HEXIM2, a HEXIM1-related protein, regulates positive transcription elongation factor b through association with 7SK. J. Biol. Chem. 280, 16360–16367.

Cohen-Or, I., Shenhar, Y., Biran, D., and Ron, E.Z. (2010). CspC regulates rpoS transcript levels and complements hfq deletions. Res. Microbiol. 161, 694–700.

Czapski, T.R., and Trun, N. (2014). Expression of csp genes in E. coli K-12 in defined rich and defined minimal media during normal growth, and after cold-shock. Gene 547, 91–97.

Derman, Y., Söderholm, H., Lindström, M., and Korkeala, H. (2015). Role of csp genes in NaCl, pH, and ethanol stress response and motility in Clostridium botulinum ATCC 3502. Food Microbiol. 46, 463–470.

Erisman, J.W., Sutton, M.A., Galloway, J., Klimont, Z., and Winiwarter, W. (2008). How a century of ammonia synthesis changed the world. Nat. Geosci. 1, 636–639.

Etchegaray, J.P., and Inouye, M. (1999). CspA, CspB, and CspG, major cold shock proteins of Escherichia coli, are induced at low temperature under conditions that completely block protein synthesis. J. Bacteriol. 181, 1827–1830.

Gao, M., Barnett, M.J., Long, S.R., and Teplitski, M. (2010). Role of the Sinorhizobium meliloti global regulator Hfq in gene regulation and symbiosis. Mol. Plant-Microbe Interact. MPMI 23, 355–365.

Green, M.R., Sambrook, J., and Sambrook, J. (2012). Molecular cloning: a laboratory manual (Cold Spring Harbor, N.Y: Cold Spring Harbor Laboratory Press).

Hankins, J.S., Denroche, H., and Mackie, G.A. (2010). Interactions of the RNA-binding protein Hfq with cspA mRNA, encoding the major cold shock protein. J. Bacteriol. 192, 2482–2490.

69

Hwang, J., Lee, K., Phadtare, S., and Inouye, M. (2012). Identification of two DNA helicases UvrD and DinG as suppressors for lethality caused by mutant cspA mRNAs. J. Mol. Microbiol. Biotechnol. 22, 135–146.

Jiang, W., Fang, L., and Inouye, M. (1996). Complete growth inhibition of Escherichia coli by ribosome trapping with truncated cspA mRNA at low temperature. Genes Cells Devoted Mol. Cell. Mech. 1, 965–976.

Jones, P.G., and Inouye, M. (1994). The cold-shock response--a hot topic. Mol. Microbiol. 11, 811–818.

Jones, K.M., Kobayashi, H., Davies, B.W., Taga, M.E., and Walker, G.C. (2007). How rhizobial symbionts invade plants: the Sinorhizobium-Medicago model. Nat. Rev. Microbiol. 5, 619–633.

Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H., Valentin, F., Wallace, I.M., Wilm, A., Lopez, R., et al. (2007). Clustal W and Clustal X version 2.0. Bioinforma. Oxf. Engl. 23, 2947–2948.

Lee, S.J., Xie, A., Jiang, W., Etchegaray, J.P., Jones, P.G., and Inouye, M. (1994). Family of the major cold-shock protein, CspA (CS7.4), of Escherichia coli, whose members show a high sequence similarity with the eukaryotic Y-box binding proteins. Mol. Microbiol. 11, 833–839.

Loepfe, C., Raimann, E., Stephan, R., and Tasara, T. (2010). Reduced host cell invasiveness and oxidative stress tolerance in double and triple csp gene family deletion mutants of Listeria monocytogenes. Foodborne Pathog. Dis. 7, 775–783.

Lohse, M., Bolger, A.M., Nagel, A., Fernie, A.R., Lunn, J.E., Stitt, M., and Usadel, B. (2012). RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res. 40, W622-627.

Majdalani, N., Vanderpool, C.K., and Gottesman, S. (2005). Bacterial small RNA regulators. Crit. Rev. Biochem. Mol. Biol. 40, 93–113.

Max, K.E.A., Zeeb, M., Bienert, R., Balbach, J., and Heinemann, U. (2006). T-rich DNA single strands bind to a preformed site on the bacterial cold shock protein Bs-CspB. J. Mol. Biol. 360, 702–714.

McWilliam, H., Li, W., Uludag, M., Squizzato, S., Park, Y.M., Buso, N., Cowley, A.P., and Lopez, R. (2013). Analysis Tool Web Services from the EMBL-EBI. Nucleic Acids Res. 41, W597-600.

Morgan, H.P., Estibeiro, P., Wear, M.A., Max, K.E.A., Heinemann, U., Cubeddu, L., Gallagher, M.P., Sadler, P.J., and Walkinshaw, M.D. (2007). Sequence specificity of single-stranded DNA- binding proteins: a novel DNA microarray approach. Nucleic Acids Res. 35, e75.

Nogueira, T., and Springer, M. (2000). Post-transcriptional control by global regulators of gene expression in bacteria. Curr. Opin. Microbiol. 3, 154–158.

70

O’Connell, K.P., and Thomashow, M.F. (2000). Transcriptional organization and regulation of a polycistronic cold shock operon in Sinorhizobium meliloti RM1021 encoding homologs of the Escherichia coli major cold shock gene cspA and ribosomal protein gene rpsU. Appl. Environ. Microbiol. 66, 392–400.

O’Connell, K.P., Gustafson, A.M., Lehmann, M.D., and Thomashow, M.F. (2000). Identification of cold shock gene loci in Sinorhizobium meliloti by using a luxAB reporter transposon. Appl. Environ. Microbiol. 66, 401–405.

Oldroyd, G.E.D., Murray, J.D., Poole, P.S., and Downie, J.A. (2011). The rules of engagement in the legume-rhizobial symbiosis. Annu. Rev. Genet. 45, 119–144.

Pédelacq, J.-D., Cabantous, S., Tran, T., Terwilliger, T.C., and Waldo, G.S. (2006). Engineering and characterization of a superfolder green fluorescent protein. Nat. Biotechnol. 24, 79–88.

Phadtare, S., and Inouye, M. (1999). Sequence-selective interactions with RNA by CspB, CspC and CspE, members of the CspA family of Escherichia coli. Mol. Microbiol. 33, 1004–1014.

Phadtare, S., and Inouye, M. (2001). Role of CspC and CspE in regulation of expression of RpoS and UspA, the stress response proteins in Escherichia coli. J. Bacteriol. 183, 1205–1214.

Phadtare, S., and Severinov, K. (2005). Nucleic acid melting by Escherichia coli CspE. Nucleic Acids Res. 33, 5583–5590.

Phadtare, S., and Severinov, K. (2009). Comparative analysis of changes in gene expression due to RNA melting activities of translation initiation factor IF1 and a cold shock protein of the CspA family. Genes Cells Devoted Mol. Cell. Mech. 14, 1227–1239.

Phadtare, S., Alsina, J., and Inouye, M. (1999). Cold-shock response and cold-shock proteins. Curr. Opin. Microbiol. 2, 175–180.

Phadtare, S., Inouye, M., and Severinov, K. (2002). The nucleic acid melting activity of Escherichia coli CspE is critical for transcription antitermination and cold acclimation of cells. J. Biol. Chem. 277, 7239–7245.

Phadtare, S., Severinov, K., and Inouye, M. (2003). Assay of transcription antitermination by proteins of the CspA family. Methods Enzymol. 371, 460–471.

Phadtare, S., Kazakov, T., Bubunenko, M., Court, D.L., Pestova, T., and Severinov, K. (2007). Transcription antitermination by translation initiation factor IF1. J. Bacteriol. 189, 4087–4093.

Reinkensmeier, J., and Giegerich, R. (2015). Thermodynamic matchers for the construction of the cuckoo RNA family. RNA Biol. 12, 197–207.

Robinson, J.T., Thorvaldsdóttir, H., Winckler, W., Guttman, M., Lander, E.S., Getz, G., and Mesirov, J.P. (2011). Integrative genomics viewer. Nat. Biotechnol. 29, 24–26.

71

Robinson, M.D., McCarthy, D.J., and Smyth, G.K. (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinforma. Oxf. Engl. 26, 139– 140.

Robledo, M., Peregrina, A., Millán, V., García-Tomsig, N.I., Torres-Quesada, O., Mateos, P.F., Becker, A., and Jiménez-Zurdo, J.I. (2017). A conserved α-proteobacterial small RNA contributes to osmoadaptation and symbiotic efficiency of rhizobia on legume roots. Environ. Microbiol.

Roux, B., Rodde, N., Jardinaud, M.-F., Timmers, T., Sauviac, L., Cottret, L., Carrère, S., Sallet, E., Courcelle, E., Moreau, S., et al. (2014). An integrated analysis of plant and bacterial gene expression in symbiotic root nodules using laser-capture microdissection coupled to RNA sequencing. Plant J. Cell Mol. Biol. 77, 817–837.

Sachs, R., Max, K.E.A., Heinemann, U., and Balbach, J. (2012). RNA single strands bind to a conserved surface of the major cold shock protein in crystals and solution. RNA N. Y. N 18, 65– 76.

Sallet, E., Roux, B., Sauviac, L., Jardinaud, M.-F., Carrère, S., Faraut, T., de Carvalho-Niebel, F., Gouzy, J., Gamas, P., Capela, D., et al. (2013). Next-generation annotation of prokaryotic genomes with EuGene-P: application to Sinorhizobium meliloti 2011. DNA Res. Int. J. Rapid Publ. Rep. Genes Genomes 20, 339–354.

Sauviac, L., Philippe, H., Phok, K., and Bruand, C. (2007). An extracytoplasmic function sigma factor acts as a general stress response regulator in Sinorhizobium meliloti. J. Bacteriol. 189, 4204–4216.

Schäfer, A., Tauch, A., Jäger, W., Kalinowski, J., Thierbach, G., and Pühler, A. (1994). Small mobilizable multi-purpose cloning vectors derived from the Escherichia coli plasmids pK18 and pK19: selection of defined deletions in the chromosome of Corynebacterium glutamicum. Gene 145, 69–73.

Schlüter, J.-P., Reinkensmeier, J., Barnett, M.J., Lang, C., Krol, E., Giegerich, R., Long, S.R., and Becker, A. (2013). Global mapping of transcription start sites and promoter motifs in the symbiotic α-proteobacterium Sinorhizobium meliloti 1021. BMC Genomics 14, 156.

Schneider, C.A., Rasband, W.S., and Eliceiri, K.W. (2012). NIH Image to ImageJ: 25 years of image analysis. Nat. Methods 9, 671–675.

Sievers, F., Wilm, A., Dineen, D., Gibson, T.J., Karplus, K., Li, W., Lopez, R., McWilliam, H., Remmert, M., Söding, J., et al. (2011). Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol. Syst. Biol. 7, 539.

Simon, R., Priefer, U., and Pühler, A. (1983). A Broad Host Range Mobilization System for In Vivo Genetic Engineering: Transposon Mutagenesis in Gram Negative Bacteria. Bio/Technology 1, 784–791.

72

Sobrero, P., and Valverde, C. (2011). Evidences of autoregulation of hfq expression in Sinorhizobium meliloti strain 2011. Arch. Microbiol. 193, 629–639.

Somerville, J.E., and Kahn, M.L. (1983). Cloning of the glutamine synthetase I gene from Rhizobium meliloti. J. Bacteriol. 156, 168–176.

Torres-Quesada, O., Oruezabal, R.I., Peregrina, A., Jofré, E., Lloret, J., Rivilla, R., Toro, N., and Jiménez-Zurdo, J.I. (2010). The Sinorhizobium meliloti RNA chaperone Hfq influences central carbon metabolism and the symbiotic interaction with alfalfa. BMC Microbiol. 10, 71.

Torres-Quesada, O., Millán, V., Nisa-Martínez, R., Bardou, F., Crespi, M., Toro, N., and Jiménez- Zurdo, J.I. (2013). Independent activity of the homologous small regulatory RNAs AbcR1 and AbcR2 in the legume symbiont Sinorhizobium meliloti. PloS One 8, e68147.

Torres-Quesada, O., Reinkensmeier, J., Schlüter, J.-P., Robledo, M., Peregrina, A., Giegerich, R., Toro, N., Becker, A., and Jiménez-Zurdo, J.I. (2014). Genome-wide profiling of Hfq-binding RNAs uncovers extensive post-transcriptional rewiring of major stress response and symbiotic regulons in Sinorhizobium meliloti. RNA Biol. 11, 563–579.

Trapnell, C., Roberts, A., Goff, L., Pertea, G., Kim, D., Kelley, D.R., Pimentel, H., Salzberg, S.L., Rinn, J.L., and Pachter, L. (2012). Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578.

Udvardi, M., and Poole, P.S. (2013). Transport and metabolism in legume-rhizobia symbioses. Annu. Rev. Plant Biol. 64, 781–805.

Updegrove, T.B., Zhang, A., and Storz, G. (2016). Hfq: the flexible RNA matchmaker. Curr. Opin. Microbiol. 30, 133–138. del Val, C., Rivas, E., Torres-Quesada, O., Toro, N., and Jiménez-Zurdo, J.I. (2007). Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics. Mol. Microbiol. 66, 1080–1091. del Val, C., Romero-Zaliz, R., Torres-Quesada, O., Peregrina, A., Toro, N., and Jiménez-Zurdo, J.I. (2012). A survey of sRNA families in α-proteobacteria. RNA Biol. 9, 119–129.

Vasse, J., de Billy, F., Camut, S., and Truchet, G. (1990). Correlation between ultrastructural differentiation of bacteroids and nitrogen fixation in alfalfa nodules. J. Bacteriol. 172, 4295– 4306.

Wang, Z., Wang, S., and Wu, Q. (2014). Cold shock protein A plays an important role in the stress adaptation and virulence of Brucella melitensis. FEMS Microbiol. Lett. 354, 27–36.

Wang, Z., Liu, W., Wu, T., Bie, P., and Wu, Q. (2016). RNA-seq reveals the critical role of CspA in regulating Brucella melitensis metabolism and virulence. Sci. China Life Sci.

73

Williams, K.P., Sobral, B.W., and Dickerman, A.W. (2007). A Robust Species Tree for the Alphaproteobacteria. J. Bacteriol. 189, 4578–4586.

Wright, P.R., Richter, A.S., Papenfort, K., Mann, M., Vogel, J., Hess, W.R., Backofen, R., and Georg, J. (2013). Comparative genomics boosts target prediction for bacterial small RNAs. Proc. Natl. Acad. Sci. U. S. A. 110, E3487-3496.

Wright, P.R., Georg, J., Mann, M., Sorescu, D.A., Richter, A.S., Lott, S., Kleinkauf, R., Hess, W.R., and Backofen, R. (2014). CopraRNA and IntaRNA: predicting small RNA targets, networks and interaction domains. Nucleic Acids Res. 42, W119-123.

Xia, B., Ke, H., and Inouye, M. (2001). Acquirement of cold sensitivity by quadruple deletion of the cspA family and its suppression by PNPase S1 domain in Escherichia coli. Mol. Microbiol. 40, 179–188.

Yamanaka, K., Fang, L., and Inouye, M. (1998). The CspA family in Escherichia coli: multiple gene duplication for stress adaptation. Mol. Microbiol. 27, 247–255.

Yurgel, S.N., and Kahn, M.L. (2005). Sinorhizobium meliloti dctA mutants with partial ability to transport dicarboxylic acids. J. Bacteriol. 187, 1161–1172.

Yurgel, S.N., Berrocal, J., Wilson, C., and Kahn, M.L. (2007). Pleiotropic effects of mutations that alter the Sinorhizobium meliloti cytochrome c respiratory system. Microbiol. Read. Engl. 153, 399–410.

74

This page left intentionally blank

75

Figure 1. Multiple sequence alignment of the S. meliloti CspA protein family members with E. coli CspA. RNP-1 and RNP-2 domains are almost perfectly conserved and are highlighted in grey. CspA5 contains two cold shock domains and these were aligned separately as N and C terminal fragments.

76

77

Figure 2. Symbiotic nodule zone specific and free-living stress responsive expression of S. meliloti CspAs. (A) Bright field and fluorescent microscopy of representative longitudinal sections of alfalfa nodules expressing endogenously tagged S. meliloti CspA2-GFP and CspA4-

GFP fusions at 17 and 24 days post inoculation. Scale bar represents 200 µm. (B) Log2 fold ratio of fluorescence of S. meliloti CspA2-GFP, CspA4-GFP, and CspA5-GFP endogenously tagged fusions in response to the indicated stresses for 24 hr to fluorescence of unstressed cells. Data are represented as the mean of a minimum 3 biological replicates +/- SEM.

78

79

Figure 3. Free-living and symbiotic phenotypes of S. meliloti cspA deletion strains. (A) Replica plating experiment of a 5-fold dilution series onto YMB media agar plates with the indicated strains grown under the indicated stresses. Images of a representative experiment converted to grayscale are shown to the left. Densitometry of the lane enclosed in a rectangle is represented on the right as the mean of biological triplicate experiments ± SEM. (B) Images of alfalfa nodules inoculated with Sm1021 WT and Sm1021 ΔcspA2 ΔcspA4 strains 10 and 20 dpi.

Scale bars represent 2 mm. (C) Number of immature (white) and mature (pink) alfalfa nodules observed per plant at the indicated day post inoculation with Sm1021 WT and Sm1021 ΔcspA2

ΔcspA4 strains. Data is represented as the mean ± SEM. (D) Alfalfa aerial tissue dry weight was measured 28 dpi with the indicated strains. Data is represented as the mean ± SEM. Dry weight from plants inoculated with the Sm1021 ΔcspA2 ΔcspA4 strain was determined to be significantly different (p-value =.0063).

80

81

Figure 4. CspA-GFP fusion native immunoprecipitation followed by RNA sequencing. (A) Coomassie stained SDS-PAGE of input (I) and bound (B) fractions from Control, CspA2-GFP,

CspA4-GFP, and CspA5-GFP native immunoprecipitations. (B) anti-GFP western blot with input, flow-through (FT) and bound fractions from the IP. (C) RNA fragment analysis of input and bound samples from the IP showing enrichment of 16S rRNA and a population of sRNAs in the

CspA2-GFP and CspA4-GFP IPs. (D) Number of significantly enriched RNA species from CspA2,

CspA4 and CspA5 IPs identified by Illumina RNA sequencing and determined by an FDR < 0.05 with edgeR analysis. (E) Enrichment comparison plots for the CspA2, CspA4 and CspA5 IP vs the control IP for RNAs identified by Illumina RNA sequencing. Normalized RNA abundance values per gene are visualized on a log x log plot. Light gray circles represent genes with non- significant enrichment, dark gray genes represent genes with significant enrichment, larger magenta circles represent significantly enriched αR14 family sRNAs in the CspA2 and CspA5 IP, and smaller magenta circles in the CspA5 IP represent non-significantly enriched αR14 family sRNAs (F) RT-qPCR verification of 16S rRNA and smr14C2 sRNA enrichment in control, CspA2,

CspA4 and CspA5 IPs. 23S rRNA was used as the endogenous control for calculation of fold enrichment. Data represents the mean of biological duplicates, each performed with technical triplicates. (G) EMSA with Ni-NTA purified his-tagged CspA4 and T7 in vitro transcribed smr14C2

RNA. His-tagged RibE protein is a protein that does not bind RNA and serves as a negative control. CspA4 was purified with and without a GnHCl wash step and protein from the various preparations was fractionated by PAGE and stained with EtBr as indicated.

82

83

Figure 5. rpoE2 expression analysis from whole transcriptome sequencing of free-living

Sm1021 WT and Sm1021 ΔcspA2 ΔcspA4 strains. (A) 1X TBE 6% urea 6% polyacrylamide denaturing RNA gel electrophoresis with representative samples of isolated RNA from 28 dpi S. meliloti / alfalfa nodules and free-living S. meliloti log phase cultures used for generation of

Illumina RNA sequencing libraries. (B) Expression comparison plot between Sm1021 WT and

Sm1021 ΔcspA2 ΔcspA4 strain transcript abundances identified by Illumina RNA sequencing in free-living log phase culture. Average FPKM expression value between both strains was log2 transformed and plotted on the X-axis while log2 transformed fold change (Sm1021 ΔcspA2

ΔcspA4 FPKM / WT FPKM) was plotted on the Y-axis. Differentially expressed genes that are not statistically significantly different were plotted with 75% transparent light gray circles.

Significantly differentially expressed genes are plotted as dark gray circles and differentially expressed genes that were statistically different identified by Schluter et al to be part of the

RpoE2 regulon are indicated with green circles. (C) Venn diagram representing the overlap between the total number of significantly upregulated genes in the Sm1021 ΔcspA2 ΔcspA4 free-living transcriptome with the total number of genes identified to be part of the RpoE2 regulon from Schluter et al. (D) Normalized Integrated Genome Viewer (IGV) coverage tracks from WT and ΔA2ΔA4 strain transcriptome sequencing from free-living culture and symbiotic nodules over the rpoE2 locus. The small non-coding sRNA, SMc06778 is highlighted in green in the annotation below. (E) Lowest energy m-fold secondary structure prediction for SMc06778.

84

85

Figure 6. αR14 family sRNA and rpoE8 expression analysis from whole transcriptome sequencing of Sm1021 WT and Sm1021 ΔcspA2 ΔcspA4 strains from nodules. (A) Expression comparison plot, as in Fig 5B, except representing Sm1021 WT and Sm1021 ΔcspA2 ΔcspA4 strains in symbiosis. αR14 family sRNA are indicated with pink circles. (B) Normalized Integrated

Genome Viewer (IGV) coverage tracks from Sm1021 WT and Sm1021 ΔcspA2 ΔcspA4 strain transcriptomes from free-living cultures and symbiotic nodules in the region of the rpoE8 gene.

The αR14 family sRNA, Smr14B, is part of the rpoE8 5ʹ UTR and is designated as “αR14 element” and highlighted in magenta in the annotation at the bottom of the figure. (C) Lowest energy m- fold secondary structure prediction for Smr14B.

86

87

Figure 7. Cartoon representing CspA interactions in transcription antitermination, sRNA stabilization, and translation initiation. (A) CspA opens base paired RNA regions to single- strand RNA specific RNases; CspA can disrupt dsRNA regions involved in termination and potentially disrupt hairpins involved in antitermination. Stem sequences required for terminator structure formation are drawn in light green. (B) CspA role in mediating translation initiation and potentially αR14 translational repression. αR14 anti-Shine-Dalgarno loop repeats and mRNA Shine-Dalgarno sequences are drawn in magenta.

88

89

Figure S1. CspA family amino acid sequence alignments. (A) Clustal Omega alignment between

E. coli CspA and the S. meliloti CspA family of proteins. (B) Clustal Omega alignment of S. meliloti CspA1, CspA2, and CspA8. (C) Clustal Omega alignment of S. meliloti CspA3 and CspA4.

(D) Clustal Omega alignment of S. meliloti CspA6 and CspA7. (E) Clustal Omega alignment of S. meliloti CspA5 N and C terminal regions. (F) Cladogram based upon protein sequence similarity between the E. coli (white) and S. meliloti (black) CspA families.

90

91

Figure S2. S. meliloti CspA family transcript and eGFP fusion expression. (A) Sm2011 transcript expression data from Roux et al. (B) Free-living transcript expression data from Sallet et al. (C)

Free-living expression levels measured for the indicated CspA-GFP fusions.

92

93

Figure S3. Endogenous GFP-tagging strategy.

94

95

Figure S4. (A) Generation of ΔcspA2 and ΔcspA4 strains. (A) Deletion map of shoulder regions cloned to construct deletion constructs (dotted lines) and genomic coordinates of deleted regions. Primers used for genotyping are highlighted as green arrows. (B) Genotyping by PCR to confirm construction of deletions in the cspA2 and cspA4 single and double deletion strains

96

97

Figure S5. Antibiotic sensitivity phenotypes of ΔcspA2 and ΔcspA4 strains. (A) Replica dilution plate experiments with the indicated media and strains. (B) Genotyping by PCR to test for the presence of the nptII gene in the Sm1021 ΔcspA strains. The Sm1021 cspA2-gfp strain was a positive PCR control and 1021 WT strain was a negative control for the presence of nptII.

98

99

Figure S6. S. meliloti CspA – RNA target interactions. (A) SDS PAGE of Ni-purified S. meliloti

6XHis-tagged proteins from heterologous over-expression in E. coli. (B) 1X TBE 6% urea 6% polyacrylamide denaturing RNA gel electrophoresis of RNA isolated from “purified” proteins in

(A) stained with EtBr. RibE was not shown because the amounts of RNA associated with RibE were undetectable. (C) 1% Agarose 1X TAE gel with thensame RNA as in (B) also including a total RNA sample from S. meliloti. In addition to the 16 RNA of normal length, rRNA species that have not been trimmed at the 5ʹ end appear as bands with lower mobility. (D) Illumina RNA sequencing reads from the isolated RNA shown in (B) and (C) that mapped near the 5ʹ end of the E. coli BL21 DE3 16S rRNA gene. (E) Lowest energy secondary structure predictions from the

Mfold program for 6 members of the S. meliloti αR14 sRNA family. Conserved loop region sequences for Smc06496 are compared with the S. meliloti 16S rRNA extreme 3ʹ end and the canonical Shine-Dalgarno ribosome binding site. (F) SDS PAGE of enterokinase cleavage of

6XHis-tagged CspA4. Lane 1 shows 6XHis-tagged CspA4 protein isolated by Ni affinity purification, Lane 2 shows the products following treatment with enterokinase, Lane 3 is the purified, cleaved CspA4 protein. (G) EMSA with Smr14C RNA and native CspA4 protein produced from enterokinase cleavage with concentrations as indicated.

100

101

Figure S7. Illumina sequencing data from CspA IPs and ΔCspA strain whole transcriptome sequencing. (A-C) Circle charts representing the % of RNA by type present in the bound fractions of the (A) control, (B) CspA2, and (C) CspA4 IPs. The “other” category represents transcript reads mapping to repeat element regions. (D) Table with values represented in (A-C)

(E) Normalized Integrated Genome Viewer (IGV) coverage tracks from free-living cultures and nodules of αR14 family sRNA gene regions from transcriptome sequencing of RNA of SM1021 wild type and Sm1021 ΔcspA2 ΔcspA4 mutant.

102

103

Figure S8. αR14 sRNA family members zone specific expression. Zone specific expression data mined from Roux et al. (2014). Data represents the % of total transcript found in the indicated zones identified by RNA sequencing following laser capture microdissection of Medicago truncatula nodules inoculated with Sinorhizobium meliloti 2011.

104

105

TABLE S1. Genes corresponding to RNA significantly enriched by co-IP with CspA2-GFP

106

TABLE S2. Genes corresponding to RNA significantly enriched by co-IP with CspA4-GFP

107

TABLE S3. Genes corresponding to RNA significantly enriched by co-IP with CspA5-GFP

108

TABLE S4. Strains and Plasmids

109

CHAPTER 3. A new kinetic fluorescent RNA binding assay for RNA chaperone activity reveals

cooperative binding of S. meliloti CspA2 and CspA4 with αR14 sRNA family targets

PREFACE

This chapter represents a manuscript prepared for submission to Nucleic Acids Research. An undergraduate student Xingkai Liu assisted with some of the technical aspects of the experiments including RNA construct synthesis, protein purification, and running fluorescent binding assays under my close mentorship and will be listed as a co-author. I was responsible for the experimental design, data analysis, and data interpretation. I wrote the manuscript with

Dr. Kahn.

110

ABSTRACT

Bacterial CspA family proteins are small, ancient, RNA chaperones involved in mediating stress adaptation, in part through their global regulation of RNA secondary structure. Sinorhizobium meliloti CspA family proteins are important in mediating stress adaptation and are involved in establishing symbiotic nitrogen fixation within legume hosts. S. meliloti CspA2 and CspA4 interact strongly with the αR14 family of small non-coding RNAs (sRNAs) and this interaction is important for effective symbiotic development. We describe the development of a new RNA structure assay that follows the folding of the broccoli aptamer, a synthetically evolved RNA mimic of Green Florescent Protein (GFP), using the appearance of fluorescence. We used this this assay to investigate the in vitro interaction of S. meliloti CspA2 and CspA4 proteins with various hairpin sequences, including αR14 family RNA substructures. CspA2 and CspA4 did not have sequence preference for purine vs pyrimidine rich loops in our assay but do recognize specific features of some RNAs. CspA2 and CspA4 bind cooperatively to native αR14 family sRNA target structures, but not with other structures. This work presents a new in vitro RNA binding assay with far reaching application and for the first time defines cooperative binding of

CspA family chaperones with certain RNA sequences but not others.

111

INTRODUCTION

RNA structure plays essential roles in regulating gene expression in prokaryotes through its effects on transcription termination, translation initiation, and mRNA stability (Lalaouna et al., 2013; Meyer, 2017; Zhang and Landick, 2016). Bacterial RNA chaperones are important in regulating RNA structure and have been shown to be important in altering gene expression, especially in response to stress (Jiménez-Zurdo et al., 2013; Phadtare, 2004; Phadtare and

Severinov, 2010). The bacterial CspA family of proteins are small RNA binding proteins that are typically comprised of a single Cold Shock Domain (CSD), an ancient, single stranded nucleic acid binding structure found throughout all domains of life (Horn et al., 2007). Bacteria usually have several (4-10) of these proteins. The E. coli family of CspA proteins was initially identified as a set of non-specific RNA chaperones important for bacterial adaptation to low temperature environments, especially during cold-shock (Jiang et al., 1997; Xia et al., 2001; Yamanaka,

1999). It is now evident that the CspA family of proteins has important functions at physiological temperatures and in responding to stresses in addition to low temperatures

(Phadtare and Severinov, 2010). CspAs are thought to disrupt RNA secondary structure by weakly and non-specifically interacting with single stranded RNA through base stacking interactions facilitated by aromatic and positively charged amino acid residues extending from a conserved anti-parallel β-sheet within the CSD (Phadtare et al., 2004; Rennella et al., 2017;

Sachs et al., 2012). A role for the CspA family of proteins in antitermination of transcription in which CspAs specifically melt intrinsic terminator sequences has been well characterized in E. coli (Bae et al., 2000; Phadtare and Severinov, 2009; Phadtare et al., 2002a, 2002b).

112

More recently a role for the CspA family of proteins in mediating bacterial adaptation to stress encountered within eukaryotic hosts has emerged. Loss of host cell invasiveness following deletion of CspA family members has been reported in Listeria, Salmonella, and

Brucella (Loepfe et al., 2010; Michaux et al., 2017; Schmid et al., 2009; Wang et al., 2014, 2016).

We have shown that S. meliloti CspA2 and CspA4 are important for establishment of effective symbiosis with alfalfa. Using RNA-seq analysis of RNA molecules associated with immunoprecipitated CspA2 and CspA4 proteins, we identified a specific interaction between

CspA2 and CspA4 and the αR14 family of sRNAs, and suggested that the decrease in abundance of αR14 sRNAs in a S. meliloti ∆cspA2∆cspA4 mutant was related to the decrease in symbiotic effectiveness of the mutant. Robledo et al. have recently shown that deleting a major αR14

RNA gene also has this phenotype. The αR14 sRNAs are a unique family of sRNA molecules that are recognized by their cloverleaf-like structure and nearly perfectly conserved anti-Shine-

Dalgarno loop sequences (Robledo et al., 2017; del Val et al., 2007, 2012). αR14 RNAs are only found in alphaproteobacteria. Our immunoprecipitation experiments also showed very strong enrichment of several additional sRNAs and of the 16S rRNA, in contrast to a significant but weaker enrichment of a large subset of global cellular mRNAs. An important question within the RNA chaperone field is how CspA family proteins might distinguish between RNA molecules and what sequence or structural features define interaction strength.

Broccoli is a synthetically evolved RNA aptamer designed to bind 3,5-difluoro-4- hydroxybenzylidene imidazolinone (DFHBI), a small molecule mimic of the green fluorescent protein (GFP) fluorophore (Filonov et al., 2014). Broccoli forms a G-quadruplex structure that can bind and activate fluorescence of DFHBI in a Mg2+ dependent manner. Fluorescence

113 activation is due to the stabilization of the fluorescent confirmation of the fluorophore within the G-quadruplex structure of broccoli (Paige et al., 2011; You and Jaffrey, 2015). DFHBI-1T (1T) is a modified version of DFHBI that also binds broccoli RNA but has enhanced fluorescence properties (Song et al., 2014). Broccoli and other related fluorescent aptamers such as spinach and spinach2, have been used for in-cell imaging of RNA (Filonov and Jaffrey, 2016; Ouellet,

2016; Walker et al., 2015), imaging intracellular metabolites (Strack et al., 2014), visualizing

RNA processing (Filonov et al., 2015), and detecting RNA – RNA interactions in vivo (Alam et al.,

2017).

In this work, we designed broccoli constructs in which the 3ʹ and 5ʹ sequences can fold to give the fluorescence enhancing structure only if this structure is stabilized by an intervening hairpin, such as domains of αR14 sRNA. We investigated the interaction of these constructs with S. meliloti CspA2 and CspA4 proteins in vitro by following formation of a broccoli DFHBI-1T fluorescent complex over time. The rate of complex formation and, in some cases, the amount of complex that forms, was affected by the presence of CspA proteins. We used this assay to investigate details of CspA2 and CspA4 interactions with target RNA sequences.

METHODS

Broccoli tag RNA synthesis and purification

Broccoli tag RNAs were synthesized using the procedure diagrammed in Fig. S1. A PCR reaction containing 2 forward DNA primers and a reverse DNA primer was used to generate a ds-cDNA template containing a T7 RNA polymerase promoter sequence upstream of the desired broccoli-tagged sequence. PCR reactions were carried out with GoTaq (Promega) using the

114 following cycling parameters: 95°C for 2 min followed by 35 cycles of 95°C for 30 sec, 65°C for

30 sec, 72°C for 1 min, followed by final extension at 72°C for 5 min. Primer sequences can be found in Table S1. Because the nucleotides needed for formation of the broccoli complex are at the 3ʹ and 5ʹ ends of the molecule, the reverse primer and one forward primer can be used in all constructs while the “second’ forward primer contains customizable information including the hairpin sequence to be tested. The PCR products were separated using agarose gel electrophoresis and a DNA PCR fragment of the correct size was extracted and purified with

Zymoclean gel DNA recovery kit (Zymo Research). The cleaned PCR product was used as the template for bacteriophage T7 RNA polymerase in vitro transcription using the MEGAscript T7 transcription kit (Ambion). The reaction was carried out according to the manufacturer’s protocol. The RNA product of the reaction was extracted with TRIzol reagent (Thermo Fisher

Scientific) following the manufacturer’s RNA clean-up protocol. Cleaned RNA was resuspended in RNase-free water and stored at -80°C for use in the broccoli fluorescent binding assay.

Ni purification of 6X his-tagged proteins

N-terminal 6XHis-tagged proteins were heterologously expressed in E. coli BL21 DE3 cells from the pET-30 Ek/LIC expression vector (Novagen 69077). Expression of a 1L culture was induced with 100 µM IPTG at an absorbance of 0.4 - 0.6 at 600nm. Cells were transferred to 18°C and allowed to express overnight. Cells were harvested by centrifugation and lysed by sonication in

1X PBS Lysis Buffer (1X PBS (10 mM Na2HP04, 1.8 mM KH2PO4, 137 mM NaCl, 2.7mM KCL), 0.1%

Triton, 20 mM imidazole, 1 mM DTT, and 0.1% PMSF added from a saturated solution of PMSF in 2-propanol). Sonicated lysates were centrifuged at 20,000 g in a tabletop microfuge at 4°C

115 for 45 min. Spun lysates were incubated for 1 hr with Lysis Buffer equilibrated Ni-NTA resin

(Qiagen) while rocking at 4°C and transferred to an Econo-Pac Chromatography column (Bio-

Rad). The resin was washed using gravity flow with High Salt Wash Buffer (Lysis Buffer containing 1 M NaCl), followed by Low Salt Wash Buffer (Lysis Buffer with 150 mM NaCl). After the high and low salt washes, GnHCL Wash Buffer (Lysis buffer containing 6M GnHCL) was added to the column, the column was capped, and the resin was allowed to incubate with

GnHCL buffer for 1 hr at 4°C with gentle shaking. After incubation, the column was uncapped and the resin was washed with Equilibration Buffer (Elution buffer with 20 mM imidazole).

Samples were eluted with Elution Buffer (25 mM HEPES, 25% glycerol, 70 mM KCL, 1 mM DTT,

0.1% PMSF, and 300 mM imidazole). Eluted fractions were analyzed by SDS PAGE and the peak fraction was desalted with Bio-Rad P6DG according to manufacturer’s protocol into HGKEDP

Buffer (25 mM HEPES, 25% glycerol, 70 mM KCL, 1 mM DTT, 0.1% PMSF) and stored at -80°C.

Protein concentrations were determined with the Qubit protein assay kit (Thermo Fisher

Scientific).

Broccoli fluorescent binding assay

Broccoli fluorescent binding assay reactions were carried out in 2 stages. 25 µl binding reactions were first carried out in RNA Binding Buffer (20 mM HEPES pH 7.6, 20% glycerol, 80 mM KCl,

0.08 µM EDTA). RNA was taken from -80°C storage, allowed to thaw on ice, heated for 5 min at

75°C and then immediately placed on ice for 5 min before inclusion in the binding reaction. 5 µl of RNA in water was added to 20 µl of protein in HGKE (25 mM HEPES pH 7.6, 25% glycerol, 100 mM KCl, 0.1 µM EDTA), mixed well, and incubated for 15 min at 30°C. 1T Buffer (60mM HEPES,

116

120 mM KCl, 0.6 mM MgCl2, 20 µM DFHBI-1T) was prepared to initiate the second stage of the assay. DFHBI-1T ((Z)-4-(3,5-difluoro-4-hydroxybenzylidene)- 2-methyl-1-(2,2,2-trifluoroethyl)-

1Himidazol-5(4 H)-one) from Lucerna Technologies was stored as a 20 mM stock solution in

DMSO at -80°C. At time = zero, 25 µl of the binding assay was mixed with 25 µl 1T Buffer in a

96-well white bottom microplate, placed in a Fluostar SLT microplate reader (excitation: 485nm

/ emission: 538) and formation of the fluorescent complex was monitored over time.

For assay reactions involving the ssDNA inhibitor (5ʹ-GGGAGGAGGAGUGUCGCCAAUC-3ʹ), 2.5 µl

RNA was incubated with 2.5 µl of the cDNA inhibitor for 15 min at 30°C prior to carrying out the reactions as described above.

Non-linear regression modelling of broccoli fluorescent binding assay curves

The time course of fluorescent signal values from the fluorometer were corrected for background fluorescence by subtracting the corresponding value from a sample that contained water instead of RNA and the resulting data was analyzed with GraphPad Prism 7 software

(GraphPad Software, La Jolla, CA, USA). Except as noted below, curves best fitting the data were generated using the 2-phase association model contained in the Prism 7 software to obtain

2 values for Plateau, %Fast reaction, KFast and KSlow rates, and R goodness of fit for each experimental sample. In experiments where very high ratios of protein/RNA were examined, 2- phase association modelling yielded ambiguous results because the second reaction phase was very small and a 1-phase association model was used. Secondary plots were generated from the association modelling by combining data from replicate experiments. The %Fast reaction parameter from protein concentration titrations was normalized to the RNA alone sample by

117 setting this value to 100%. The normalized %Fast reaction data was then plotted vs [CspA] for replicate experiments and fit with the [Inhibitor] vs Normalized Response equation in GraphPad

Prism 7 software or the [Inhibitor] vs Normalized Response – variable slope equation based

2 upon R values. The IC50 parameter was obtained and represents an estimate of the apparent Kd for the protein – broccoli-tag interaction.

118

RESULTS

Broccoli-Tag substrates combine minimal broccoli units with interchangeable test hairpin loop regions.

The broccoli RNA binding aptamer and related spinach and spinach2 aptamers have proven to be powerful tools for investigating various aspects of RNA biology. In the experiments described below, we investigated using the broccoli core unit to report on RNA chaperone activity in vitro. The broccoli structure described previously uses complementary base-pairing sequences at both ends to stabilize a domain that binds DFHBI-1T, leading to a substantial increase in fluorescence. To alter the broccoli aptamer core unit in order to use it as a component of an in vitro binding assay, we removed the tRNA scaffold from the base of the broccoli unit and replaced it with a hexamer GC clamp. We also replaced the base-paired c- diGTP aptamer at the other end with another hexamer GC clamp or, in the case of an initial test hairpin, a polyU5 sequence (Fig. 1A). The Mfold webserver was used to predict the lowest energy secondary structure presented (Zuker, 2003). The designed polyU5-loop Broccoli-tag was tested for DFHBI-1T binding competence and fluorescent signal production as well as for accurate detection in a fluorometer. The polyU5-loop broccoli-tag fluoresces strongly and the signal can be accurately measured in a fluorometer (Fig. 1B and 1C). In preliminary experiments, we showed that using two oligonucleotides that contain all of the paired sequences in Fig1A but were not linked into a single molecule did not result in a reaction that increased fluorescence, indicating that the sequence labelled “test hairpin” was needed to connect the two domains in order to promote fluorescence by proper folding of the broccoli tag

(data not shown).

119

The broccoli fluorescent binding assay can be used to examine CspA RNA chaperone activity in vitro

The broccoli – DFHBI-1T interaction forms a stable complex that was not disrupted by a

DNA oligonucleotide with a sequence complimentary to the 3ʹ half of the RNA, from base position 31 – position 66. When the polyU5-loop broccoli-tag RNA and ssDNA oligonucleotide inhibitor were incubated together before adding DFHBI-1T, fluorescence was much, but if the cDNA inhibitor was added after the RNA – DFHBI-1T complex was formed, there was only a small decrease in fluorescence (Fig 1D). CspA proteins have been reported to bind relatively weakly to RNA. We hypothesized that if we formed RNA – CspA complexes and then monitored appearance of fluorescence after the addition of DFHIB-1T, the CspA – RNA complex would be unable to fold normally into the DFHBI-1T binding configuration but that as the CspA proteins dissociated from the complex, the unbound RNA would be able to fold, bind to DFHBI-1T and fluoresce. In this model, the rate of fluorescence increase would report on the dissociation rate of the CspA – RNA complex(es), koff (Fig. 1E). Initial experiments demonstrated that adding

6XHis-tagged CspA2 to the polyU5-loop broccoli-tag RNA prior to adding DFHBI-1T at time = 0, resulted in a concentration dependent decrease in the rate of fluorescence increase but had no effect on the eventual intensity of fluorescence (Fig. 1F). As a control, we used a protein without known RNA binding activity, 6X His-tagged RibBA, and observed that this protein did not affect the rate of fluorescence increase when present at similar concentration by weight where CspA2 was effective.

120

CspA2 and CspA4 interact similarly with synthetic broccoli tags regardless of loop sequence

With the success of the initial test polyU5-loop broccoli-tag, we tested whether changing the loop sequence would affect CspA binding by constructing broccoli tags with a purine rich polyG9-loop broccoli tag and a pyrimidine rich polyUCC3-loop (Fig 2A). The UCC3-loop was designed to mimic the conserved loop regions found in αR14 family sRNAs, which we had demonstrated by co-IP interact strongly with S. meliloti CspA2 and CspA4. We followed the time-course of fluorescence development by these broccoli-tags as a function of CspA2 and

CspA4 concentration after adding DFHBI-1T. The resulting curves (Fig. 2B) were best modelled with a 2-phase exponential association equation. In this model, we hypothesize that the observed KFast represents CspA unbound broccoli-tag RNAs while the observed KSlow represents the koff or dissociation of CspA bound broccoli-tag RNAs (Fig. 2C), making those RNAs eligible to form fluorescent complexes with DFHBI-1T. In this model, the %Fast reaction represents the percentage of broccoli-tag RNA not associated with CspA protein (Fig S3). Plotting the %Fast versus [CspA], and fitting to a standard dose-response equation, we obtained IC50 values that estimate an apparent CspA Kd (Fig. 2D). CspA2 and CspA4 had similar binding affinity for the synthetic polyG9 and polyUCC3 broccoli-tags with apparant Kd values around 50 µM for all experiments. Plotting the derived KSlow vs [CspA] we find that KSlow decreases as [CspA] increases (Fig. 2E), a result consistent with the model for a 2-phase association described above.

CspAs bind cooperatively to native αR14 family sRNA Smr14C2 Arm1 and Arm2 broccoli-tags

121

Previously, we identified a strong interaction between S. meliloti CspA2 and CspA4 proteins and αR14 family sRNAs by showing a 100X enrichment of some of the αR14 RNAs after immunoprecipitation of the proteins from cell lysates. To test interaction of these proteins with elements of an αR14 RNA, we designed broccoli tags in which the Arm1 and Arm2 sequences of the αR14 family sRNA, Smr14C2 (Fig. 3A), were joined to the broccoli-tag base to generate

Arm1 and Arm2 constructs (Fig. 3B and 3C). Using these αR14 broccoli-tagged RNA molecules, we measured the kinetics of fluorescence increase after adding DFHBI-1T in the presence of various levels of CspA2 and CspA4 protein (Fig. 4A). Secondary %Fast plots vs [CspA] revealed that, while the rates of fluorescence increase was similar to those reported in Fig. 2 for the synthetic G9-loop and UCC3-loop constructs, lower levels of the CspA proteins were able to slow the rate of fluorescence increase for interactions with the Arm1 and Arm2 sequences (Fig. 4B).

The data shows a Kd of 13 µM and 27 µM for the affinity of CspA2 with the Arm1 and Arm2 broccoli-tags, respectively. The affinity of CspA4 for the Arm1 and Arm2 broccoli-tags was even higher, with Kd values of 6 µM and 8 µM, respectively. In addition to the higher affinity of the

CspAs for the native αR14 family structure loops, CspA2 and CspA4 appeared to be binding cooperatively to these RNAs, as indicated by a better fit of the %Fast vs [CspA] secondary plots using a model that incorporated cooperative behavior. In this model, the level of cooperative binding is described by the Hill coefficient. Values greater than 1 represent positive cooperativity and values less than 1 represent negative cooperativity (Stefan and Le Novère,

2013). CspA2 binding to the Arm1 and Arm2 broccoli-tags had Hill coefficients of 3.7 and 2.0, respectively. CspA4 binding to the Arm1 and Arm2 broccoli-tags had Hill coefficients of 2.3 and

1.7, respectively. Binding to the synthetic polyG9 and polyUCC3 broccoli-tags (Fig 2) had Hill

122 coefficients near 1. This suggests that the native αR14 hairpin structures possess some sequence or structural features that the synthetic polyG9 and polyUCC3 hairpins do not.

To further investigate determinants of CspA binding, we tested the binding of CspA4 to an αR14 Arm1 variant that we generated by deleting the A bases at position 32 and position 61, which eliminates the bulges predicted in the native hairpin structure (Fig 3). We observed a marked decrease in affinity for the mutant αR14 Arm1 broccoli RNA and a loss of cooperative binding, with a Kd of 100 µM and a Hill coefficient of 0.8. These values are more like the binding to the synthetic polyG9 and polyUCC3 broccoli-tags. This data also shows that the longer stem of the αR14 broccoli tags than the stems in the synthetic polyG9 and polyUCC3 broccoli-tags was not the reason for the different binding properties.

Intriguingly, KSlow rate vs [CspA] plots revealed that KSlow values were highest at intermediate concentrations of CspA (Fig. 4C). This led us to hypothesize that these broccoli- tagged RNAs have an intrinsic KSlow, which dominates at low concentrations of CspA. This is evident as broccoli-tags without any CspA present exhibited 2-phase association kinetics. At intermediate concentrations of CspA, the KSlow was dominated by the CspA koff, which in the case of the αR14 Arm1 and αR14 Arm2 broccoli-tags was faster than the intrinsic KSlow. At higher concentrations of CspA, the KSlow dominated by the CspA koff was slower because the

CspA koff decreased at higher concentrations. The result was an overall maximal KSlow at intermediate concentrations of CspA.

CspA proteins can rescue kinetically trapped RNA structures

123

Further analysis of CspA2 and CspA4 experiments with the Arm1 broccoli-tag revealed that at late timepoints and in the presence of intermediate concentrations of CspA2 or CspA4, fluorescence values were higher than we observed in the absence of the proteins (Fig. 5A). We hypothesized that RNA folded in solution into kinetically-trapped alternative structures that could not transition to a configuration that binds DFHBI-1T, but that CspA proteins can bind to these alternatives and allow them to refold into structures that bind DFHBI-1T and fluoresce.

We investigated the possible existence of alternative structures by running a series of denaturing and native electrophoretic separations with the αR14 Arm1 broccoli-tag RNA and staining with EtBr, which fluoresces when bound to any structured RNA or staining with DFHBI-

1T, which requires a specific configuration (Fig. 5B). The αR14 Arm1 construct ran as a single band in a denaturing 6% Urea 6% polyacrylamide TBE gel stained with EtBr, showing that the purified sample used in our experiments was one RNA species, as expected. However, the Arm1 construct separated into several bands when run on a 6% polyacrylamide native gel and stained with EtBr, suggesting the presence of multiple, alternatively-folded structures. When an identical native gel was stained with DFHBI-1T only one of the bands fluoresced confirming the presence of stable alternative structures that were not activating DFHBI-1T fluorescence. This supports the idea that the increase in endpoint fluoresce that we observed with the Arm1

Broccoli-Tag at intermediate concentrations of CspA was due to transition of the kinetically trapped alternative structures to one that could fluoresce.

We further investigated the ability of CspAs to convert kinetically trapped structures to more stable ones by testing the effect of intermediate CspA2 concentrations on intentionally trapped Arm2 Broccoli-Tags by adding a complementary ssDNA (Fig. 5C). We designed a ssDNA

124 oligonucleotide complementary to the SMr14C2 Arm2 broccoli-tag construct (Fig. S4) and tested its effect in our binding assay. We hypothesized that the ssDNA would bind to and trap the RNA in a duplex RNA – ssDNA complex and this would suppress the development of fluorescence. We further hypothesized that intermediate concentrations of CspA could form a

CspA – Arm2 intermediate, disrupting the trapped complex and allowing it to fold into the more stable Arm2 Broccoli-Tag – DFHBI-1T fluorescent complex (Fig. 5C). As expected, adding increasing concentrations of the ssDNA oligo to the Arm2 Broccoli-Tag lowered the level of endpoint fluorescence observed after DFHBI-1T was added (Fig. 5D). When we formed a ssDNA

– Arm2 Broccoli-Tag RNA duplex structure and then added an intermediate concentration of

CspA4 followed byDFHBI-1T, we observed that the rate of fluorescence formation decreased, but the endpoint fluorescence increased (Fig. 5E). This result is consistent with CspA4 facilitating the dissociation of the RNA – DNA duplex interfering with the binding of DFHBI-1T but also allowing the RNA to refold into a configuration that binds to DFHBI-1T.

DISCUSSION

Previous in vitro investigations into RNA chaperone activity of CspA family proteins has been with E. coli CspA family proteins using fluorescent molecular beacon constructs (Phadtare et al., 2002a, 2002b), KMnO4 probing for oxidizable bases (Phadtare and Severinov, 2005;

Phadtare et al., 2004), and NMR (Rennella et al., 2017). Together these studies revealed the importance of specific aromatic residues in the protein for RNA “melting” activity but not for binding, and revealed that CspAs bind RNA hairpins first at the single stranded loop regions and

“melt” down the stem. We report here a novel in vitro kinetic RNA binding assay to further

125 investigate CspA family protein chaperonin function. Taking advantage of a variation of the fluorescent broccoli RNA -DFHBI-1T complex, we designed an assay to estimate the koff rate of

CspA – RNA hairpin interactions as a function of broccoli – DFHBI-1T fluorescent complex formation over time. The assay substrates contain a minimal broccoli unit that was fused to hairpin sequences with affinity to CspA proteins in which development of fluorescence by the broccoli unit depended on stabilization by the test hairpin. We observed that in the presence of

CspA protein the broccoli-tags become fluorescent more slowly, which we attributed to CspA binding to the hairpin sequences and inhibiting folding. The kinetics of fluorescence increase over time can be approximated most successfully using a 2-phase model from which we can calculate constants for fast and slow components of the rate of fluorescence increase. KFast represents CspA-free broccoli-tag RNA that can bind DFHBI-1T while KSlow represents CspA bound broccoli-tags that become available to bind DFHBI-1T after dissociation of CspA-RNA complexes at a rate described by koff (Fig. 2C). By describing CspA as an inhibitor of the formation of a complex between the broccoli-tag and DFHBI-1T and by using the %Fast reaction to approximate the amount of broccoli-tag not bound to CspA, experimental Kd values can be extrapolated from the binding assay curves by plotting %Fast vs [CspA]. Using this method, we used this assay to determine binding constants and koff rates of CspA2 and CspA4 for various synthetic and native target RNA hairpin structures (Fig. 2D, Fig 4B, and Table S2).

Previous methods mentioned above for investigating mechanisms of CspA chaperone function in vitro used synthetically created RNA targets and did not directly address loop or stem sequence specificity of CspA chaperone activity. CspA family members are generally reported as lacking strong sequence specific binding (Jiang et al., 1997), though many transcript

126 specific interactions have been described. E. coli CspE specifically interacts with and regulates the CspA transcript (Bae et al., 1999). CspA, CspC, and CspE interact with and specifically regulate genes important for the cold shock response by affecting antitermination (Bae et al.,

2000). CspC and CspE regulate the rpoS stress responsive sigma factor gene in E. coli (Cohen-Or et al., 2010; Phadtare and Inouye, 2001). Here, we directly compared the affinity of S. meliloti

CspA2 and CspA4 for a polyG9 purine rich loop and a polyUCC3 pyrimidine rich loop and found no significant differences in affinity for either of the loops with either of the proteins. This suggests that CspA – RNA target specificity for this stem was loop sequence independent. The lack of loop sequence specificity suggests that the targeting of CspA family proteins to specific

RNA structures could be due to higher order structural recognition. Rennella et al found that

CspA interacted more strongly with a pyrimidine rich loop sequence (UUUC) over a purine rich

(GAAA) loop, however the stem sequences and hydrogen bonding strengths were different between the two hairpins, which may be an alternative explanation for the differences in affinity they observed.

In co-immunoprecipitation experiments reported elsewhere, we showed that CspA2,

CspA4 and CspA5 enriched distinct subsets of S. meliloti cellular RNA, but that both CspA2 and

CspA4 substantially enriched the αR14 group of sRNAs. To explore the interaction further, we generated constructs in which the Arm1 and Arm2 sequences from Smr14C2, an sRNA in the

αR14 family, were fused to sequences related to the broccoli aptamer and investigated in vitro the interaction of these RNA molecules with CspA2 and CspA4. CspA2 and CspA4 had dramatically higher affinity for these constructs than for the related synthetic polyG9-loop and

UCC3-loop broccoli-tags we also constructed. In addition to this higher affinity, CspA2 and

127

CspA4 bound cooperatively to the Arm1 and Arm2 broccoli constructs with Hill coefficients as high as 3.7. A variant of the Arm1 construct missing two bases predicted to lead to bulges in the original Arm1 stem structure had dramatically lower affinity and no longer bound CspA4 cooperatively. We suggest that the cooperative binding of CspA2 and CspA4 to two of the αR14 stem loop structures may explain the substantial enrichment we observed by co-IP of these

RNA molecules and, more speculatively, the enrichment of other RNA molecules by specific

CspA proteins. Cooperative binding of some structures but not others would contribute to sequence specific selection of RNA molecules and would enhance the concentration dependent effect that various CspA proteins would have on targeted RNA molecules. These properties could help define global vs specific functions for CspA family members in the cell.

The observation that there was no significant difference in affinity between the synthetic UCC3-loop construct and the polyG9-loop construct suggested that the 5ʹ-

UCCUCCUCCC-3ʹ conserved sequence motif found in the αR14 structures was not responsible for the increase in affinity or cooperativity observed. In turn, this suggests that some aspect of

RNA structure defines this phenomenon. The observation that the mutated version of Arm1, which was as long as the native Arm1 sequence and still contained the 5ʹ-UCCUCCUCCC-3ʹ loop sequence, did not interact cooperatively or with high affinity is consistent with this idea. The ΔG of the terminal hairpins differ among the various constructs we tested (Table S2), but there does not appear to be an obvious correlation between hairpin strength and CspA binding affinity. As far as we are aware this study represents the first report of cooperative kinetics in the binding of CspA proteins to RNA but the nature of the cooperativity might be specific to the

αR14 RNAs or CspA2/CspA4 and might not have been detected in an analysis of Csp protein

128 binding to bulk RNA or to non-target sequences. This study also represents the first detailed in vitro investigation of CspA interactions with a specific target, chosen because it was highly enriched during an in vivo association assay. Rennella et al used a cold box sequence in their study of a native target of the E. coli CspA protein, but altered the loop sequence to confer stability and, like with our αR14 Arm1m variant, may have disrupted some features of the interaction. The cooperative binding we observed with native αR14 structures leads us to propose that this abundant sRNA family may function to locally titrate CspA concentration in the cell and serve as a molecular CspA sponge. The cooperative interaction of CspA2 and CspA4 with the αR14 sRNAs indicates that their influence on the interaction of these RNAs with cellular functions is especially sensitive to changes in CspA concentration.

Data presented in Fig. 5 strongly support the idea that CspA proteins interact with

RNA:RNA and RNA:DNA duplexes in a dynamic way and can allow these molecules to explore folding configurations that may not be available in the absence of protein. One argument for this is that at intermediate concentrations of CspA4, the endpoint fluorescence was higher than was obtained in the absence of the protein. Our explanation for this is that the αR14 Arm1 broccoli-tag construct forms stable alternative structures and that these kinetically trapped alternative structures could not bind DFHBI-1T, thus lowering the level of fluorescence that could be obtained. Adding CspA4 allows these molecules to escape the trap and to fold again, sometimes in a way that can bind DFHBI-1T and increase fluorescence. We directly tested the ability of CspA proteins to rescue trapped alternative structures by trapping the Aim2 broccoli

RNA in an RNA:DNA structure using a DNA oligonucleotide inhibitor complementary to part of

129 the Arm2 structure. As we hypothesized, adding an intermediate concentration of CspA led to an increase in fluorescence.

The idea that CspA4 can interact in a dynamic way with nucleic acid structures was further validated by observing the dependence of different components of the development of fluorescence as a function of CspA concentration in Figure 4C. At low concentrations of CspA, the KSlow rate was dominated by KSlow1, the spontaneous switching between alternative structures. As CspA concentration increases, the melting and refolding mediated by CspA, and shown in Fig 6 as KSlow2, becomes more substantial and an overall increase in KSlow was observed at intermediate concentrations of CspA. At higher concentrations of CspA, the RNA becomes more saturated with CspA protein, trapping the RNA in another type of non-fluorescent complex which lowered KSlow2 and thus the overall KSlow observed (Fig. 6).

The results presented here demonstrate the use of Broccoli-Tags as useful in vitro tools for investigating RNA chaperone activity. Advantages of Broccoli-Tags over similar in vitro methods such as molecular beacon experiments are ease of construct generation, the flexibility of construct design and the low cost of reagents. Though used here to study S. meliloti CspA family protein interactions with target RNA structures, Broccoli-Tags could be easily adapted to high throughput screens for novel RNA – protein interactions in vitro or for novel conditions that influence these interactions in vivo.

130

REFERENCES Alam, K.K., Tawiah, K.D., Lichte, M.F., Porciani, D., and Burke, D.H. (2017). A Fluorescent Split Aptamer for Visualizing RNA-RNA Assembly In Vivo. ACS Synth. Biol.

Bae, W., Phadtare, S., Severinov, K., and Inouye, M. (1999). Characterization of Escherichia coli cspE, whose product negatively regulates transcription of cspA, the gene for the major cold shock protein. Mol. Microbiol. 31, 1429–1441.

Bae, W., Xia, B., Inouye, M., and Severinov, K. (2000). Escherichia coli CspA-family RNA chaperones are transcription antiterminators. Proc. Natl. Acad. Sci. U. S. A. 97, 7784–7789.

Cohen-Or, I., Shenhar, Y., Biran, D., and Ron, E.Z. (2010). CspC regulates rpoS transcript levels and complements hfq deletions. Res. Microbiol. 161, 694–700.

Filonov, G.S., and Jaffrey, S.R. (2016). RNA Imaging with Dimeric Broccoli in Live Bacterial and Mammalian Cells. Curr. Protoc. Chem. Biol. 8, 1–28.

Filonov, G.S., Moon, J.D., Svensen, N., and Jaffrey, S.R. (2014). Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299–16308.

Filonov, G.S., Kam, C.W., Song, W., and Jaffrey, S.R. (2015). In-gel imaging of RNA processing using broccoli reveals optimal aptamer expression strategies. Chem. Biol. 22, 649–660.

Horn, G., Hofweber, R., Kremer, W., and Kalbitzer, H.R. (2007). Structure and function of bacterial cold shock proteins. Cell. Mol. Life Sci. CMLS 64, 1457–1470.

Jiang, W., Hou, Y., and Inouye, M. (1997). CspA, the major cold-shock protein of Escherichia coli, is an RNA chaperone. J. Biol. Chem. 272, 196–202.

Jiménez-Zurdo, J.I., Valverde, C., and Becker, A. (2013). Insights into the noncoding RNome of nitrogen-fixing endosymbiotic α-proteobacteria. Mol. Plant-Microbe Interact. MPMI 26, 160– 167.

Lalaouna, D., Simoneau-Roy, M., Lafontaine, D., and Massé, E. (2013). Regulatory RNAs and target mRNA decay in prokaryotes. Biochim. Biophys. Acta 1829, 742–747.

Loepfe, C., Raimann, E., Stephan, R., and Tasara, T. (2010). Reduced host cell invasiveness and oxidative stress tolerance in double and triple csp gene family deletion mutants of Listeria monocytogenes. Foodborne Pathog. Dis. 7, 775–783.

Meyer, M.M. (2017). The role of mRNA structure in bacterial translational regulation. Wiley Interdiscip. Rev. RNA 8.

131

Michaux, C., Holmqvist, E., Vasicek, E., Sharan, M., Barquist, L., Westermann, A.J., Gunn, J.S., and Vogel, J. (2017). RNA target profiles direct the discovery of virulence functions for the cold- shock proteins CspC and CspE. Proc. Natl. Acad. Sci. U. S. A.

Ouellet, J. (2016). RNA Fluorescence with Light-Up Aptamers. Front. Chem. 4, 29.

Paige, J.S., Wu, K.Y., and Jaffrey, S.R. (2011). RNA mimics of green fluorescent protein. Science 333, 642–646.

Phadtare, S. (2004). Recent developments in bacterial cold-shock response. Curr. Issues Mol. Biol. 6, 125–136.

Phadtare, S., and Inouye, M. (2001). Role of CspC and CspE in regulation of expression of RpoS and UspA, the stress response proteins in Escherichia coli. J. Bacteriol. 183, 1205–1214.

Phadtare, S., and Severinov, K. (2005). Nucleic acid melting by Escherichia coli CspE. Nucleic Acids Res. 33, 5583–5590.

Phadtare, S., and Severinov, K. (2009). Comparative analysis of changes in gene expression due to RNA melting activities of translation initiation factor IF1 and a cold shock protein of the CspA family. Genes Cells Devoted Mol. Cell. Mech. 14, 1227–1239.

Phadtare, S., and Severinov, K. (2010). RNA remodeling and gene regulation by cold shock proteins. RNA Biol. 7, 788–795.

Phadtare, S., Inouye, M., and Severinov, K. (2002a). The nucleic acid melting activity of Escherichia coli CspE is critical for transcription antitermination and cold acclimation of cells. J. Biol. Chem. 277, 7239–7245.

Phadtare, S., Tyagi, S., Inouye, M., and Severinov, K. (2002b). Three amino acids in Escherichia coli CspE surface-exposed aromatic patch are critical for nucleic acid melting activity leading to transcription antitermination and cold acclimation of cells. J. Biol. Chem. 277, 46706–46711.

Phadtare, S., Inouye, M., and Severinov, K. (2004). The mechanism of nucleic acid melting by a CspA family protein. J. Mol. Biol. 337, 147–155.

Rennella, E., Sára, T., Juen, M., Wunderlich, C., Imbert, L., Solyom, Z., Favier, A., Ayala, I., Weinhäupl, K., Schanda, P., et al. (2017). RNA binding and chaperone activity of the E. coli cold- shock protein CspA. Nucleic Acids Res.

Robledo, M., Peregrina, A., Millán, V., García-Tomsig, N.I., Torres-Quesada, O., Mateos, P.F., Becker, A., and Jiménez-Zurdo, J.I. (2017). A conserved α-proteobacterial small RNA contributes to osmoadaptation and symbiotic efficiency of rhizobia on legume roots. Environ. Microbiol.

132

Sachs, R., Max, K.E.A., Heinemann, U., and Balbach, J. (2012). RNA single strands bind to a conserved surface of the major cold shock protein in crystals and solution. RNA N. Y. N 18, 65– 76.

Schmid, B., Klumpp, J., Raimann, E., Loessner, M.J., Stephan, R., and Tasara, T. (2009). Role of cold shock proteins in growth of Listeria monocytogenes under cold and osmotic stress conditions. Appl. Environ. Microbiol. 75, 1621–1627.

Song, W., Strack, R.L., Svensen, N., and Jaffrey, S.R. (2014). Plug-and-play fluorophores extend the spectral properties of Spinach. J. Am. Chem. Soc. 136, 1198–1201.

Stefan, M.I., and Le Novère, N. (2013). Cooperative binding. PLoS Comput. Biol. 9, e1003106.

Strack, R.L., Song, W., and Jaffrey, S.R. (2014). Using Spinach-based sensors for fluorescence imaging of intracellular metabolites and proteins in living bacteria. Nat. Protoc. 9, 146–155. del Val, C., Rivas, E., Torres-Quesada, O., Toro, N., and Jiménez-Zurdo, J.I. (2007). Identification of differentially expressed small non-coding RNAs in the legume endosymbiont Sinorhizobium meliloti by comparative genomics. Mol. Microbiol. 66, 1080–1091. del Val, C., Romero-Zaliz, R., Torres-Quesada, O., Peregrina, A., Toro, N., and Jiménez-Zurdo, J.I. (2012). A survey of sRNA families in α-proteobacteria. RNA Biol. 9, 119–129.

Walker, C.L., Lukyanov, K.A., Yampolsky, I.V., Mishin, A.S., Bommarius, A.S., Duraj-Thatte, A.M., Azizi, B., Tolbert, L.M., and Solntsev, K.M. (2015). Fluorescence imaging using synthetic GFP chromophores. Curr. Opin. Chem. Biol. 27, 64–74.

Wang, Z., Wang, S., and Wu, Q. (2014). Cold shock protein A plays an important role in the stress adaptation and virulence of Brucella melitensis. FEMS Microbiol. Lett. 354, 27–36.

Wang, Z., Liu, W., Wu, T., Bie, P., and Wu, Q. (2016). RNA-seq reveals the critical role of CspA in regulating Brucella melitensis metabolism and virulence. Sci. China Life Sci.

Xia, B., Ke, H., and Inouye, M. (2001). Acquirement of cold sensitivity by quadruple deletion of the cspA family and its suppression by PNPase S1 domain in Escherichia coli. Mol. Microbiol. 40, 179–188.

Yamanaka, K. (1999). Cold shock response in Escherichia coli. J. Mol. Microbiol. Biotechnol. 1, 193–202.

You, M., and Jaffrey, S.R. (2015). Structure and Mechanism of RNA Mimics of Green Fluorescent Protein. Annu. Rev. Biophys. 44, 187–206.

Zhang, J., and Landick, R. (2016). A Two-Way Street: Regulatory Interplay between RNA Polymerase and Nascent RNA Structure. Trends Biochem. Sci. 41, 293–310.

133

Zuker, M. (2003). Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415.

134

This page left intentionally blank

135

Figure 1. Broccoli-tags and the broccoli kinetic binding assay. (A) Mfold predicted lowest energy secondary structure of the U5-loop broccoli-tag. (B) U5-loop broccoli-tag in complex with

DFHBI-1T (1T) showing that fluorescence depends on RNA concentration. (C) Quantitation of fluorescence of the U5-loop broccoli-tag – DFHBI-1T complex at different RNA concentrations.

(D) Fluorescence of 1 µM U5-loop broccoli-tag differs when 10 µM of an oligo-DNA inhibitor was added before or after DFHBI-1T as indicated. (E) Cartoon of the broccoli kinetic binding assay.

CspA binds to the RNA, inhibiting the rate of folding into a confirmation that can bind DFHBI-1T.

The interaction between RNA and CspA is reversible. (F) Effect of increasing [CspA2] on the rate of formation of the U5-loop broccoli-tag fluorescent complex with DFHBI-1T. The mean values of an experiment performed in duplicate were plotted. ( 1 µM RNA, ·· 1 µM RNA + 400

µg/mL CspA2, · 1 µM RNA + 120 µg/mL CspA2, ······ 1 µM RNA + 40 µg/mL CspA2). (G) Control reaction showing the effect of increasing [RibBA], a protein not known to bind RNA. Data represents the mean of an experiment performed in duplicate. ( 1 µM RNA, ·· 1 µM RNA +

400 µg/mL RibBA, · 1 µM RNA + 120 µg/mL RibBA, ······ 1 µM RNA + 40 µg/mL RibBA).

136

137

Figure 2. Comparison of loop sequence specificity of CspA2 and CspA4 for polyG9-loop vs polyUCC3-loop broccoli-tags. (A) Mfold predicted lowest energy secondary structure of polyG9- loop and polyUCC3-loop broccoli-tags. (B) Fluorescence development over time of the indicated broccoli tag – DFHBI-1T complex in the presence of CspA2 or CspA4 at the indicated concentration. The 2-phase association curve fits of a representative experiment are shown. (

1 µM RNA, ·· 1 µM RNA + 180 µM CspA2, · 1 µM RNA + 60 µM CspA2, ······ 1 µM RNA + 18

µM CspA2, ----- 1 µM RNA + 6 µM CspA2, ·· 1 µM RNA + 1.8 µM CspA2) ( 1 µM RNA, ·· 1 µM

RNA + 190 µM CspA4, · 1 µM RNA + 68 µM CspA4, ······ 1 µM RNA + 18 µM CspA4, ----- 1 µM

RNA + 6 µM CspA4, ·· 1 µM RNA + 1.8 µM CspA4) (C) Cartoon of the broccoli kinetic binding assay highlighting the proposed mechanism for the resulting KSlow and KFast rates observed.

Depicted is the pre-reaction binding step followed by the addition of DFHBI-1T at time = 0. (D)

Secondary plots reporting the %Fast reaction vs [CspA] from the 2-phase association modeling in (B). Data was fit with a standard dose response curve. Data represents the average ± SEM from duplicate experiments. In some case error bars are smaller than the marker size and are

2 not shown. The experimental dissociation constant (Kd) and goodness of fit (R ) values are reported. (E) Secondary plots reporting the determined KSlow rates vs [CspA]. Data points represent the average ± SEM from duplicate experiments. In some case error bars are smaller than the marker size and are not shown.

138

139

Figure 3. Smr14C2 Arm1 and Arm2 broccoli-tags (A) m-fold predicted lowest energy secondary structure of Smr14C2, an αR14 family sRNA, with Arm1 highlighted in blue and Arm2 highlighted in orange (B) Arm1 broccoli-tag. (C) Arm2 broccoli tag.

140

141

Figure 4. Binding kinetics of CspA2 and CspA4 with Arm1 and Arm2 broccoli tags (A)

Fluorescent detection of DFHBI-1T complexes with Arm1 or Arm2 broccoli tags in the presence of various concentrations of CspA2 or CspA4. Arm1m is the Arm1 broccoli-tag with the two unpaired A residues shown in Fig 3B removed. Apart from the curves for CspA2 at 180 µM and

CspA4 at 190 µM and 60 µM in the Arm1 and Arm2 experiments, the lines in the graphs show

2-phase association curve fits of a representative experiment. The curves for these exceptions were generated using a one-phase association model since, because they essentially do not contain a second phase, the software cannot deal with them. ( 1 µM RNA, ·· 1 µM RNA +

180 µM CspA2, · 1 µM RNA + 60 µM CspA2, ······ 1 µM RNA + 18 µM CspA2, ----- 1 µM RNA + 6

µM CspA2, ·· 1 µM RNA + 1.8 µM CspA2, · 1 µM RNA + 0.6 µM CspA2, ······ 1 µM RNA + 0.18

µM CspA2) ( 1 µM RNA, ·· 1 µM RNA + 190 µM CspA4, · 1 µM RNA + 68 µM CspA4, ······ 1

µM RNA + 18 µM CspA4, ----- 1 µM RNA + 6 µM CspA4, ·· 1 µM RNA + 1.8 µM CspA4, · 1 µM

RNA + 0.6 µM CspA4, ······ 1 µM RNA + 0.18 µM CspA4). (B) Secondary plots reporting the %Fast reaction vs [CspA] data calculated from the 2-phase association modelling in Fig 4A. Curves were generated using a model that incorporated a Hill coefficient term to include cooperative binding. Data represents the average ± SEM from duplicate experiments. The 0.18µM point from the CspA2 –Arm1 plot and the 6 µM point from the CspA4 –Arm2 plot represent a single

2 experiment. The estimated dissociation constant (Kd), Hill coefficient, and goodness of fit (R ) values are reported. (C) Secondary plots reporting the determined KSlow rates vs [CspA].

Presented data represents the average +/- SEM from duplicate experiments.

142

143

Figure 5. CspA2 and CspA4 rescue kinetically trapped alternative structures and cDNA inhibited structures. (A) Arm1 broccoli-tag kinetic assay with CspA2 titration. Same experiment as in Figure 4A, except showing extended time points. The blue line highlights the 18 µM CspA titration point. (B) Denaturing and native gel analysis of the Arm1 broccoli-tag RNA. Lane 1 is a

1X TBE 6% polyacrylamide 6% urea denaturing gel stained with EtBr showing a single species of

RNA. Lane 2 is a 0.5X tris/glycine native gel stained with EtBr, showing multiple alternative structure confirmations. Lane 3 is a 0.5X tris/glycine native gel stained with DFHBI-1T showing that of the multiple structures in lane 2, only one of the confirmations fluoresces with DFHBI-

1T. (C) Cartoon of a broccoli kinetic binding assay including an oligo-ssDNA inhibitor. Depicted is the 1st pre-rxn binding step where the cDNA is allowed to bind the broccoli-tag, the 2nd pre-rxn where CspA is allowed to interact with the cDNA-broccoli-tag complexes followed by the addition of 1T at t = 0. (D) Arm2 broccoli-tag kinetic assay with cDNA inhibitor titration. Shown are 2-phase association curves fitting a representative experiment. (E) Arm2 broccoli-tag kinetic assay with 0.3 µM cDNA inhibitor and 18 µM CspA4 where indicated. Shown are 2-phase association curves fitting a representative experiment.

144

145

Figure 6. Cartoon of CspA mechanism in the broccoli kinetic binding assay.

146

147

Figure S1. Broccoli tag generation scheme. (A) 3 primers with overlapping regions were included in a single PCR reaction to yield a T7 transcription template. T7 promoter is highlighted in grey. (B) In vitro T7 transcription was carried with the cleaned PCR product from (A). (C) RNA product from the T7 in vitro transcription assay was purified by Trizol extraction followed by isopropanol precipitation.

148

149

Figure S2. Purified proteins and RNAs. (A) 6XHis-tagged proteins purified with Ni-NTA resin and resolved by SDS PAGE followed by Coomassie stain (B) Purified in vitro T7 transcription RNA products resolved by 1% Agarose gel electrophoresis and stained with EtBr.

150

151

Figure S3. 2-phase association modeling of broccoli kinetic binding assay curves. (A) 2-phase exponential association equation used for modeling. (B) Cartoon highlighting the key features of 2-phase association modeling.

152

153

Figure S4. Smr14C2 Arm2 and Arm2 cDNA Inhibitor. (A) Smr14C2 Arm2 Mfold lowest energy predicted structure (B) Smr14C2 Arm2 Mfold lowest energy predicted structure with the complementary cDNA inhibitor target sequence highlighted in red.

154

155

TABLE S1. Broccoli-tag sequences

156

TABLE S2. List of calculated binding constants.

157

CHAPTER 4. Conclusions and Perspectives

PREFACE

Fixed nitrogen availability is often the factor limiting plant growth. Because of this, substantial amounts of nitrogen fertilizers are used to increase crop yields in agriculture. This nitrogen input and associated techniques, such as the development of crops able to use more nitrogen, were keys to the Green Revolution and a major contributor to significant increases in human population over the last 70 years. Almost all of the nitrogen used in agriculture is derived from chemical fixation via the Haber-Bosch process, including much of the nitrogen in manure used in organic agriculture (Schrӧder, 2014). Chemically fixed nitrogen fertilizers are currently the major source of fixed nitrogen in agricultural practice (Erisman et al., 2008) and they are the largest non-farm expense in some sectors of American agriculture, a cost that is highly dependent on natural gas prices since the Haber-Bosch process uses about 5% of global methane production. Application of nitrogen fertilizers also contributes to soil acidification, groundwater nitrate contamination and waterway contamination through runoff. Greater use of alternative sources of fixed nitrogen, such as symbiotic nitrogen fixation, is an appealing alternative to the constant use of chemical fertilizers.

To improve the application of symbiotic nitrogen fixation in agriculture it is necessary to improve understanding of the bacterial factors governing rhizobial differentiation from free- living bacteria to a symbiotic nitrogen-fixing state. To this end, this body of work aimed to characterize the role of the CspA family of RNA binding proteins in the symbiotic maturation of

Sinorhizobium meliloti with Medicago sativa (alfalfa). In Chapter 2 the stress responsive and symbiotic specific expression of S. meliloti CspA family member proteins was presented as well

158 as stress responsive and symbiotic phenotypes associated a cspA deletion strains. Furthermore, the identification of CspA family interacting target RNAs and transcriptional defects associated with a double CspA deletion were presented. In Chapter 3 development of a novel fluorescent

RNA binding assay was described and its use in the characterization of CspA interactions with target RNAs was presented. In this chapter, overarching conclusions from the work presented in Chapter 2 and Chapter 3 will be discussed. For the work presented below from (Hagberg et al., unpublished), my fellow graduate student, Kelly Hagberg was responsible for the experimental design of nutrient limiting conditions. She also carried out the RNA isolation, and

Illumina RNA sequencing library sample preparation with my assistance. I was responsible for some of the data analysis, identification of statistically significant differential gene expression, metabolic pathway mapping, and I created the figures presented here.

S. meliloti CspAs family proteins and bacterial stress adaptation

In the course of these experiments we constructed fusion proteins to join CspA2, CspA4 and CspA5 with green fluorescent protein (GFP) and used these fusions in S. meliloti to monitor

CspA protein levels and immunoprecipitate the proteins along with associated RNAs.

Monitoring the fluorescence of CspA-GFP proteins after application of various stresses revealed that S. meliloti CspA2 and CspA4 proteins respond significantly to cold, heat, and high salt stress. Both CspA2 and CspA4 are up-regulated in response to cold and high salt stress and down regulated in response to heat stress. Given that the conditions in which they are upregulated are conditions in which global RNA secondary strength is increased and the condition in which they are downregulated is a condition in which global RNA secondary

159 strength is decreased, this leads to the hypothesis that CspAs, in part, function to counteract changes in physiological RNA structure strength imposed by the environment. Interestingly, expression of the other S. meliloti CspA family member we worked with, CspA5-GFP, responded in the opposite direction of CspA2-GFP and CspA4-GFP under several of the conditions tested.

Given these results, CspA5 may be antagonizing CspA2 and CspA4 expression or vice versa. This sort of inter-family regulatory contrast has been described in E. coli for CspC and CspA (Bae et al., 1999). The role of CspA2 and CspA4 in S. meliloti stress adaptation was further explored with ∆cspA2 and ∆cspA4 deletion strains as well as a double deletion strain. We observed growth defects with replica spot plate experiments for the ∆cspA2 and ∆cspA4 strains in response to the same stresses that alter expression of the GFP-labeled proteins, indicating that the wild-type response was adaptive. In most cases, the single deletion strains behaved similarly while the double deletion had a similar but more severe defect. A notable exception to this was in the strains’ response to a sub-lethal dose of kanamycin, neomycin or gentamycin, where the ΔcspA2 deletion strain had decreased sensitivity while the ΔcspA4 strain had increased sensitivity. Kanamycin is aminoglycoside antibiotic that binds directly to the 16S rRNA and disrupts the accuracy of translation, resulting in generation of miscoded proteins (François et al., 2005). Given the direct interaction characterized between CspA2 and CspA4 with the 16S rRNA in immunoprecipitation experiments of both S. meliloti and E. coli, the different sensitivities of the ΔcspA2 and ΔcspA4 strains suggested that these proteins interact in distinct ways with the 16S rRNA and may directly mediate the affinity of aminoglycoside antibiotics for their target region. The role of RNA structures and the chaperones that control them is an

160 emerging field in understanding antibiotic resistance (Dersch et al., 2017). This finding suggests

CspA family members may be potential targets for treating antibiotic resistance.

S. meliloti CspAs family proteins and rhizobia – legume symbiosis

Initial detection of Sinorhizobium medicae CspA2a and CspA4 as abundant proteins up- regulated in symbiotic vs free-living proteomes (Yurgel et al, unpublished) implicated that the rhizobial CspA family of proteins might be key factors in symbiosis. The specific increase in expression of CspA2-GFP and CspA4-GFP within the IZ in early nodule development supports this hypothesis. The observed delay in nodule maturation and decrease in overall symbiotic effectiveness with the ∆cspA2 ∆cspA4 double deletion strain further supports a role for CspA2 and CspA4 in symbiosis. Given their developmental time specific localization and the larger fraction of immature nodules on alfalfa inoculated with the ∆cspA2 ∆cspA4 double deletion strain, CspA2 and CspA4 appear to be involved in the symbiotic differentiation of the bacteria from a free-living to symbiotic state.

Identification of RNAs that interact with CspA2, CspA4 and CspA5 provided more detail into how these proteins may be influencing bacterial differentiation. Distinct populations of

RNA interact with these proteins but there were some shared targets. CspA2 and CspA4 co- precipitated significantly with global mRNAs and 16S rRNA, but most significant was the enrichment of several highly structured sRNAs that included a unique class of sRNAs, the αR14 family. We observed lower levels of αR14 sRNA family members in the symbiotic transcriptome of the ∆cspA2 ∆cspA4 double deletion strain and this decrease may contribute to the observed decrease in symbiotic effectiveness. Deletion of the αR14 family member, Smc06496, has been

161 reported to have a similar symbiotic phenotype to the one observed with the ∆cspA2 ∆cspA4 double deletion strain (Robledo et al., 2017).

The observation of what appeared to be both general and specific interactions by CspA2 and CspA4 with groups of RNA molecules led to the hypothesis that these CspAs have general housekeeping functions in the cell by interacting weakly and relatively non-specifically with global mRNA to facilitate RNA dynamics important for translation initiation, transcription termination and mRNA turnover as well as having more specific functions by interacting more strongly with specific messages to control their expression. Combining the immunoprecipitation data with transcriptome sequencing data from the ∆cspA2 ∆cspA4 double deletion strain identified several such specific interactions. Among the targets highly enriched in the CspA2 co-

IP was the Smc06778 sRNA. This sRNA is located immediately upstream of an operon encoding the RpoE2 stress responsive sigma factor. rpoE2 and related genes in the operon were among the most upregulated transcripts identified in the ∆cspA2 ∆cspA4 double deletion strain transcriptome. This suggests that S. meliloti may use CspA expression to modulate expression of a sigma factor that activates transcription of a number of stress related genes. There is a precedent for this in E. coli, where CspC and CspE affect expression of the general stress responsive sigma factor, RpoS, and mediate stability of its mRNA (Cohen-Or et al., 2010;

Phadtare and Inouye, 2001).

This work provides evidence that CspA family proteins general function is important for modulating physiological RNA structure in the cell so that housekeeping functions such as translation initiation, transcription termination, and RNA degradation can occur. As has been suggested elsewhere, this work reinforces the idea that CspA family proteins evolved to interact

162 with environmental conditions effecting RNA structure homeostasis and their transcriptional and translational expression is regulated accordingly (Yamanaka et al., 1998). This sort of sensing is exemplified in the E. coli cspA gene whose 5ʹ UTR adopts different structures in response to different temperatures to control expression of the CspA protein (Giuliodori et al.,

2010; Kortmann and Narberhaus, 2012). The general hypothesis is that expression of stress responsive sigma factors has evolved to sense the CspA environment in the cell and, in turn, to regulate their own expression through specific interaction between cis-acting sRNA elements and CspA proteins. The interaction between CspA2 and Smc06778 is an example of such an interaction. Another interaction we identified was between an αR14 sRNA family member located upstream of the sigma factor, rpoE8, which was specifically expressed in symbiosis.

Taken together, these results suggest that CspA family members help maintain a physiologically appropriate RNA structure that matches the cellular environment to regulate RNA dynamics involved in housekeeping functions such as translation, transcription, and RNA turnover under conditions that would otherwise disrupt these processes. Overlapping functions of RNA structure manipulation by CspA family proteins probably explains why these proteins belong to small families in which the family members function are partially redundant (Xia et al., 2001). In addition to a general interaction with all (or many) RNAs, specific CspA family proteins can interact with specific sRNA structures that regulate expression of genes such as stress responsive sigma factors. The demonstration in vitro that CspA2 and CspA4 binding to specific

RNA sequences was cooperative suggests that the proteins can also recognize and differentiate between RNA molecules. Through these distinct types of interaction, the cell can use CspA proteins to sense its environment and regulate gene expression accordingly. This hypothesis

163 provides a mechanism for the function of CspA family members in control of rhizobial differentiation from free-living to symbiotic state within the plant host.

CspA family proteins and aR14 family sRNAs

This work characterized a significant and novel interaction between several CspA family proteins and the αR14 family of sRNAs. As described above this interaction appears to be important for symbiotic development and overall symbiotic effectiveness. CspA2 and CspA4 interactions with αR14 family sRNA structures were investigated in vitro with the novel kinetic

RNA binding/folding assay developed in this work. Interestingly, CspA2 and CspA4 showed no preference for the 5ʹ-UCCUCCUCCC-3ʹ loop sequence nearly perfectly conserved in αR14 family sRNA throughout α-proteobacteria when tested in the context of synthetic broccoli-tag and compared to a polyG9-loop sequence. This suggested that loop sequence was not a factor contributing to the strong co-IP enrichment observed with these proteins although it does not comment on the possibility that CspA proteins might be able to influence the interaction of the

5ʹ-UCCUCCUCCC-3ʹ sequence, which is a perfect match to the 16S rRNA Shine Dalgarno sequence, with mRNA translation or with some other aspect of the initiation of translation. It is very significant that when αR14 native Arm 1 and Arm 2 broccoli-tagged constructs were tested in the assay, they had a significantly higher affinity for CspA and protein binding was highly cooperative, with some Hill constants exceeding those for hemoglobin binding to oxygen. There is no comparable evidence in the literature for cooperativity or for binding constants that are as sensitive to sequence specificity. This lead to the hypothesis that a cellular RNAs potential for cooperative binding may define global vs specific interactions of CspAs as discussed above and

164 that the zone-specific expression of CspA2 and CspA4 may have a functional role in controlling expression of specific genes in specific places.

Apart from the finding that αR14 sRNAs were very strongly enriched by both CspA2 and

CspA4, this work did not explore the interaction of sequence and structure in determining the binding constants between RNA molecules and these proteins. Strong sequence specific binding has not been reported elsewhere for CspA family proteins (Jiang et al., 1997; Rennella et al.,

2017). We did not observe a strong base sequence motif for binding the molecules enriched by

IP but, since double stranded hairpin configurations may be involved, structure may be more important than sequence. Within the limits of the sequences/structures we report here, there is something about the two different αR14 RNA arms that leads to tighter binding than the arbitrary stem sequences we tested in the experiments of Fig 1 and Fig 2 in Chapter 3.

However, both tighter binding and cooperativity disappeared after removing two unpaired bases from the Arm1 sequence, indicating that the binding is not a simple function of hairpin

ΔG or stem length.

These observations suggest that some higher order structural feature may define binding strength and cooperativity. A recent report investigating ribosome translation efficiency in the context of different inhibitory hairpin structures in the translation initiation region revealed that translation efficiency was most strongly affected by the rate of hairpin folding and not simply related to the ΔG of the hairpins (Espah Borujeni and Salis, 2016). In their work, two inhibitory hairpin structures of the same structure strength that differed in their predicted rates of formation had dramatically different inhibitory effects on translation. The model Espah

Borujeni and Salis presented was that, as a ribosome moves through hairpin mRNA sequences,

165 they are converted to single stranded RNA and that the ability for a subsequent ribosome to bind and initiate translation depends on how long the RNA remains unstructured. Potentially, slower re-folding kinetics of CspA target RNA structures may be in part responsible for the cooperative binding. In hairpins with slower re-folding rates single stranded target regions may persist for longer in solution and facilitate the re-binding of CspA molecules after release in comparison hairpins with faster folding kinetics. CspA binding to the RNA could influence this process in the interaction of proteins with the unfolded structure, by disrupting the folded structure as was shown in the rescue of Arm1 RNA alternative structures, or by dissociation of known nucleic acid duplexes in the experiments of Figures 4 and 5 in Chapter 3. We envision cooperative binding could play a role in this by increasing the sensitivity of CspA interactions as a function of protein concentration so that induction of a specific cspA gene would have a non- linear effect unlike those described by (Espah Borujeni and Salis, 2016). The ability of CspA2 and

CspA4 to cooperatively bind αR14 family sRNA structures also introduces the interesting possibility that these RNAs serve as molecular sponges for CspA, buffering the concentration of free CspA proteins in another way to locally control the RNA structure strength environment.

An example of an additional role for αR14 family sRNAs in nutrient stress was found through differential gene expression analysis of whole transcriptomes from S. meliloti free- living culture in response to nitrogen, phosphate, and combined nitrogen-phosphate limitation

(Hagberg et al., unpublished). In these experiments nitrogen limitation was introduced by providing the amino acid glutamate as the sole nitrogen source. Phosphate limitation was established by putting a low level of phosphate into the media. A schematic of the experimental design is found in (Fig. 1). αR14 sRNAs were among the most significantly upregulated genes

166 when the transcriptome of cells stressed for both nitrogen and phosphate was compared to the transcriptome of nitrogen stressed cells (Fig. 2). Using the KEGG Mapper tool to place differentially expressed genes identified in the phosphate and nitrogen limited transcriptomes onto well-characterized metabolic networks revealed a shift in gene expression within central carbon metabolism (Fig. 3). Phosphate stress led to an increase in transcription of the glyoxylate bypass genes and this change persisted when nitrogen stress was added. Induction of genes involved in decarboxylation reactions of the TCA cycle that were seen in nitrogen stressed cells was reversed by phosphate stress. αR14 sRNA was prominent in the doubly stressed cells. This suggests that αR14 sRNAs may be playing a role in the combined nitrogen- phosphate limitation response by helping to resolve the conflict in gene expression signals.

Given the proposed role of αR14 sRNAs as molecular sponges titrating free CspA concentration within the cell, this finding suggests that integrating gene expression signals may be implemented through global control of the RNA structure strength environment. This proposes the idea of higher order regulation of gene expression by sRNA mediated modulation of the

CspA controlled RNA structure strength environment. While it is a standard regime for testing induction of the nitrogen stress response, nitrogen limitation by using glutamate as the sole nitrogen source in the background of a low phosphate environment is probably not a condition that rhizobia encounter very often outside of the laboratory. Adaptive tuning of the RNA secondary strength may allow the bacteria to find an appropriate response to new combinations of stress not previously encountered in its evolution.

167

The broccoli RNA binding assay

The work in Chapter 3 describes the development of a new in vitro RNA binding assay that takes advantage of the increase in fluorescence of the small molecule DFHIB-1T when it binds to a properly folded broccoli RNA aptamer (Filonov et al., 2014). Previously the broccoli aptamer, as well as the related spinach and mango fluorescent RNA aptamers (Dolgosheina et al., 2014; Song et al., 2014), have been used for imaging RNA localization in vivo (Filonov and

Jaffrey, 2016; Strack et al., 2014). As presented in chapter 3, applying the broccoli aptamer to study protein RNA binding interactions represents a novel use of the technology and expands the set of tools that can be used to investigate RNA biochemistry in vitro. An important aspect of the data presented in Chapter 3 is that the broccoli tagged RNA assay is relatively simple to set up, and is flexible regarding protein concentration, choice of RNA molecule and other reaction conditions. It requires relatively simple equipment and the readout of the assay can be translated into biochemical parameters relatively easily. The assay gave highly repeatable values, which permitted us to generate information that quantifies the binding properties of

RNA – Csp protein complexes in a way that has not been done before and to examine phenomena, such as the refolding of alternative structures, in a new way. We used the assay in this study to investigate the S. meliloti CspA family proteins interaction with target RNA structures. However, the broccoli-tags as described in chapter 3 could be easily adapted to high throughput screens for novel RNA – protein interactions in vitro or for novel conditions that influence these interactions in vivo. The potential applications of the assay are interesting, including novel drug discovery for the correction of misfolded RNAs in human disease, design of

168 novel mechanisms and factors for controlling gene expression in engineered organisms, and synthetic optimization of RNA binding proteins for their respective targets.

Concluding remarks

Taken together, this work has defined a role for S. meliloti CspA family member proteins as key regulators of stress adaptation and as important players in symbiotic development. The phenomena proposed here contribute to the understanding of rhizobial control of gene expression in symbiosis and to general bacterial stress adaptation. The development of a kinetic

RNA binding assay based on incorporation of broccoli tags contributes a new tool for the study of protein – RNA interactions in vitro with broad application potential. Through defining novel

RNA structure control mechanisms of rhizobial symbiotic differentiation, this work contributes to efforts aimed at improving the use of symbiotic nitrogen fixation in agriculture and alleviating the world’s dependence on chemically fixed nitrogen fertilizers.

169

REFERENCES Bae, W., Phadtare, S., Severinov, K., and Inouye, M. (1999). Characterization of Escherichia coli cspE, whose product negatively regulates transcription of cspA, the gene for the major cold shock protein. Mol. Microbiol. 31, 1429–1441.

Cohen-Or, I., Shenhar, Y., Biran, D., and Ron, E.Z. (2010). CspC regulates rpoS transcript levels and complements hfq deletions. Res. Microbiol. 161, 694–700.

Dersch, P., Khan, M.A., Mühlen, S., and Gӧrke, B. (2017). Roles of Regulatory RNAs for Antibiotic Resistance in Bacteria and Their Potential Value as Novel Drug Targets. Front. Microbiol. 8.

Dolgosheina, E.V., Jeng, S.C.Y., Panchapakesan, S.S.S., Cojocaru, R., Chen, P.S.K., Wilson, P.D., Hawkins, N., Wiggins, P.A., and Unrau, P.J. (2014). RNA mango aptamer-fluorophore: a bright, high-affinity complex for RNA labeling and tracking. ACS Chem. Biol. 9, 2412–2420.

Erisman, J.W., Sutton, M.A., Galloway, J., Klimont, Z., and Winiwarter, W. (2008). How a century of ammonia synthesis changed the world. Nat. Geosci. 1, 636–639.

Espah Borujeni, A., and Salis, H.M. (2016). Translation Initiation is Controlled by RNA Folding Kinetics via a Ribosome Drafting Mechanism. J. Am. Chem. Soc. 138, 7016–7023.

Filonov, G.S., and Jaffrey, S.R. (2016). RNA Imaging with Dimeric Broccoli in Live Bacterial and Mammalian Cells. Curr. Protoc. Chem. Biol. 8, 1–28.

Filonov, G.S., Moon, J.D., Svensen, N., and Jaffrey, S.R. (2014). Broccoli: rapid selection of an RNA mimic of green fluorescent protein by fluorescence-based selection and directed evolution. J. Am. Chem. Soc. 136, 16299–16308.

François, B., Russell, R.J.M., Murray, J.B., Aboul-ela, F., Masquida, B., Vicens, Q., and Westhof, E. (2005). Crystal structures of complexes between aminoglycosides and decoding A site oligonucleotides: role of the number of rings and positive charges in the specific binding leading to miscoding. Nucleic Acids Res. 33, 5677–5690.

Giuliodori, A.M., Di Pietro, F., Marzi, S., Masquida, B., Wagner, R., Romby, P., Gualerzi, C.O., and Pon, C.L. (2010). The cspA mRNA is a thermosensor that modulates translation of the cold- shock protein CspA. Mol. Cell 37, 21–33.

Jiang, W., Hou, Y., and Inouye, M. (1997). CspA, the major cold-shock protein of Escherichia coli, is an RNA chaperone. J. Biol. Chem. 272, 196–202.

Kortmann, J., and Narberhaus, F. (2012). Bacterial RNA thermometers: molecular zippers and switches. Nat. Rev. Microbiol. 10, 255–265.

Phadtare, S., and Inouye, M. (2001). Role of CspC and CspE in regulation of expression of RpoS and UspA, the stress response proteins in Escherichia coli. J. Bacteriol. 183, 1205–1214.

170

Rennella, E., Sára, T., Juen, M., Wunderlich, C., Imbert, L., Solyom, Z., Favier, A., Ayala, I., Weinhäupl, K., Schanda, P., et al. (2017). RNA binding and chaperone activity of the E. coli cold- shock protein CspA. Nucleic Acids Res.

Robledo, M., Peregrina, A., Millán, V., García-Tomsig, N.I., Torres-Quesada, O., Mateos, P.F., Becker, A., and Jiménez-Zurdo, J.I. (2017). A conserved α-proteobacterial small RNA contributes to osmoadaptation and symbiotic efficiency of rhizobia on legume roots. Environ. Microbiol.

Schrӧder, J.J. (2014). The Position of Mineral Nitrogen Fertilizer in Efficient Use of Nitrogen and Land: A Review. Nat. Resour. 05, 936–948.

Song, W., Strack, R.L., Svensen, N., and Jaffrey, S.R. (2014). Plug-and-play fluorophores extend the spectral properties of Spinach. J. Am. Chem. Soc. 136, 1198–1201.

Strack, R.L., Song, W., and Jaffrey, S.R. (2014). Using Spinach-based sensors for fluorescence imaging of intracellular metabolites and proteins in living bacteria. Nat. Protoc. 9, 146–155.

Xia, B., Ke, H., and Inouye, M. (2001). Acquirement of cold sensitivity by quadruple deletion of the cspA family and its suppression by PNPase S1 domain in Escherichia coli. Mol. Microbiol. 40, 179–188.

Yamanaka, K., Fang, L., and Inouye, M. (1998). The CspA family in Escherichia coli: multiple gene duplication for stress adaptation. Mol. Microbiol. 27, 247–255.

171

Figure 1. Diagram of the RNA response to nitrogen and phosphate stress conditions. The RNA- seq data used to generate these patterns were generated in experiments by Kelly Hagberg

(Hagberg et al., 2017, unpublished) and the stress responses are represented by comparing

RNA abundance between the indicated conditions N = nitrogen, P = phosphate. NSR = Nitrogen

Stress Response induced by a low concentration of glutamate as a nitrogen source; PSR =

Phosphate Stress Response induced by a low concentration of phosphate; NSR(p) = Changes to the Nitrogen Stress Response caused by imposing a low phosphate concentration; PSR(n) =

Changes to the Phosphate Stress Response caused by imposing a low nitrogen concentration.

172

173

Figure 2. Differential gene expression plots of the NSR and NSR(p). Presented are differential gene expression plots like those presented in Chapter 2 (Fig. 5C and Fig. 6B). (A) Differential gene expression plot of the NSR. Genes significantly (q-value > 0.01) up or down-regulated in the NSR are highlighted in magenta. αR14 family sRNAs are highlighted in green. (B) Differential gene expression plot of the NSR(p). Genes significantly up or down regulated in the NSR from

(A) are highlighted in magenta. αR14 family sRNAs are highlighted in green. The bulk of genes differentially regulated in the NSR were no longer found to be differentially regulated under

NSR(p). In contrast, αR14 RNAs not found to be differentially regulated in the NSR were among the most significantly changed in the NSR(p).

174

175

Figure 3. Conflicting gene expression through central carbon metabolism. Significant differentially expressed genes identified under (A) NSR, (B) PSR, and (C) NSR(p), were mapped to central carbon metabolism using the KEGG Mapper tool. Genes that were upregulated are shown with green, genes that were down regulated are shown in magenta, genes not changed under the conditions are represented by black arrows.

176

177