City University of New York (CUNY) CUNY Academic Works

All Dissertations, Theses, and Capstone Projects Dissertations, Theses, and Capstone Projects

6-2014

Computational analyses of the components of Sinorhizobium meliloti ExoR-ExoS/ChvI pathway: the ExoR and ExoS proteins

Eliza M. Wiech Graduate Center, City University of New York

How does access to this work benefit ou?y Let us know!

More information about this work at: https://academicworks.cuny.edu/gc_etds/304 Discover additional works at: https://academicworks.cuny.edu

This work is made publicly available by the City University of New York (CUNY). Contact: [email protected]

COMPUTATIONAL ANALYSES OF THE COMPONENTS OF

Sinorhizobium meliloti ExoR-ExoS/ChvI PATHWAY:

THE ExoR AND ExoS PROTEINS

by

ELIZA M. WIECH

A dissertation submitted to the Graduate Faculty in Biology in partial fulfillment of the requirements for the degree of Doctor of Philosophy, The City University of New York

2014

© 2014

ELIZA M. WIECH

All Rights Reserved

ii

This manuscript has been read and accepted for the Graduate Faculty in Biology in satisfaction of the dissertation requirement for the degree of Doctor of Philosophy.

Dr. SHANEEN M. SINGH

4/4/2014 ______Date Chair of Examining Committee

Dr. LAUREL A. ECKHARDT

4/22/2014 ______Date Executive Officer

Dr. JUERGEN POLLE

Dr. THEODORE MUTH

Dr. HAI-PING CHENG

Dr. PETER KAHN

Supervisory Committee

THE CITY UNIVERSITY OF NEW YORK

iii Abstract

COMPUTATIONAL ANALYSES OF THE COMPONENTS OF Sinorhizobium meliloti ExoR-

ExoS/ChvI PATHWAY: THE ExoR AND ExoS PROTEINS

by

Eliza M. Wiech

Adviser: Professor Shaneen M. Singh

The Sinorhizobium meliloti periplasmic ExoR protein and the ExoS/ChvI two-component system form a regulatory mechanism that directly controls the transformation of free-living to host-invading cells. In the absence of crystal structures, understanding the molecular mechanism of interaction between ExoR and the ExoS sensor, which is thought to drive the key regulatory step in the invasion process, remains a major challenge. In this study, we present theoretical structural models of the active form of ExoR protein, ExoRm, as well as of the sensing domain of ExoS, ExoSp, generated using computational methods. Our model suggests that ExoRm possesses a super- helical fold comprising twelve α-helices forming six Sel1-like repeats, including two that were unidentified in previous studies. The structural model of ExoSp suggests that ExoSp is a single Per-ARNT-Sim (PAS) domain.

Docking analysis was used to suggest models for ExoSp-ExoSp and ExoSp-ExoRm protein interactions and interfaces. Our studies reveal three novel insights: (a) a possible extended conformation of the ExoR third Sel1-like repeat that might be important for ExoR regulatory function (b) a buried proteolytic site that implies a unique mechanism of proteolysis, central to controlling ExoR function and (c) an elongated structure of helix H4 that is unique to ExoSp and might be crucial for the association with ExoRm. This

iv study provides new and interesting insights into the structure of the S. meliloti ExoRm and ExoSp proteins, lays the groundwork for elaborating the molecular mechanism of ExoRm cleavage, ExoRm-ExoS interactions, and studies of

ExoR homologs in other bacterial host interactions.

v

Table of Contents

Chapter I: 1-4 Introduction

Chapter II: 5-11 Material and methods

Chapter III: 12-32 Molecular modeling and computational analyses of the Sinorhizobium meliloti Rm1021 ExoR protein

Chapter IV: 33-45 Analyses of ExoR mutant proteins

Chapter V: 46-64 Molecular modeling and computational analyses of the Sinorhizobium meliloti Rm1021 periplasmic domain of the ExoS protein

Bibliography 65-82

vi Lists of Tables

Table III.1. 13

Sel1-like repeats detected within the ExoRm amino acid sequence.

Table III.2. 21 RMSD values for the superposition of ExoR repeats and their sequence identity (%).

Table III.3. 25 In silico prediction of protein-protein interfaces and hot spots in ExoR.

Table IV.1. 38 Comparison of changes in solvent-accessible surface area (SASA) of residues identified as candidate interface residues in ExoR wild type and ExoR mutants.

vii Lists of Figures

Figure III.1. 14 Secondary structure prediction of the ExoR protein.

Figure III.2. 15-16 Alignment of repeats found in the ExoR protein of S. meliloti Rm1021.

Figure III.3. 19 Evaluation plots of the putative three-dimensional model of ExoRm

Figure III.4. 20 Ribbon representation of the putative structure of the

S. meliloti Rm1021 ExoR mature protein (ExoRm, residues 31-268).

Figure III.5. 22 Secondary structure prediction of the ExoR3 repeat.

Figure III.6. 23-24 Putative functional sites mapped to the surface of

ExoRm protein.

Figure III.7. 27 Electrostatic potentials mapped to the molecular surface of the ExoRm protein.

Figure IV.1. 33-35 Evaluation plots of the three-dimensional models of the ExoR mutants.

Figure IV.2. 40 Electrostatic potentials mapped to the molecular surface of the ExoR wild-type protein and the ExoR mutant proteins.

viii Figure IV.3. 40 Orientation of Dipole moment vectors in the ExoR wild type and the ExoR mutant proteins.

Figure V.1. 46 Schematic representation of the domains present in the ExoS protein.

Figure V.2. 47 Evaluation plots of the three-dimensional model of the periplasmic domain of ExoS.

Figure V.3. 49 Ribbon representation of the putative structure of the S. meliloti Rm1021 periplasmic domain of the ExoS protein.

Figure V.4. 51-52 Structure-based multiple of the ExoS periplasmic domain and other PAS sensors.

Figure V.5. 53 Orientation of the dipole moment vector mapped to the putative structural model of the sensing domain of ExoS.

Fig. V.6. 54 Electrostatic potentials mapped to the molecular surface of the sensing domain of ExoS.

Figure V.7. 55 Location of the putative protein-protein interaction sites within the sensing domain of ExoS.

Figure V.8. 56

Ribbon diagram of the ExoSp dimer showing packing between helices H1, H3, and H6 of opposite monomers.

Figure V.9. 59

ExoSp-ExoR complex organization.

ix Chapter I: Introduction

Most bacteria rely on two-component signal transduction systems for detecting and adapting to changes in their environment [1, 2, 3, 4].

Parasitic and symbiotic bacteria rely on these two-component systems to sense the presence of host organisms and turn on appropriate signaling cascades [5,

6, 7, 8]. Two-component systems consist of a soluble cytoplasmic or membrane- bound sensor and a cytoplasmic response regulator [9,10,11].

A typical membrane-bound histidine kinase consists of an N-terminal extracellular sensing domain flanked by two transmembrane helices and a cytoplasmic region. The cytoplasmic region is composed of a HAMP domain

(important in signal transduction), a dimerization domain, and a C-terminal catalytic (ATP/ADP-binding phosphotransfer) domain [9, 10, 12, 13, 14].

Environmental stimuli are detected either directly or indirectly by the N- terminal sensing domain. Upon transfer of the signal to the cytoplasmic domain, transphosphorylation of conserved histidine residues on the dimerization domains takes place. Then, the phosphoryl group is transferred to an aspartate residue in the regulatory domain of the response regulator.

This in turn, controls the expression of target genes through its effector domain, usually a DNA-binding domain or an enzyme [11, 14].

A group of Gram-negative bacteria rely on the ExoR-ExoS/ChvI (RSI) signal transduction pathway to transit from a free-living to host-invading form [15]. ExoS and ChvI form a typical two-component system that consists of a membrane-integral histidine kinase, ExoS, and an associated cytoplasmic response regulator, ChvI [8, 16]. The ExoS/ChvI system is regulated via ExoR, a periplasmic regulatory protein [17, 18, 19, 20].

In Sinorhizobium meliloti, a nitrogen-fixing bacterium, it has been established that the ExoS/ChvI two-component regulatory system controls the transformation of free-living cells to host-invading cells which differentiate into nitrogen-fixing bacteroids inside the host cells [15, 17, 1 18, 20]. The ExoS/ChvI system remains inactive in free-living cells and allows them to produce flagella for motility. The ExoS/ChvI system becomes active in host invading cells and turns on the production of succinoglycan, which is required for invading the host through the production of the infection threads inside root hairs [15, 21]. In addition, the ExoS/ChvI system also controls the expression of various other genes related to adaptations to alternative living forms e.g. biofilms [17, 22].

The activities of the ExoS/ChvI system are thought to be regulated by

ExoR through a direct interaction of ExoR and ExoS, the sensor of the

ExoS/ChvI two-component system. Together, the three proteins, ExoR, ExoS, and

ChvI form the RSI regulatory pathway [17, 18]. As proposed by Lu et al.

[20], the current model for the RSI pathway suggests that ExoS/ChvI system is turned off when the periplasmic domain of ExoS is in a protein complex with the mature periplasmic form of ExoR, ExoRm. In the ExoRm-ExoS complex, ExoS acts as a phosphatase, and keeps ChvI, the response regulator, dephosphorylated and inactive. When the ExoRm-ExoS interaction is disrupted through the proteolytic cleavage of ExoRm, ExoS becomes an active kinase, and phosphorylates ChvI directly [8, 20]. Phosphorylated ChvI upregulates the expression of succinoglycan biosynthesis genes while repressing the flagellum biosynthesis genes allowing the cells to switch from free-living to host invading form [15, 20].

Since ExoR modulates the signaling state of ExoS [17, 18, 19], the level of the ExoR protein needs to be regulated carefully [20]. Experimental data suggests that the putative physical association between ExoRm and ExoS proteins not only suppresses ExoS activity, but also stabilizes ExoRm [18].

Athough the trigger(s) for the disruption of stable ExoRm-ExoS interaction is still unknown, the disruption of the complex releases ExoRm and exposes it to proteolysis [18, 20]. Recent findings suggest that the site of proteolysis in

ExoR is located between A80 and L81 of ExoR with the possibility of further 2 digestion that involves residues 84, 85, 86, and 87. As a consequence of the proteolysis, ExoR exists in three forms: the full-length precursor cytoplasmic ExoRp, the mature and active periplasmic ExoRm, and the inactive periplasmic ExoRC20 [20]. The expression of the exoR gene is also under the control of the ExoS/ChvI system. Therefore, the loss of ExoRm protein in the periplasm leads to upregulation of exoR expression to replenish the ExoRm protein in the periplasm [19]. The combination of biosynthesis and proteolysis allows S. meliloti cells to maintain ExoRm levels at a fine and highly dynamic equilibrium to regulate the RSI pathway activity. This mechanism makes it possible for cells to respond to the presence of host or environmental signals which would reduce ExoRm levels and then quickly return to baseline by restoring the normal level of ExoRm in the absence of hosts

[19, 20].

Phenotypes of the ExoR loss-of-function mutant ExoRL81A [20] and the

ExoR reduced-function mutants ExoRG76C and ExoRS156Y [18] further strengthen this model of ExoR mediated regulation of the RSI pathway. The change of a highly conserved leucine (L81) to alanine at the proposed proteolytic site in

ExoRL81A results in a dramatic reduction of both the levels and the activity of ExoRm and activation of ExoS, supporting the hypothesis that proteolysis is the key regulator of ExoR activity [20]. Similar loss of ExoR based suppression of ExoS occurs when the ExoRm-ExoS interaction is disrupted in the

ExoRG76C and S156Y mutants leading to increased ExoS/ChvI activity and a phenotype of strong overproduction of succinoglycan [18].

ExoR belongs to a family of solenoid proteins with Sel1-like repeats

[23] highly conserved in Rhizobia [18, 20, 24]. Even though ExoR has been established as a key regulator of the RSI pathway [17,18,19,20], the tertiary structure including structural details and associated functional aspects of this important protein remain unknown. Delineating the structure-function correlations of the ExoR protein is critical to understanding of the cleavage 3 mechanism of ExoRm, ExoRm-ExoS interactions and other aspects of its regulatory role in the RSI pathway [18, 20].

In this study, we present a robust theoretical model of the active form of ExoR protein, ExoRm, and a structural model of the periplasmic domain of

ExoS, ExoSp, generated by means of the well-established approach of template- based modeling combined with ab initio modeling. Comparative modeling methods have been successfully applied in revealing key structural details, describing structure-function relationships, and modeling interactions with putative binding partners. These studies include those that belong to the family of helical repeat proteins such as modeling the repeats of proteasome- binding protein PA200 [25] and the repeats of TAL (transcriptional activator- like) effectors [26] confirmed by subsequent crystal structures of TAL effector-DNA complexes [27, 28]. We propose that the ExoR protein has an alpha-alpha super-helical fold that is known to be conducive to protein- protein interactions and has inherent flexibility built into it [29, 30, 31,

32, 33, 34]. We also propose that the periplasmic domain of ExoS takes on alpha-beta PAS fold. In the absence of structural representations of the ExoR and ExoS proteins in the structural database, our study provides a first glance into the proposed structural folds of ExoR and ExoSp and the implications of the identified folds for ExoR and ExoS functions.

4 Chapter II: Material and methods

ExoR protein sequence analyses

The amino acid sequence of the ExoR protein from S. meliloti Rm1021 was retrieved from the NCBI Protein database [35] (GenBank: AAA26260.1). The mature ExoR protein (238 amino acid residues long), ExoRm, i.e. without its

30-residue signal sequence [17], was used for modeling the protein as well as sequence and structure analyses.

SMART [36] was used to identify and validate boundaries of the four previously predicted Sel1-like repeats of ExoR [23]. In addition, the ExoRm sequence was analyzed with Pfam 25.0 [37], CD-Search [38] using Conserved

Domain Database v3.04 [39, 40], and Prosite 20.76 [41]. To probe for any additional Sel1-like repeats previously unidentified within ExoRm, a profile- based repeat detection method, TPRpred was used [42], along with de novo repeat identification methods: a hidden Markov model (HMM) profile based

HHrepID [43] and REPRO [44]. The boundaries of the identified repeats were manually assigned based on the data from the repeat/domain detection methods and the consensus of the output of a number of secondary structure prediction programs (detailed below) to ensure that each of the ExoR repeats encompasses the two α-helices characteristic of Sel1-like repeats [23, 31, 32].

The secondary structure prediction on the ExoR primary sequence was carried out using PROF v1.0 [45], PSIPRED v.3.0 [46], SAM-T08 [47], SSpro 4.0

[48], the PredictProtein server (PROFsec) [49], PSSpred v2 [50], YASPIN [51], a version of Jpred3 [52, 53, 54] that incorporates Jnet v2.2 [55], and YASSPP v1.0 [56] which was accessed through the MONSTER server [57].

The structure based sequence alignment of ExoR Sel1-like repeats was carried out using the TM-align algorithm [58]. The alignment of loop regions was manually refined by anchoring key conserved structural residues found in the Sel1-like repeat consensus sequence [23, 32]. To validate the presence of the last repeat and to determine its unique consensus sequence, sequences of 5 putative ExoR orthologs were retrieved using NCBI-BLASTP against the non- redundant protein sequence database [59, 60]. Redundant sequences and sequences with large inserts in this region were removed from the dataset.

The sequences were aligned using the multiple sequence alignment program, T-

COFFEE v. 9.01 [61].

ExoS protein sequence analyses

The amino acid sequence of the S. meliloti Rm1021 ExoS protein (595 residues) was retrieved from the S. meliloti 1021 database

[https://iant.toulouse.inra.fr/bacteria/annotation/cgi/rhime.cgi] (Acc

CAC41430.1). The TMHMM server v2.0 [62,63] was used to identify transmembrane regions and the boundaries of the periplasmic sensing domain of ExoS

(residues 68-278). The sequence of the excised periplasmic sensing domain of

ExoS (211 residues) was used for modeling. The secondary structure prediction of the primary sequence of the ExoS periplasmic domain was carried out using

PSIPRED v.3.3 [46], the PredictProtein server (PROFsec) [49], and PSSpred v2

[50]. The structure-based sequence alignment of the ExoS sensing domain and

PAS domains of histidine kinases was carried out using Promals3D [64].

Single- and double-PAS sensor structures and sequences were retrieved from the Protein Data Bank (PDB) [65].

Modeling methodology

The three-dimensional models of ExoRm, ExoR mutant proteins and ExoSp were generated using comparative modeling methodology, which is widely used and the current state of the art in prediction [66, 67, 68,

69].

a. Modeling methodology of ExoRm

The fold of the ExoR protein was identified using the sequence 6 similarity based algorithm, BLAST [59, 60] limited to searches against the structural database (PDB) [65], as well as the sequence-structure threading methods, SPARKS–X [70], 3D-PSSM v.2.6.0 [71], HHpred [72, 73], LOOPP [74,

75], pGenTHREADER [76], and FUGUE [77].

To generate high-quality representations of the three-dimensional structure of ExoRm, a large number of theoretical models were generated based on various alternative alignments for each structural template and also multi-template modeling regimes. MODELLER v9.8 and v9.9 [78] was used to build models with alternative alignments generated by other programs: (1) for pairwise alignments: T-COFFEE version_9.02 [61], MAFFT v6.864 [79], and FUGUE

[77] (2) for multi-sequence alignments, HHpred [72, 73]. For automatic alignment and model building of the ExoRm structure, automated homology modeling servers were also explored: 3D-JIGSAW [80], SWISS MODEL [81], (PS)2v2

[82], PHYRE [83], LOOPP [74, 75], I-TASSER [84, 85], and Robetta [86]. Some of the template based modeling algorithms, I-TASSER [84, 85] and Robetta

[86], include ab initio techniques as needed to model the proteins’ structures to atomic detail. The prediction of side-chain conformations was implemented through the program SCWRL4 [87, 88, 89] via the AS2TS system

[90].

b. Modeling methodology of the ExoR mutant proteins

We introduced point mutations in the ExoRm wild type sequence by manual replacement of the target residues. The best ExoRm structural model was generated with the I-Tasser server [84, 85]; therefore, to stay consistent in methodology and build comparable three-dimensional models of the ExoR mutants, their structures were generated with the I-Tasser server [84, 85] as well.

7 c. Modeling methodology of ExoSp

The fold of the ExoSp protein was identified using the sequence- structure threading methods, FFAS [91], FUGUE [77], HHpred [72, 73], SPARKS–X

[70], and pGenTHREADER [76]. Several structural representations of the ExoS sensing domain were generated based on various alternative alignments.

MODELLER v9.9 [78] was used to build models with alternative alignments generated by FUGUE [77] and HHpred [72, 73]. In addition, LOMETS, the Local

Meta-Threading Server [92], generated full-length models of the ExoS periplasmic domain using MODELLER [78] based on alignments from a number of different threading programs (FUGUE, HHsearch, MUSTER, PROSPECT2, PPA-I, SAM,

SP3, SPARKS, FFAS, and PRC). Automated homology modeling servers such as

(PS)2v2 [82] and I-TASSER [84, 85] were also employed. The prediction of side- chain conformations was implemented through the program SCWRL4 [87, 88, 89] via the AS2TS system [90].

The generated models of ExoRm, the ExoR mutants, and ExoSp were evaluated using ProSA-web [93, 94] and Verify3D [95, 96]. ProSA-web determines the reliability of a structural model based on the model’s energy profiles and Verify3D evaluates the model by comparing the environment of residues found in the modeled structure with their observed propensities for being in those environments. To select final models of ExoRm and ExoSp, the best-evaluating models were checked for anomalies in stereochemical properties with WHAT_CHECK [97].

Analyses of three-dimensional models of ExoRm, ExoR mutants, and ExoSp

The visualization and analyses of the shape, structural alignments, solvent-accessible molecular surfaces, putative protein-protein interaction sites and hot-spots residues, and the surface electrostatic profile of the generated models were performed by using the surface property analyses tools

8 in PyMOL [98]. All cartoons and diagrams were constructed using PyMOL [98] and Prosite: MyDomains-Image Creator [99].

a. Analysis of the structural conservation of ExoRm repeats

To examine the conformation of loops and helices in the generated models, the individual repeats were structurally aligned using the TM-align algorithm [58]. The structural conservation of the repeats was further probed by computing helical packing angles of the ExoRm models using the PyMOL script to calculate angles between helices (method: helix orientation-hbond) [100].

The conserved structural residues were identified and mapped to the secondary structure elements of the ExoRm three-dimensional models to check if the positioning of these residues followed the expected pattern observed in other

Sel1-like proteins. The STRIDE server [101, 102] was used to ascertain the location of secondary structure elements in the modeled proteins.

b. Solvent accessibility analysis of the cleavage site in the ExoR wild type and mutants

To predict the solvent-accessible surface area of the ExoR cleavage site based on primary structure the following programs were used: ACCpro at the SCRATCH Protein Predictor server [48], SABLE [103, 104, 105], NETASA

[106], and RVP-NET [107]. To determine the solvent accessibility of these residues in the generated theoretical models, the GETAREA [108] server was used.

c. Electrostatics

The distribution of surface electrostatic potential for the ExoRm model was calculated using the Poisson-Boltzmann solver, DelPhi v.4 release 1.1

[109, 110, 111, 112]. The net charge and the dipole moment of the modeled

ExoRm protein were computed using the automated system Protein Dipole Moments 9 Server [113].

d. Identification of protein-protein interaction sites in ExoRm and ExoSp

To identify putative protein-protein interaction sites on the surface of the modeled ExoRm protein and ExoS periplasmic domain, a number of automatic computational prediction methods were used: PIER [114], SPPIDER version 2 [115], and cons-PPISP [116, 117]. The PIER method depends solely on structural data that include physicochemical properties of atoms at the protein surface [114]. Only residues identified with a prediction value above

30 were considered for further analysis. The prediction method used in

SPPIDER [115] incorporates solvent accessibility of surface residues, whereas the cons-PPISP method [116, 117] employs sequence conservation in addition to the accessible surface area. The detection of interaction residues with the

SPPIDER server [115] was done with a tradeoff between sensitivity and specificity set to 1.0 for best precision. The accuracy of the cons-PPISP prediction [116, 117] is given as neural network scores, and residues that were reported and examined belong to cluster 1 (confidence of 35). In addition, interaction hot spot residues (residues that are essential for recognition or binding) were detected in the ExoR sequence using the alignment-dependent ISIS method [118, 119]. To determine changes in the solvent accessibility of candidate interface residues in the generated theoretical models of the ExoR mutant proteins, the GETAREA server [108] was used.

Protein docking

Molecular docking allows determining the orientation of two molecules relative to each other. There are several protein docking algorithms designed to model protein complexes from the protein subunits in their unbound state.

Approaches include (i) using complementarity in shapes; and (ii) simulation 10 of binding based on interaction energies and a search for optimal subunit association that involves a transformation of one of the subunits [120].

Most of the available programs or servers are rigid-body docking algorithms and direct the transformations relying either on local shape matching, e.g.

PatchDock [120] or searching the entire three-dimensional space of one of the subunits using Fast Fourier Transform (FFT) to simulate binding [121, 120], e.g. ClusPro [122, 123], GRAMM-X [124], ZDOCK [125], and HexServer [121]. The

ClusPro server [122, 123] was evaluated at the 2013 Critical Assessment of

Predicted Interactions meeting as the best docking algorithm in the automated docking server category, outperforming all other automated docking algorithms. Therefore, the ClusPro server [122, 123] was used to generate putative homodimers of ExoS, heterodimers of ExoSp-ExoRm, and complexes of

ExoSR:ExoSR’ and ExoSS’:ExoR. For each group of complexes, more than 100 putative docking orientations were produced. Based on known previously identified protein interfaces of histidine kinases as well as of solenoid proteins, we selected the most probable representations of the ExoSp-ExoSp’ and ExoSp-ExoR complexes. To specifically identify residues involved in the selected protein-protein interfaces, the protein complexes were analyzed with the SPIDDER server [115].

11 Chapter III: Molecular modeling and computational analyses of the

Sinorhizobium meliloti Rm1021 ExoR protein

III.1. The S. meliloti Rm1021 ExoR protein is composed of six Sel1-like repeats

Analysis of the ExoR sequence, using traditional domain architecture programs including the SMART database [36], suggests that the ExoR protein contains four alpha-alpha repeats categorized as Sel1-like repeats [23].

Further scrutiny by the latest repeat detection methods reveal two additional repeats, repeat 5 and 6, at the C terminus of the ExoR protein (Table III.1).

Repeat 5 (ExoR5) was predicted with similar confidence as the previously identified repeats (TPRpred, P-value 2.8E-04; HHrepID, P-value 9.0E-09;

REPRO, alignment score: 31), whereas repeat 6 (ExoR6) was identified with lower confidence than the preceding repeats (TPRpred, P-value 2.2E-02;

HHrepID, P-value 7.8E-02; REPRO, alignment score 24) (Table III.1). Secondary structure prediction analysis of ExoR (Fig. III.1) also suggests that the most of the ExoR protein folds into helices including the region that has the newly identified repeats, further supporting the presence of two additional

Sel1-like repeats. The length of each of the six repeats ranges from 32 to 40 residues and consists of two α-helices; this repeat structure is in agreement with the expected configuration of Sel1-like repeats [23, 31, 32].

12 Table III.1. Sel1-like repeats detected within the ExoRm amino acid sequence. The repeat detection methods are sensitive but not specific in determining repeats’ boundaries. Therefore, the information from the repeat detection methods together with the data on the secondary structure elements determined from the ExoR sequence was used to determine the final boundaries of the repeats (shown in bold). The significance of the predictions is reported as a – E-values, b – P-values, and c – alignment scores. Repeats detected with lower significance shown in gray. Repeats identified as TPR repeat denoted with star (*). The numbering corresponds to the residues of the full length ExoR,

ExoRp.

Method ExoR1 ExoR2 ExoR3 ExoR4 ExoR5 ExoR6

SMART a 43-74 75-110 121-159 161-196 X X

1.27E+02 1.84E-04 1.12E+01 1.56E-07 b TPRpred 43-76* 76-111 125-160 162-197 197-230* 235-268*

3.4E-07 7.7E-11 2.9E-04 2.2E-15 2.8E-04 2.2E-02 b HHrepID 40-57 | 58-88 91-134 140-174 177-213 214-241 247-265

3E-03|2E-16 4.2E-11 4.9E-15 1.6E-15 9.0E-09 7.8E-02 c REPRO 49 – 84 54 – 125 131 – 208 159-185 203-238 237-267

31 38 38 24 31 24

Consensus 43-74 75-110 121-160 161-196 197-228 229-265

13

31 | FDPGAGVTKESGPFALFKFGFSAYKSGRKDEAVEAYRYAAEKGHTGSRWALANMYAYGDG PROF: ------?HHHHHHHHH??----HHHHHHHHHHHH?----?HHHHHHHHHH?--- PSIPRED: ------HHHHHHHHHHHH?---HHHHHHHHHHHHH---HHHHHHHHHHH?---- PROFsec: --?HHH???---?HHHHHH?????----HHHHHHHHHHHH?---?HHHHHHHHHH?---- PSSpred: --??HHHHHH--HHHHHHHHHHHHH---HHHHHHHHHHHHHH--HHHHHHHHHHHHH---

VAENDLEAFKIYSEIAQQGVEPGSEDTGYFVNALISLAGYYRRGIPDTPVRSDLSQARQL PROF: ----HHHHHHHHHHHH?------?HHHHHHHHHHHHHHH?------HHHHHHH PSIPRED: ----HHHHHHHHHHHHH------??HHHHHHHHHHHHHH------HHHHHHH PROFsec: ----HHHHHHHHHHHHH?---????????????HHHHHHHHHHH------??HHHHHHH PSSpred: ----HHHHHHHHHHHHHH--HHHHHHHHHHHHHHHHHHHHHHH------HHHHHHH

YFQAASTFGVAEAQFQLARMLLSGEGGSVNVQQAKKWLNRARKNGHAGAMGVFGNVIFQE PROF: HHHHHHH---?HHHHHHHHHHH?------HHHHHHHHHHHH?---?HHHHHHHHHH?-- PSIPRED: HHHHH?----HHHHHHHHHH??------HHHHHHHHHHHHH---HHHHHHHHHHH??- PROFsec: HHHHH??---??HHHHHHH???------HHHHHHHHHHHH?---?HHHHHHHHHH?-- PSSpred: HHHHHHH?--HHHHHHHHHHHHH------HHHHHHHHHHHHHH--HHHHHHHHHHHHH-

268 | GQTARGLAYMTAALGQ-SPKDRPWLQAMQEQAFSLATEDDRRVAITMSQNMHLQNDDD PROF: ---?HHHHHHHHH?------?HHHHHHHHH??---?HHHHHHHHHH??------PSIPRED: --HHHHHHHHHHHH?---HHHHHHHHHHHHH?-----?HHHHHHHHHHHHHHH----- PROFsec: --HHHHHHHHHHHH?---?HHHHHHHHHHH?------?HHHHHHHHHHHHH?----- PSSpred: --HHHHHHHHHHHHHH--HHHHHHHHHHHHH------??HHHHHHHHHHHHHH----

Figure III.1. Secondary structure prediction of the ExoR protein. (H, α-helix; “-”, random coil; “?”, low confidence α-helix prediction)

The alignment of all six ExoR repeats reveals that the ExoR6 repeat is a highly divergent repeat and its sequence does not fit the consensus sequence for the other repeats well (Figure III.2a). The consensus sequence (based on conservation of 80% or more) derived from an alignment of the first five ExoR repeats reveals the presence of conserved alanine and glycine residues at positions 3, 8, 14, 24, 32, 39, and 43; this consensus matches the Sel1-like repeat consensus sequence for the most part [23, 32] (Figure III.2a). A sequence alignment of the C-terminal region from 28 putative ExoR orthologs that corresponds to the ExoR6 repeat shows the following unique consensus: sxDpsWhpshbcpAFshAscscRphAhshxxphhx-x (s-small, b-big, c-charged, p-polar, h- hydrophobic, x-any amino acid residue; upper case letters refer to the standard amino acid one-letter code (Figure III.2b). In addition, the S.

14 meliloti ExoR6 contains an Asp3 tag that is also found in S. medicae WSM419 but otherwise absent from other species where the following pattern of three amino acid residues is found: xpx (x-any residue, p-polar) (figure 1b). Even though ExoR6 shows a unique sequence pattern, it still conserves three important structural residues (Ala14, Ala24, and Ala32) (Figure III.2b) [32].

15 Figure III.2. (a) Alignment of repeats found in the ExoR protein of S. meliloti Rm1021. The amino acid residues located within α-helices are shaded in gray. The experimentally determined proteolysis site (A80) is indicated in green. Proline residues found in the ExoR3 loop 1 and in the helix A of ExoR6 are shown in blue. (b) Amino acid sequence alignment of the ExoR C-terminal regions from 28 ExoR orthologs. The alignment is based on ExoR6 from S.meliloti Rm1021 and corresponds to residues 229-268 of the full length ExoR

(ExoRp) that includes ExoR6 (black) and the ExoR Asp3 tag (gray). The following species are shown: Sinorhizobium meliloti Rm1021 (S.m.), Sinorhizobium medicae WSM419 (S.m.w.), Sinorhizobium fredii NGR234 (S.f.), Rhizobium leguminosarum bv. trifolii WSM1325 (R.l.1325), Rhizobium leguminosarum bv. trifolii WSM2304 (R.l.2304), Rhizobium etli CIAT 652 (R.e.c.), Rhizobium etli Kim 5 (R.e.k.), Agrobacterium tumefaciens str. C58 (A.t.), Agrobacterium vitis S4 (A.v.), Agrobacterium radiobacter K84 (A.r.), Bartonella grahamii as4aup (B.g.), Bartonella henselae str. Houston-1 (B.h.), Bartonella tribocorum CIP 105476 (B.t.), Bartonella schoenbuchensis R1 (B.s.), Ochrobactrum anthropi ATCC 49188 (O.a.), Ochrobactrum intermedium LMG 3301 (O.i.), Brucella abortus str. 2308 A (B.a.), Hoeflea phototrophica DFL- 43 (H.p.), Chelativorans sp. BNC1 (C.sp.), Starkeya novella DSM 506 (S.n.), Beijerinckia indica subsp. indica ATCC 9039 (B.i.), Rhodopseudomonas palustris CGA009 (R.p.), Bradyrhizobiaceae bacterium SG-6C (B.b.), Bradyrhizobium japonicum USDA 110 (B.j.), Bradyrhizobium sp. BTAi1 (B.sp.), Mesorhizobium ciceri biovar biserrulae WSM1271 (M.c.), Mesorhizobium loti MAFF303099 (M.l.), Mesorhizobium opportunistum WSM2075 (M.o.). The consensus sequences above the alignments reflect conservation of at least 80% of the first five repeats (ExoR1-ExoR5) (a) and of the 28 ExoR orthologs (b) and numbering indicated above the consensus sequences follows the convention of Mittl and Schneider-Brachert [23]. The numbering at the beginning and end of each repeat corresponds to the residue number of the full length ExoR, ExoRp. The conserved structural residues common to the SLR consensus sequence [23, 32] are indicated as red letters within the sequences of the S. meliloti ExoR protein and ExoR orthologs. Residues are abbreviated as follows: A, alanine; L, leucine; F, phenylalanine; G, glycine; V, valine; H, histidine; D, aspartic acid; W, tryptophan; R, arginine; s, small; b, big; h, hydrophobic; p, polar; c, charged; x, any residue.

16 III.2. Comparative modeling of the S. meliloti ExoRm protein

In BLASTP searches [59, 60] against non-redundant protein sequences, the S. meliloti ExoR sequence shows high sequence identity (45-98%) to other exopolysaccharide biosynthesis regulatory proteins (ExoR); however, BLASTP searches [59, 60] against Protein Data Bank proteins [65] showed that none of the identified ExoR proteins have a known structure that can be used to model the ExoR protein. Therefore, fold recognition algorithms were used to detect possible structural templates with similar fold to the ExoR protein. The closest structural match to the S. meliloti ExoR protein was identified as the Sel1-like repeat H. pylori cysteine rich protein C (HcpC, PDB ID: 1OUV)

[32] by several fold recognition algorithms: SPARKS–X (z-score 19.14) [70],

3D-PSSM v.2.6.0 (E-value 1.93E-05) [71], HHpred (E-value 2.3E-23) [72, 73], and LOOPP (E-value 5e-21) [74, 75]. Another Sel1-like repeat protein corresponding to locus C5321 of E. coli strain Cft073 (PDB ID: 4BWR) [126] was identified as the closest structural match to the S. meliloti ExoR protein by two other fold recognition algorithms pGenTHREADER [76] and FUGUE

[77] (p-value: 5E-08 and z-score: 31.98 respectively). Other distantly related structural templates identified (data not shown) also comprise solved structures of Sel1-like and teratricopeptide repeat proteins confirming a super-helical fold for the ExoR protein.

Even though there is low sequence identity (below 30%) between ExoR and other solenoid proteins with experimentally solved structures, we were able to generate models that evaluated favorably. Sequence similarity between the template and the target is an accepted indicator of the accuracy and possible applications of the comparative models [67, 68, 69], however this correlation is not absolute [127]. There are examples of reliable models with root-mean- square deviation (RMSD) values below 3.0Å built from alignments with sequence identity less than 12.5% [105]. Furthermore, models constructed based on alignments with sequence identity less than 40% do not always show 17 significant decrease in correct modeling of the surface accessibility of residues important in analyses of point mutations and ligand docking [127] and may be suitable to provide information unique to the target [128].

Out of approximately 200 ExoR models built, the best structural ExoRm model was selected from a group of top ranked models, based on the evaluation criteria detailed in the methods section and correlated with a fine visual inspection of biophysical properties of the models. The model generated by

I-Tasser, which builds the models based on threading alignments and ab initio modeling of loops and tails [84, 85] evaluated as the best three-dimensional model of the ExoRm protein. Energy profiles calculated using ProSA-web [93,

94] for the selected structural model of the ExoRm protein showed low energies

(below zero) and a z-score of -6.99 comparable to solved NMR and X-ray crystal structures (Figure III.3a). The verification of the three-dimensional model of the ExoRm protein via Verify3D [95, 96] also shows a high 1D-3D profile score (above 0.2) for almost the entire length of the model (Figure

III.3b). In addition, the model passed the checks of bond lengths (Z-score

0.704), bond angles (Z-score 0.901), chirality/dihedrals (Ramachandran Z- score 0.678), and chi-1/chi-2 angles (chi-1/chi-2 correlation Z-score 5.125) implemented in WHAT_CHECK [97] with values within the expected range for well-refined structures.

18

Figure III.3. Evaluation plots of the putative three-dimensional model of

ExoRm (a) ProSA-web evaluation plots [93, 94]. The z-score of the ExoR model of -6.99 (highlighted as black dot) and the energy plot obtained with ProSA- web are shown. (b) Verify 3D evaluation plot [95, 96]. The 3D-1D profile is shown. The 3D-1D averaged scores plotted on the vertical axis (window size 21) are above 0 for the entire length of the model except for some residues at the ExoR C-terminal end.

III.3. The tertiary model of S. meliloti ExoRm protein reveals a super-helical fold

The ExoRm three-dimensional model suggests a super-helical fold for the

ExoR protein. The modeled ExoRm protein consists of twelve α-helices forming six Sel1-like repeats preceded by a 3 residues long 310 helix at its N- terminus. Each repeat is formed by two antiparallel helices and the N- and C- 19 terminal helices of the repeats are referred to as the A- and B-helices respectively. Helix A is located at the inner (concave) and helix B at the outer (convex) surfaces of the super-helix (Figure III.4a).

Figure III.4. (a) Ribbon representation of the putative structure of the S. meliloti Rm1021 ExoR mature protein (ExoRm, residues 31-268). The repeats are numbered from 1 to 6 and helix A and helix B of each repeat are colored in blue and red respectively. The N-terminal 310 helix is colored in green. The location of ExoR cleavage site, helix A of ExoR2 is marked with an arrow. (b) Backbone trace of the first three repeats of ExoR (ExoR1-ExoR3). Side-chains of conserved structural residues responsible for tight packing of helices are shown in red (3,8,32, 39 and 40) and those responsible for sharp turns in loop regions are indicated in blue (14, 24, and 43). The numbers represent the positions of the amino acid residues in repeats. (c) Schematic representation of the three forms of the ExoR protein (ExoRp, ExoRm, and

ExoRc20). The cartoon shows the location of signal peptide (SP), six putative Sel1-like repeats (ExoR1-ExoR6), and the cleavage site of ExoR, A80 (flag- tagged).

20 Individual repeats are structurally very similar and superpose with

RMSD values below 1 Å, the only exception being ExoR3. The higher RMSD values

(0.9-1.3 Å) for the structural alignment of ExoR3 with other ExoR repeats can be ascribed to its elongated loop region that contains two proline residues absent from the other repeats (Table III.2 and Figure III.2a). Despite the fact that ExoR6 possesses a proline residue in the middle of helix A and a reduced number of the conserved structural residues, it is modeled structurally as a typical repeat (RMSD values range from 0.6 to 0.7 Å except for the superposition with ExoR3) (Table III.2 and Figure III.2a). The ExoR3 repeat in the ExoRm model not only shows the extended loop conformation predicted from the ExoR sequence but its helix A is also unusual in its length of 20 amino acid residues, where the range of helix length for all other helices is 12 to 15 residues. This extended helix is predicted by 4 of the 9 secondary structure prediction programs (Fig. III.5).

Table III.2. RMSD values for the superposition of ExoR repeats and their sequence identity (%). The RMSD values (in bold) were calculated using TM- align server [58]. The numbering corresponds to the residues of the full length ExoR, ExoRp.

ExoR1 ExoR2 ExoR3 ExoR4 ExoR5 ExoR6 43-74 75-110 114-160 161-196 197-228 229-265

ExoR1 - 16 22 19 16 7

ExoR2 0.7 - 28 25 9 17

ExoR3 0.9 1.1 - 22 6 9

ExoR4 0.5 0.6 1.1 - 9 9

ExoR5 0.5 0.7 1.0 0.5 - 7

ExoR6 0.7 0.7 1.3 0.6 0.7 -

21

Figure III.5. Secondary structure prediction of the ExoR3 repeat. The boundary of the repeat determined from the repeat detection methods is shaded in gray (region 121-160 of full length ExoR, ExoRp). The prediction of the secondary structure elements with the STRIDE server [101, 102] from ExoR atomic coordinates is shown in (A). (H, α-helix; “-”, random coil; “?”, low confidence α-helix prediction)

The low RMSD values for the superposition of the majority of the ExoR repeats indicate that the helical structure of the ExoR repeats follows the expected pattern for the positioning of known conserved structural residues characteristic of the Sel1-like proteins: residues 14, 24, and 43 dictate sharp turns and are present in loop regions while residues at positions 3, 8,

32, 39, and 40 are located within the α-helices and are responsible for the tight packing of the repeats [23, 32] (Fig. III.2a and III.4b). The packing of the ExoR repeats also shows high conservation: the average inter-repeat helix-packing angle found in the modeled ExoR protein is 42(±2.6)° and the average intra-repeat helix-packing angle is 18(±2.9)°.

III.4. Identification of the ExoRm protein-protein interaction sites

Since the canonical function of Sel1-like repeat proteins is typically mediated via protein-protein interactions [reviewed in 23] and the physical association between ExoS and ExoRm is known to stabilize ExoRm [18], the ExoR sequence and its modeled three-dimensional structure were examined for the presence of protein-protein interaction sites. Three putative non-overlapping

22 binding sites were identified: A, B, and C. Site A is located at the inner face of the N-terminal end of the protein, site B is found at the center of the inner face of the protein and extends to the C-terminus, and site C is positioned at the outer face of the C- terminal end of the ExoR protein (Fig.

III.6a-c).

23 Figure III.6. (a) Location of the putative protein-protein interaction sites within ExoR repeats. The sites A, B, and C are colored red, blue, and green respectively. (b, c) Putative functional sites mapped to the surface of ExoRm protein. Site A (residues identified by more than one prediction program in red; residues predicted using just one program in light red) and site C (green) in (b); and site B (residues identified by more than one prediction program in blue; residues predicted using just one program in light blue) in (c). Residues identified as hot spots are shown in black. (d) Superposition of the crystal contact II (orange) of HcpC protein [32] and site A (red) of ExoR. The superposition is based on residues 28-116 from HcpC and residues 31-143 from ExoR. The side-chains of Asn66 (HcpC), Asn83 (ExoR), and Asn122 (ExoR) are shown. (e) Superposition of the crystal contact I (orange) of HcpC protein [32] and site B (blue) of ExoR. The superposition is based on residues 225-292 from HcpC and residues 162-268 from ExoR.

Site A encompasses the ExoR proteolytic site and is composed of 36 residues found in the N-terminal 310 helix, helices A of ExoR1, ExoR2, and

ExoR3, intra-repeat loop of ExoR2 and ExoR3, the inter-repeat loop between

ExoR2 and ExoR3, and helices B of ExoR1 and ExoR2. Within site A, six residues were identified as protein-protein interaction hot spots: Lys55 in helix A of ExoR1, Glu115 in helix A of ExoR3, Arg132 and Arg133 in helix A of

ExoR3, and Asp137 and Thr138 in intra-repeat loop of ExoR3 (Table III.3 and

Figure III.6). Site B is formed by 21 residues located on the surface of helices A of ExoR4, ExoR5, and ExoR6, helix B of ExoR4, and intra-repeat loops of ExoR4 and ExoR6. Four residues of this binding patch were identified as hot spots: Arg169 (helix A of ExoR4), Glu175 (intra-repeat loop of ExoR4),

Gln209 and Glu210 (helix A of ExoR5) (Table III.3 and Figure III.6). The putative interaction site C is formed by 8 residues located on the surface of the helix B of ExoR5, helices A and B of ExoR6, and intra-repeat loop of

ExoR6. None of the residues that form site C were identified as hot spots

(Table III.3 and Figure III.6).

24 Table III.3. In silico prediction of protein-protein interfaces and hot spots in ExoR. The identified interface residues were grouped into three clusters: site A, site B, and site C. Residues identified as hot spots are shown in red. The numbering corresponds to the residues of the full length ExoR, ExoRp (Res = residue and its position; Acc = accuracy of the prediction).

PIER SPPIDER cons-PPISP ISIS PIER SPPIDER ISIS Res Acc Res Res Acc Res Acc Res Acc Res Res Acc Site A Site B P33 R169 R169 21 V37 L172 P43 0.9 S173 F44 0.8 E175 30 F47 55 F47 0.8 G176 K48 40 L188 40 K48 0.6 G201 35 F49 0.7 V202 36 F51 47 F51 F51 0.7 G204 43 S52 35 S52 S52 0.6 N205 38 N205 Y54 38 Y54 V206 32 V206 K55 K55 23 F208 39 F208 Y66 54 Q209 36 Q209 Q209 24 W79 53 W79 0.7 E210 21 A80 56 A80 0.7 D231 35 N83 41 N83 0.7 P233 32 M84 47 W234 33 Y87 32 Y87 0.9 A237 33 V91 0.8 M238 45 E93 0.9 A242 Y102 49 F243 32 P112 0.9 S114 S114 0.9 E115 E115 0.7 E115 21 D116 0.7 T117 0.6 G118 34 G118 0.6 Y119 44 Y119 0.9 N122 41 N122 0.6 A123 43 I125 32 S126 35 Y130 0.6 R132 31 R133 40 D137 D137 26 Site C T138 26 A214 T221 32 L224 35 G225 30 R232 34 L235 34 A246 M257 34

25 III.5. Analyses of the electrostatic features of ExoRm

The surface electrostatic profile of the modeled ExoRm protein was examined to further analyze the biophysical features of ExoR that may be driving interactions with its binding partners since electrostatic interactions are important for protein recognition and binding [129]. The net charge of -3 at pH 7 and a dipole moment of 584 Debye oriented from the inside (concave face) of the super-helix towards the outside (convex face) was calculated from the ExoR atomic coordinates using the Protein Dipole

Moments Server [113].

The putative protein-protein interaction site A, located in the N- terminal inner (concave) face of the super-helix, displays a mild acidic surface electrostatic profile due to the side chains of Asp89, Glu115, and

Asp137 (Fig.III.7a). There is also a negatively charged patch at the inner side of the super-helix at its C-terminus that includes several acidic residues found in the helix A (Asp231 and Glu240) and the helix B (Asp250) of

ExoR6, the intra-repeat loop of ExoR6 (Glu248 and Asp249), and Asp3 tag

(Asp266, Asp267, and Asp268) (Fig.III.7b). The protein-protein interaction site B includes a positively charged patch in the center of the protein interspersed with surface hydrophobic residues (Fig.III.7b). Another negatively charged patch is formed on the outer surface of the super-helix due to residues located on the surface of the helices B of ExoR1 (Asp60,

Glu61, Glu64, Glu71) and ExoR2 (Asp95, Glu97, Glu104); and in the intra- repeat loops of ExoR2 (Asp89 and Glu93), ExoR3 (Asp137 and Asp143), and ExoR4

(Glu175) (Fig.III.7b). The outer C-terminal end of the protein shows a hydrophobic cluster and corresponds to the binding patch identified as the site C (Fig.III.7a). Adjacent to site C, there is a positively charged patch created by the side chains of Lys185, Lys186, Arg190, Arg192, Lys193 (ExoR4, helix B), Arg148 (ExoR3, helix B), and Arg215 (ExoR5, helix B) (Fig.III.7c).

The electrostatic profile of the modeled ExoRm protein shows that the inner 26 concave surface of the protein is more negatively charged relative to the outer convex surface of the super-helix.

Figure III.7. Electrostatic potentials mapped to the molecular surface of the

ExoRm protein. The ExoR protein is oriented to visualize the interfaces of the functional sites A and C in (a), site B in (b), and the basic surface patch at the outer surface of the super-helix in (c). In (a) and (b) the protein is oriented as in figure III.5. In all shown structures the N-terminal ends are placed on the left and the C-terminal ends on the right. The surface potentials are color-graded from −4 kT/e (red) to +4 kT/e (blue).

III.6. Discussion

The ExoR protein is a key player in regulating the ExoS/ChvI two- component system responsible for the successful establishment of the symbiotic relationship between S. meliloti and its legume host plant, alfalfa

[15, 17, 18, 19, 20]. Homologs of ExoR, ExoS, and ChvI are essential for host invasion by animal and plant pathogens [130, 131]. Yet, a major gap in knowledge exists in understanding the basis of ExoR functionality since no solved ExoR structure exists. This study attempts to bridge this gap and provides a robust theoretical structural model of the S. meliloti Rm1021 ExoRm protein. Here we present a first look into the structural fold of this protein and its implications in ExoS/ChvI two-component signaling.

The ExoRm structural model reveals an alpha-alpha super-helical fold

The proposed structure of ExoRm reveals the same overall super-helical fold as the crystal structure of the HcpC protein with PDB ID of 1OUV, an

27 accepted structural template for the modeling of other Sel1-like repeat proteins [32]. Both of the proteins are composed of α/α repeats: six repeats consisting of 32 to 40 amino acid residues in ExoR, and seven repeats consisting of 36 amino acid residues in HcpC [32]. In contrast to HcpC repeats, the Sel1-like repeats in ExoR are not interlinked with disulphide- bridges characteristic of the HcpC family of proteins [23, 32].

High structural similarity (RMSD of 0.79 Å) of ExoR to HcpC and the presence of conserved structural residues characteristic of Sel1-like repeats

[23, 32] in ExoR imply that the helix packing angles in HcpC and ExoR are conserved. However, the intra-repeat and inter-repeat helix packing angles are 43(±3.5)° and 19(±3.5)° in HcpC [32], compared to 18(±2.9)° and 42(±2.6)° in ExoR. This discrepancy stems from alternate definitions of Sel1-like repeats [23]. The structure of the ExoR repeats is in agreement with the

Sel1-like repeats definition derived from SMART database [23, 36] and shown in figure III.2a with loop 1 as the flexible intra-repeat loop and loop 2 as the fixed length inter-repeat loop. In the structure of HcpC on the other hand, loop 1 (flexible) is treated as an inter-repeat loop and loop 2 (fixed length) as an intra-repeat loop [23, 32].

The structure of multi-repeat proteins is known to be a suitable scaffold for protein-protein interactions [29, 30, 31, 32]. The comparison of the ExoR and HcpC (PDB ID: 1OUV) [32] reveals that the one of the two potential peptide-binding sites identified as crystal contact II at the concave face of N- terminal end of the HcpC protein [32] matches the putative protein-protein interaction site A recognized at the concave face of the N- terminal end of the ExoR protein. The putative site B identified in ExoR partially overlaps the identified crystal contact I at the C-terminal end of

HcpC [32]. Most of the crystal contact I interface residues in HcpC are found in the last helix (7B) and the extended helix after the last repeat [32]. In

ExoR, most of the site B interface residues were identified in helices 4A, 28 5A, and 6A. There are a few interface residues identified in the last helix of ExoR (6B) but predicted with low significance and not included in our analysis (Arg251, cons-PPISP score 0.007; Ile255, cons-PPISP score 0.079;

Ser258, cons-PPISP score 0.074). Site C does not have a counterpart in the

HcpC protein. It is formed of just eight interface residues and possibly forms a novel interface site characteristic of the ExoR protein family. Since

HcpC and ExoR belong to the same structural family and share similar structural features, we anticipate that the mode of protein-protein interactions in ExoR will be similar to that of HcpC [32].

The analyses of the electrostatic profile of the modeled ExoRm protein shows an asymmetry in the charge distribution on the surface of the ExoR protein with the inside of the protein being more negatively charged relative to the outer surface of the super-helix. Since all the predicted sites of protein interaction have distinct electrostatic profiles and surface hydrophobicity, electrostatic and/or hydrophobic interactions may play a key role in driving these interactions [129, 132, 133].

The C-terminus of ExoR houses two previously unidentified Sel1-like repeats

Previous studies have reported that there are four Sel1-like repeats within the S. meliloti Rm1021 ExoR protein [23]. Our analyses of the ExoR sequence suggest that there are two additional Sel1-like repeats, ExoR5 and

ExoR6, present in the C-terminal part of the protein. The sequence signature of the Sel1-like repeats of the S. meliloti Rm1021 ExoR protein follows that of other Sel1-like repeat proteins [23, 32] except for the unique sequence pattern of its last repeat (ExoR6) which can be classified as a non- traditional Sel1-like repeat. The secondary structure prediction of the ExoR sequence also corresponds to the six α/α repeats and supports the helical repeats that take on modular architecture in the three-dimensional model of the ExoRm protein. This data suggest that the newly identified repeats are an 29 integral and functional part of the complete ExoR structural fold and protein.

Sel1-like repeats with low sequence conservation have been found to play important roles in other solenoid proteins. For example, the C-terminal end of HcpC (PDB ID: 1OUV) which includes the last Sel1-like repeat that deviates from the SLR consensus plays a key role in protein-protein interactions [32]. Similar to this divergent but functionally important terminal Sel1-like repeat in the HcpC protein, we suggest that the last repeat of the ExoR protein is important for its function in vivo. The importance of ExoR6 is supported by the phenotype of the S. meliloti exoR95 mutant that exhibits a strong overproduction of succinoglycan [35, 134]. In this mutant, the helix B of ExoR6 is replaced by a sequence of 9 amino acids

(ADSYTQVAS), rendering the ExoR95 protein nonfunctional [20, 134].

Helix A of the third repeat can assume an extended conformation

Although the modeled ExoRm protein follows the structural fold of the

HcpC protein (PDB ID of 1OUV) [32] faithfully in most regions, it is unique in showing the presence of a 20 residue long extended helix A of the ExoR3 repeat. In the modeled ExoRm, this third repeat is seven residues (SEDTGYF) longer than that predicted by the repeat detection algorithms. The ambiguous prediction of the length of helix A of the ExoR3 repeat in our analysis may be a reflection of pliability of the functional protein in that region with respect to secondary structure. Although speculative, this raises the possibility that a helix to random coil transformation or melting

[reviewed in 135] occurs which may be important for ExoR function and/or regulation. A similar unusual extended conformation of a repeat has been observed in the crystal structure of a third repeat of the PEX5 protein from

Trypanosoma brucei [136]. The PEX5 protein is a helical multi-repeat protein that contains tetratricopeptide repeats (TPRs) [136, 137] that are similar in 30 structure to Sel1-like repeats: two antiparallel α-helices per repeat [29, reviewed in 30] differing only in helix packing [23]. The first two repeats of the PEX5 protein assume a standard TPR fold but the third repeat is 29 residues long [136]. On the other hand, the crystallized human PEX5 does not show this unusual elongated conformation of its repeats [137]. Based on the similarity in sequence and function of T. brucei and human PEX5, it has been suggested that the third TPR in T. brucei PEX5 can adopt the extended form as well as the standard conformation characteristic for TPR motifs [136]. We propose a similar scenario for the ExoR3 repeat: although it is possible that the helix A of the ExoR3 repeat is a regular length α-helix (as suggested by some of our alternate models of ExoRm), it is equally probable that it is an extended helix that can transition to a regular length helix and vice-versa. Among the seven additional residues (SEDTGYF) found at the N-terminal end of the third repeat Ser114, Glu115, Asp116, Thr117, Gly118, and Tyr119 are predicted to be a part of protein-protein interaction site A. Assuming that this site is an actual interaction site, the strategic placement of this helix and possibility of helix length pliability for helix A suggest that this repeat may be important for ExoR function and/or its regulation.

Conclusion

In conclusion, we present the first attempt to generate a three- dimensional model of S. meliloti Rm1021 ExoR protein in the absence of its crystal structure. Our proposed structural model of ExoRm suggests the presence of six Sel1-like repeats that form a structural fold conducive to protein-protein interactions. Ongoing studies targeted towards the modeling and characterization of the ExoS protein with docking will allow us to identify the binding sites involved in ExoRm-ExoS interactions. Homologs of S. meliloti ExoR protein along with ExoS and ChvI have been found in many host 31 interacting bacteria. A systematic analysis of these homologs is currently underway to determine the origin and distribution of these three proteins.

While most of the homologs have been identified based on sequence similarity alone, the Agrobacterium tumefaciens ExoR, ChvG (ExoS), and ChvI, and

Brucella abortus BvrS (ExoS) and BvrR (ChvI) have been identified independently based on their crucial roles in mediating the invasion of many plants by A. tumefaciens and animals by B. abortus [5, 6, 7, 24, 131, 138,

139, 140]. The findings of our structural analyses make it possible to study the molecular mechanism of ExoR cleavage and ExoRm-ExoS interactions using rational hypothesis driven approaches, which will facilitate the studies of pathogenicities of animal and plant pathogens in general.

32 Chapter IV. Analyses of ExoR mutant proteins

IV.1. Comparative modeling of the ExoR mutants

In order to understand the structure-function relationships of the ExoR protein, we undertook the modeling and structural analyses for two known and experimentally characterized ExoR reduced-function mutants, ExoRG76C and

ExoRS156Y [18], and one loss-of-function mutant, ExoRL81A [20]. The generated models were evaluated using ProSA-web [93, 94] and Verify3D [95, 96] and the best three-dimensional representation of each mutant protein was selected.

For all three mutants, energy profiles calculated using ProSA-web [93, 94] showed low energies (below zero) and z-scores of -7.26, -6.01, -6.75 for

ExoRG76C, ExoRS156Y, and ExoRL81A, respectively comparable to the z-score calculated for the ExoR wild type, -6.99. The verification of the selected models of the mutant proteins via Verify3D [95, 96] also produced 3D-1D profile scores similar to wild-type ExoR, above 0.00 for almost the entire length of the models (Fig. IV.1).

33

34 Figure IV.1. Evaluation plots of the three-dimensional models of the ExoR mutants: I. ExoRG76C, II. ExoRS156Y, III. ExoRL81A (a) ProSA-web [93, 94] generated models’ energy profile and their z-scores (-7.26 for ExoRG76C, - 6.01 for ExoRS156Y, and -6.75 for ExoRL81A). (b) Verify3D [95,96] 3D-1D models’ profiles. The 3D-1D averaged scores plotted on the vertical axis (window size 21) are above 0 for the entire length of the generated models except for some residues at their C-termini.

IV.2. Structural comparison between the ExoR wild-type protein and ExoR mutants

To detect conformational differences between the modeled wild-type ExoR and the mutant proteins, structural alignments were calculated. The overall structures of the mutant proteins were found to be comparable to the structure of the ExoR wild type. However, closer examination of the modeled single-point mutants revealed changes in the helix packing angles. The superposition of the wild type and the mutant proteins shows RMSD values of approximately 1 Å suggesting that these structures are similar and adopt the same super-helical fold. Even though there is high structural similarity between the modeled ExoR wild type and mutant proteins, the helix packing of the mutant proteins show minor changes compared to the wild-type ExoR:

18°(±2.9), 16°(±4.9), 17°(±6.3), and 16°(±5.6) for the average inter-repeat; and 42°(±2.6), 42°(±1.2), 40°(±2.9), and 45°(±3.0) for the average intra-repeat helix-packing angles of the wild type, ExoRG76C, ExoRL81A, and ExoRS156Y respectively. The effect of these changes in the helix-helix interactions on

ExoR function and regulation are described and discussed in the following sections.

IV.3. Changes in the surface accessibility of residues involved in protein- protein interactions.

Since the stabilizing interactions between ExoRm and ExoS are disrupted

35 in the ExoRG76C and ExoRS156Y mutant proteins [18] and the ExoRL81A mutant protein undergoes higher rates of proteolysis than the wild-type protein

[20], residues involved in protein-protein interactions were examined to identify possible changes that could affect ExoRm-ExoS stabilizing interactions in these mutant proteins [18].

The analysis of the protein-protein interaction sites in the mutant proteins revealed changes in the surface accessibility of some of the residues identified as interaction hot spot residues. The mutations in the

ExoR protein altered the burial category of the following interaction hot spot residues: Arg133 (site A) from exposed to partially buried in ExoRL81A;

Asp137 (site A) from exposed to partially buried in the ExoRS156Y mutant;

Thr138 (site A) from partially buried to exposed in the ExoRS156Y mutant; and

Glu175 (site B) from exposed to partially buried in ExoRG76C and in ExoRL81A

(Table IV.1).

The examination of the putative protein-protein interaction site A revealed alterations in the burial categories of Val37 from partially buried to exposed in ExoRL81A and ExoRS156Y; Phe47 from partially buried to buried in ExoRS156Y; Ser52 from partially buried to exposed in ExoRL81A; Trp79 from buried to partially buried in ExoRL81A; Asn83 from partially buried to buried in ExoRG76C and ExoRS156Y; Glu93 from partially buried to exposed in all analyzed mutants; Asp116 in ExoRL81A; Thr117 from partially buried to buried in all analyzed mutants; Gly118 from partially buried to buried in ExoRL81A and ExoRS156Y; Tyr119 from partially buried to exposed in ExoRG76C and

ExoRL81A; Asn122 from exposed to partially buried in ExoRG76C and ExoRL81A; and Tyr130 from buried to partially buried in ExoRL81A (Table IV.1). The investigation of the putative protein-protein interaction site B showed changes in the burial state of Glu175 from exposed to partially buried in

ExoRG76C and ExoRL81A; Gly176 from partially buried to buried in ExoRG76C and to exposed in ExoRL81A; Asn205 from partially buried to exposed in ExoRL81A; 36 Trp234 from exposed to partially buried in all three mutants; Ala 237 from exposed to partially buried in ExoRG76C and ExoRL81A; Ala242 from partially buried to exposed in ExoRL81A; Phe243 from exposed to partially buried in

ExoRS156Y (Table IV.1). The inspection of the putative protein-protein interaction site C showed changes in the burial categories of Thr221 from exposed to partially buried in ExoRG76C; Leu224 from buried to partially buried in ExoRG76C; Met257 from partially buried to buried in ExoRS156Y

(Table IV.1).

In summary, our analysis of the modeled mutant protein, showed significant alterations in the surface accessibility of key residues projected to be involved in protein-protein interactions.

37

Table IV.1. Comparison of changes in solvent-accessible surface area (SASA) of residues identified as candidate interface residues grouped into three putative interfaces site A (A), site B (B), and site C (C) in ExoR wild type and ExoR mutants. Residues identified as interface hot spots residues are shown in red. The program GetArea [108] was used to calculate SASA. Residues are defined as exposed if they have SASA value larger than 50 and are considered to be buried if their SASA values are less than 20. (1=WT; 2=ExoRG76C; 3=ExoRL81A; 4=ExoRS156Y)

SASA (%) SASA (%) SASA (%) A B C

1 2 3 4 1 2 3 4 1 2 3 4

P33 96 91 88 88 R169 45 32 38 47 A214 100 100 100 96 V37 47 47 51 56 L172 44 44 30 44 T221 53 38 51 62 P43 41 39 47 47 S173 27 30 34 39 L224 12 22 7 8 F44 71 78 85 59 E175 50 32 40 52 G225 63 68 81 75 F47 22 26 31 16 G176 31 15 50 32 R232 13 12 10 13 K48 78 76 62 75 L188 0 0 0 1 L235 5 1 1 3 F49 29 31 23 27 G201 7 11 14 1 M257 29 26 42 10 F51 64 69 60 62 V202 38 36 29 34 S52 43 28 54 41 G204 0 0 1 0 Y54 47 48 35 37 N205 34 38 51 48 K55 84 88 77 82 V206 9 9 10 5 Y66 5 6 5 4 F208 34 27 23 38 W79 16 16 23 18 Q209 76 65 69 63 A80 16 18 12 10 E210 37 26 46 32 N83 31 17 23 19 D231 0 1 2 1 M84 6 15 17 12 P233 9 7 7 4 Y87 58 57 55 57 W234 59 38 46 36 V91 19 16 10 12 A237 64 49 42 73 E93 48 51 71 67 M238 5 0 10 7 Y102 0 0 2 1 A242 35 31 54 45 S114 100 100 88 100 F243 50 60 58 48 E115 88 85 88 91 A246 92 88 100 100 D116 13 18 48 13 T117 31 5 14 13 G118 39 26 11 19 Y119 45 51 59 48 N122 57 47 44 64 A123 0 0 1 0 I125 18 9 10 7 S126 25 28 41 31 Y130 15 16 34 18 R132 32 42 22 22 R133 63 53 45 56 D137 87 78 83 49 T138 47 38 41 76

38 IV.4. Changes in the electrostatic profiles of the mutant proteins.

To further our understanding of the ExoR interactions with its binding partners, the electrostatic profiles of the modeled ExoR mutant proteins were examined.

The analysis of the mutant proteins with the Protein Dipole Moments

Server [113] resulted in the same net charge of -3 at pH 7 as in the wild-type protein. However, the visual inspection of the electrostatic profile of the mutant proteins revealed variations at functional site A and the C-terminal part of the concave face of the ExoR mutants. These sites became more negatively charged compared to the electrostatic profile of these regions in the wild type protein (Fig. IV.2). These observations are consistent with the calculated changes in the dipole moment and dipole moment vector orientations for the mutant proteins. The dipole moments of 584 Debye, 606 Debye, 701

Debye, and 779 Debye were calculated from the atomic coordinates of ExoR wild type, ExoRG76C, ExoRL81A, and ExoRS156Y respectively using Protein Dipole

Moments Server [113]. In the ExoR wild type as well as in the mutant proteins, the dipole is oriented from the concave face to the convex face of the modeled proteins. In case of ExoRG76C and ExoRS156Y, the dipole passes between helices A and B of ExoR3 similar to the wild-type but points more toward the N-terminal part of ExoR3. On the other hand, the dipole of

ExoRL81A is located between helices 3B and 4A (Fig. IV.3).

In summary, our analyses of the electrostatic profiles of the modeled mutants suggest alterations in their electrostatic profiles and dipole moment that might interfere with specific interactions with other proteins [113].

39

Figure IV.2. Electrostatic potentials mapped to the molecular surface of the

ExoR wild-type protein (a) and the ExoR mutant proteins ExoRG76C (b), ExoRS156Y (c), and ExoRL81A (d). In all shown structures the N-terminal ends are placed on the left where the functional site A is found. The arrow points to the cleavage site. The surface potentials are color-graded from −2 kT/e (red) to +2 kT/e (blue).

Figure IV.3. The ExoR mutant proteins structurally aligned to show the difference in the dipole moment vectors orientations. Dipole moment of the wild type in red, ExoRG76C in blue, ExoRS156Y in magenta, and ExoRL81A in green. The dipoles were calculated using Protein Dipole Moments Server [113].

40 IV.5. Analysis of the accessibility of the cleavage site in the mutant proteins.

The ExoRL81A mutant displays a dramatic reduction of the active form of

ExoR, ExoRm [20]. The reduced levels of ExoRm are most likely the result of higher rates of proteolysis of ExoRL81A [20] possibly due to the changes in the accessibility of the cleavage site. In addition, disruption of the stabilizing interactions between ExoS and ExoR in ExoRG76C and ExoRS156Y might cause the mutant proteins to be more prone to attack by periplasmic proteases [18]. Therefore, the accessibility of the cleavage site was examined in detail to probe the mutant proteins for higher susceptibility to proteolysis.

Examination of the surface accessibility of Ala80 and Leu81 in the modeled ExoR wild type supports the results obtained from the prediction of the surface accessibility of these residues based on the ExoR sequence: both residues are buried with surface accessible area of 16% for Ala80 and 0% for

Leu81 in the wild-type ExoR protein. However, contrary to our hypothesis, the investigation of the proteolysis site in the mutant proteins showed no increase in the surface accessibility of this site. Residues at position 80 and 81 remained buried in the analyzed mutants: the solvent-accessible surface of Ala80 for ExoRG76C 18%, ExoRS156Y 10%, and ExoRL81A 12%; the accessible surface area of residue 81 ranges from 0 to 2% in these three mutants.

As no significant changes in the surface accessible area of the key residues at the proteolysis site (Ala80 and Leu81) was found, this site was further examined to identify structural features that possibly could lead to an increased proteolysis rate of these mutant proteins [18, 20]. Since the three-dimensional architecture of the ExoRm protein forms a super-helix, a fold in which, function is known to be intimately associated with the flexibility of this fold [34], normal-mode analyses with ElNemo [141] was 41 used to probe any likely conformational fluctuations in the mutant structures compared to the wild type, in the context of the cleavage site accessibility.

The open forms of the lowest modes (mode 7) of the wild type and the mutant proteins were analyzed. Ala80 remained buried in the wild type

(18%), ExoRL81A (15%), and ExoRS156Y (12%), but changed from buried to partially buried in the ExoRG76C mutant protein (23%). Even though the vibrational variations have not brought about any significant changes in the accessibility of the cleavage site, these conformational fluctuations take place in the region of cleavage. These fluctuations bring instability to this region raising the possibility of this site to become exposed upon binding of a protease or a ligand in order for the proteolysis to take place.

IV.6. Discussion

Structural models of the ExoR mutants reveal alterations in the packing of the helices

Our modeling studies suggest that function of the protein is impacted by the introduced missense mutations through changes in the side chain packing of residues and resultant alterations of the helix-packing angles without grossly impacting the structure. The ExoRG76C [18] is a point mutation located in helix A of ExoR2. There are two main aspects to this substitution: (1) replacement of a small hydrophobic residue with a more polar amino acid with a much bigger side chain and (2) a removal of one of the conserved structural residues responsible for the tight packing of the helices. In the ExoRL81A mutant protein, the mutation replaces a conserved hydrophobic residue at the experimentally identified cleavage site [20]. The

ExoRS156Y [18] is a point mutation located in helix B of ExoR3, which replaces a small polar residue with an aromatic polar residue with a bulky side chain.

42 Changes in the solvent accessibility of asparagine residues might effect interactions with binding partners in mutant proteins.

Examination of potential protein-protein interaction sites in the modeled ExoR mutant proteins reveal changes in burial state of some of the binding residues. Solvent accessibility states of Glu93, Thr 117, and Trp234 residues showed changes in all three mutants. Change in the burial category of Asn83 is specific only to ExoRG76C and ExoRS156Y. Alterations in the burial category of the following residues are only found in the ExoRL81A mutant protein: Ser52, Trp79, Asp116, Tyr130, Arg133, Asn205, and Ala242.

Asparagine residues have been recognized to play an important role in peptide recognition in HcpC (Asn66) [32], in the Hsp70/Hsp90 organizing protein (Hop) [142], and in PEX5 [137]. Two asparagine residues are present in site A of ExoR: Asn83 (helix 2A) and Asn122 (helix 3A), of which Asn83 does show slight changes in solvent accessibility in ExoRG76C and ExoRS156Y, whereas analysis of solvent accessibility of Asn122 reveals variations in

ExoRG76C and ExoRL81A. Based on the phenotypes of these two reduced-function mutants and the loss-of-function mutant, it is suggestive that Asn83 and

Asn122 might indeed play a role in mediating protein-protein interactions in

ExoR.

Mutations in ExoR affect the dipole moment of the protein

There is an asymmetric charge distribution on the surface of the ExoR wild-type protein that becomes even more pronounced in the modeled mutant proteins where the functional site A and concave face of the C terminus become more negatively charged compared to the wild-type protein. The mutations result in a bigger dipole moment and change in the orientation of the dipole in the modeled mutant proteins. Previous studies have shown that mutations that involve polar amino acids and/or changes in the size of the side chain, especially if the site of mutation is located at interface site, 43 can affect electrostatic properties of the protein such as the net charge or the dipole moment of the protein and influence steric interactions with other proteins [129]. The dipole may play an important role in directing the protein toward the correct binding site and partner [113], and delocalized electrostatic interactions allow for the interacting protein partners to stay in close proximity long enough to properly orient themselves toward the interface site [132]. Moreover, it is known that minor structural changes caused by point mutations coupled with changes in electrostatic profile interfere with protein complex formation [129, 132].

The ExoR proteolysis site stays buried in the ExoR mutant proteins

Mapping the experimentally determined cleavage site [20] on the modeled

ExoRm reveals that this site is buried and therefore, should not be accessible to periplasmic proteases in this native state. The residues at the experimentally determined cleavage site [20] also remained buried in all three ExoR mutant proteins. Moreover, the normal-mode analysis with ElNemo

[141] also does not show conformational changes that could explain higher rates of proteolysis observed in ExoRL81A [20] and possible higher susceptibility of ExoRG76C and ExoRS156Y to periplasmic proteases [18].

Therefore, we propose a model in which conformational changes in the structure or of individual repeats would be required for proteolysis to occur. DegP, a periplasmic serine endoprotease in Escherichia coli recognizes three residues of the substrate protein and cleaves after a hydrophobic residue (in most cases Val, Ala, Ile, and Thr) that is almost completely buried or solvent inaccessible in most DegP substrates [143, 144, 145]. It has been suggested that a protein has to undergo a conformational change to make the cleavage site more surface accessible for the cleavage by DegP protease to take place [143]. While the protease involved in ExoR proteolysis has not been identified, S. meliloti Rm1021 does have a homolog of E.coli 44 DegP [146]. Even though it remains to be determined if the ExoR protein is digested by the DegP protease homolog, the presence of a solvent inaccessible hydrophobic residue in the vicinity of the experimentally determined ExoR cleavage site and the proteolysis of ExoR beyond the proposed proteolytic site fall in line with the proposed model of DegP cleavage [20, 143, 144,

145]. Experimental studies designed to ascertain if indeed a DegP is the protease responsible for ExoR cleavage would be an important step towards understanding the mechanism of ExoRm cleavage.

Conclusion

Our structural analysis of the experimentally characterized ExoR mutants (ExoRG76C, ExoRS156Y and ExoRL81A) provided some insight into their loss of stabilizing interactions with the ExoS protein [18] as well as higher susceptibility to proteolysis [18, 20]. We propose that alterations in the solvent accessibility of binding residues, in particular of Asn83 and Asn122, as well as changes in the electrostatic profiles of the mutant proteins destabilize complex formation between the ExoR and the ExoS proteins.

Nevertheless, the structural examination of the modeled mutant proteins to explain higher susceptibility of these mutants to proteolysis [18, 20] has not produced a clear answer since the cleavage site remained buried.

Therefore, we suggest that ExoR has to undergo a conformational change to make the cleavage site more surface accessible for the cleavage by DegP protease to take place [143]. It is possible that the small changes in the accessibility of the cleavage site observed in these mutant proteins might become significant upon dimer formation or interactions with other protein(s) and/or ligand(s).

45

Chapter V: Molecular modeling and computational analyses of the Sinorhizobium meliloti Rm1021 periplasmic domain of the ExoS protein

V.1. Comparative modeling of the periplasmic portion of ExoS

ExoS is a histidine kinase sensor protein that contains an N-terminal periplasmic sensor domain, two transmembrane domains (TM1 and TM2), and a C- terminal cytoplasmic domain that houses the HAMP, HisKA and HATPase C domains

[8, 62, 63, 36] (Fig. V.1). We have generated homology models of the ExoS periplasmic-sensing domain, residues 68-278 (ExoSp). Sequence-based fold prediction of the ExoS periplasmic sensor domain by Delta-Blast [59,60] and

Pfam [37] indicate a stimulus-sensing domain but no specific domain was identified by SMART [36]. The closest structural match to the S. meliloti

ExoS periplasmic domain was identified as the extracytoplasmic domain of the

Bacillus subtilis PhoR sensor histidine kinase (PDB ID: 3CWF) [147] by the following fold recognition algorithms: FUGUE (z-score: 12.9) [77], FFAS03

(score -24.4) [91], and HHpred (E-value 1.9E-12) [72, 73]. Other histidine kinase sensor domains, DctB of Vibrio cholerae (PDB ID: 3BY9) [148] and of

Sinorhizobium meliloti (PDB ID: 3E4P) [149], were identified as the closest structural match to the S. meliloti ExoS periplasmic domain by two other fold recognition algorithms SPARKS–X (z-score 4.96) [70] and pGenTHREADER (p- value: 7E-03) [76], respectively. All the structural templates identified by fold recognition algorithms are Per-ARNT-Sim (PAS) fold proteins.

Figure V.1. Schematic representation of the ExoS protein showing the location of two transmembrane domains (TM1 and TM2), periplasmic PAS domain (PASp), HAMP, HisKA and HATPase C domains. 46 The I-Tasser server [84, 85] generated the best three-dimensional structural representation of ExoSp based on various evaluation criteria.

Analysis of the corresponding final model using ProSA-web [93, 94] generated low energy profiles (below zero) and a z-score of -7.55 that is comparable to solved NMR and X-ray crystal structures (Fig. V.2a). The sequence-structure fit in this model was validated via Verify3D [95, 96] and showed high 3D-1D profile scores (above 0) for the majority of the length of the model (Fig.

V.2b). In addition, the model passed the checks of bond lengths (Z-score

0.873), bond angles (Z-score 1.004), chirality/dihedrals (Ramachandran Z- score -1.428), and chi-1/chi-2 angles (chi-1/chi-2 correlation Z-score 5.685)

(all values are within expected ranges for well-refined structures) implemented in WHAT_CHECK [97].

Figure V.2. Evaluation plots of the three-dimensional model of the periplasmic domain of ExoS. (a) ProSA-web [93, 94] evaluation plot showing a z-score of -7.55 and the model’s energy profile. (b) Verify3D [95, 96] 3D-1D profile. The 3D-1D averaged scores plotted on the vertical axis (window size 21) are above 0 for the entire length of the generated model except for the H1 to H2 loop and helix H2 (residues 35 to 48). 47 V.2. Overall structure of the periplasmic domain of ExoS

The periplasmic domain of ExoS is an α/β-structure with a central β- sheet having five antiparallel strands (S1 to S5) flanked on both sides by α- helices. On one side of the beta-sheet there is an N-terminal helix H1

(residues 68-99) almost parallel to helix H3 (residues 132-146) at its C terminus and to the C-terminal helix H6 (residues 267-277) at its N terminus.

Helices H1 and H3 are linked through helix H2 (residues 106-113) that is nearly perpendicular to these two helices. The N-terminal helix H1 and the C- terminal helix H6 are perpendicular to the bacterial inner membrane surface, however helix H6 is bent. It is likely that these two helices are the continuation of the transmembrane regions (TM1 and TM2) in the intact ExoS protein since they are positioned right next to the transmembrane regions,

TM1 and TM2. The other helices, H4 (residues 188-202) and H5 (residues 218-

228) as well as the long random coil regions between strand S2 and helix H4

(residues 165-187) and helices H4 and H5 (residues 203-217), lie on the other side of the β-sheet (Fig.V.3).

48

Figure V.3. Ribbon representation of the putative structure of the S. meliloti Rm1021 periplasmic domain of the ExoS protein (residues 68-278). The α-helices are numbered from H1 to H6 and the strands of the central β-sheet are numbered from S1 to S5. The periplasmic domain is oriented perpendicular to the bacterial inner membrane.

V.3. Structural similarities between ExoSp and sensor domains of other histidine kinases

A structure-based sequence alignment was performed on the ExoS sensing domain and on a selection of single and double-PAS domains (Fig. V.4).

Proteins with single-PAS periplasmic domains include PhoR from Bacillus subtilis (PDB ID 3CWF) [147], CitA from Klebsiella pneumoniae (PDB ID 2V9A)

[150], DcuS from Escherichia coli (PDB ID 3BY8) [151], and chemoreceptor TlpB from Helicobacter pylori (PDB ID 3UB6) [152]. Proteins with double-PAS periplasmic domains include histidine kinase 4 from Arabidopsis thaliana (PDB

ID 3T4T) [153], DctB from Vibrio cholerae (PDB ID 3BY9) [148], DctB from

Sinorhizobium meliloti (PDB ID 3E4P) [149], KinD from Bacillus subtilis (PDB

49 ID 4DAH=4JGP) [154], LuxQ from Vibrio cholerae (PDB ID 3C30) [155], LuxQ from

Vibrio harveyi (PDB ID 2HJ9) [156], chemotaxis protein from Vibrio cholerae

(PDB ID 3C8C) [157], mmHK1S-Z3 from Methanosarcina mazei (PDB ID 3LIB) [158], soHK1S-Z6 from Shewanella oneidensis (PDB ID 3LIC) [158], and vpHK1S-Z8 from

Vibrio parahaemolyticus (PDB ID 3LIE) [158]. Despite low sequence similarity among the PAS-like domains (below 20%), they show high conservation of their secondary structure elements: the single-PAS sensors contain six α-helices and a single with five strands and the double-PAS sensors contain seven α-helices and two five-stranded beta sheets, one in the membrane-distal domain and one in the membrane-proximal domain. However, it has to be noted that helices H2 and H4 are not present in all of the analyzed sensing domains

(Fig. V.4). Despite its longer length of 211 residues, the sensing domain of

ExoS conforms to the structure of single-PAS sensors based on the number and type of the structural elements present in the domain. The unusual length of this single PAS-like domain (almost twice the size of other single-sized PAS domains) is due to the elongated region between helices H1 and H3 as well as between S2 and H5 (Fig. V.3). These elongated structures unique to ExoSp produce relatively high root-mean-square deviation (RMSD) values (below 3Å) when ExoSp is superposed with single-PAS sensors.

50 ExoSp 1 ------QFREGLIDARVESLLTQGEIIAGAISASASVDTNSITIDPEKLLELQAGES 51 3CWF_A 1 ------TSDQRKAEEHIEKEAKYLAS------LLDAGNL-- 27 2V9A_A 1 ------EERLHYQVGQRALIQAMQISA------MPELVEAVQKR-- 32 3BY8_A 1 ------DMTRDGLANKALAVARTLAD------SPEIRQGLQK--- 30 3UB6_A 1 ------AQLMEHLETGQYKKREKTLAYMTKILEQ------GIHEYYKSFDND-- 40 3T4T_A 1 --MDDANKIRREEVLVSMCDQRARMLQDQFSVSVNHVHALAI------LVSTFHYHKNPS--- 52 3BY9_A 1 ----RFQYQALLNEHQSQLDRFSSHIVATLDKYAHIPHLISK------DKELVDALLSA-- 49 3E4P_B 1 ------LAGQSRIDASLKASLLRAVVERQRALPLVLAD------DAAIRGALLSP-- 43 4DAH_A 1 ---KDTIAAEHKQEASVLLNLHRNKINYLIGETMARMTSLSI------AID-----RPV-- 45 3C30_A 1 ------TSVQTSSLIQSLFDFRLAALRIHQDSTAK------NASLINALVSR-- 40 2HJ9_C 1 ------KQQTSALIHNIFDSHFAAIQIHHDSNSK------SEVIRDFYTDR-- 39 3C8C_B 1 ------LRSMVSDSVDEIVDGVSKTTAEVINGRKSIAQYATS------LIENNP-- 42 3LIB_A 1 ------KLAYQQSVEMASNYANQFDADMKANLAIARTIST------TMES---YETA-- 42 3LIC_A 1 -ENYLSIEKRLYENLAQESSHSASRLQFLLEHAQANTQGLSD------FIGLLADKDDIN-- 53 3LIE_A 1 QALLANNVENTAKEALHQLAYTGREYNNIQDQIETISDLLGH------SQSLYDYLREP-- 53 H1 H2

ExoSp 52 ITPLPSDEDLEFPIIQERVAPVLRRLI--SPT---RTRARLFDAD------91 3CWF_A 28 ------NNQANEKIIKDAG--GAL---DVSASVIDTD------53 2V9A_A 33 ------DLARIKALIDPMR--SFS--DATYITVGDAS------59 3BY8_A 31 ------KPQESGIQAIAEAVR--KRN--DLLFIVVTDMQ------59 3UB6_A 41 ------TARKMALDYFKRIND-DKG---MIYMVVVDKN------68 3T4T_A 53 ------AIDQETFAEYTARTAFERPLLSGVAYAEKVVNFEREMFERQHNWVIKTMDRGEPS 107 3BY9_A 50 ------QNSAQIDITNRYLEQVN--EVI--QAADTYLIDRF------80 3E4P_B 44 ------DRPSLDRINRKLEALA--TSA--EAAVIYLIDRS------73 4DAH_A 46 ------DIKKMQSILEKTF--DSEP-RFSGLYFLNAK------73 3C30_A 41 ------DSSRLDEFFSSVD-ELELSNAPDLRFISSHD------70 2HJ9_C 40 ------DTDVLNFFFLSID-QSDPSHTPEFRFLTDHK------69 3C8C_B 43 ------EPDNVRTIISQPLI----KNTFLLVGFGLEK------69 3LIB_A 43 ------DRDEALLILENLL--RDNP-HLLGTYVAFEPDA------FDGKDAEYTNSP- 84 3LIC_A 54 ------NPEKLKTVLTNRI--QRNP-DFFGSAIAFKP------NTFPN----- 86 3LIE_A 54 ------SKANLTILENMWSSVA--RNQK-LYKQIRFLDTS------84 H3 S1

ExoSp 92 ------ADLLLD-SRHLYSGGQVLRFDLPPVDPESPSLADEFGTWFNRLLQPGDLPLYKEPP--GGNGSI 152 3CWF_A 54 ------GKVLYG-SNGR------SAD 66 2V9A_A 60 ------GQRLN------64 3BY8_A 60 ------SLRYSH--PE------AQR------I--GQPFK- 76 3UB6_A 69 ------GVVLFD-PVN------PKT------V--GQSGLD 87 3T4T_A 108 PVRDEYAPVIFSQDSVS------YLE--SLDMMS 133 3BY9_A 81 ------GNTIAS-SNWN------LDRSFI--GRNFAW 102 3E4P_B 74 ------GVAVAA-SNWQ------EPTSFV--GNDYAF 95 4DAH_A 74 ------GDVTAS-TTELKT------KVNLAD 91 3C30_A 71 ------NILWDD-GNAS------FYGI----AQQEL- 89 2HJ9_C 70 ------GIIWDD-GNAH------FYGV----NDLIL- 88 3C8C_B 70 ------DGSNIN-NDPS------WNPGP--TWDPRV 90 3LIB_A 85 -AHDGTGRFVPY-WNKM-NGTA------SVAP----LLHYDS 113 3LIC_A 87 -----KKLFSPY-VYRS-GSGF------NYLDIGADGYDYTD 115 3LIE_A 85 ------GTEKVR-IKYDFKTSI------AGPSLI---LRDKSA 111 S2 H4

ExoSp 153 ------YPEVMNALT---GVRGAVVR--VTEK---GELIVSVAVPVQR--FR------AVLG 192 3CWF_A 67 ------SQKVQALVS---GHEGILSTD------NKLYYGLSLRS--EG------EKTG 101 2V9A_A 65 ------AKSYVSVRKGSL------GSSLRGKSPIQD-ATG------KVIG 95 3BY8_A 77 ------GDDILKALN---GEENVAIN--RGF----LAQALRVFTPIYD-ENH------KQIG 116 3UB6_A 88 AQSVDGVYYVRGYLEAAKK---GGGYTYYKMPKYDGGVPEKKFAYSHYDEVS------QM 138 3T4T_A 134 GE-----EDRENILRARET--GKAVLTSPFRLLET---HHLGVVLTFPVYK-SSLPENPTVEERIAATAG 192 3BY9_A 103 ------RPYFYLSIA---GQKSQYFA-LGSTS---GQRGYYYAYPVIY--AA------EILG 143 3E4P_B 96 ------RDYFRLAVR---DGMAEHFA-MGTVS---KRPGLYISRRVDG--PG------GPLG 136 4DAH_A 92 ------RSFFIKAKET--KKTVISDSYSSRIT---GQPIFTICVPVLD-SKR------NVTD 135 3C30_A 90 ------NKLIRRVAI----SGNWHLVQTPSEG--KSVHILMRRSSLIE-ATG------QVVG 132 2HJ9_C 89 ------DSLANRVSF----SNNWYYINVMTSI--GSRHMLVRRVPILDPSTG------EVLG 132 3C8C_B 91 ------RPWYKDAKNA--GKLVITAPYADSAS---GEILVSVATPVKDSATG------QFLG 135 3LIB_A 114 ------SDYYQLPKAT--EKDVLTEPYFYE-----GVFMVSYVSPIMK--EG------EFAG 154 3LIC_A 116 G------NWDWWSKAINQ--VGGYWSKAYFDEGA--GNVLMITYAVPFGV--QP------DYFG 161 3LIE_A 112 ------REYFKYAQSLDNEQISAWGIELERD----LSPSLRILMPISV--ND------VRQG 155 H5 S3 S4 S5

51

ExoSp 193 VLLLSTQA---GDIDKIVHAER------211 3CWF_A 102 YVLLSAS------108 2V9A_A 96 IVSVGYTI------E------104 3BY8_A 117 VVAIGLEL---SRVTQQIND------133 3UB6_A 139 VIAATSYY---TDINTENKAIKEGVNKV------163 3T4T_A 193 YLGGAFDVE--SLVENLLGQ-LAGNQAIVVHVYDITNASDPLVMYGN------236 3BY9_A 144 VIVVKMDL---SAIEQGWQ-----NKSSYFVATDD----HQVVFMSS-QPAWLF-HSVADLSPAQLNDIR 199 3E4P_B 137 VIVAKLEF---DGVEADWQA-----SGKPAYVTDR----RGIVLITS-LPSWRF-MTTKPIAEDRLAPIR 192 4DAH_A 136 YLVAAIQI---DYLKNLINLL---SPDVYIEVVNQ----DGKMIFAS--GQAS------176 3C30_A 133 YLYVGIVLNDNFALLENIRSGSN---SENLVLAV-----DTTPLVST--LKG------174 2HJ9_C 133 FSFNAVVLDNNFALMEKLKSESN---VDNVVLVA-----NSVPLANS--LIG------174 3C8C_B 136 SIFYDVSL---AELAELVNE-VKLFDAGYVFIVSE----DGTTIAHP-KKEFNG-KPMS------184 3LIB_A 155 IGGVDVSL---EYVDEVVSK-VRTFDTGYAFMVSN----SGVILSHPTHKDWIGKKDLY------205 3LIC_A 162 VTTVDLAL---DRLPEQLGIA-----PSRLVVLDD----QGRLIFHS-DKEKVLAGWLD------207 3LIE_A 156 YLVLNVDI---EYLSSLLNY-SPV-RDFHIELVKH----KGFYIASP-DESRLYGD------201 S5 H6 S6 S7 H7A

ExoSp ------3CWF_A ------2V9A_A ------3BY8_A ------3UB6_A ------3T4T_A 237 ------QDE------EAD----RSLSHESKLDFGDPFRKHKMI 263 3BY9_A 200 QSQQYLDSPIPSLGWQ------GDLQAEQSEWRKPEKHWLQD----DYIVSSRPLPE---LAL-TIR 252 3E4P_B 193 ESLQFGDAPLLPLPFRKIE------ARPDGSSTLDALLPGDSTA----AFLRVETMVPS---TNW-RLE 247 4DAH_A 177 ------HAE----DQKPVSGYLDD---ISW-NMK 196 3C30_A 175 ------NEPYSLDYVVHSAKDDSFIVGQTFLEVES-VPTYLCV 210 2HJ9_C 175 ------DEPYNVADVLQR-----LLVIETPIVVNA-VTTELCL 205 3C8C_B 185 ------EFLGE------SKINVDTHQVII-----NGK----PYAVSFSDVEG---EDW-YVG 221 3LIB_A 206 ------DFGGEELEKASRDIKNGIGGHLETADP---TTGK----TVILFYEPVET---GDF-AFV 253 3LIC_A 208 ------KQNIKNIAFATLLNDGQAGQASFVD----DKGT----VYLASVAEVAK---LKW-RVV 253 3LIE_A 202 ----IIPERSQFNFSNMYPDIWPRVVSEQAGYSYSG------EHLIAFSSIKFVS-NEPLHLI 253 H7B S8 S9 S10

ExoSp ------3CWF_A ------2V9A_A ------3BY8_A ------3UB6_A ------3T4T_A 264 CRYHQ------268 3BY9_A 253 VLSPKI------258 3E4P_B 248 QLSPL------252 4DAH_A 197 VYPNPV----TIE----- 205 3C30_A 211 YSIQTN----QNV----- 219 2HJ9_C 206 LTVQ------209 3C8C_B 222 VVIDEEIAYAALDELRRS 239 3LIB_A 254 LVVPKEEMLAGVADLRE- 270 3LIC_A 254 VMVPKHELFASL------265 3LIE_A 254 IDLSNEQLSKRAT----- 266 S10 H8

Figure V.4. Structure-based multiple sequence alignment of the ExoS periplasmic domain and of single- (3CWF, 2V9A, 3BY8, 3UB6) and double-PAS (3T4T, 3BY9, 3E4P, 4DAH, 3C30, 2HJ9, 3C8C, 3LIB, 3LIC, 3LIE) domains. The RXYF motif is marked in bold. Secondary structure elements are specified at the bottom alignment based on the PSIPRED [46] calculation implemented in the PROMALS3D web server [64]. Helix residues are shown in red and strand residues are shown in blue. Numbers at the start and the end of sequences specify residues that correspond to the sensing domains of the PAS sensors used in the alignment.

52 V.4. Analysis of the electrostatic features of the periplasmic domain of ExoS The surface electrostatic profile of the periplasmic domain of ExoS was examined to determine if electrostatic interactions could be the driving force for ExoSp to form homodimers and/or complexes with other proteins. ExoSp has a net charge of -11 at pH 7 and a dipole moment of 774 Debye [113]. The dipole moment is oriented from the top of the sensing domain towards the membrane and crosses the beta sheet from the ligand-binding side towards the

ExoSp-ExoSp’ dimerization site (Fig. V.5).

Figure V.5. The putative structural model of the sensing domain of ExoS showing the dipole moment vector calculated using Protein Dipole Moments Server [113].

The electrostatic profile of the model of the ExoSp is overall negative with regions of mild acidic surface charge interspersed with surface hydrophobic and positively charged residues (H1: R70, R77; H3: R142, R143; H3 to S1 loop: R149; S1: R151, R153; S2 to H4 loop: R176; helix 4: R197; H4 to

53 H5 loop: K210; S3: Arg236; S4 to S5 loop: Arg253, Arg254; H6: K272; Arg278)

(Fig. V.6).

Fig. V.6. Electrostatic potentials mapped to the molecular surface of the sensing domain of ExoS. The ExoSp protein is oriented to visualize the ExoS-

ExoS’ interface in (a) and the putative ExoSp-ExoR interface in (b). The surface potentials are color-graded from −4 kT/e (red) to +4 kT/e (blue).

V.6. Identification of the protein-protein interaction sites in the periplasmic domain of ExoS

Since ExoSp is known to form a complex with ExoRm [18] and the functional units of many of the two-component sensor kinases are dimeric

[156, 159, 160], the periplasmic domain of ExoS sequence and its modeled three-dimensional structure were examined for the presence of protein-protein interaction sites. The candidate interface residues were found on the surface of helices and loops on both sides of the sheet implying two non-overlapping interaction sites (Fig. V.7). Helices 1 and 4 show highest number of candidate interface residues, 19 and 14 respectively. None of these residues within helix 1 were identified as hot spot residues. Within helix 4, three residues were identified as protein-protein interaction hot spots Glu192,

54 Asn198, and Arg199. Several protein-protein interaction hot spots were identified within the loop between H1 and H2 (Thr100, Asn101, and Ser102), helix 2 (Glu108), the loop between H2 and H3 (Glu117, Asp125), helix 3

(Arg143), the loop between H3 and S1 (Arg149), strand 1 (Arg151), the loop between S2 and H4 (Ser170, Leu175, Phe177, Pro184, Glu185), the loop between

H4 and H5 (Pro203, Tyr209, Lys210, Glu211), the loop between S3 and S4

(Glu239, Lys240, Gly241), and the loop between S4 and S5 (Arg253, Phe254,

Arg255).

Figure V.7. Location of the putative protein-protein interaction sites within the sensing domain of ExoS. (a) Putative ExoSp-ExoSp’dimer interface (blue). (b) Putative ExoSp-ExoR’ complex interface (red). The figures are oriented in the same orientation as in figure V.6. Residues identified as interface hot spot residues are shown in black.

V.7. Docking analysis of the periplasmic domain of ExoS

We generated ExoSp-ExoSp’ homodimers using ClusPro [122,123] and scrutinized the differences in the molecular architecture of the predicted homodimers. Previous studies [148, 150, 158] have demonstrated that the most likely functional homodimer interfaces of the periplasmic domains of histidine kinases are interfaces involving N-terminal helices. Therefore,

55 this class of predicted homodimers was chosen for further analysis. One of the ExoSp-ExoSp’ homodimers was selected as the best representation based on the fact that the homodimer is the most symmetric showing similar change in the surface area of the interface for both chains (1363Å2 and 1317Å2). The N- terminal helices from both monomers are perpendicular to the putative membrane, and most of the identified interface residues were recognized as candidate interface residues or protein-protein interaction hot spots on the

ExoS monomer. In this interface, each of the two helices H1 on both sides of the interface is joined by helix H3 at their C terminus and by helix H6 at their N terminus. In addition, residues found in the loop regions of H1 to

H2, H3 to S1, and S2 to H4 are part of the homodimer interface (Fig. V.8).

Figure V.8. Ribbon diagram of the ExoSp dimer showing packing between helices H1, H3, and H6 of opposite monomers. Individual monomers are colored blue and green with interface residues shown in ball-and-stick representation. The two monomers are oriented perpendicular to the bacterial inner membrane.

56 V.8. Overall structure of the ExoSp-ExoR complex

To visualize the interaction between ExoR and ExoSp, we modeled the structure of this heterodimer. Analysis of the generated ExoSp-ExoR complexes revealed several distinct candidate ExoSpR interfaces. In the first type, referred to as interface A, (e.g. between ExoSp H4 and ExoR site A), the interface is mainly built by the helix H4, and the loops flanking helix 4: H3 to H4 and H4 to H5 of ExoSp and the putative protein-protein interaction site

A of ExoR. Additional interactions in the H2 to H3 loop of ExoSp stabilize the

C-terminal region of ExoR (Fig.V.9a). This complex interface buries 1437 Å2 and 1401 Å2 of accessible surface of ExoSp and ExoR, respectively.

Within the next putative ExoSp-ExoR interface, referred to as interface

B, (e.g. between the ExoSp H5 to S3 loop and ExoR site A), there is one main contact region located between helix H1, strands S3 and S4, and loops H5 to

S3 and S4 to S5 of ExoSp and the putative protein-protein interaction site A of ExoR. Additional interactions between residues located within the H2 to H3 loop of ExoSp and Asp265 of ExoR stabilize the C-terminus of ExoR (Fig.V.9b).

2 2 This ExoSp-ExoR interface buries 1415 Å and 1374 Å of accessible surface of

ExoSp and ExoR, respectively.

The dominant feature of the third heterodimer interface, referred to as interface C, (e.g. between ExoSp H4 and C-terminus of ExoR) is the association of helix H4, its flanking loop regions, and the H2 to H3 loop of ExoSp with the C-terminus of ExoR and its putative protein-protein interaction site B.

In addition, ExoR2 and ExoR3 intra-repeat loops are stabilized through interactions with the S4 to S5 loop as well as with helix H5 and the S1 to S2 loop of ExoSp, respectively. The N-terminal end of ExoR is not stabilized by

ExoSp interactions (Fig.V.9c). This heterodimer formation decreases the total

2 2 accessible surface of Exosp by 1970 Å and of ExoR by 1891 Å .

The fourth heterodimer interface (referred to as interface D), the electrostatic-favored heterodimer interface (e.g. between ExoSp H4 and the 57 positive patch on the ExoR convex surface) involves the association of helix

H4 and loops H2 to H3, H3 to H4, and H4 to H5 of ExoSp and mainly involves residues within helices B of ExoR4, ExoR5, and ExoR6 of ExoR that include the positive patch on the convex surface of ExoR and its putative binding site C

(Fig.V.9d). This complex formation decreases the accessible surface by 1180 Å2 in ExoSp and by 1091 Å2 in ExoR.

Each of the four different ExoR-ExoSp complexes was dimerized, and the interface between the ExoR-ExoSp: ExoSp’-ExoR’ complex was found to involve N- terminal helices H1 and H3, and C-terminal helix H6 of ExoS. The tetramer interface interactions are the same as observed for the ExoSp-ExoSp’ homodimer interactions.

To investigate other alternate possibilities of interaction modes between ExoR and ExoSp, we investigated the ExoSp-ExoSp’:ExoR complex

(interface E). The characteristic of this trimer interface is the association of the ExoR protein with both ExoSp monomers. The putative protein-protein interaction site A of ExoR interacts with helix H4 of ExoSp. In addition, the putative site C and the part of the positively charged patch on the convex surface of ExoR is stabilized through the interactions with the H5 to S3 and

S4 to S5 loops as well as with strand S3 and helix H1 of ExoSp’ (Fig.V.9e).

2 This trimer formation decreases the accessible surface of ExoSp by 2106 Å ,

2 2 ExoSp’ by 2247 Å , and of ExoR by 1737 Å .

58

Figure V.9. ExoSp-ExoR complex organization (a) interface A (b) interface B

(c) interface C (d) interface D (e) interface E. ExoSp is colored blue and ExoR green. The complexes are oriented perpendicular to the bacterial inner membrane. Helix H4 of ExoSp and N terminus of ExoR are labeled in each model.

V.9. Discussion

The histidine kinase, ExoS, along with its response regulator, ChvI, forms a two-component system important for sensing and adapting to environmental changes [8,16]. Here we present the three-dimensional model of the ExoS periplasmic sensing domain and the implications of the ExoSp fold for the formation of ExoS homodimers and ExoSp-ExoR complexes.

The S. meliloti ExoS periplasmic domain reveals a PAS fold

The three-dimensional model of the S. meliloti ExoS periplasmic domain suggests a PAS-like fold with a central β-sheet enclosed by α-helices. The

59 structure-based sequence alignment of the ExoSp with PAS-domain proteins further supports a single-sized PAS structure for the sensing domain of ExoSp.

This is in agreement with previous studies that most of the periplasmic domains of histidine kinases take on a single or double PAS-like fold [158,

161]. The structure-based sequence alignment as well as the superposition of

ExoSp with PAS domain proteins reveals structural elements that are common to single-sized PAS-like domains although the size of the ExoS sensing domain is almost twice the size of single-sized PAS domains. The unusual length for single-sized PAS domain is due to structural features that are unique to

ExoSp. One of the distinctive features is the elongated loop region that encloses helix H2. Additional differences between ExoSp and other PAS sensors include an elongated loop between S2 and H4 and helix H4. Helix H4 is also present in LuxQ from V. harvey (PDB ID 1ZHH) [162], DcuS from E. coli (PDB ID

3BY8) [Cheung and Hendrickson 2008], CitA from Klebsiella pneumoniae (PDB ID

2V9A) [150], and in TlpB from H. pylori (PDB ID 3UB6) [152] but is much shorter, a few residues compared to thirteen residues in ExoSp. The structural differences seen in ExoSp fall in line with previous observations that the presence of the β-sheet in PAS-like domains is strongly conserved [163], whereas the length and the number of α-helices surrounding the central β-sheet vary considerably [164].

Docking analysis of the periplasmic domain of ExoS

Previous studies have demonstrated that the functional units of many of the two-component sensor kinases consist of a dimer and that the dimerization depends on N-terminal helices [149, 150, 151, 156, 158, 159, 160, 164].

Through our docking analysis of ExoSp we were able to generate ExoSp homodimers that show a dimerization mode similar to that observed in crystal structures of the sensing domains of other histidine kinases. The ExoSp dimer interface involves N-terminal helix H1, helix H3 and C-terminal helix H6. In 60 the proposed model of the ExoSp dimer, the C-terminal helix H6 is part of the interface but not found in other dimer interfaces of periplasmic PAS domains.

However, Zhang and Hendrickson [158] speculate that the C-terminal helices might be a part of the dimer interface in intact receptors since they are oriented toward the N-terminal helices.

The dimer interface between the ExoSp monomers is overall negatively charged. Our observation is in agreement with the nature of the dimer interfaces found in extracellular domains of other histidine kinases such as mmHK1S-Z2 from Methanosarcina mazei (PDB ID 3LIA), mmHK1S-Z3 from

Methanosarcina mazei (PDB ID 3LIB), and soHK1S-Z6 from Shewanella oneidensis

(PDB ID 3LIC) [158] as well as in CitA from Klebsiella pneumoniae (PDB ID

2V9A) [150]. It has been suggested that the hydrophilic nature of the dimer interfaces of the periplasmic domains of histidine kinases allows for a dynamic character to the interface and permits for the ligand-induced structural changes in the sensing domain to transmit the signal across the membrane [158, 150].

Overall structure of the ExoSp-ExoR complex

Similar to LuxQ [162], ExoS does not bind its signaling molecule (yet to be identified) directly, but forms a complex with the periplasmic binding protein, ExoR [18]. To investigate the possible interactions between ExoSp and

ExoR, we generated ExoSp-ExoR heterodimers that produced several distinct candidate ExoSp-ExoR interfaces.

The first two types of the ExoSp-ExoR dimers (interfaces A and B) involve putative protein-protein interaction site A and the C-terminus of

ExoR and either helix H4 or the β-sheet and helix H1 of ExoSp. The third heterodimer interface (interface C) mainly shows association of ExoSp helix H4 with the C-terminus of ExoR and its putative protein-protein interaction site

B. The electrostatically favored heterodimer interface (interface D) involves 61 the association of helix H4 of ExoSp and the positive patch on the convex surface of ExoR and its putative binding site C. All generated dimers were computationally shown to form ExoSp-ExoR:ExoSp’-ExoR’ tetramers. Another type of interactions between ExoR and ExoSp might involve the formation of the

ExoSp-ExoSp’:ExoR trimeric complex where the ExoR protein is complexed with both ExoSp proteins.

Ligand-binding sites in the periplasmic PAS domains involve the side of the central β-sheet as well as the loops between S2 and H4, H4 to H5, S3 to S4 and helices H4 and H5; for example, ligand-binding sites of malate in DcuS from E.coli (3BY8) [151], succinate in DctB from V.cholerae (3BY9)[151] and from S.meliloti (3E4O) [149], citrate [150] and citrate and MoO3 [164] in CitA from Klebsiella pneumoniae (2V9A and 1P0Z), bistris in mmHK1S-Z2 (3LIA), ethylene glycol in soHK1S-Z6 (3LIC), phosphate in vpHK1S-Z8 (3LIE) [158], and urea in TlpB from H.pylori (3UB6) [152]. The above listed PAS-domain proteins bind their ligands directly. On the other hand, LuxQ from V.harveyi employs another protein (LuxP) for ligand binding. The interface between LuxQ and

LuxP (PDB ID 1ZHH) in addition to the central β-sheet includes loops between

S1 and S2, S4 and S5, as well as the F-G loop (H5 to S3 loop) [162].

Even though the ligand-binding site is found in similar locations in

PAS-domain proteins, the binding residues show high variability due to the fact that the binding site has to accommodate diverse ligands [158]. The highest variability is observed in the length and the sequence of the region between S2 and H5 [158], which in some cases also includes helix H4. On the other hand, the RXYF motif at the beginning of helix H5 has been identified as the most conserved and therefore, important motif for ligand recognition

(Fig. V.4) [158]. However, this motif is absent from LuxQ sensors due to the different mode of ligand-binding (LuxQ employs periplasmic binding protein

LuxP to bind its signaling molecule) [158] and from sensors with single PAS- like domains included in our analysis as well as from ExoSp (Fig. V.4). In the 62 case of LuxQ, it was shown that the F-G loop (H5 to S3 loop) is critical for

LuxQp-LuxP interactions [162]. Association of ExoSp with ExoR, an elongated solenoid protein, might require a different approach to recognize and bind this protein partner compared to binding of smaller ligands or the LuxP protein that has a more compact fold of two parallel α/β/α domains [165].

Therefore, the unique elongated random coil regions between S2 and H4 as well as between H4 and H5 of ExoSp might provide flexibility, while helix H4 offers a more rigid scaffold needed to accommodate binding of this elongated protein.

Although it is not possible to determine which of the generated ExoSp-

ExoR interfaces is physiologically relevant based on theoretical data alone, we hypothesize that the first type of the analyzed heterodimer (interface A)

(Fig. V.9a) is the most promising model for the ExoSp-ExoR interactions. In this interface helix H4 and its surrounding random coil regions of ExoSp interact with the putative protein-protein interaction site A of ExoR, whereas the H2 to H3 loop of ExoSp stabilizes the C-terminal region of ExoR.

The interactions seen in ExoR in this dimer interface are supported by the interactions observed in another Sel1-like repeat protein, HcpC [32], and interactions that involve mainly helix H4 of ExoSp are in agreement with the predicted protein-protein interaction sites within ExoSp (Fig. V.7). It is possible that the direct interaction of helix H4, its flanking loop regions, and the loop between H2 to H3 (the elongated nature of these secondary elements is unique to ExoSp) with ExoR might cause conformational changes necessary for signal transduction. It can be speculated that the ExoSp-ExoR interactions might change the ExoSp-ExoSp’ dimerization interface similar to the changes observed in another sensing domain of the histidine kinase, DctB.

In DctB the movement of the loop between S2 and H5 is caused by ligand binding and leads to rearrangement in the dimer interface that leads to the transfer of a signal across the membrane [149]. 63

Conclusion

In conclusion, we present for the first time a three-dimensional representation of S. meliloti Rm1021 ExoS periplasmic domain generated through computational methods. Our model suggests the Per-ARNT-Sim (PAS)-like fold for ExoSp. Even though ExoSp shares a similar overall structure as other

PAS domains, it does not bind small ligands and therefore probably uses a different mechanism to recognize and bind its periplasmic binding protein,

ExoR. Elongated structural elements that are unique to ExoSp might provide flexibility to this fold to accommodate binding of ExoR that is an elongated multi-repeat protein. Further studies including experimental analysis that target key proposed functional sites of ExoSp and interface of the ExoSp-ExoR complexes can help validate the ExoSp structure and the essential residues employed in ExoSp-ExoR interactions.

64

Bibliography

1. Nixon BT, Ronson CW, Ausubel FM. 1986 Two-component regulatory systems responsive to environmental stimuli share strongly conserved domains with the nitrogen assimilation regulatory genes ntrB and ntrC. Proc. Natl Acad. Sci. USA 83, 7850-7854.

2. Winans SC, Ebert PR, Stachel SE, Gordon MP, Nester EW. 1986 A gene essential for Agrobacterium virulence is homologous to a family of positive regulatory loci. Proc. Natl Acad. Sci. USA 83, 8278-8282.

3. Bourret RB, Borkovich KA, Simon MI. 1991 Signal transduction pathways involving protein phosphorylation in prokaryotes. Annu.Rev.Biochem. 60, 401- 441. (doi: 10.1146/annurev.bi.60.070191.002153)

4. Stock JB, Ninfa AJ, Stock AM. 1989 Protein phosphorylation and regulation of adaptive responses in bacteria. Microbiol. Rev. 53, 450-490.

5. Guzmán-Verri C, Manterola L, Sola-Landa A, Parra A, Cloeckaert A, Garin J, Gorvel J-P, Moriyon I, Moreno E, Lopez-Goni I. 2002 The two-component system BvrR/BvrS essential for Brucella abortus virulence regulates the expression of outer membrane proteins with counterparts in members of the Rhizobiaceae. P. Natl. Acad. Sci. 99, 12375-12380. (doi:10.1073/pnas.192439399)

6. Charles TC, Nester EW. 1993 A chromosomally encoded two-component sensory transduction system is required for virulence of Agrobacterium tumefaciens. J. Bacteriol. 175, 6614-6625.

7. Sola-Landa A, Pizarro-Cerdá J, Grilló M, Moreno E, Moriyón I, Blasco J, Gorvel J-P, Lopez-Goni I. 1998 A two-component regulatory system playing a critical role in plant pathogens and endosymbionts is present in Brucella abortus and controls cell invasion and virulence. Mol Microbiol. 29, 125-138. (doi:10.1046/j.1365-2958.1998.00913.x)

8. Cheng HP, Walker GC. 1998a Succinoglycan production by Rhizobium meliloti is regulated through the ExoS-ChvI two-component regulatory system. J. Bacteriol. 180, 20–26.

65

9. Wolanin PM, Thomason PA, Stock JB. 2002 Histidine protein kinases: key signal transducers outside the animal kingdom. Genome Biol. 3, reviews3013.1–reviews3013.8. (doi: 10.1186/gb-2002-3-10-reviews3013)

10. Stock JB, Surette MG, Levit M, Park P. 1995 Two-component signal transduction systems: Structure-function relationships and mechanisms of catalysis. In: Hoch JA, Silhavy TJ, Eds. Two-component signal transduction. ASM Press, Washington, D.C., pp 25-51.

11. Stock AM, Robinson VL, Goudreau PN. 2000 Two-component signal transduction. Annu. Rev. Biochem. 69, 183–215.

12. Aravind L, Ponting CP. 1999 The cytoplasmic helical linker domain of histidine kinase and methyl-accepting proteins is common to many prokaryotic signalling proteins. FEMS Microbiol Lett. 176, 111-116.

13. Williams SB, Stewart V. 1999 Functional similarities among two-component sensors and methyl-accepting chemotaxis proteins suggest a role for linker region amphipathic helices in transmembrane signal transduction. Mol. Microbiol. 33, 1093-1102.

14. Szurmant H, White RA, Hoch JA. 2007 Sensor complexes regulating two- component signal transduction. Curr. Opin. Struc. Biol. 17, 706–715. (doi: 10.1016/j.sbi.2007.08.019)

15. Yao SY, Luo L, Har KJ, Becker A, Ruberg S, Yu GQ, Zhu JB, Cheng HP. 2004 Sinorhizobium meliloti ExoR and ExoS proteins regulate both succinoglycan and flagellum production. J. Bacteriol. 186, 6042–6049. (doi:10.1128/JB.186.18.6042–6049.2004)

16. Osteras M, Stanley J, Finan TM. 1995 Identification of Rhizobium-specific intergenic mosaic elements within an essential two-component regulatory system of Rhizobium species. J. Bacteriol. 177, 5485-5494.

17. Wells DH, Chen EJ, Fisher RF, Long SR. 2007 ExoR is genetically coupled to the ExoS-ChvI two-component system and located in the periplasm of

66 Sinorhizobium meliloti. Mol. Microbiol. 64, 647– 664. (doi:10.1111/j.1365- 2958.2007.05680.x)

18. Chen EJ, Sabio EA, Long SR. 2008 The periplasmic regulator ExoR inhibits ExoS/ChvI two-component signaling in Sinorhizobium meliloti. Mol. Microbiol. 69, 1290–1303. (doi:10.1111/j.1365-2958.2008.06362.x)

19. Lu HY, Cheng HP. 2010 Autoregulation of Sinorhizobium meliloti exoR gene expression. Microbiology 156, 2092-2101. (doi 10.1099/mic.0.038547-0)

20. Lu HY, Luo L, Yang MH, Cheng HP. 2012 Sinorhizobium meliloti ExoR is the target of periplasmic proteolysis. J. Bacteriol. 194, 4029–4040. (doi:10.1128/JB.00313-12)

21. Cheng HP, Walker GC. 1998b Succinoglycan is required for initiation and elongation of infection threads during nodulation of alfalfa by Rhizobium meliloti. J. Bacteriol. 180, 5183–5191.

22. Belanger L, Dimmick KA, Fleming JS, Charles TC. 2009 Null mutations in Sinorhizobium meliloti exoS and chvI demonstrate the importance of this two- component regulatory system for symbiosis. Mol. Microbiol. 74,1223-1237. (doi:10.1111/j.1365-2958.2009.06931.x)

23. Mittl PRE, Schneider-Brachert W. 2007 Sel1-like repeat proteins in signal transduction. Cell. Signal. 19, 20–31. (doi:10.1016/j.cellsig.2006.05.034)

24. Tomlinson AD, Ramey-Hartung B, Day TW, Merritt PM, Fuqua C. 2010 Agrobacterium tumefaciens ExoR represses succinoglycan biosynthesis and is required for biofilm formation and motility. Microbiology 156, 2670–2681. (doi:10.1099/mic.0.039032-0)

25. Kajava AV, Gorbea C, Ortega J, Rechsteiner M, Steven AC. 2004 New HEAT- like repeat motifs in proteins regulating proteasome structure and function. J. Struc. Biol. 146, 425-430. (doi:10.1016/j.jsb.2004.01.013)

26. Bradley P. 2012 Structural modeling of TAL effector–DNA interactions. Protein Sci. 21, 471–474. (doi:10.1002/pro.2034)

67 27. Mak AN, Bradley P, Cernadas RA, Bogdanove AJ, Stoddard BL. 2012 The crystal structure of TAL effector PthXo1 bound to its DNA target. Science 335, 716-719. (doi:10.1126/science.1216211)

28. Deng D, Yan C, Pan X, Mahfouz M, Wang J, Zhu JK, Shi Y, Yan N. 2012 Structural basis for sequence-specific recognition of DNA by TAL effectors. Science 335, 720-723. (doi:10.1126/science.1215670)

29. Das AK, Cohen PTW, Barford D. 1998 The structure of the tetratricopeptide repeats of protein phosphatase 5: implications for TPR-mediated protein– protein interactions. EMBO J. 17, 1192–1199.

30. Groves MR, Barford D. 1999 Topological characteristics of helical repeat proteins. Curr. Opin. Struc. Biol. 9, 383-389.

31. Lüthy L, Grütter MG, Mittl PRE. 2002 The crystal structure of Helicobacter pylori Cysteine-rich protein B reveals a novel fold for a penicillin-binding protein. J. Biol. Chem. 277, 10187–10193. (doi:10.1074/jbc.M108993200)

32. Lüthy L, Grütter MG, Mittl PRE. 2004 The crystal structure of Helicobacter Cysteine-rich protein C at 2.0 A resolution: Similar peptide- binding sites in TPR and SEL1-like repeat proteins. J. Mol. Biol. 340, 829– 841. (doi:10.1016/j.jmb.2004.04.055)

33. Lahiri A, Basu S. 2007 Dynamics of leucine-rich repeat proteins. Biophys. Rev. Lett. 2, 207–219. (doi:10.1142/S1793048007000507)

34. Forwood JK, Lange A, Zachariae U, Marfori M, Preast C, Grubmuller H, Stewart M, Corbett AH, Kobe B. 2010 Quantitative structural analysis of importin-beta flexibility: Paradigm for solenoid protein structures. Structure 18, 1171–1183. (doi:10.1016/j.str.2010.06.015)

35. Reed JW, Glazebrook J, Walker GC. 1991 The exoR gene of Rhizobium meliloti affects RNA levels of other exo genes but lacks homology to known transcriptional regulators. J. Bacteriol. 173, 3789-3794.

36. Schultz J, Milpetz F, Bork P, Ponting C. 1998 SMART, a simple modular

68 architecture research tool: Identification of signaling domains. PNAS 95, 5857-5864.

37. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K, Holm L, Sonnhammer ELL, Eddy SR, Bateman A. 2010 The Pfam protein families database. Nucleic Acids Res. 38, D211–D222. (doi:10.1093/nar/gkm960)

38. Marchler-Bauer A, Bryant SH. 2004 CD-search: annotations on the fly. Nucleic Acids Res. 32, W327-W331. (doi:10.1093/nar/gkh454)

39. Marchler-Bauer A, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese-Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, He S, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Liebert CA, Liu C, Lu F, Lu S, Marchler GH, Mullokandov M, Song JS, Tasneem A, Thanki N, Yamashita RA, Zhang D, Zhang N, Bryant SH. 2009 CDD: Specific functional annotation with the conserved domain database. Nucleic Acids Res. 37, D205-D210. (doi: 10.1093/nar/gkn845)

40. Marchler-Bauer A, Lu S, Anderson JB, Chitsaz F, Derbyshire MK, DeWeese- Scott C, Fong JH, Geer LY, Geer RC, Gonzales NR, Gwadz M, Hurwitz DI, Jackson JD, Ke Z, Lanczycki CJ, Lu F, Marchler GH, Mullokandov M, Omelchenko MV, Robertson CL, Song JS, Thanki N, Yamashita RA, Zhang D, Zhang N, Zheng C, Bryant SH. 2011 CDD: A conserved domain database for the functional annotation of proteins. Nucleic Acids Res. 39, D225-D229. (doi: 10.1093/nar/gkq1189)

41. De Castro E, Sigrist CJA, Gattiker A, Bulliard V, Langendijk-Genevaux PS, Gasteiger E, Bairoch A, Hulo N. 2006 ScanProsite: detection of PROSITE signature matches and ProRule-associated functional and structural residues in proteins. Nucleic Acids Res. 34, W362-5. (doi:10.1093/nar/gkl124)

42. Karpenahalli MR, Lupas AN, Söding J. 2007 TPRpred: a tool for prediction of TPR-, PPR- and SEL1-like repeats from protein sequences. BMC Bioinformatics 8, 2. (doi:10.1186/1471-2105-8-2)

43. Biegert A, Söding J. 2008 De novo identification of highly diverged protein repeats by probabilistic consistency. Bioinformatics 24, 807-814. (doi:10.1093/bioinformatics/btn039)

69

44. George RA, Heringa J. 2000 The REPRO server: finding protein internal sequence repeats through the Web. Trends Biochem. Sci. 25, 515-517.

45. Ouali M, King RD. 2000 Cascaded multiple classifiers for secondary structure prediction. Prot. Sci. 9, 1162-1176.

46. Jones DT. 1999 Protein secondary structure prediction based on position- specific scoring matrices. J. Mol. Biol. 292, 195-202. (doi:10.1006/jmbi.1999.3091)

47. Karplus K. 2009 SAM-T08, HMM-based protein structure prediction. Nucleic Acids Res. 37, W492-W497. (doi:10.1093/nar/gkp403)

48. Cheng J, Randall AZ, Sweredoski MJ, Baldi P. 2005 SCRATCH: A protein structure and structural feature prediction server. Nucleic Acids Res. 33, W72-W76. (doi: 10.1093/nar/gki396)

49. Rost B, Yachdav G, Liu J. 2004 The PredictProtein server. Nucleic Acids Res. 32, W321–W326. (doi: 10.1093/nar/gkh377)

50. Zhang Y., http://zhanglab.ccmb.med.umich.edu/PSSpred

51. Lin K, Simossis VA, Taylor WR, Heringa J. 2005 A simple and fast secondary structure prediction algorithm using hidden neural networks. Bioinformatics. 21, 152-9. (doi: 10.1093/bioinformatics/bth487)

52. Cole C, Barber JD, Barton GJ. 2008 The Jpred 3 secondary structure prediction server. Nucleic Acids Res. 36, W197-W201. (doi: 10.1093/nar/gkn238)

53. Cuff JA, Barton GJ. 1999 Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins 34, 508-519. (doi:10.1002/(SICI)1097-0134(19990301)34:4<508::AID-PROT10>3.0.CO;2-4)

54. Cuff JA, Clamp ME, Siddiqui AS, Finlay M, Barton GJ. 1998 Jpred: A consensus secondary structure prediction server. Bioinformatics 14, 892-893.

70 (doi:10.1093/bioinformatics/14.10.892)

55. Cuff JA, Barton GJ. 2000 Application of multiple sequence alignment profiles to improve protein secondary structure prediction. Poteins 40, 502- 511. (doi:10.1002/1097-0134(20000815)40:3<502::AID-PROT170>3.0.CO;2-Q)

56. Karypis G. 2006 YASSPP: Better Kernels and Coding Schemes Lead to Improvements in SVM-basedSecondary Structure Prediction. Proteins 64, 575-86. (doi: 10.1002/prot.21036)

57. Rangwala H, Kauffman C, Karypis G. 2009 svmPRAT: SVM-based protein residue annotation toolkit. BMC Bioinformatics 10, 439. (doi:10.1186/1471- 2105-10-439)

58. Zhang Y, Skolnick J. 2005 TM-align: A protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33, 2302-2309. (doi:10.1093/nar/gki524)

59. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. 1990 Basic local alignment search tool. J. Mol. Biol. 215, 403-410. (doi:10.1016/S0022- 2836(05)80360-2)

60. Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. 1997 Gapped BLAST and PSI-BLAST: A new generation of protein database search programs. Nucleic Acids Res. 25, 3389-3402. (doi:10.1093/nar/25.17.3389)

61. Notredame C, Higgins DG, Heringa J. 2000 T-coffee: A novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205-217. (doi:10.1006/jmbi.2000.4042)

62. Krogh A, Larsson B, von Heijne G, Sonnhammer ELL. 2001 Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567-580.

63. Sonnhammer ELL, von Heijne G, Krogh A. 1998 A hidden Markov model for predicting transmembrane helices in protein sequences. In J. Glasgow et al., 71 eds. Proc. Sixth Int. Conf. on Intelligent Systems for Molecular Biology, pp 175-182. AAAI Press.

64. Pei J, Tang M, Grishin NV. 2008 PROMALS3D web server for accurate multiple protein sequence and structure alignments. Nucleic Acids Research 36, W30–W34. doi:10.1093/nar/gkn322

65. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. 2000 The protein data bank. Nucleic Acids Res. 28, 235-242. (doi:10.1093/nar/28.1.235)

66. Edwards YJK, Cottage A. 2003 Bioinformatics methods to predict protein structure and function. A practical approach. Mol. Biotechnol. 23, 139-166. (doi:10.1385/MB:23:2:139)

67. Petrey D, Honig B. 2005 Protein structure prediction: Inroads to biology. Mol. Cell 20, 811–819. (doi:10.1016/j.molcel.2005.12.005)

68. Sánchez R, Pieper U, Melo F, Eswar N, Martí-Renom MA, Madhusudhan MS, Mirkovic N, Sali A. 2000 Protein structure modeling for structural genomics. Nat. Struct. Biol. Suppl, 986-990. (doi:10.1038/80776)

69. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. 2000 Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29, 291–325. (doi: 10.1146/annurev.biophys.29.1.291)

70. Yang Y, Faraggi E, Zhao H, Zhou Y. 2011 Improving protein fold recognition and template-based modeling by employing probabilistic-based matching between predicted one-dimensional structural properties of query and corresponding native properties of templates. Bioinformatics 27, 2076-2082. (doi: 10.1093/bioinformatics/btr350)

71. Kelley LA, MacCallum RM, Sternberg MJE. 2000 Enhanced Genome Annotation using Structural Profiles in the Program 3D-PSSM. J. Mol. Biol. 299, 499-520.

72. Söding J, Biegert A, Lupas AN. 2005 The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33, W244-W248. (doi: 10.1093/nar/gki40)

72

73. Söding J. 2005 Protein homology detection by HMM–HMM comparison. Bioinformatics 21, 951-960. (doi:10.1093/bioinformatics/bti125)

74. Vallat BK, Pillardy J, Elber R. 2008 A template-finding algorithm and a comprehensive benchmark for homology modeling of proteins. Proteins 72, 910- 928. (doi: 10.1002/prot.21976)

75. Vallat BK, Pillardy J, Májek P, Meller J, Blom T, Cao B, Elber R. 2009 Building and assessing atomic models of proteins from structural templates: Learning and benchmarks. Proteins 76, 930-945. (doi:10.1002/prot.22401)

76. Lobley A, Sadowski MI, Jones DT. 2009 pGenTHREADER and pDomTHREADER: New methods for improved protein fold recognition and superfamily discrimination. Bioinformatics 25, 1761-1767. (doi:10.1093/bioinformatics/btp302)

77. Shi J, Blundell TL, Mizuguchi K. 2001 FUGUE: sequence-structure homology recognition using environment-specific substitution tables and structure- dependent gap penalties. J. Mol. Biol. 310, 243-57.

78. Šali A, Blundell TL. 1993 Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234, 779-815. (doi:10.1006/jmbi.1993.1626)

79. Katoh K, Misawa K, Kuma K, Miyata T. 2002 MAFFT: A novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Res. 30, 3059-3066. (doi: 10.1093/nar/gkf436)

80. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ. 2001 Enhancement of protein modeling by human intervention in applying the automatic programs 3D- JIGSAW and 3D-PSSM. Proteins, Suppl 5, 39-46. (doi:10.1002/prot.1168)

81. Schwede T, Kopp J, Guex N, Peitsch MC. 2003 SWISS-MODEL: An automated protein homology-modeling server. Nucleic Acids Res. 31, 3381-3385. (doi: 10.1093/nar/gkg520)

82. Chen C, Hwang J, Yang J. 2009 (PS)2-v2: Template-based protein structure prediction server. BMC Bioinformatics 10, 366. (doi:10.1186/1471-2105-10-

73 366)

83. Kelley LA, Sternberg MJE. 2009 Protein structure prediction on the web: a case study using the Phyre server. Nat. Protoc. 4, 363-371. (doi:10.1038/nprot.2009.2)

84. Zhang Y. 2008 I-TASSER server for protein 3D structure prediction. BMC Bioinformatics 9, 40. (doi:10.1186/1471-2105-9-40)

85. Roy A, Kucukural A, Zhang Y. 2010 I-TASSER: a unified platform for automated protein structure and function prediction. Nat. Protocols 5, 725- 738. (doi: 10.1038/nprot.2010.5)

86. Kim DE, Chivian D, Baker D. 2004 Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res. 32, W526-W531. (doi: 10.1093/nar/gkh468)

87. Bower MJ, Cohen FE, Dunbrack RL Jr. 1997 Prediction of protein side-chain rotamers from a backbone-dependent rotamer library: a new homology modeling tool. J. Mol. Biol. 267, 1268-1282.

88. Canutescu AA, Shelenkov AA, Dunbrack RL Jr. 2003 A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci. 12, 2001-2014.

89. Krivov GG, Shapovalov MV, Dunbrack RL Jr. 2009 Improved prediction of protein side-chain conformations with SCWRL4. Proteins 77, 778-795. (doi:10.1002/prot.22488)

90. Zemla A, Zhou CE, Slezak T, Kuczmarski T, Rama D, Torres C, Sawicka D, Barsky D. 2005 AS2TS system for protein structure modeling and analysis. Nucleic Acids Res. 33, W111-W115. (doi:10.1093/nar/gki457)

91. Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A. 2005 FFAS03: a server for profile-profile sequence alignments. Nucl. Acids Res. 33, W284-W288

92. Wu S, Zhang Y. 2007 LOMETS: A local meta-threading-server for protein structure prediction. Nucleic Acids Res. 35, 3375-3382.

74 93. Sippl MJ. 1993 Recognition of errors in three-dimensional structures of proteins. Proteins 17, 355-362. (doi:10.1002/prot.340170404)

94. Wiederstein M, Sippl MJ. 2007 ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res. 35, W407-W410. (doi:10.1093/nar/gkm290)

95. Bowie JU, Lüthy R, Eisenberg D. 1991 A method to identify protein sequences that fold into a known three-dimensional structure. Science 253, 164-170.

96. Lüthy R, Bowie JU, Eisenberg D. 1992 Assessment of protein models with three-dimensional profiles. Nature 356, 83-85.

97. Hooft RWW, Vriend G, Sander C, Abola EE. 1996 Errors in protein structures. Nature 381, 272-272. (doi:10.1038/381272a0)

98. The PyMOL Molecular Graphics System, Version 1.3 Schrödinger, LLC.

99. Hulo N, Bairoch A, Bulliard V, Cerutti L, Cuche BA, de Castro E, Lachaize C, Langendijk-Genevaux PS, Sigrist CJA. 2008 The 20 years of PROSITE. Nucleic Acids Res. 36, D245-249. (doi:10.1093/nar/gkm977)

100. Holder, Thomas. AngleBetweenHelices, PyMOLWiki, 2010. Retrieved March 9, 2012 from http://pymolwiki.org/index.php/AngleBetweenHelices

101. Frishman D, Argos P. 1995 Knowledge-Based Protein Secondary Structure Assignment. Proteins 23, 566-579.

102. Heinig M, Frishman D. 2004 STRIDE: a Web server for secondary structure assignment from known atomic coordinates of proteins. Nucleic Acids Res. 32, W500-W502. (doi: 10.1093/nar/gkh429)

103. Adamczak R, Porollo A, Meller J. 2004 Accurate prediction of solvent accessibility using neural networks based regression. Proteins 56, 753-67. (doi:10.1002/prot.20176)

75 104. Adamczak R, Porollo A, Meller J. 2005 Combining prediction of secondary structure and solvent accessibility in proteins. Proteins 59, 467-75. (doi:10.1002/prot.20441)

105. Wagner M, Adamczak R, Porollo A, Meller J. 2005 Linear Regression Models for Solvent Accessibility Prediction in Proteins. J. Comput. Biol. 12, 355-69. (doi:10.1089/cmb.2005.12.355)

106. Ahmad S, Gromiha MM. 2002 NETASA: Neural network based prediction of solvent accessibility. Bioinformatics 18, 819-824. (doi:10.1093/bioinformatics/18.6.819)

107. Ahmad S, Gromiha MM, Sarai A. 2003 RVP-net: online prediction of real valued accessible surface area of proteins from single sequences. Bioinformatics 19, 1849-1851. (doi: 10.1093/bioinformatics/btg249)

108. Fraczkiewicz R, Braun W. 1998 Exact and efficient analytical calculation of the accessible surface areas and their gradients for macromolecules. J. Comp. Chem. 19, 319-333. (doi:10.1002/(SICI)1096- 987X(199802)19:3<319::AID-JCC6>3.0.CO;2-W)

109. Rocchia W, Alexov E, Honig B. 2001 Extending the applicability of the nonlinear Poisson-Boltzmann equation: Multiple dielectric constants and multivalent ions. J. Phys. Chem. B 105, 6507-6514. (doi:10.1021/jp010454y)

110. Nicholls A,Honig B. 1991 A Rapid Finite Difference Algorithm, Utilizing Successive Over-Relaxation to Solve the Poisson-Boltzmann Equation. J. Comp. Chem. 12, 435-445. (doi: 10.1002/jcc.540120405)

111. Gilson MK, Honig B. 1988 Calculation of the total electrostatic energy of a macromolecular system: Solvation energies, binding energies and conformational analysis. Proteins 4, 7-18. (doi:10.1002/prot.340040104)

112. Gilson MK, Sharp K, Honig B. 1988 Calculating the electrostatic potential of molecules in solution: Method and error assessment. J. Comp. Chem. 9, 327-335. (doi:10.1002/jcc.540090407)

76 113. Felder CE, Prilusky J, Silman I, Sussman JL. 2007 A server and database for dipole moments of proteins. Nucleic Acids Res. 35, W512-W521. (doi:10.1093/nar/gkm307)

114. Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R. 2007 PIER: Protein interface recognition for structural proteomics. Proteins 67, 400- 417. (doi:10.1002/prot.21233)

115. Porollo A, Meller J. 2007 Prediction-based fingerprints of protein- protein interactions. Proteins 66, 630-645. (doi: 10.1002/prot.21248)

116. Chen H, Zhou H. 2005 Prediction of interface residues in protein- protein complexes by a consensus neural network method: Test against NMR data. Proteins 61, 21-35. (doi: 10.1002/prot.20514)

117. Zhou H, Shan Y. 2001 Prediction of protein interaction sites from sequence profile and residue neighbor list. Proteins 44, 336-343. (doi: 10.1002/prot.1099)

118. Ofran Y, Rost B. 2007a ISIS: Interaction sites identified from sequence. Bioinformatics 23, e13-e16. (doi:10.1093/bioinformatics/btl303)

119. Ofran Y, Rost B. 2007b Protein–protein interaction hotspots carved into sequences. PLoS Comput.Biol. 3, 1169-1176. (doi:10.1371/journal.pcbi.0030119)

120. Duhovny D, Nussinov R, Wolfson HJ. Efficient Unbound Docking of Rigid Molecules. In Gusfield et al., Ed. Proceedings of the 2'nd Workshop on Algorithms in Bioinformatics(WABI) Rome, Italy, Lecture Notes in Computer Science 2452, pp. 185-200, Springer Verlag, 2002.

121. Macindoe G, Mavridis L, Venkatraman V, Devignes MD, Ritchie DW. 2010 HexServer: an FFT-based protein docking server powered by graphics processors. Nucleic Acids Res. 38, W445-W449.

122. Comeau SR, Gatchell DW, Vajda S, Camacho CJ. 2004 ClusPro: an automated docking and discrimination method for the prediction of protein complexes. Bioinformatics 20, 45-50.

77 123. Comeau SR, Gatchell DW, Vajda S, Camacho CJ. 2004 ClusPro: a fully automated algorithm for protein-protein docking. Nucleic Acids Res. 32, W96- W99. (doi: 10.1093/nar/gkh354)

124. Tovchigrechko A, Vakser IA. 2006 GRAMM-X public web server for protein– protein docking. Nucleic Acids Res. 34, W310–W314. (doi:10.1093/nar/gkl206)

125. Pierce BG, Wiehe K, Hwang H, Kim BH, Vreven T, Weng Z. 2014 ZDOCK Server: Interactive Docking Prediction of Protein-Protein Complexes and Symmetric Multimers. Bioinformatics, In Press.

126. Urosev D, Ferrer-Navarro M, Pastorello I, Cartocci E, Costenaro L, Zhulenkovs D, Marechal JD, Leonchiks A, Reverter D, Serino L, Soriani M, Daura X. Crystal structure of C5321: A protective antigen present in uropathogenic Escherichia coli strains displaying an Slr fold. To be published.

127. Eramian D, Eswar N, Shen MY, Sali A. 2008 How well can the accuracy of comparative protein structure models be predicted? Protein Sci. 17, 1881– 1893. (doi:10.1110/ps.036061.108)

128. Chakravarty S, Sanchez R. 2004 Systematic analysis of added-value in simple comparative models of protein structure. Structure 12, 1461–1470. (doi:10.1016/j.str.2004.05.018)

129. Zhang Z, Witham S, Alexov E. 2011 On the role of electrostatics on protein-protein interactions. Phys. Biol. 8, 035001. (doi:10.1088/1478- 3975/8/3/035001)

130. Guzman-Verri C, Manterola L, Sola-Landa A, Parra A, Cloeckaert A, Garin J, Gorvel JP, Moriyon I, Moreno E, Lopez-Goni I. 2002 The two-component system BvrR/BvrS essential for Brucella abortus virulence regulates the expression of outer membrane proteins with counterparts in members of the Rhizobiaceae. Proc. Natl. Acad. Sci. USA 99, 12375-12380. (doi:10.1073/pnas.192439399)

131. Wu CF, Lin JS, Shaw GC, Lai EM. 2012 Acid-induced type VI secretion

78 system is regulated by ExoR-ChvG/ChvI signaling cascade in Agrobacterium tumefaciens. PLoS Pathog. 8, e1002938. (doi:10.1371/journal.ppat.1002938)

132. Sheinerman FB, Norel R, Honig B. 2000 Electrostatic aspects of protein– protein interactions. Curr. Opin. Struc. Biol. 10, 153–159.

133. Tsai C-J, Lin SL, Wolfson HJ, Nussinov R. 1997 Studies of protein- protein interfaces: A statistical analysis of the hydrophobic effect. Protein Sci. 6, 53-64.

134. Doherty D, Leigh JA, Glazebrook J, Walker GC. 1988 Rhizobium meliloti mutants that overproduce the R. meliloti acidic calcofluor-binding exopolysaccharide. J. Bacteriol. 170, 4249-4256.

135. Main ERG, Lowe AR, Mochrie SGJ, Jackson SE, Regan L. 2005 A recurring theme in protein engineering: the design, stability and folding of repeat proteins. Curr. Opin. Struc. Biol. 15, 464–471. (doi:10.1016/j.sbi.2005.07.003)

136. Kumar A, Roach C, Hirsh IS, Turley S, deWalque S, Michels PA, Hol WG. 2001 An unexpected extended conformation for the third TPR motif of the peroxin PEX5 from Trypanosoma brucei. J.Mol. Biol. 307, 271-282.

137. Gatto GJ, Geisbrecht BV, Gould SJ, Berg JM. 2000 Peroxisomal targeting signal-1 recognition by the TPR domains of human PEX5. Nat. Struct. Biol. 7, 1091-1095. (doi:10.1038/81930)

138. Li L, Jia Y, Hou Q, Charles TC, Nester EW, Pan SQ. 2002 A global pH sensor: Agroacterium sensor protein ChvG regulates acid-inducible genes on its two chromosomes and Ti plasmid. Proc Natl Acad Sci USA 99, 12369-12374. (doi:10.1073/pnas.192439499)

139. Vanderlinde EM, Yost CK. 2012 Mutation of the sensor kinase chvG in Rhizobium leguminosarum negatively impacts cellular metabolism, outer

79 membrane stability, and symbiosis. J Bacteriol. 194, 768-777. (doi:10.1128/JB.06357-11)

140. Viadas C, Rodríguez MC, Sangari FJ, Gorvel J-P, García-Lobo JM Lopez- Goni I. 2010. Transcriptome analysis of the Brucella abortus BvrR/BvrS two- component regulatory system. PLoS One 5, e10216. (doi:10.1371/journal.pone.0010216)

141. Suhre K, Sanejouand YH. 2004 ElNémo: a normal mode web server for protein movement analysis and the generation of templates for molecular replacement. Nucleic Acids Res. 32, W610-W614. (doi: 10.1093/nar/gkh368)

142. Scheufler C, Brinker A, Bourenkov G, Pegoraro S, Moroder L, Bartunik H, Hartl FU, Moarefi I. 2000 Structure of TPR domain–peptide complexes: Critical elements in the assembly of the Hsp70–Hsp90 multichaperone machine. Cell 101, 199–210. (doi:10.1016/S0092-8674(00)80830-2)

143. Kolmar H, Waller PRH, Sauer RT. 1996 The DegP and DegQ periplasmic endoproteases of Escherichia coli: Specificity for cleavage sites and substrate conformation. J. Bacteriol. 178, 5925–5929.

144. Jones CH, Dexter P, Evans AK, Liu C, Hultgren SJ, Hruby DE. 2002 Escherichia coli DegP protease cleaves between paired hydrophobic residues in a natural substrate: the PapA pilin. J. Bacteriol. 184, 5762–5771. (doi: 10.1128/JB.184.20.5762–5771.2002)

145. Krojer T, Pangerl K, Kurt J, Sawa J, Stingl C, Mechtler K, Huber R, Ehrmann M, Clausen T. 2008 Interplay of PDZ and protease domain of DegP ensures efficient elimination of misfolded proteins. PNAS 105, 7702–7707. (doi:10.1073/pnas.0803392105)

146. Galibert F, Finan TM, Long SR, Puehler A, Abola P, Ampe F, Barloy- Hubler F, Barnett MJ, Becker A, Boistard P, et al. 2001 The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293, 668-672. (doi:10.1126/science.1060966)

147. Chang C, Tesar C, Gu M, Babnigg G, Joachimiak A, Pokkuluri PR, Szurmant H, Schiffer M. 2010 Extracytoplasmic PAS-Like Domains Are Common in Signal 80 Transduction Proteins. J. Bacteriol. 192, 1156–1159.

148. Cheung J, Hendrickson WA. 2008 Crystal Structures of C4-Dicarboxylate Ligand Complexes with Sensor Domains of Histidine Kinases DcuS and DctB. J. Biol. Chem. 283, 30256–30265.

149. Zhou Y-F, Nan B, Nan J, Ma Q, Panjikar S, Liang Y-H, Wang Y, Su X-D. 2008 C4-Dicarboxylates Sensing Mechanism Revealed by the Crystal Structures of DctB Sensor Domain. J. Mol. Biol. 383, 49–61.

150. Sevvana M, Vijayan V, Zweckstetter M, Reinelt S, Madden DR, Herbst-Irmer R, Sheldrick GM, Bott M, Griesinger C, Becker S. 2008 A Ligand-Induced Switch in the Periplasmic Domain of Sensor Histidine Kinase CitA. J. Mol. Biol. 377, 512–523.

151. Cheung J, Hendrickson WA. 2008 Crystal Structures of C4-Dicarboxylate Ligand Complexes with Sensor Domains of Histidine Kinases DcuS and DctB. J. Biol. Chem. 283, 30256–30265.

152. Sweeney EG, Henderson JN, Goers J, Wreden C, Hicks KG, Foster JK, Parthasarathy R, Remington SJ, Guillemin K. 2012 Structure and Proposed Mechanism for the pH Sensing Helicobacter pylori chemoreceptor TlpB. Structure. 20, 1177–1188. (doi:10.1016/j.str.2012.04.021)

153. Hothorn M, Dabi T, chory J. 2011 Structural basis for cytokinin recognition by Arabidopsis thaliana histidine kinase 4. Nat. Chem. Biol. 7, 766–768. (doi:10.1038/nchembio.667)

154. Wu R, Gu M, Wilton R, Babnigg G, Kim Y, Pokkuluri PR, Szurmant H, Joachimiak A, Schiffer M. 2013 Insight into the sporulation phosphorelay: crystal structure of the sensor domain of Bacillus subtilis histidine kinase, KinD. Protein Sci. 22, 564-76. (doi: 10.1002/pro.2237)

155. Slama B, Hendrickson W. Crystal structure of the Vibrio Cholerae LuxQ

81 periplasmic domain (SeMet), to be published

156. Neiditch MB, Federle MJ, Pompeani AJ, Kelly RC, Swem DL, Jeffrey PD, Bassler BL, Hughson FM. Ligand-Induced Asymmetry in Histidine Sensor Kinase Complex Regulates Quorum Sensing. Cell 126, 1095–1108.

157. Patskovsky et al., Crystal structure of Mcp_N and cache N-terminal domains of methyl-accepting chemotaxis protein from Vibrio cholerae, to be published

158. Zhang Z, Hendrickson WA. 2010 Structural Characterization of the Predominant Family of Histidine Kinase Sensor Domains. J. Mol. Biol. 400, 335–353. (doi:10.1016/j.jmb.2010.04.049)

159. Khorchid A, Ikura M. 2006 Bacterial histidine kinase as signal sensor and transducer. Int. J. Biochem. Cell Biol. 38, 307–312.

160. Stock AM, Robinson VL, Goudreau PN. 2000 Two-component signal transduction. Annu. Rev. Biochem. 69, 183–215. 161. Szurmant H, White RA, Hoch JA. 2007 Sensor complexes regulating two- component signal transduction. Curr. Opin. Struc. Biol. 17, 706–715.

162. Neiditch MB, Federle MJ, Miller ST, Bassler BL, Hughson FM. 2005 Regulation of LuxPQ Receptor Activity by the Quorum-Sensing Signal Autoinducer-2. Mol. Cell 18, 507–518. (doi: 10.1016/j.molcel.2005.04.020)

163. Taylor BL, Zhulin IB. 1999 PAS Domains: Internal Sensors of Oxygen, Redox Potential, and Light. Microbiol. Mol. Biol. Rev. 63, 479–506.

164. Reinelt S, Hofmann E, Gerharz T, Bott M, Madden DR. 2003 The Structure of the Periplasmic Ligand-binding Domain of the Sensor Kinase CitA Reveals the First Extracellular PAS Domain. J. Biol. Chem. 278, 39189–39196.

165. Fox NK, Brenner SE, Chandonia JM. 2013. SCOPe: Structural Classification of Proteins—extended, integrating SCOP and ASTRAL data and classification of 82 new structures. Nucleic Acids Res. (doi: 10.1093/nar/gkt1240)

83