Sequence-, Structure-, and Dynamics-Based Comparisons of Structurally Homologous Chey-Like Proteins
Total Page:16
File Type:pdf, Size:1020Kb
Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins Yi Hea, Gia G. Maisuradzea, Yanping Yina, Khatuna Kachlishvilia, S. Rackovskya,b, and Harold A. Scheragaa,1 aBaker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853; and bDepartment of Pharmacological Sciences, The Icahn School of Medicine at Mount Sinai, New York, NY 10029 Contributed by Harold A. Scheraga, December 29, 2016 (sent for review October 18, 2016; reviewed by Robert L. Jernigan and Jeffrey Skolnick) We recently introduced a physically based approach to sequence (DSSP) algorithm (16), are slightly different, as shown in Fig. S1. comparison, the property factor method (PFM). In the present Spo0F and CheY have five well-defined α-helices and β-strands, work, we apply the PFM approach to the study of a challenging set and they exhibit a pairwise rmsd value of about 1.85 Å (14). of sequences—the bacterial chemotaxis protein CheY, the N-terminal NT-NtrC not only lacks one α-helix (corresponding to α4 in Spo0F receiver domain of the nitrogen regulation protein NT-NtrC, and and CheY) and two β-strands (β4andβ5 in Spo0F and CheY), but the sporulation response regulator Spo0F. These are all response also has a slightly larger rmsd (∼2.50 Å) from both Spo0F and CheY regulators involved in signal transduction. Despite functional sim- (14) and significantly shorter secondary structural fragments. ilarity and structural homology, they exhibit low sequence iden- Recent investigations, using a Go¯ model modified to include tity. PFM sequence comparison demonstrates a statistically significant sequence information (17), suggest that these proteins may have qualitative difference between the sequence of CheY and those of hierarchical folding processes and that formation of certain the other two proteins that is not found using conventional align- subdomains is critical to reaching the native state (12, 14). In this ment methods. This difference is shown to be consonant with model, CheY and NT-NtrC share an N-terminal to C-terminal structural characteristics, using distance matrix comparisons. We folding pathway, whereas the folding of Spo0F starts at the center also demonstrate that residues participating strongly in native and elongates first to the N terminus and then to the C terminus BIOPHYSICS AND contacts during unfolding are distributed differently in CheY than (12, 14). These folding differences must arise from significant COMPUTATIONAL BIOLOGY in the other two proteins. The PFM result is also in accord with sequence differences (18). dynamic simulation results of several types. Molecular dynamics Our approach is twofold. We investigate the interactions and simulations of all three proteins were carried out at several tem- fluctuations of these molecules, in an all-atom representation, in peratures, and it is shown that the dynamics of CheY are predicted their native states. This can provide information as to how the to differ from those of NT-NtrC and Spo0F. The predicted dynamic proteins perform their biological functions (19–22) and also in- properties of the three proteins are in good agreement with ex- formation that can be used to understand their folding processes perimentally determined B factors and with fluctuations predicted (23–25). Observed dynamic differences must be encoded in by the Gaussian network model. We pinpoint the differences be- amino acid sequences. tween the PFM and traditional sequence comparisons and discuss We also compare the sequences of the three proteins, using the informatic basis for the ability of the PFM approach to detect both the property factor method (PFM) (18, 26) and conven- physical differences between these sequences that are not appar- tional sequence alignment methods (27, 28). In previous systems ent from traditional alignment-based comparison. that we have studied, we have shown that the PFM approach is able to detect differences between sequences that conventional amino acid physical properties | protein fluctuations | all-atom simulations Significance he investigation of the similarities and differences in the Tdynamics of sequentially and structurally homologous pro- We study a set of proteins that exhibit low sequence identity, teins has a long history (1–14). One of the important computa- but high structural homology and functional similarity. It is tional approaches to this problem involves the identification of demonstrated that a physics-based sequence comparison tool, conserved residues (1–4) and the investigation of the influence of the property factor method, is able to detect differences be- these conserved residues on the folding mechanism of the pro- tween the sequences of these proteins that correlate with teins. An advantage of this approach is that the influence of differences in their structures and dynamics. It is shown that conserved residues can be verified by mutation experiments. these sequence differences are not detected in this challenging More subtle questions are raised by the existence of proteins system by conventional alignment methods. This result sug- that are structurally homologous and have similar biological gests that a significant amount of the information encoded in functions but dissimilar amino acid sequences. These molecules protein sequences is not captured by evolutionarily motivated are of particular interest because differences in behavior can arise comparison methods. from sequence differences, even though the proteins have almost identical tertiary structures. A central problem then becomes the Author contributions: Y.H., S.R., and H.A.S. designed research; Y.H., G.G.M., Y.Y., and S.R. detection of those sequence characteristics that correlate with performed research; Y.H., G.G.M., Y.Y., K.K., S.R., and H.A.S. analyzed data; and Y.H., G.G.M., observed differences in molecular properties. It is this problem Y.Y., K.K., S.R., and H.A.S. wrote the paper. that we address in the present work. Reviewers: R.L.J., Iowa State University; and J.S., Georgia Institute of Technology. We consider the proteins NT-NtrC, Spo0F, and CheY, which Conflict of interest statement: In 2014, a WeFold paper [Khoury GA, et al. (2014) WeFold: have α/β structures and are known to be response regulators A coopetition for protein structure prediction. Proteins 82(9):1850–1868] described a hybrid approach generated from several protein structure prediction methodologies of involved in signal transduction (15). All three proteins have about 13 laboratories, including the H.A.S. and J.S. groups, and did not involve any active re- 120 residues and very similar native structures, with pairwise root- search collaboration. mean-square deviations (rmsds) below 3.0 Å (14). However, they 1To whom correspondence should be addressed. Email: [email protected]. exhibit less than 35% pairwise sequence identity. Their secondary This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. structures, determined by the define secondary structure of proteins 1073/pnas.1621344114/-/DCSupplemental. www.pnas.org/cgi/doi/10.1073/pnas.1621344114 PNAS Early Edition | 1of6 Downloaded by guest on September 26, 2021 Results Sequence Relationships Between NT-NtrC, Spo0F, and CheY. Previous comparisons of the sequences of these three proteins have cen- tered on the large residues Leu, Ile, and Val, and have high- lighted nonpolar clusters on each side of the sheet formed by strands β1, β2, β3, β4, and β5 (14, 36). It was suggested that these are key contributors to the stability of the proteins. Side-chain size and hydrophobicity are important physical properties, which must necessarily influence the structures and folding mechanisms of α/β proteins. It is known, however, that all amino acid physical properties contribute equally to the dynamics and folding mech- Fig. 1. (A and B) The PFM similarity (A) and global alignment scores (B) anisms of proteins (25). based on the BLOSUM62 scoring matrix as a function of sequence position, for each pair of proteins, calculated using the optimal, 63-residue fragment A systematic analysis of the sequences of these protein was length. carried out. Both the PFM and conventional sequence alignments methods were used to investigate sequence similarities. PFM similarity (shown in Fig. 1A) was calculated as a function of chain alignment-based methods miss. We wish to study the applicability position, using a 63-residue maximal-similarity window length of the PFM algorithm to the present challenging set of proteins. for each pair. [The strategy for determination of the maximal- We demonstrate the following general points: (i) The PFM ap- similarity window length (Fig. S2) and the final average value proach is able to distinguish between these sequences in a corresponding to each window size (Fig. S3) are described in manner not available using conventional alignment. (ii) Differ- Supporting Information, Dependence of PFM Similarity on Frag- ences between the molecules detected by the PFM analysis, ment Length for NT-NtrC, Spo0F, and CheY.] Global sequence based solely on their sequences, are reflected in differences in alignment using the Needleman–Wunsch (NW) algorithm (Fig. both structure and predicted dynamic behavior. 1B) and the blocks substitution matrix 62 (BLOSUM62) (37) The Gaussian network model (GNM) (29, 30) and distance scoring matrix was also performed. To provide a normalized per- matrix analysis (31) were used to establish structural characteristics. residue score, the total score for each 63-residue fragment was Similarities and differences in the dynamics of the three molecules divided by 63.0. [Changing from the BLOSUM62 scoring matrix were predicted using 1-μs all-atom molecular dynamics (MD) to the BLOSUM50 (37) scoring matrix produced only small simulations at 303.15 K and 400 K, and 1.5-μs MD simulations at differences in the alignment-based results.] It can be seen from 450 K, in explicit solvent, generated using the Chemistry at Harvard Fig. 1 that the overall degrees of similarity of the three sequence Macromolecular Mechanics (CHARMM) force field (32–35).