The Structure of Protein Dynamic Space

The structure of protein dynamic space S. Rackovskya,b,1 and Harold A. Scheragaa,1 aDepartment of Chemistry and Chemical Biology, Baker Laboratory, Cornell University, Ithaca, NY 14853; and bDepartment of Biochemistry and Biophysics, University of Rochester School of Medicine and Dentistry, Rochester, NY 14642 Contributed by Harold A. Scheraga, June 26, 2020 (sent for review May 5, 2020; reviewed by Robert L. Jernigan and Jeffrey Skolnick) We use a bioinformatic description of amino acid dynamic properties, 2) also encodes information about the influence of protein based on residue-specific average B factors, to construct a dynamics- fold and other residue-external factors on dynamics; based, large-scale description of a space of protein sequences. We 3) Using Fourier techniques, the global, whole-sequence dy- examine the relationship between that space and an independently namic properties of sequences can be represented; constructed, structure-based space comprising the same sequences. It 4) A substantial fraction of the information encoded in the is demonstrated that structure and dynamics are only moderately global representation of dynamic properties originates from correlated. It is further shown that helical proteins fall into two classes the part of which does not arise from single-amino acid with very different structure–dynamics relationships. We suggest that physical properties, and is therefore not accessible from any dynamics in the two helical classes are dominated by distinctly differ- representation based on static amino acid properties; ent modes––pseudo–one-dimensional, localized helical modes in one 5) Groups of proteins which fold to different architectures differ case, and pseudo–three-dimensional (3D) global modes in the other. from one another in their behavior in a detectable and sta- Sheet/barrel and mixed-α/β proteins exhibit more conventional tistically significant manner, when represented by global structure–dynamics relationships. It is found that the strongest corre- dynamic parameters. lation between structure and dynamic properties arises when the latter are represented by the sequence average of the dynamic index, The availability of a well-characterized bioinformatic quantity which corresponds physically to the overall mobility of the protein. derived from the dynamic properties of amino acids makes None of these results are accessible to bioinformatic methods hitherto possible study of the dynamic properties of proteins on a large available. scale, rather than anecdotally. We wish to understand the relationship between the space of protein structures and a parallel, B factor | protein dynamics | structure–dynamics relationships | distinct space determined by the dynamic properties of the same Fourier transform proteins. We demonstrate the following results: 1) The relationship between the two spaces is characterized, in rotein structure and evolution have been intensively studied for part, by an anomalous dependence of dynamic distance on many years, and vast bodies of sequence and structure data have P structure difference. been accumulated and analyzed using tools of bioinformatics. This 2) This anomaly arises from unexpected behavior of all-helical approach is usually referred to as “knowledge-based,” to distinguish proteins, which exhibit two distinct types of behavior in it from an equally impressive body of computational studies based dynamic space. on simulation, using physically motivated empirical energy func- 3) We suggest that these behaviors correspond to physically differ- tions, of actual physical processes. One central area of protein sci- ent dynamic regimes within the universe of all-helical proteins. ence, however, has thus far resisted knowledge-based study. Protein 4) Structure–dynamics correlations in proteins are encoded in dynamic characteristics have only been available computationally the overall mobility of the structure, rather than in more lo- from two frameworks. These are molecular-dynamic simulations calized descriptions of chain dynamic properties. and elastic network models, processor-intensive approaches which limit studies to single proteins, or to comparisons of small groups of molecules of interest. This situation arises from the fact that no Significance informatic parameter has been available which adequately repre- sents the dynamics of individual amino acids. Protein dynamic properties have been computationally accessible In recent work (1), we have developed a measure of the dynamic only by means of molecular-dynamic simulations, or network properties of amino acids in protein sequences which is suitable for models, of specific molecules. We have developed a bioinformatic bioinformatic use. This property is the residue-specific average approach to dynamics, which makes it possible to delineate the value of the B factor (2), determined from a large database of dynamic characteristics of large numbers of sequences simulta- protein structures. We denote the average B factor for amino acid neously. In this work we report an analysis of the large-scale Xas<B(X)>. The quantity plays the same general role with dynamic structure of protein space. It is demonstrated that pro- respect to dynamics that a hydrophobicity index plays with respect teins of different structural classes have different dynamic be- to solvent exposure. It is not the case that every hydrophilic amino haviors, and that all-helical proteins occur with two distinct types acid is in actual contact with solvent, nor does every amino acid with of dynamic behavior. One subset of helical proteins is character- a high value of exhibit high mobility. Rather, <B(X)> is a ized by localized, helix-based dynamics, while the complementary measure of the tendency of the amino acid X to be in motion. The subset exhibits dynamics of a more three-dimensional nature. This information carried by <B(X)> becomes important in the context information has not been available through the application of of a complete sequence, as is also true of hydrophobicity indices. more traditional methods. It was shown (1) that the values of differ between amino Author contributions: S.R. and H.A.S. designed research, performed research, analyzed acids in a statistically significant manner. Using statistical, signal data, and wrote the paper. processing, and information theoretic methods, we demonstrated Reviewers: R.L.J., Iowa State University; and J.S., Georgia Institute of Technology. several properties of : The authors declare no competing interest. 1) Values of are partly, but not exclusively, determined by Published under the PNAS license. the values of the intrinsic physical properties of the amino 1To whom correspondence may be addressed. Email: [email protected] or has5@ acids, as represented by an complete and orthogonal set of cornell.edu. property factors (3, 4); First published August 5, 2020. 19938–19942 | PNAS | August 18, 2020 | vol. 117 | no. 33 www.pnas.org/cgi/doi/10.1073/pnas.2008873117 Downloaded by guest on September 29, 2021 We turn next to the construction of the distance function in D. The proteins in our database are labeled by values of the four indices C,A,T, and H which together classify entries in the CATH database (8). We focus first on the identifier C, which specifies structural class. It will be remembered that C = 1 denotes helical architecture, C = 2 sheet/barrel architecture, and C = 3 mixed-α/β architecture. The sequences of the 5,719 proteins in our database are written in terms of <B(X)>, giving a numerical string for each sequence (which we denote as the dynamic sequence), and we ask in what way sequences belonging to the three C classes differ from one another. We answer this question by Fourier analyzing the dynamic sequences, and carrying out an ANOVA analysis of the distributions of the resulting Fourier coefficients. (Details of the procedure are given in Methods.) We require a high degree of statistical significance (9), and find that there are 11 values of the wave number k at which the distributions of sine or cosine Fourier coefficients of sequences belonging to the three classes differ from one another with P < 0.0001. We measure distance in D using a weighted Euclidean distance function based on these 11 Fourier Fig. 1. Side-by-side boxplot of the average (<RALL>), maximum [MAX(- coefficients, as shown in Eq. 4 below. The weighting allows us to RALL)], and minimum [MIN(RALL)] of the correlation between structure and measure independently the contribution of each of the significant dynamic distances, over all possible choices of the weighting set {w } and all i wave numbers to any structure–dynamics correlation we observe. proteins in the dataset. The values of these quantities for the choice {wi} = (1,0,....,0) (see text) are indicated by arrows. Given these two functions, the correlation between distances in the two spaces can be determined for any protein in the database, and for any set of values of the dynamic weighting fac- S D Results and Discussion tors. We denote this correlation coefficient by R( m, m;{wi}), where distances are measured from protein m, and {w |i = In the present work, we examine a basic, but hitherto inaccessible i 1,2...11} is the set of weighting functions used in the dynamic BIOPHYSICS AND COMPUTATIONAL BIOLOGY question about proteins––whether structure and dynamics are distance function. We have carried out this calculation for every related in a simple way. We proceed as follows. protein in the database, and for all 2,047 possible binary values 1) We construct a dynamics-based distance function between of the 11 weights. The results are summarized in Fig. 1, in which proteins, based on the global properties of , and apply we show a side-by-side boxplot of the average value, maximum, it to a large protein database. This generates a protein space and minimum of R(Sm,Dm;{wi}). It will be seen that one specific determined by sequence dynamic properties. choice of {wi} gives an exceptionally large average and range for 2) We construct a structure-based distance function between pro- R.

The Structure of Protein Dynamic Space

Fast Pressure-Jump All-Atom Simulations and Experiments Reveal Site-Specific Protein Dehydration-Folding Dynamics

Studying Backbone Torsional Dynamics of Intrinsically Disordered Proteins Using ﬂuorescence Depolarization Kinetics

Linking Predictions of Protein Structure and Disorder Through Molecular Simulation

The Conformation of Serum Albumin in Solution: a Combined Phosphorescence Depolarization-Hydrodynamic Modeling Study

Numerical Evaluation of Protein Global Vibrations at Terahertz Frequencies by Means of Elastic Lattice Models †

Dynamics of Proteins in Solution Biophysics

Comparisons of Protein Dynamics from Experimental Structure

Nucleolar Proteome Dynamics Analysis

Fluorescence Methods to Study Protein Dynamics in Living Cells

Protein Dynamics and Entropy: Implications for Protein-Ligand Binding

Protein Dynamics 11/16/16 and 11/18/16

Proteomics of Phosphorylation and Protein Dynamics During Fertilization