Overview Cooperative dynamics of proteins unraveled by network models Eran Eyal,1,2 Anindita Dutta1 and Ivet Bahar1∗

Recent years have seen a significant increase in the number of computational studies that adopted network models for investigating biomolecular systems dy- namics and interactions. In particular, elastic network models have proven useful in elucidating the dynamics and allosteric signaling mechanisms of proteins and their complexes. Here we present an overview of two most widely used elastic network models, the Gaussian Network Model (GNM) and Anisotropic Network Model (ANM). We illustrate their use in (i) explaining the anisotropic response of proteins observed in external pulling experiments, (ii) identifying residues that possess high allosteric potentials, and demonstrating in this context the propen- sity of catalytic sites and metal-binding sites for enabling efficient signal trans- duction, and (iii) assisting in structure refinement, molecular replacement and comparative modeling of ligand-bound forms via efficient sampling of energeti- cally favored conformers. C 2011 John Wiley & Sons, Ltd. WIREs Comput Mol Sci 2011 1 426–439 DOI: 10.1002/wcms.44

INTRODUCTION that proteins have ‘intrinsic’ abilities, uniquely en- coded by their three-dimensional (3D) fold, to sample ost proteins are molecular machines. They 1,2 M achieve their function because they possess functional fluctuations near their native state. the ability to undergo the structural changes required From a statistical thermodynamics perspective, for their function. These structural changes may vary the accessible conformations result from the equilib- over a broad range of scales, from atomic fluctua- rium fluctuations near the global free energy mini- tions and side chain rotations to collective domain or mum. Some directions of fluctuations are more proba- subunit movements. Yet, even the large-scale move- ble than others. These are the ‘soft modes’ of motions; ments do not, in principle, alter the native ‘fold’; the they evolve along a direction, on the multidimensional packing of secondary structural elements and/or the energy landscape, which by definition involves a rel- distribution of tertiary and quaternary contacts be- atively small energy ascent for a given deformation. tween residues remain unchanged for the most part. These same modes are also expected to be most fre- The collective motions allow for cooperative rear- quently recruited to achieve function, following the rangements of domains with respect to each other, hypothesis that structures may have evolved to ac- or loop motions, which maintain the overall architec- cess most easily their functional movements. This hy- ture/fold, or allow the protein to restore its original pothesis motivated a large body of studies aiming at conformation, similar to an elastic material. Here, characterizing the soft modes and exploring their rel- cooperative motions refer to conformational changes evance to function. A popular approach has been to that collectively engage large portions of the struc- extract the soft modes from the normal mode analysis ture and usually lie at the lowest frequency end of (NMA) of ENMs. In parallel, simplified models such as the Gaussian network model (GNM),3,4 rooted in the spectrum of modes accessible to the protein un- 5 der physiological conditions. Not surprisingly, elastic the statistical thermodynamics of polymer networks, network models (ENMs) have been broadly exploited have been broadly used in the last decade. Theory and in unraveling , based on the premise Methods section provides an overview of the theory and assumptions underlying these approaches. ∗Correspondence to: [email protected] ENM–NMAs and GNM analyses are now pro- 1Department of Computational & Systems Biology, School of viding us with a better understanding of the coopera- Medicine, University of Pittsburgh, Pittsburgh, PA, USA tive machinery and design principles of biomolecular 2Cancer Research Institute, Sheba Medical Center, Ramat Gan, systems, as will be illustrated for a few cases in the Israel section Learning from Network Models: Cooperative DOI: 10.1002/wcms.44 Responses and Communication. First, the soft modes

426 c 2011 John Wiley & Sons, Ltd. Volume 1, May/June 2011 WIREs Computational Molecular Science Cooperative dynamics of proteins unraveled by network models are highly cooperative as may be quantified by their clustering techniques to identify the sites that play degree of collectivity,6 i.e., they cooperatively entail a a critical role in the flow of information across the large part of the molecule, if not the entire molecule. ‘network’ of amino acids in the folded state. The col- They are also very robust, as indicated by the fol- lective fluctuations predicted by the GNM also define lowing two features: (i) detailed atomic coordinates signal propagation mechanisms, as derived using a or energetics have negligible effect on these modes, Markovian model in a recent study.14 Certain ‘nodes’ as originally demonstrated by Tirion,7 and confirmed are distinguished by their effective signal transmis- in many studies since then. This is presumably due sion, or allosteric communication, properties. Exam- to the fact that they are collectively defined by all ination of the identity of such key sites reveals a the 3N-6 internal degrees of freedom for a struc- striking preponderance of active sites (e.g., catalytic ture of N interaction sites (atoms or pseudoatoms); residues and metal-binding sites), suggesting that effi- and (ii) the intrinsic, structure-encoded fluctuations cient transmission of signals is an inherent, structure- in conformations undergone by (in the ab- encoded property of active sites. Conceivably, active sence of ligand binding) are in close accord with sites of proteins evolved to be positioned at sites that the changes observed in ligand-bound forms, sug- lend themselves to efficient communication. Examples gesting that enzymes are predisposed to make the will be presented in the section Reconciling Equilib- conformational rearrangements required for ligand rium Dynamics and Signal Propagation Stochastics. binding.8,9 A similar behavior is exhibited by antibod- A practical utility of ENM–NMA of proteins is ies when binding their antigens10,11 or by allosteric the predictions of collective changes of structures that proteins in general.1 It should be noted that predis- are expected to be most readily accessible or to gener- position of the means the accessibility of the ate alternative conformations that satisfy certain re- structural transition path, or a given mode of motion, quirements, e.g., optimal binding of a substrate.8,10,15 rather than the coexistence of the alternative state. Conformational changes along the soft modes have As such, predisposition does not exclude the adapta- indeed been the focus of essential dynamics analysis of tion of a given protein upon binding its ligand, but (MD) simulations trajectories16 implies that the directions of structural change are se- and principal component analysis of ensembles of lected from among those intrinsically accessible prior structures.8 The soft modes are advantageously ex- to deformation. ploited for deforming high-resolution structures to fit Second, the accessible modes provide a complete cryo-electron microscopy (EM) images,17–19 for struc- orthonormal basis set for describing the equilibrium tural refinement, and for generating alternative con- motions in the 3N-6 dimensional space of internal formations that would suitably account for structural coordinates. Each mode may be viewed as a mech- changes stabilized in various complexes/assemblies anism of relaxation, or dissipation of energy, if the (e.g., upon binding substrate proteins or ligands). structure is subjected to a perturbation from the en- NMA in Structural Refinement section will summa- vironment. Low-frequency (or soft) modes evidently rize recent developments in that direction and illus- constitute the most favorable paths of energy dissi- trate the utility of ENM–NMA in refining homology pation, as they incur the lowest ascent on the energy models. surface for a given deformation. Or, the accommoda- tion of a given external stress by a soft mode would be expected to entail a lower internal resistance; there- fore, soft modes are naturally activated in response to THEORY AND METHODS perturbations. The effective forces measured for ex- erting uniaxial tensions along different directions can Normal Mode Analysis indeed be explained by the selective recruitment of Classical NMA originated from spectroscopic analy- the modes that comply with the extension.12,13 This ses of molecules, in which the absorption in the vi- concept will be elaborated in the section Anisotropic brational spectra is given by normal modes. Its first Response to Mechanical Stress: Selection of Suitable application to proteins dates back to early 1980s.20–22 Modes. Renewed interest in the last decade stems from the Third, biomolecular structures need to be opti- fact that the low-frequency modes can be robustly mally ‘wired’ to efficiently transmit signals, to couple and efficiently derived using simplified models such or exchange their chemical and mechanical energies, as ENMs, and despite such simplifications, the global or induce allosteric responses. ENM representation modes accessible to biomolecular systems are ro- of proteins permits us to utilize graph theory, along bustly determined and they usually relate to func- with information theoretic approaches and spectral tional motions.23,24

Volume 1, May/June 2011 c 2011 John Wiley & Sons, Ltd. 427 Overview wires.wiley.com/wcms

The potential energy V(q) of a molecular system approximated by harmonic potentials. This assump- of n degrees of freedom (e.g., atomic coordinates) may tion is strictly valid in the immediate neighborhood be approximated to the second order of the series of the global energy minimum. Proteins are subject, 0 0 0 expansion around the equilibrium state q = [q1 q2 however, to anharmonic potentials and nonlinear dy- 0 0 T q3 ...qn ] as: namics, which are manifested by the ruggedness of 0 the energy landscape and multiple minima/barriers. 1 ∂2V V(q) = q − q0 q − q0 Even side-chain bond rotations exhibit two or three 2 i i ∂q q j j i, j i j isomeric states often approximated by cosine func- tions. Furthermore, solvent damping effect is not ac- 1 = qT K q, (1) counted for by conventional NMA. The results from 2 NMA should therefore be interpreted in the context where q is the 3n-dimensional vector of the in- of these approximations. As discussed below, coarse stantaneous fluctuations, the superscript T desig- graining presents, in a sense, the advantage of elimi- nates its transpose, and K is the positive semidefi- nating certain degrees of freedom and resulting in a nite matrix of second derivatives of the potential with ‘smoother’ energy landscape where local minima are respect to atomic coordinates. The ijth element of K overlooked (or overcome) during collective motions, 2 0 is Kij = [∂ V/∂qi ∂qj] . If we treat structural points as providing access to substates separated by low-energy harmonic oscillators in the absence of other effects, barriers, but this comes at the cost of loosing atomic the protein obeys the equation of motion: level accuracy. d2q M + K q = 0, (2) dt2 GNM and Anisotropic Network Model where M is a diagonal matrix composed of n superele- The use of ENMs to represent native proteins has ments mi I3 along the diagonal, where mi is the mass the following two computational advantages: (i) there of the ith atom and I3 is the identity matrix of order is no need for energy minimization and the Protein 3. The general solution to Eq. (2) is a 3n-dimensional Data Bank (PDB)25 structure is assumed to represent vector of the form q(t) = aeiωt, which, upon substi- a global energy minimum, and (ii) a coarse-grained tution into Eq. (2), leads to model is adopted for structure and energetics; typ- ically, each node represents a residue and the pairs (−ω2 M + K) a = 0. (3) of nodes within a certain cutoff distance (Rc)are − 1 Premultiplication of Eq. (3) by M 2 yields connected by springs of uniform force constant γ , 2 1 − 1 − 1 − 1 1 ω M 2 a = M 2 Ka = M 2 K[M 2 M 2 ]a, which, upon which significantly reduce the size and complexity of 3,4 the change of variables, u = M1/2a, λ = ω2, and H = H. Two ENMs, the GNM and the anisotropic net- − 1 − 1 26–28 M 2 KM 2 , leads to the eigenvalue equation work model (ANM), have found wide use in the last decade. λ u = Hu. (4) NMA is the solution of this equation to ob- Gaussian Network Model tain the 3n-6 nonzero eigenvectors u(k) of the Hessian Following the original statistical thermodynamics the- 5 H, along with the corresponding eigenvalues, λk.The ory of random polymer networks, the network nodes eigenvectors are 3n-dimensional vectors, the elements are assumed to undergo Gaussian fluctuations Ri of which are organized in 3D vectors that represent (1 ≤ i ≤ N) about their mean positions. Likewise, the (k) (k) (k) the displacements u1 , u2 , ...un of the n atoms distance vectors between nodes undergo Gaussianly 0 away from their equilibrium positions as the struc- distributed fluctuations Rij = Rij − Rij = Rj − Ri ture reconfigures along a given mode (e.g., mode k); (Figure 1). The coordinates of α-carbons are used to and the eigenvalue λk is the corresponding squared define the spatial position of the nodes. The GNM po-  = frequency. λk scales with the curvature of the energy tential is given in terms of the fluctuations Ri Ri – 0 3 landscape along mode k. Thus, the lower frequency Ri in the position vectors of the nodes as modes have a smaller curvature/stiffness and they un- ⎡ ⎤ dergo larger excursions from the energy minimum for γ V = ⎣ (− )[R − R ]2⎦ a given energy increase, hence their ‘soft modes’ at- GNM 2 ij j i i j, j>i tribute. γ One clear limitation of this method is its appli- = [X TX + Y TY + ZTZ], cability to near-native conditions only, because it im- 2 plicitly assumes linear dynamics, all interactions being (5)

428 c 2011 John Wiley & Sons, Ltd. Volume 1, May/June 2011 WIREs Computational Molecular Science Cooperative dynamics of proteins unraveled by network models

inverted. Instead, its pseudoinverse is calculated using the N − 1 nonzero eigenvalues σ k and eigenvectors v(k) of . In compact notation, N−1 (N) = 3kBT −1 = 3kBT σ −1v(k)v(k)T , C γ γ k (7) k=1 where C(N) is the N × N covariance matrix; its ijth (N) element is Cij = <Ri Rj>, and the ith diagonal (N) element Cii is simply the mean-square fluctuations 2 <(Ri) > of residue i. The contribution of any sub- set of modes to the cross-correlations or mean-square fluctuations may be evaluated by performing the sum- mation in Eq. (8) over this particular subset. Note that the soft modes (smallest eigenvalues) make the largest contribution to the covariance. The eigenvectors are (k) (k)T normalized such that the plot of [v v ]ii as a func- tion of residue i represents the probability distribution of residue fluctuations in mode k, also called the kth mode profile. FIGURE 1| Schematic representation of equilibrium fluctuations. A portion of the protein backbone is displayed by the dotted curve. Anisotropic Network Model α Filled dots refer to interaction sites (e.g., C -atoms) that are adopted as The ANM potential, VANM, is similar in form to 0 0 the network nodes. Ri and Rj designate the equilibrium positions of VGNM, except for the replacement of the scalar prod- 2 0 0 residues i and j; Ri and Rj are their instantaneous position vectors. The uct [Rj − Ri] ≡ [(Rij − Rij ) • (Rij – Rij )] in Eq. (6) 0 2 original and instantaneous separations are indicated by the solid line by [|Rij| − |Rij |] where |Rij| designates the magnitude and dashed line, respectively. The fluctuations in the position vectors of Rij. Thus, VGNM penalizes the change in the orien- are given by R and R . R 0 and R designate the equilibrium and j i ij ij tation of Rij even if the magnitude of the distance instantaneous distance vectors between residues i and j. The change in vectors is maintained, whereas VANM is exclusively the interresidue distance vector is related to the fluctuations in residue  = − 0 =  −  based on distance changes. The second derivative of positions as Rij Rij Rij Rj Ri, and may be expressed as 26 (k) VANM leads to expressions of the form a weighted sum over the contributions Rij of individual modes. 2 ∂ VANM/∂ Xi ∂Yj |R0 0 0 0 0 c 2 where is the Kirchhoff (or connectivity) matrix, the =−γij Xj − Xi Yj − Yi Rij off-diagonal elements of which are defined as ij =−1 =−γ 0 0 0 2 if |Rij| ≤ Rc and zero otherwise, and the diagonal ele- ij Xij Yij Rij (8) ments are evaluated from the summation  =−  ii j ij in terms of the components X 0, Y 0, and Z 0 of R 0. over all off-diagonal elements in the ith row (or col- i i i i This permits us to express the elements of ANM Hes- umn); X, Y, and Z are the N-dimensional vec- sian H in closed form, and easily evaluate the ANM tors of the respective X-, Y-, and Z-components of eigenvector u(k) and eigenvalue λ ,(1≤ k ≤ 3N-6), the fluctuation vectors R , R , ..., and R cor- k 1 2 N which provide information on the shape and frequen- responding to the N residues. cies of normal modes subject to V . u(k) is a 3N- The cross-correlations between the fluctuations ANM dimensional vector, composed of 3D vectors u (k), of residues i and j are found from the statistical me- 1 u (k), ... u (k), that describe the displacements of all chanical average3 2 N N residues in mode k. The change in interresidue dis-  ·  =  ·  − / { }/ tance induced by mode k is given in terms of the Ri R j ( Ri R j ) exp( V kBT)d R elements of u(k) as 3k T  (k) = (k) − 0 = /λ 1/2 (k) − (k) . × − / { }= B −1 , Rij Rij Rij (kBT k) ui ui (9) exp( V kBT)d R γ [ ]ij (6) Likewise, a 3N × 3N covariance matrix, C(3N), −1 −1 where [ ]ij is the ijth element of , kB is the composed of blocks of size 3 × 3 associated with Boltzmann constant, and T is the absolute tempera- the three components of each fluctuation vector Rj ture. Note the determinant of is 0, i.e., cannot be replaces C(N). C(3N) scales with the inverse of the

Volume 1, May/June 2011 c 2011 John Wiley & Sons, Ltd. 429 Overview wires.wiley.com/wcms

Hessian as The strength of interactions (or weight of edges) in this network model (or the nonzero elements of ) 3N−6 (3N) −1 −1 (k) (k)T could be assumed to be uniform (as in the GNM) C = kBT H = kBT λ u u . (10) k as a first approximation. A slightly refined model, k=1 however, is to take account of residue specificity at The GNM/ANM results depend on the over- a coarse-grained√ scale. To this aim, we define affini- all ‘fold’ rather than specific interactions or detailed ties aij = Nij/ (NiNj) as the weights of the edges, atomic coordinates. As such, they provide informa- where Nij is the total number of heavy atom contacts tion on the global dynamics favored by the partic- between residues i and j with the denominator cor- ular fold/architecture, rather than local changes in recting for bias due to size (Figure 2). The degree dj structure and interactions. The main utility of GNM/ of each node is given by i aij. The Kirchhoff ma- ANM is indeed the ability to efficiently explore global trix corresponding to this model is then = D – A, dynamics (collective motions in the low-frequency where D is the diagonal matrix of the degrees and A regime), which have been demonstrated in multi- is the matrix of affinities. The conditional probability ple studies to be insensitive to structural details or of transmitting a signal from residue j to residue i in −1 detailed force fields. Results on the high-frequency one time step is simply mij = dj aij (where mij is the modes have limited utility: they permit us to iden- ijth element of the Markov transition matrix M).14 tify potentially conserved sites and folding nuclei, but A metric of efficiency of signal transmission in provide no information on the mechanisms of local network models is the hit time, which is the aver- motions.29–32 It should also be noted that protein age number of steps it takes to transmit information conformers used as input in, or generated as outputs from broadcaster i to receiver j. This is described by from, ENM–NMA are not energy minimized with a enumerating all possible ways to get from residue i molecular mechanics force field and hence may not be to residue j, weighted by the transition/conditional directly utilized in docking studies that require atomic probability of signal flow. The recursive equation to level precision. Efforts are underway for designing evaluate the average hit time between nodes i and j is hybrid methodologies that combine the information given by: on global dynamics from ENMs and local motions n from MD simulations33 to efficiently explore mul- H( j, i) = [1 + H( j, k)]m tiscale processes. Also, sequence information is not ki k=1 included in ENMs and hence single mutations that n n result in loss of function are not captured by ENM = m + H( j, k)m analysis of the mutant, unless there are structural ki ki = = , = data for the mutant, which differ from that of the k 1 k 1 k j wild-type protein. Clearly, NMAs with ENMs also n = + , . suffer from the same limitations (e.g., inadequate de- 1 H( j k)mki (11) scription of nonlinear effects, solvent effect, etc.) as k=1, k = j atomic NMA. Yet, the coarse-graining of the struc- = −1  = − ture and interactions allows for efficient sampling of Substituting M DA and D A, H(j,i) −1 14 domain rearrangements and cooperative movements can be expressed in terms of the elements of as near native state conditions. These motions could be N hardly accessible (due to local energy barriers) should −1 −1 −1 H( j, i) = {[ ]ki − [ ] ji − [ ]kj a detailed, full atomic description of the structure and k=1 dynamics be implemented. −1 + [ ] jj}dk, (12)

Markovian Propagation of or, using Eq. (6), Allosteric Signals N −1 Network models provide a useful tool for investigat- H( j, i) = (3kBT/γ ) [Rk.Ri −Rj .Ri  ing the communication and signaling properties of k=1 complex systems. A widely used approach is model −Rk.Rj +Rj .Rj ]dk. (13) signal transduction as a Markovian stochastic pro- cess. We recently explored the possibility of probing the stochastics of signal propagation in proteins us- H(j,i) depends on the direction of signal trans- ing a discrete time, discrete state Markov model.14 mission, i.e., H(j,i) = H(i, j).

430 c 2011 John Wiley & Sons, Ltd. Volume 1, May/June 2011 WIREs Computational Molecular Science Cooperative dynamics of proteins unraveled by network models

FIGURE 2| Schematic description of the evaluation of affinity matrix elements. Two residues within interaction range are displayed, composed of Ni = 7andNj = 11 atoms. The affinity is evaluated based on atom–atom contacts that are closer than a cutoff separation, e.g., 4.0 Å. In the present case, there is only one pair of atom within this interaction rage, such that Nij = 1, and the affinity between this pair becomes √ 1 aij = Nij/ (NiNj) = 1/(77) 2 (see Markovian Propagation of Allosteric Signals section).

Commute time, τ (i, j) ≡ H(j,i) + H(i, j), is tubules. This, together with recent advances in single- another metric, which has no directionality.14 τ (i, j) molecule manipulation techniques, led to increased scales with the fluctuations in the distance vector Rij interest in the mechanical response of proteins to ex- as ternal pulling forces applied at specific positions.34 A database of results from such experiments has been τ , ∝{−1 + −1 − −1 } (i j) [ ]ii [ ]jj 2[ ]ij recently constructed.34,35

=Rij.Rij, (14) Theoretical studies for elucidating the relation- ship between mechanical stability and biological func- where the proportionality constant is tion have been traditionally performed using steered γ −1 (3kBT/ ) kdk. Eq. (14) is readily obtained MD.36–40 More recently, network models and ana- using Eq. (13) twice, for H(j,i) and H(i, j). Eqs (13) lytical treatments have been utilized. One example is and (14) permit us to bridge between statistical the ANM analysis of the anisotropic response to de- physical quantities such as mean-square fluctuations formations in different pulling directions,12 presented or cross-correlations with graph-theoretic concepts below in some detail. Although this approach assumes such as hitting or commute times. We will illustrate that the unfolding path correlates with the ‘weak/soft the utility of these concepts for analyzing signal trans- directions’ near the native state, other studies went on duction behavior of proteins by way of application to longer and more detailed simulations of the process to two cases in the section Reconciling Equilibrium and demonstrated explicitly the relation between nor- Dynamics and Signal Propagation Stochastics. mal modes and the unfolding process by interactively considering the specific interactions being broken. An example is the work of Dietz and Reif41 wherein the LEARNING FROM NETWORK fractional force that each ‘bond’ experiences so as to MODELS: COOPERATIVE RESPONSES withstand a uniaxial tension was evaluated, which AND COMMUNICATION also yielded good agreement with experimental data on green fluorescent protein (GFP). In another study, Anisotropic Response to Mechanical Stress: snapshots from stochastic simulations of stretching Selection of Suitable Modes have been analyzed.42 ENMs also assist in identifying Biomolecules are often subjected to mechanical folding nuclei32,43 and exploring the order of events pressures—e.g., within muscle fibers and micro- in thermal unfolding. A recent application of GNM

Volume 1, May/June 2011 c 2011 John Wiley & Sons, Ltd. 431 Overview wires.wiley.com/wcms

FIGURE 3| (a) Comparison of mechanical stress (uniaxial tension) experiments with anisotropic network model (ANM) predictions for green (k) fluorescent protein (GFP). The plots on the left display the normalized contributions dij of ANM modes k = 1, 3N-6 (abscissa) to the deformation of the molecule in response to uniaxial tension exerted at residues (i, j) = (K3, N212) (top) and (D117,Y182) (bottom). When pulling apart the residues Lys3–Asn212 (upper diagram), the deformation is largely accommodated one soft mode (note the peak at mode 1). In the case of Asp117–Tyr182 (lower diagram), the response is distributed over a broader range of modes, including high-frequency modes that entail high energy cost. The force required to unfold the molecule by stretching in the Lys3–Asn112 direction was reported46 to be five times weaker than that needed in the Asp117–Tyr182 direction, consistent with the mode distributions. (b) Complete mechanical resistance map generated for GFP by ANM. The entries in the map represent the effective force constants <κij> (Eq. 15) in response to uniaxial tensions exerted at each of pair of residues. The secondary structure of the protein is shown along the upper abscissa and on the right (blue arrows, β-strand; zigzag line, α-helix). The profile at the lower part of the map displays the mean resistance per residue, averaged over all values in a given column. See our study12 for more details. wherein native contacts are iteratively broken to sim- and Asp117–Tyr182 (bottom). In the former case, ulate unfolding yielded results in accord with those the low-frequency modes make a larger contribution 43 (k) observed in MD and Monte Carlo simulations. (e.g., d3−212 > 0.6 for k = 1), consistent with the The ANM permits us to calculate an effective relatively low (∼115 pN) pulling force observed in 46 force constant <κij> for the stiffness/resistance of the experiments. In the latter case, on the contrary, (k) molecule against deformation under uniaxial tension d117−182 values are more uniformly distributed over applied to residues i and j.12 The idea is simple: if these a broader range of modes, including higher frequency residues already undergo long distance fluctuations modes, which are energetically expensive. Consis- by virtue of intrinsically accessible slow modes, the tent with these predictions, the GFP is destabilized 46 molecule is more ‘yielding’. <κij> is the ‘average force at 548 pN upon pulling these two residues. Simi- constant’ evaluated using12 lar calculations performed for all residue pairs led to the resistance map (<κij> values) displayed in Figure (κ) (κ) <κij >= dij λk dij , (15) 3. The lower profile is the average over all entries in k k each column, thus providing a measure of mechani- cal resistance for each individual amino acid. Similar (κ) where dij represents the contribution of mode k to calculations performed for various pulling directions the extension of the interresidue distance Rij, given by and various proteins yielded results in good agree- (k) 0 (k) (k) the projection of Rij onto Rij , i.e., dij = Rij ment with experiments despite the simplicity of the 0 (k) cos(Rij , Rij ). theory and the differences in various experimental We recently analyzed12 the mechanical behavior conditions.12 observed by single-molecule atomic force microscopy In principle, the forces predicted by the ANM re- for a series of proteins.44–46 Figure 3 illustrates the fer to behavior in the immediate neighborhood of the results obtained for GFP. The bars represent the global energy minimum or to the resistance to initiat- (k) computed contributions dij of different modes to ing a motion in a particular direction. Experiments, the extensions along the pairs Lys3–Asn212 (top) on the contrary, probe the unfolding behavior. The

432 c 2011 John Wiley & Sons, Ltd. Volume 1, May/June 2011 WIREs Computational Molecular Science Cooperative dynamics of proteins unraveled by network models

consistency between theory and experiments points to to having small values (abscissa), their vari- a correlation between the original stiffness/resistance ances in H(j,i)(for1≤ i ≤ N) are small, suggesting to exiting from the folded state and the effective cost that these residues are not only fast but also precise of unfolding. The approach of Dietz and Reif,41 on in their communication with other residues. the contrary, assumes that a single bond breakage Panels (a )–(c ), on the contrary, present the re- is equivalent to complete unfolding. Both theoreti- sults for an enzyme, human carbonic anhydrase II, cal approaches are based on native contact topology with respect to its catalytic residues H64, Q92, H94, and cannot be used for unfolding processes wherein H96, and H119. Again, the active residues appear the transition state is far from the native state. They to be highly predisposed to efficient communication yield relative forces for different residue pairs, rather with the rest of the protein, in addition to be precisely than absolute values; and they lack sequence infor- tuned (i.e., have minimal variance in hitting times). mation. Thus due to calibrations that depend on the These results suggest that the positions of the protein and on experimental conditions alike, these active residues in the 3D structure may have been models cannot compare values obtained for different evolutionarily optimized to ensure efficient communi- proteins or mutants. Despite such limitations, coarse- cation. In order to examine to which extent the sites grained (CG) models42,47 and in particular ENM- with high allosteric potential (or enhanced commu- based approaches12,13,41,42 appear as useful tools for nication properties) are evolutionarily conserved, we estimating the anisotropic response of proteins to ex- compared in Figure 5 the color-coded diagrams based ternal stress, and may assist in designing or interpret- on communication properties (panels a and a ) and ing single-molecule experiments at a very low com- conservation properties (panels b and b ). Conserva- puting time and memory cost. tion scores were obtained using the ConSurf server.49 The comparisons of panels (a) and (b), and panels (a ) and (b ) clearly demonstrates the correlation be- Reconciling Equilibrium Dynamics and tween conservation patterns and efficient communica- Signal Propagation Stochastics tion propensities, i.e., those residues distinguished by Equations (13) and (14) establish the connection their efficient signaling properties are also evolution- between information theoretic concepts (hit and com- arily conserved presumably due to their functional mute times) and physics-based equilibrium dynam- role. ics (e.g., mean-square fluctuations in interresidue dis- There are currently several studies that unravel tances). Previous work demonstrated that hit times the structural origins of molecular signaling pro- are relatively small when targeting catalytic residues, cesses using NMA. A recent study by Beratan and consistent with the efficient communication require- coworkers50 demonstrated, e.g., that local structural ments of active sites.14 A more recent study also changes triggered by ligand binding can translate shows that metal-binding residues follow the same into signaling responses at remote allosteric sites and trend.48 different motions mediate different signaling path- Figure 4 illustrates these concepts for two ex- ways. The connection between signaling pathways ample cases. Panels (a)–(c) display the results for a and collective motions predicted by ENM–NMA has xylose isomerase that binds a cobalt ion. Panel (a) been also demonstrated in an extensive study of shows the hit time map of the metal-binding protein. GroEL–GroES machinery.51 Such analyses could be It can be seen that signal propagation properties es- particularly useful in structure-based design of in- sentially depend on the target site (abscissa). For a hibitors/regulators of allosteric proteins. clearer understanding of the signaling properties of individual residues, we define two single-body prop- erties: mean hitting times and mean commute times, NMA in Structural Refinement 20–22 = i H(j,i)/N and <τ(j)> = i τ(j,i)/N for Soon after the pioneering articles in the 1980s each residue. These properties provide a measure of that demonstrated the utility of NMA for investi- the allosteric potential of residues. Panel (b) displays gating the equilibrium dynamics of proteins, it was the mean hitting times for xylose isomerase. Metal- suggested that the new methodology could assist in binding residues are E267, E231, D338, and D295 in refining X-ray structures.52–55 The refinement of B- this case. The hitting time profile shows that all these factors, previously based on a single fitted parame- residues (indicated by red dots) are located at min- ter per atom, was replaced by a summation of the ima, i.e., they communicate efficiently with all other anisotropic fluctuations over a selected set of low- residues or they are very sensitive to signals conveyed frequency normal modes. The main conceptual ad- by other residues. Panel (c) shows that in addition vantage of a NMA-based refinement over traditional

Volume 1, May/June 2011 c 2011 John Wiley & Sons, Ltd. 433 Overview wires.wiley.com/wcms

FIGURE 4| Efficient signal propagation properties of active sites illustrated for a metal-binding protein (a–c) and an enzyme (a –c ). Hit time maps for cobalt-binding xylose isomerase [Protein Data Bank (PDB) code: 1a0c] and for human carbonic anhydrase II (PDB code: 1a42) are displayed in the respective panels (a) and (a ). Average hit time profiles as a function of residue number are shown in panels (b) and (b )with metal-binding and catalytic residues labeled by the red dots in the two respective cases. Panels (c) and (c ) display the average hit time versus standard deviation for each residue for the two cases. The residues that participate in metal binding (b) and catalysis (b ) (shown in red markers) exhibit small average hit time and small-to-moderate standard deviation, indicative of their fast and precise communication properties. refinements is that it takes account of intrinsic mo- ters may outperform standard refinements with many tions and correlations. The essence of the refinement more parameters52,54,56 while avoiding overfitting. procedure is to minimize the difference (R-factor) be- NMA refinement also allows for large movements tween observed structure factors and those calculated beyond the capability of other refinement methods57 based on model parameters. In a typical NMA-based and is of particular utility for large structures and refinement, the model is evaluated using the posi- complexes that are inherently flexible.58–60 In an ap- 0 + tion vectors, R k ckuk, perturbed along normal plication to the potassium channel KcsA, e.g., Chen et 56 modes (k), where ck is the contribution of mode k to al. were able to resolve residues that were originally be fitted. As low-frequency modes induce the largest missing and obtain anisotropic displacement param- displacements, a subset of slow modes is usually suffi- eters related to functional motions. cient, and thus the number of fitting parameters is usu- An additional advantage of NMA refinement ally significantly smaller than the number of atoms. is that it inherently decomposes the atomic fluctua- NMA-based fitting with a small number of parame- tions into their internal and external components. The

434 c 2011 John Wiley & Sons, Ltd. Volume 1, May/June 2011 WIREs Computational Molecular Science Cooperative dynamics of proteins unraveled by network models

that cannot be resolved currently using MR—despite the existence of apparently suitable templates—will be resolved upon NMA-based perturbations of their templates. This will save effort, time, and resources, setting the stage for application of NMA-based ap- proaches. With the advances in cryo-EM techniques, there is a great interest in refining the low-resolution (some- times in the order of 15–25 Å) data. A useful strategy to this aim is to dock known higher resolution struc- tures (or substructures) onto EM density maps, but of- ten the dynamic nature of the molecules is overlooked. There is a need for tools that consider the flexibility of the molecule during refinement, and NMA has been naturally employed for this purpose.17,19,63,64 The objective function to be minimized is the difference between computed and measured electron densities, using NMA soft modes as search directions. Applica- tions of this methodology confirmed that large con- formational changes need to be accounted for in order to have a good fit to EM density maps. An impor- tant contribution was the development of Quantized FIGURE 5| Correlation between signaling efficiency and 65,66 Elastic Deformational Model, wherein the pro- evolutionary conservation. Panels (a; 1a0c) and (a ; 1a42) show the tein is treated as an elastic object with mass distri- ribbon diagrams color coded by the distance of residues from the bution taken from the density maps. An elegant ap- origin in Figure 4 (c) and (c ). The regions colored blue refer to the 67 fastest and most precise communication sites. Panels (b) and (b )are plication to fatty acid synthase demonstrated that color coded by conservation score of each residue as obtained from the structures along electron density maps conform to Consurf server, the most conserved sites being colored purple. The the first two softest modes. This study highlighted the metal-binding and catalytic sites are shown in space-filling strength of combining NMA with multiple model re- representation. finement for highly flexible proteins, as well as the ability to improve the resolution of the final EM reconstructions. internal component refers to the collective vibrational Chacon et al.68 showed that alternative confor- motions of the molecule (corresponding to the non- mations for several important well-studied systems zero eigenvalues), which are usually of interest for (bacterial RNA polymerase, the 30S and 50S subunits assessing functional mechanisms. The external com- of the , and chaperonin) could be detected ponent, on the contrary, reflects the static disorder within single low-resolution EM structures, consis- and rigid body motion in the crystal as well as uni- tent with major conformational changes predicted by form thermal noise and inaccuracies in the experi- normal modes. These experimental results validate ments. By separating the two components, it has been what coarse-grained NMA models taught us long demonstrated55,56,58–61 that rigid body motions make ago—large-scale motions are insensitive to detailed a significant contribution to the B-factors observed in atomic interactions. A number of excellent tools, such X-ray crystallographic structures. as NOMAD,69 NORMA,18 and NMFF,70 are now In recent years, it has been shown62 that NMA available to make it easy and straightforward to ap- can assist in molecular replacement (MR), the most ply NMA for interpreting EM density maps. popular and rapid method for solving structures by X-ray crystallography. The method is based on us- ing a structural template. The authors showed for NMA in Modeling Structures, three different examples (metallodextrin-binding pro- Ligand-bound Forms, and Complexes tein, glutamine-binding protein, and HIV-1-protease) NMA approaches can assist not only in the refine- that perturbing the template structures along one or ment of experimental structures but also for theo- two low-frequency modes yields new conformers that retical modeling. We are currently in a position to allow for successful MR, whereas the original struc- improve the quality of homology models by refining tures fail to do so. This suggests that many structures the template structures using ENM–NMA. The idea

Volume 1, May/June 2011 c 2011 John Wiley & Sons, Ltd. 435 Overview wires.wiley.com/wcms

FIGURE 6| Structural refinement of the bacteria ferric ion-binding protein by deforming a template protein along its normal modes. (a) Superposition of the target and template proteins [Protein Data Bank codes 1xvy (red) and 1qvs (blue), respectively]. The proteins share 37% sequence identity. The root-mean-square deviation (RMSD) between the structures is 3.21 Å. (b) Reduction of the RMSD by reconfigurations of the template along the first 30 normal modes accessible to the template structure, calculated using the anisotropic network model. The red dotted line shows the RMSD between the two structures prior to refinement. is to restrict the search space to a subspace spanned by that movements along the four softest modes starting a few lower frequency normal modes.71,72 When the from a given structure (‘template’) permit to attain scoring function is put aside, by taking it to be ideal, structures significantly closer to the ‘target’ structure. it has been recently shown that the normal modes When combining the first four modes, the root-mean- perform better than short MD refinement or coarse- square deviation (RMSD) may be reduced from 3.2 to grained Monte Carlo sampling for moderate difficulty 2.1 Å. The use of higher order modes has a marginal template refinement.73 Baker and coworkers74 have effect, if any. shown, e.g., that a search along dominant princi- Recently, a combined method that utilizes pal components of ensembles of structures in a given principal components and normal modes was family assists in improving the accuracy in compar- developed.76 A recent nice application of a simi- ative modeling. The idea is to determine the domi- lar methodology is the prediction of open and close nant directions of structural variance upon optimal conformations of rat heme-free oxygenase,77 starting superimposition of multiple structures (for a given from a unique heme-bound X-ray structure, and using protein or a family of proteins that share similar templates corresponding to a human and incomplete fold) and evaluate the covariances in the departures rat heme-free structures. of residue/atom coordinates from their mean values, which are organized in a covariance matrix (Eqs. 7 and 10). Eigenvalue decomposition of the latter fol- CONCLUSION lowing Eqs (7) and (10) yields the principal axes (as eigenvectors) of structural deviations. Given that the Here, we present an overview of the foundations, principal components from experimental structures basic approximations, and utility of network mod- correlate well with normal modes,8,71,75 it should els used for delineating protein dynamics. GNM and be possible to use normal modes instead of princi- ANM are physics-based models, whereas the Markov pal components for this purpose, and thus to expand model is information theoretic. The two approaches the utility of this approach. For more about PCA and provide information on different properties: motions the relation between NMA and PCA, the reader is in the former case and pathways of communication referred to previous work.8,24,71,75 in the latter. Yet, the average properties predicted by The advantage of normal modes is that they can the former such as the mean-square fluctuations in be calculated based on individual template structures interresidue distances and cross-correlations between and are not dependent on the availability of numer- residue fluctuations also define information theoretic ous family representatives that are needed for PCA. quantities such as hitting and/or commute times com- Figure 6 demonstrates the potential of this approach puted for different residue pairs (Eqs. 13 and 14). for two ferric ion-binding proteins from bacteria that In a sense, graph/network representation allows us share only 37% sequence identity. It can be seen to bridge between dynamic and allosteric signaling

436 c 2011 John Wiley & Sons, Ltd. Volume 1, May/June 2011 WIREs Computational Molecular Science Cooperative dynamics of proteins unraveled by network models properties. In either case, the native contact topology ENM analyses is that they can be performed based defines the most cooperative modes of motions, which on individual structures and do not depend on the in turn define the effective resistance to deformation, availability of numerous family representatives (that the efficient communication mechanisms, and the po- are needed for principal components analyses, for ex- tential structural changes in different forms or envi- ample). This endows these approaches with a signif- ronments, as illustrated in three different applications icant advantage in filling the gaps in the sequence– (Figures 2–6). structure space in the absence of sufficiently broad We would like to emphasize in particular the structural data for a given family. We anticipate these practical utility of NMA–ENM for gaining a deeper features of ENM–NMA to be increasingly exploited, understanding of the conformational space/ensemble especially in the characterization of supramolecular accessible to each protein. The advantage of NMA– structures and assemblies.

ACKNOWLEDGEMENTS Support from NIH grant #1R01GM086238-01 is gratefully acknowledged by I.B.

REFERENCES

1. Bahar I, Chennubhotla C, Tobi D. Intrinsic dynam- 11. Tokuriki N, Tawfik DS. Protein dynamism and evolv- ics of enzymes in the unbound state and relation ability. Science 2009, 324:203–207. to . Curr Opin Struct Biol 2007, 12. Eyal E, Bahar I. Toward a molecular understanding of 17:633–640. the anisotropic response of proteins to external forces: 2. Bahar I, Lezon TR, Bakan A, Shrivastava IH. Nor- insights from elastic network models. Biophys J 2008, mal mode analysis of biomolecular structures: intrin- 94:3424–3435. sic motions of membrane proteins. Chem Rev 2010, 13. Sacquin-Mora S, Lavery R. Modeling the mechani- 110:1463–1497. cal response of proteins to anisotropic deformation. 3. Bahar I, Atilgan AR, Erman B. Direct evaluation Chemphyschem 2009, 10:115–118. of thermal fluctuations in proteins using a single- 14. Chennubhotla C, Bahar I. Signal propagation in pro- parameter harmonic potential. Fold Des 1997, 2:173– teins and relation to equilibrium fluctuations. PLoS 181. Comput Biol 2007, 3:1716–1726. 4. Haliloglu T, Bahar I, Erman B. Gaussian dynamics of 15. Cavasotto CN, Kovacs JA, Abagyan RA. Represent- folded proteins. Phys Rev Lett 1997, 79:3090–3093. ing receptor flexibility in ligand docking through rele- 5. Flory P. Statistical thermodynamics of random net- vant normal modes. J Am Chem Soc 2005, 127:9632– works. Proc R Soc Lond A 1976, 351:351–380. 9640. 6. Bruschweiler R. Collective protein dynamics and nu- 16. Amadei A, Linssen AB, Berendsen HJ. Essential dy- clear spin relaxation. J Chem Phys 1995, 102:3396– namics of proteins. Proteins 1993, 17:412–425. 3403. 17. Hinsen K, Reuter N, Navaza J, Stokes DL, 7. Tirion MM. Large amplitude elastic motions in pro- Lacapere JJ. Normal mode-based fitting of atomic teins from a single-parameter, atomic analysis. Phys structure into electron density maps: application to Rev Lett 1996, 77:1905–1908. sarcoplasmic reticulum Ca-ATPase. Biophys J 2005, 8. Bakan A, Bahar I. The intrinsic dynamics of enzymes 88:818–827. plays a dominant role in determining the structural 18. Suhre K, Navaza J, Sanejouand YH. NORMA: a tool changes induced upon inhibitor binding. Proc Natl for flexible fitting of high-resolution protein structures Acad Sci U S A 2009, 106:14349–14354. into low-resolution electron-microscopy-derived den- 9. Tobi D, Bahar I. Structural changes involved in protein sity maps. Acta Crystallogr D Biol Crystallogr 2006, binding correlate with intrinsic motions of proteins in 62:1098–1100. the unbound state. Proc Natl Acad Sci U S A 2005, 19. Tama F, Miyashita O, Brooks CL III. Flexible multi- 102:18908–18913. scale fitting of atomic structures into low-resolution 10. James LC, Roversi P, Tawfik DS. Antibody multispeci- electron density maps with elastic network nor- ficity mediated by conformational diversity. Science mal mode analysis. J Mol Biol 2004, 337:985– 2003, 299:1362–1367. 999.

Volume 1, May/June 2011 c 2011 John Wiley & Sons, Ltd. 437 Overview wires.wiley.com/wcms

20. Brooks B, Karplus M. Harmonic dynamics of proteins: 36. Grater F, Shen J, Jiang H, Gautel M, Grubmuller H. normal modes and fluctuations in bovine pancreatic Mechanically induced titin kinase activation studied by trypsin inhibitor. Proc Natl Acad Sci U S A 1983, force-probe molecular dynamics simulations. Biophys 80:6571–6575. J 2005, 88:790–804. 21. Go N, Noguti T, Nishikawa T. Dynamics of a small 37. Lee EH, Gao M, Pinotsis N, Wilmanns M, Schulten globular protein in terms of low-frequency vibrational K. Mechanical strength of the titin Z1Z2–telethonin modes. Proc Natl Acad Sci U S A 1983, 80:3696–3700. complex. Structure 2006, 14:497–509. 22. Levitt M, Sander C, Stern PS. Protein normal-mode 38. Lu H, Schulten K. The key event in force-induced un- dynamics: trypsin inhibitor, crambin, ribonuclease and folding of titin’s immunoglobulin domains. Biophys J lysozyme. J Mol Biol 1985, 181:423–447. 2006, 79:51–65. 23. Bahar I, Rader AJ. Coarse-grained normal mode anal- 39. Lu H, Krammer A, Isralewitz B, Vogel V, Schulten ysis in structural biology. Curr Opin Struct Biol 2005, K. Computer modeling of force-induced titin domain 15:586–592. unfolding. Adv Exp Med Biol 2000, 481:143–160. 24.CuiQ.BaharIE.Normal Mode Analysis: Theory and 40. Pabon G, Amzel LM. Mechanism of titin unfolding Applications to Biological and Chemical Systems.Boca by force: insight from quasi-equilibrium molecular dy- Raton: Chapman & Hall/CRC; 2006. namics calculations. Biophys J 2006, 91:467–472. 25. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne 41. Dietz H, Rief M. Elastic bond network model for PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, protein unfolding mechanics. Phys Rev Lett 2008, et al. The Protein Data Bank. Acta Crystallogr D Biol 100:098101. Crystallogr 2002, 58:899–907. 42. Sulkowska JI, Kloczkowski A, Sen TZ, Cieplak M, 26. Atilgan AR, Durell SR, Jernigan RL, Demirel MC, Jernigan RL. Predicting the order in which contacts Keskin O, Bahar I. Anisotropy of fluctuation dynamics are broken during single molecule protein stretching of proteins with an elastic network model. Biophys J experiments. Proteins 2008, 71:45–60. 2001, 80:505–515. 43. Su JG, Li CH, Hao R, Chen WZ, Wang CX. Protein 27. Doruker P, Atilgan AR, Bahar I. Dynamics of proteins unfolding behavior studied by elastic network model. predicted by molecular dynamics simulations and an- Biophys J 2008, 94:4586–4596. alytical approaches: application to alpha-amylase in- 44. Brockwell DJ, Paci E, Zinober RC, Beddard GS, hibitor. Proteins 2000, 40:512–524. Olmsted PD, Smith DA, Perham RN, Radford SE. 28. Eyal E, Yang LW, Bahar I. Anisotropic network model: Pulling geometry defines the mechanical resistance of a systematic evaluation and a new web interface. Bioin- beta-sheet protein. Nat Struct Biol 2003, 10:731–737. formatics 2006, 22:2619–2627. 45. Carrion-Vazquez M, Li H, Lu H, Marszalek PE, Ober- 29. Bahar I, Atilgan AR, Demirel MC, Erman B. Vibra- hauser AF, Fernandez JM. The mechanical stability of tional dynamics of folded proteins: significance of slow ubiquitin is linkage dependent. Nat Struct Biol 2003, and fast motions in relation to function and stability. 10:738–743. Phys Rev Lett 1998, 80:2733–2736. 46. Dietz H, Berkemeier F, Bertz M, Rief M. Anisotropic 30. Demirel MC, Atilgan AR, Jernigan RL, Erman B, Bahar deformation response of single protein molecules. Proc I. Identification of kinetically hot residues in proteins. Natl Acad Sci U S A 2006, 103:12724–12728. Protein Sci 1998, 7:2522–2532. 47. West DK, Brockwell DJ, Olmsted PD, Radford SE, Paci 31. Haliloglu T, Keskin O, Ma B, Nussinov R. How similar E. Mechanical resistance of proteins explained using are protein folding and protein binding nuclei? Exam- simple molecular models. Biophys J 2006, 90:287–297. ination of vibrational motions of energy hot spots and 48. Dutta A, Bahar I. Metal-binding residues are distin- conserved residues. Biophys J 2005, 88:1552–1559. guished by their lower mobilities and efficient sig- 32. Rader AJ, Bahar I. Folding core predictions from net- nal propagation properties. Structure 2010, 18:1140– work models of proteins. Polymer 2004, 45:659–668. 1148. 33. Isin B, Schulten K, Tajkhorshid E, Bahar I. Mech- 49. Landau M, Mayrose I, Rosenberg Y, Glaser F, Martz E, anism of signal propagation upon retinal isomeriza- Pupko T, Ben-Tal N. ConSurf 2005: the projection of tion: insights from molecular dynamics simulations evolutionary conservation scores of residues on protein of rhodopsin restrained by normal modes. Biophys J structures. Nucleic Acids Res 2005, 33:W299–W302. 2008, 95:789–803. 50. Balabin IA, Yang W, Beratan DN. Coarse-grained 34. Oberhauser AF, Carrion-Vazquez M. Mechanical bio- modeling of allosteric regulation in protein receptors. chemistry of proteins one molecule at a time. J Biol Proc Natl Acad Sci U S A 2009, 106:14253–14258. Chem 2008, 283:6617–6621. 51. Chennubhotla C, Yang Z, Bahar I. Coupling between 35. Sulkowska JI, Cieplak M. Stretching to understand global dynamics and signal transduction pathways: a proteins – a survey of the protein data bank. Biophys mechanism of allostery for chaperonin GroEL. Mol J 2008, 94:6–13. Biosyst 2008, 4:287–292.

438 c 2011 John Wiley & Sons, Ltd. Volume 1, May/June 2011 WIREs Computational Molecular Science Cooperative dynamics of proteins unraveled by network models

52. Diamond R. On the use of normal modes in thermal 65. Ming D, Kong Y, Lambert MA, Huang Z, Ma J. How parameter refinement: theory and application to the to describe protein motion without amino acid se- bovine pancreatic trypsin inhibitor. Acta Crystallogr A quence and atomic coordinates. Proc Natl Acad Sci 1990, 46:425–435. USA2002, 99:8620–8625. 53. Kidera A, Go N. Refinement of protein dynamic struc- 66. Tama F, Wriggers W, Brooks CL III. Exploring global ture: normal mode refinement. Proc Natl Acad Sci distortions of biological and assem- USA1990, 87:3718–3722. blies from low-resolution structural information and 54. Kidera A, Inaka K, Matsushima M, Go N. Nor- elastic network theory. J Mol Biol 2002, 321:297–305. mal mode refinement: crystallographic refinement of 67. Brink J, Ludtke SJ, Kong Y, Wakil SJ, Ma J, Chiu W. protein dynamic structure. II. Application to human Experimental verification of conformational variation lysozyme. J Mol Biol 1992, 225:477–486. of human fatty acid synthase as predicted by normal 55. Kidera A, Inaka K, Matsushima M, Go N. Normal mode analysis. Structure 2004, 12:185–191. mode refinement: crystallographic refinement of pro- 68. Chacon P, Tama F, Wriggers W. Mega-dalton tein dynamic structure applied to human lysozyme. biomolecular motion captured from electron mi- Biopolymers 1992, 32:315–319. croscopy reconstructions. J Mol Biol 2003, 326:485– 56. Chen X, Poon BK, Dousis A, Wang Q, Ma J. Normal- 492. mode refinement of anisotropic thermal parameters for 69. Lindahl E, Azuara C, Koehl P, Delarue M. NOMAD- potassium channel KcsA at 3.2 Å crystallographic res- Ref: visualization, deformation and refinement of olution. Structure 2007, 15:955–962. macromolecular structures based on all-atom normal 57. Delarue M, Dumas P. On the use of low-frequency nor- mode analysis. Nucleic Acids Res 2006, 34:W52–W56. mal modes to enforce collective movements in refining 70. Tama F, Miyashita O, Brooks CL III. Normal mode macromolecular structural models. Proc Natl Acad Sci based flexible fitting of high-resolution structure into USA2004, 101:6957–6962. low-resolution experimental data from cryo-EM. J 58. Poon BK, Chen X, Lu M, Vyas NK, Quiocho FA, Wang Struct Biol 2004, 147:315–326. Q, Ma J. Normal mode refinement of anisotropic ther- 71. Leo-Macias A, Lopez-Romero P, Lupyan D, Zerbino mal parameters for a supramolecular complex at 3.42- D, Ortiz AR. Core deformations in protein families: Å crystallographic resolution. Proc Natl Acad Sci U S a physical perspective. Biophys Chem 2005, 115:125– A 2007, 104:7869–7874. 128. 59. Ni F, Poon BK, Wang Q, Ma J. Application of normal- 72. Leo-Macias A, Lopez-Romero P, Lupyan D, Zerbino mode refinement to X-ray crystal structures at the D, Ortiz AR. An analysis of core deformations in pro- lower resolution limit. Acta Crystallogr D Biol Crys- tein superfamilies. Biophys J 2005, 88:1291–1299. tallogr 2009, 65:633–643. 73. Stumpff-Kane AW, Maksimiak K, Lee MS, Feig M. 60. Chen X, Lu M, Poon BK, Wang Q, Ma J. Structural Sampling of near-native protein conformations dur- improvement of unliganded simian immunodeficiency ing protein structure refinement using a coarse-grained virus gp120 core by normal-mode-based X-ray crystal- model, normal modes, and molecular dynamics simu- lographic refinement. Acta Crystallogr D Biol Crystal- lations. Proteins 2008, 70:1345–1356. logr 2009, 65:339–347. 74. Qian B, Ortiz AR, Baker D. Improvement of compara- 61. Song G, Jernigan RL. vGNM: a better model for un- tive model accuracy by free-energy optimization along derstanding the dynamics of proteins in crystals. JMol principal components of natural structural variation. Biol 2007, 369:880–893. Proc Natl Acad Sci U S A 2004, 101:15346–15351. 62. Suhre K, Sanejouand YH. On the potential of 75. Yang LW, Eyal E, Bahar I, Kitao A. Principal com- normal-mode analysis for solving difficult molecular- ponent analysis of native ensembles of biomolecular replacement problems. Acta Crystallogr D Biol Crys- structures (PCA NEST): insights into functional dy- tallogr 2004, 60:796–799. namics. Bioinformatics 2009, 25:606–614. 63. Falke S, Tama F, Brooks CL, III, Gogol EP, Fisher 76. Han R, Leo-Macias A, Zerbino D, Bastolla U, MT. The 13 angstroms structure of a chaperonin Contreras-Moreira B, Ortiz AR. An efficient confor- GroEL–protein substrate complex by cryo-electron mi- mational sampling method for homology modeling. croscopy. J Mol Biol 2005, 348:219–230. Proteins 2008, 71:175–188. 64. Mitra K, Schaffitzel C, Shaikh T, Tama F, Jenni S, 77. Marechal JD, Perahia D. Use of normal modes for Brooks CL 3rd, Ban N, Frank J. Structure of the E. structural modeling of proteins: the case study of rat coli protein-conducting channel bound to a translating heme oxygenase 1. Eur Biophys J 2008, 37:1157– ribosome. Nature 2005, 438:318–324. 1165.

Volume 1, May/June 2011 c 2011 John Wiley & Sons, Ltd. 439