Dynamics of Proteins Predicted by Molecular Dynamics Simulations and Analytical Approaches: Application to ␣-Amylase Inhibitor
Total Page:16
File Type:pdf, Size:1020Kb
PROTEINS: Structure, Function, and Genetics 40:512–524 (2000) Dynamics of Proteins Predicted by Molecular Dynamics Simulations and Analytical Approaches: Application to ␣-Amylase Inhibitor Pemra Doruker, Ali Rana Atilgan, and Ivet Bahar* Polymer Research Center, School of Engineering, Bogazici University, and TUBITAK Advanced Polymeric Materials Research Center, Bebek, Istanbul, Turkey ABSTRACT The dynamics of ␣-amylase inhibi- INTRODUCTION tors has been investigated using molecular dynam- ics (MD) simulations and two analytical approaches, The biological function of proteins is generally controlled the Gaussian network model (GNM) and anisotropic by cooperative motions or correlated fluctuations involving 1–5 network model (ANM). MD simulations use a full large portions of the structure. There exists a multitude atomic approach with empirical force fields, while of computational methods in the literature for identifying 1,3,6–17 the analytical approaches are based on a coarse- the dominant correlated motions in macromolecules. grained single-site-per-residue model with a single- The basic approach common in these studies is to decom- parameter harmonic potential between sufficiently pose the dynamics into a collection of modes of motion, close (r < 7 Å) residue pairs. The major difference remove the “uninteresting” (fast) ones, and focus on a few between the GNM and the ANM is that no direc- low frequency/large amplitude modes which are expected tional preferences can be obtained in the GNM, all to be relevant to function. residue fluctuations being theoretically isotropic, A classical approach for determining the modes of while ANM does incorporate directional prefer- motion is the normal mode analysis (NMA). NMA has ences. The dominant modes of motions are identi- found widespread use in exploring proteins’ dynamics fied by (i) the singular value decomposition (SVD) of starting from the original work of Go and collaborators, the MD trajectory matrices, and (ii) the similarity two decades ago. It is based on the diagonalization of the 6,18,19 transformation of the Kirchhoff matrices of inter- second-derivative potential energy matrix, assuming residue contacts in the GNM or ANM. The mean- that harmonic potentials operate near equilibrium coordi- square fluctuations of individual residues and the nates. The method has later been extended into a quasi- cross-correlations between domain movements re- harmonic oscillator approximation which utilizes, as in- tain the same characteristics, in all approaches— put, the fluctuations (auto- and cross-correlations) observed although the dispersion of modes and detailed ampli- in molecular dynamics (MD) simulations, thus including 7,20,21 tudes of motion obtained in the ANM conform more the effects of anharmonicity. The process of extract- closely with MD results. The major weakness of the ing the dominant collective modes, or the essential dynam- 14,22 analytical approaches appears, on the other hand, ics from fluctuations seen in MD trajectories—also to be their inadequacy to account for the anhar- called principal component analysis (PCA),23 or the molecu- monic motions or multimeric transitions driven by lar optimal dynamics coordinates analysis2—is now an the slowest collective mode observed in MD. Such established computational means of studying proteins’ motions usually suffer, however, from MD sampling dynamics.5 The major shortcoming of this approach is the inefficiencies, and multiple independent runs should sampling inefficiency of MD simulations. The sampling be tested before making conclusions about their problem becomes increasingly important as the size of the validity and detailed mechanisms. Overall this study investigated molecular system increases, as shown by invites attention to (i) the robustness of the average projecting the MD trajectory onto the first few PCs.24, 25 properties (mean-square fluctuations, cross-correla- The Gaussian network model (GNM) approach was tions) controlled by the low frequency motions, recently proposed as a simple but extremely efficient which are invariably reproduced in all approaches, analytical tool for modeling the dynamics of folded pro- and (ii) the utility and efficiency of the ANM, the teins.26 The structure is viewed as a completely elastic computational time cost of which is of the order of network, the nodes of which are the ␣-carbons, and the “minutes” (real time), as opposed to “days” for MD springs/chains are the bonded or nonbonded interactions simulations. Proteins 2000;40:512–524. between sufficiently close (r Յ 7.0 Å) residue pairs. The ϭ © 2000 Wiley-Liss, Inc. cutoff distance rc 7.0 Å includes all neighbors within a Key words: correlated fluctuations; molecular dy- *Correspondence to: I. Bahar, Polymer Research Center, School of namics simulations; -proteins; Gauss- Engineering, Bogazici University, Bebek 80815, Istanbul, Turkey. ian network model; anisotropic effects; E-mail: [email protected] principal component analysis Received 20 January 2000; Accepted 13 April 2000 © 2000 WILEY-LISS, INC. COOPERATIVE MOTIONS IN PROTEINS 513 first coordination shell.27,28 Following Tirion’s original cases. The question is then how similar are the dominant, proposition,29 a single parameter (␥) harmonic potential is low frequency modes revealed by these two totally differ- assigned to all pairs of residues, the absolute value of ent approaches. The subspace spanned by a small number which is to be adjusted from comparison with experiments. of collective coordinates is pointed out to be invariant in Despite its simplicity, the GNM reproduces very closely many cases, and almost independent of the treatment of the X-ray crystallographic Debye-Waller factors,26 the the degrees of freedom, the solvent effect, the simulation H/D exchange free energy costs,30 and the NMR relaxation length and initial conditions.5 On the other hand, some order parameters,31 of individual residues in a diversity of proteins are observed to visit two different substates along proteins. Its major utility lies in its application to large the first few principal axes.2,5,38 GNM does not include systems. Quaternary structures that would necessitate such bistable equilibria. It is of interest to establish the enormous amounts of computation time if examined by differences between GNM modes and those unraveled by MD can be analyzed within seconds (CPU time) with the PCA of MD trajectories by analyzing the same protein at GNM. The mechanisms of concerted motions in complexes the same level of resolution (single-site-per-residue). An- such as HIV-1 reverse transcriptase with bound DNA,32 other issue of interest is to compare the density of modes tRNA with its cognate synthetase,33 or enzymes compris- (distribution of frequencies) in the two methods. The ing multiple subunits, such as Trp synthase having over present analysis will shed light into the differences of the 1,000 residues,34 were recently elucidated, consistent with two approaches, indicate which dynamic features are experimental data. Major criticism of the GNM is its overlooked and which are adequately described by the inadequacy to describe anharmonic fluctuations. This is a simple GNM approach. deficiency common to GNM and NMA. MATERIALS, MODELS, AND METHODS In the present paper, we investigate the dynamics of a Structures -protein, ␣-amylase inhibitor tendamistat, using both MD simulations, and GNM analytical solution. MD trajec- Tendamistat (Hoe 467A) is an ␣-amylase inhibitor of 74 tories will be analyzed by singular value decomposition residues. Residues 11–73 are folded into a -structure of (SVD) technique,35 a tool that proved useful for studying six strands I–VI (Fig. 1a). The topology of the protein protein dynamics.17 We will extract here the nonlinear resembles that of immunoglobulins’ constant domains. modes along the singular vectors—counterparts of the However, tendamistat forms a six-stranded –barrel as principal axes in PCA—and compare these to GNM modes. opposed to the seven strands of the immunoglobulins’ Finally, a new analytical method will be introduced, constant domains. Tendamistat is indeed representative of referred to as the anisotropic network model (ANM), which a distinct fold, called ␣-amylase inhibitor fold, being the will permit inclusion of the effect of anisotropy in the only member of the family and superfamily of ␣-amylase network model of proteins, as this effect has proved to inhibitors whose structure has been elucidated to date. produce significant enhancements in the positional fluctua- The amino acids W18, R19, and Y20 in the loop connecting tions, especially for larger amplitudes associated with strands I and II are pointed out to be possibly involved in lower frequencies.36 binding to target mammalian ␣-amylases.39 The X-ray The type of information that will be provided by the crystallographic structure determined at 2.0 Å resolution present study is twofold. First, information on the dynam- is used in simulations.39 This structure is closely superim- ics of a protein representative of a distinct fold according to posable on the solution structure determined by NMR,40,41 SCOP37 will be found. The so-called ␣-amylase inhibitor as shown in Figure 1b. The root-mean-square (rms) devia- fold is a sandwich of two -sheets composed each of three tion between the ␣-carbon coordinates of the X-ray and strands. Both X-ray and NMR studies of the structure of NMR structures is 2.29 Å, and reduces to 1.02 Å if the tendamistat are available, but no theoretical study of the highly flexible N-terminus (residues 1–10) is neglected. dynamics of this family has been performed to date. An assessment of the regions involved in concerted motions Simulations and residues modulating the cooperative fluctuations will The MD simulations are performed using the GROMOS be made here using MD simulations, and the exact solu- package with the set of energy parameters 37D4 in tions from the GNM and ANM theories. The analysis will vacuum.42 Bond lengths are constrained to their equilib- also serve to suggest the sites prone to disruptive muta- rium values using the SHAKE algorithm.43 This allows to tions.