Introduction to Molecular Dynamics with GROMACS Molecular Modeling Course 2007
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Molecular Dynamics with GROMACS Molecular Modeling Course 2007 Simulation of Lysozyme in Water Erik Lindahl ([email protected]) Summary Molecular simulation is a very powerful toolbox in modern molecular modeling, and enables us to follow and understand structure and dynamics with extreme detail – literally on scales where motion of individual atoms can be tracked. !is task will focus on the two most commonly used methods, namely energy minimization and molecular dynamics that respectively optimize structure and simulate the natural motion of biological macromolecules. we are first going to set up your Gromacs environments, have a look at the structure, prepare the input files necessary for simulation, solvate the structure in wa- ter, minimize & equilibrate it, and finally perform a short production simulation. After this we’ll test some simple analysis programs that are part of Gromacs. You should write a brief report about your work and describe your findings. In particular, try to think about why you are using particular algorithms/choices, and whether there are alternatives. Molecular motions occur over a wide range of scales in both time and space, and the choice of ap- proach to study them depends on the question asked. Molecular simulation is far from the only avail- able method, and when the aim e.g. is to predict the structure of a protein it is often more efficient to use bioinformatics instead of spending thousands or millions of CPU hours. Ideally, the time- dependent Schrödinger equation should be able to predict all properties of any molecule with arbitrary precision ab initio. However, as soon as more than a handful of particles are involved it is necessary to introduce approximations. For most biomolecular systems we therefore choose to work with empirical parameterizations of models instead, for instance classical Coulomb interactions between pointlike atomic charges rather than a quantum description of the electrons. !ese models are not only orders of magnitude faster, but since they have been parameterized from experiments they also perform better when it comes to reproducing observations on microsecond scale (Fig. 1), rather than extrapolating quantum models 10 orders of magnitude. !e first molecular dynamics simulation was performed as late as 1957, although it was not until the 1970’s that it was possible to simulate water and biomole- cules. Background & Theory Macroscopic properties measured in an experiment are not direct observations, but averages over bil- lions of molecules representing a statistical mechanics ensemble. !is has deep theoretical implications that are covered in great detail in the literature, but even from a practical point of view there are impor- tant consequences: (i) It is not sufficient to work with individual structures, but systems have to be ex- panded to generate a representative ensemble of structures at the given experimental conditions, e.g. temperature and pressure. (ii) !ermodynamic equilibrium properties related to free energy, such as binding constant, solubilities, and relative stability cannot be calculated directly from individual simula- tions, but require more elaborate techniques. (iii) For equilibrium properties (in contrast to kinetic) the aim is to examine structure ensembles, and not necessarily to reproduce individual atomic trajectories! !e two most common ways to generate statistically faithful equilibrium ensembles are Monte Carlo and Molecular Dynamics simulations. Monte Carlo simulations rely on designing intelligent moves to generate new conformations, but since these are fairly difficult to invent most simulations tend to do classical dynamics with Newton’s equations of motion since this also has the advantage of accurately reproducing kinetics of non-equilibrium properties such as diffusion or folding times. When a starting configuration is very far from equilibrium, large forces can cause the simulation to crash or distort the system, and for this reason it is usually necessary to start with an Energy Minimization of the system prior to the molecular dynamics simulation. In addition, energy minimizations are commonly used to refine low-resolution experimental structures. All classical simulation methods rely on more or less empirical approximations called Force fields to calculate interactions and evaluate the potential energy of the system as a function of point-like atomic coordinates. A force field consists of both the set of equations used to calculate the potential energy and forces from particle coordinates, as well as a collection of parameters used in the equations. lipid bond length lipid normal protein diffusion "biology" vibration rotation rotation folding around bonds water transport in rapid ribosome membrane relaxation ion channel protein folding synthesis protein fodling -15 -12 -9 -6 -3 3 10 s 10 s 10 s 10 s 10 s 1s 10 s Accessible to atomic-detail simulation today Fig. 1: Time scales if chemical/biological process and current simulation capabilities For most purposes these approximations work great, but they cannot reproduce quantum effects like bond formation or breaking. All common force fields subdivide potential functions in two classes. Bonded interactions cover covalent bond-stretching, angle-bending, torsion potentials when rotating around bonds, and out-of-plane “improper torsion” potentials, all which are normally fixed throughout a simulation – see Fig. 2. !e remaining nonbonded interactions consist of Lennard-Jones repulsion and dispersion as well as electrostatics. !ese are computed from neighbor lists updated every 5-10 steps. Given the potential and force (negative gradient of potential) for all atoms, the coordinates are up- dated for the next step. For energy minimization, the steepest descent algorithm simply moves each atom a short distance in direction of decreasing energy, while molecular dynamics is performed by integrating Newton’s equations of motion: ∂V(r1,...,rN ) Fi = − ∂ri ∂ 2r m i = F i ∂t 2 i !e updated coordinates are then used to evaluate the potential energy again, as shown in the flow- chart of Fig. 3. Typical biomolecular simulations apply periodic boundary conditions to avoid surface artifacts, so that € a water molecule that exits to the right reappears on the left; if the box is sufficiently large the molecules will not interact significantly with their periodic copies. !is is intimately related to the nonbonded in- teractions, which ideally should be summed over all neighbors in the resulting infinite periodic system. Simple cut-offs can work for Lennard-Jones interactions that decay very rapidly, but for Coulomb inter- actions a sudden cut-off can lead to large errors. One alternative is to “switch off” the interaction before the cut-off as shown in Fig. 4, but a better option is to use Particle-Mesh-Ewald summation (PME) to calculate the infinite electrostatic interactions by splitting the summation into short- and long-range parts. For PME, the cut-off only determines the balance between the two parts, and the long-range part is treated by assigning charges to a grid that is solved in reciprocal space through Fourier transforms. Cut-offs and rounding errors can lead to drifts in energy, which will cause the system to heat up dur- ing the simulation. To control this, the system is normally coupled to a thermostat that scales velocities during the integration to maintain room temperature. Similarly, the total pressure in the system can be adjusted through scaling the simulation box size, either isotropically or separately in x/y/z dimensions. !e single most demanding part of simulations is the computation of nonbonded interactions, since millions of pairs have to be evaluated for each time step. Extending the time step is thus an important way to improve simulation performance, but unfortunately errors are introduced in bond vibrations already at 1 fs. However, in most simulations the bond vibrations are not of interest per se, and can be !e purpose of this tutorial is not to master all parts of Gromacs’ simulation and analysis tools in detail, but rather to give an overview and “feeling” for the typical steps used in practical simulations. Since the time available for this exercise is rather limited we will focus on a sample simulation system and perform some simplified analyses - in practice you would typically use one or several weeks for the production simulation. Fig. 2: Typical molecular mechanics interactions used in Gromacs. removed entirely by introducing bond constraint algorithms (e.g. SHAKE or LINCS). Constraints make it possible to extend time steps to 2 fs, and in addition the fixed-length bonds are actually better ap- proximations of the quantum mechanical grounds state than harmonic springs! In principle, the most basic system we could simulate would be water or even a gas like Argon, but to show some of the capabilities of the analysis tools we will work with a protein: Lysozyme. !is is a fascinating enzyme that has ability to kill bacteria (kind of the body’s own antibiotic), and is present e.g. in tears, saliva, and egg white. It was discovered by Alexander Fleming in 1922, and one of the first pro- tein X-ray structures to be determined (David Phillips, 1965). !e two sidechains Glu35 and Asp52 are crucial for the enzyme functionality. It is fairly small by protein standards (although intermediate to large by simulation standards...) with 164 residues and a molecular weight of 14.4 kdalton - including hydrogens it consists of 2890 atoms (see illustration on front page), although these are not present in the PDB structure since they only scatter X-rays weakly (few electrons). In the tutorial, we are first going to set up your Gromacs environments (might already be done for you), have a look at the structure, prepare the input files necessary for simulation, solvate the structure in water, minimize & equilibrate it, and finally perform a short production simulation. After this we’ll test some simple analysis programs that are part of Gromacs. Setting up Gromacs & other programs Before starting the actual work, we need a couple of programs that might already be installed on your system.