Study of a Highly Accurate and Fast Protein-Ligand Docking Algorithm Based on Molecular Dynamics
Total Page:16
File Type:pdf, Size:1020Kb
Study of a Highly Accurate and Fast Protein-Ligand Docking Algorithm Based on Molecular Dynamics M. Taufer1,2,3, M. Crowley2, D. Price2, A.A. Chien 1, C.L. Brooks III2,3 1 Dept. of Computer Science and Engineering 2 Dept. of Molecular Biology (TPC6) University of California at San Diego The Scripps Research Institute La Jolla, California 92093, U.S. La Jolla, California 92037, U.S. [email protected] taufer,crowley,priced,[email protected] 3 Center for Theoretical Biological Physics La Jolla, California 92093, U.S. Abstract plexes are time and resource intensive. As a result, much research effort has been focused on computational meth- Few methods use molecular dynamics simulations based ods for the prediction of this difficult-to-obtain structural on atomically detailed force fields to study the protein- information. In general, this process is called docking. ligand docking process because they are considered too Current docking algorithms typically use a fast, simpli- time demanding despite their accuracy. In this paper fied scoring function to direct the conformational search we present a docking algorithm based on molecular dy- and select the best structures. However, recent work has namics simulations which has a highly flexible compu- demonstrated that there are significant inaccuracies as- tational granularity. We compare the accuracy and the sociated with these algorithms [1]. Furthermore, there time required with well-known, commonly used dock- are indications that inaccuracies can be reduced by us- ing methods like AutoDock, DOCK, FlexX, ICM, and ing algorithms that use more sophisticated physics. For GOLD. We show that our algorithm is accurate, fast example, CDOCKER, a docking algorithm based on and, because of its flexibility, applicable even to loosely molecular dynamics (MD) and a conventional molecular coupled distributed systems like desktop grids for dock- mechanics force field, indeed provides better accuracy ing. than other methods. However, it is still among the more Keywords: Force field based methods, docking accu- compute-intensive methods. Our aim is to adapt the racy, desktop grid computing. CDOCKER method to improve its performance with- out sacrificing the accuracy. In particular, we would like to take advantage of advances in computer technologies 1 Introduction and the establishment of new distributed architectures, A vast number of the essential roles that proteins play such as desktop grids. require small molecules to bind to specific spots in the Desktop grids provide a viable and inexpensive solu- protein structure. For instance, the small molecules can tion to the hitherto uncompetitive computational cost act as switches to turn on or off a protein function, or and time for the force field methods. New algorithms are the substrates for the particular chemical reaction with finer computational granularity need to be devel- that a protein enzyme catalyzes. Obtaining the atomic oped, especially algorithms more suitable for the highly level details of the protein-ligand interactions is a valu- volatile nodes of desktop grids. In this paper we present able tool in the development of novel pharmaceuticals. an algorithm for the docking process based on MD simu- Conventional experimental techniques for obtaining de- lations as in [2], but characterized by a higher flexibility tailed structural information about protein-ligand com- that makes it adaptable to any computing platform, even 1 to very challenging desktop grids. in which the Newtonian equations of motion are dis- Docking many ligands to the same protein followed by cretized and solved numerically by an integration pro- scoring them for their relative strength of interaction has cedure such as the Verlet algorithm. The force on the been proposed as a procedure to identify candidates for atoms is the negative gradient of the CHARMM poten- drug development. Screening large databases of com- tial energy function [11]. pounds in this manner can potentially provide an alter- native to conventional high-throughput screening, but it 2.2 Modeling Protein-Ligand Interactions is not cost-effective unless the docking algorithm is fast and accurate. Advances in energy calculation techniques make it vi- Most of the well-known, commonly used docking meth- able to use a grid-based representation of the protein- ods that are not based on MD were compared and an- ligand potential interactions to calculate our scoring alyzed in detail in [3]. We validate the accuracy of function. A grid potential allows us to represent a rigid our algorithm by applying the tests defined in [3] to protein binding site as a potential field magnitude at our method, and compare to the published results for each grid point. Protein interactions with the ligand the following other methods: AutoDock [4], DOCK [5], in the binding site are interpolated from the interaction FlexX [6], ICM [7], and GOLD [8]. We show that our strength of the grid points near each atom of the ligand, algorithm is indeed more accurate than all other meth- rather than from computing the interactions of all ligand ods except ICM. We reach a docking success rate of atoms with all protein atoms individually, resulting in over 70%, confirming the accuracy reported in [2]. The orders of magnitude fewer floating-point computations time required for running a complete docking attempt is than in the traditional molecular mechanics method. longer but comparable with the time of the other meth- In a preliminary phase of the docking simulations, we ods. The fine computational granularity of our algorithm calculate three dimensional grid maps for each of the 20 is trivially parallel and each simulation attempt is de- potential atom types composing the ligands under inves- composable into independent sub-jobs. This flexibility tigation. Each grid map consists of a three dimensional makes our accurate docking simulations fast when there lattice of regularly spaced points surrounding and cen- are many independent compute nodes, and thus, appli- tered on the active site of a protein. Each point within the cable to a wide range of platforms from traditional su- grid map stores the potential energy of a ’probe’ atom percomputers to loosely coupled distributed systems like due to its interaction with the macromolecule. For ex- desktop grids. ample, in a carbon grid map, the value associated with In Section 2 we present our docking protocol as well a grid point represents the potential energy of a carbon as some well-known and commonly used docking algo- atom at that location due to its interactions with all atoms rithms that we will compare with our method. In Sec- of the protein receptor. We have chosen a grid spacing tion 3 we define the metrics of accuracy and time that of 1Å based on previous work that showed no signifi- we use to validate our method while in Section 4 we cant differences in docking accuracy for grid spacings compare our method with the other docking algorithms between 0.25 Å and 1 Å [2]. based on those metrics. Finally, in Section 5, we dis- To facilitate the penetration of small ligands into the pro- cuss the applicability of our docking protocol to desktop tein sites and allow larger configurational changes, van grids, and future work coming out of this research. der Waals (vdW) and electrostatic potentials with soft core repulsions [12] were utilized instead of the tradi- 2 The MD-based Docking Algorithm tional potentials. A soft core repulsion reduces the po- tential barrier at vanishing interatomic distances to a fi- 2.1 The CHARMM Scientific Computation nite limit. In this case, ligands can pass between confor- Code mational minima with a relatively small potential barrier that would normally be infinite and impassible with an We use CHARMM to perform MD simulations unmodified potential. and investigate the protein-ligand docking process. CHARMM is a program for simulating biologically rel- 2.3 MD Docking Protocol evant macromolecules (proteins, DNA, RNA) and com- plexes thereof [9]. It allows the investigation of the As in most of the existing methods, we model the structure and dynamics of large molecules in solvent or protein-ligand complex as composed of a rigid protein crystals. CHARMM can be used to calculate free energy structure and a flexible ligand. A flexible ligand has differences upon mutations or ligand binding [10]. One three translational degrees of freedom, three rotational of the most common applications of CHARMM is MD, degrees of freedom and one dihedral rotation for each 2 rotable bond. The docking search is computed over a Model Protein Ligand Interactions 6+n dimensional space where n is the number of rotable bonds in the ligand. Figure 1 shows the MD-based algo- Alter Ligand Configuration and Orientations rithm used for our docking simulations. One loop con- stitutes a docking trial. Given a protein and a ligand to dock into its binding site (a so-called protein-ligand Dock Ligand into Active Protein Site complex), a docking attempt consists of a sequence of m independent trials. For each trial, a random configura- MD Simulation tion for the ligand is generated by running 1000 steps of MD at the constant temperature of 1000K in vac- Docking Trial Energy Minimization uum, starting from a reasonable structure with random initial velocities on each ligand atom. We have ana- lyzed the distribution of torsional angles generated by Calucate Score this method, and found that they indeed vary randomly over the physically reasonable range for each rotable Evaluate Candidate Solutions bond. We are confident that our initial configurations randomly sample the available configurational space of Figure 1. Our MD-based protein-ligand docking the ligand. algorithm. Starting from the new ligand configuration, a set of 10 different orientations are randomly generated and docked into the receptor, that is, moved into the cen- terms, some of which reflect the short-range vdW in- ter of the grid.