Computer Physics Communications 185 (2014) 2920–2929

Contents lists available at ScienceDirect

Computer Physics Communications

journal homepage: www.elsevier.com/locate/cpc

GMXPBSA 2.0: A GROMACS tool to perform MM/PBSA and computational alanine scanning✩

C. Paissoni a, D. Spiliotopoulos a,1, G. Musco a, A. Spitaleri a,b,∗ a Biomolecular NMR Unit, S. Raffaele Scientific Institute, via Olgettina 58, Milan 20132, Italy b Drug Discovery and Development, Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy article info a b s t r a t

Article history: GMXPBSA 2.0 is a user-friendly suite of Bash/Perl scripts for streamlining MM/PBSA calculations on Received 9 January 2014 structural ensembles derived from GROMACS trajectories, to automatically calculate binding free energies Received in revised form for protein–protein or ligand–protein complexes. GMXPBSA 2.0 is flexible and can easily be customized 19 May 2014 to specific needs. Additionally, it performs computational alanine scanning (CAS) to study the effects Accepted 17 June 2014 of ligand and/or receptor alanine mutations on the free energy of binding. Calculations require only Available online 2 July 2014 for protein–protein or protein–ligand MD simulations. GMXPBSA 2.0 performs different comparative analysis, including a posteriori generation of alanine mutants of the wild-type complex, calculation of Keywords: simulation the binding free energy values of the mutant complexes and comparison of the results with the wild-type Binding free energy system. Moreover, it compares the binding free energy of different complexes trajectories, allowing the Virtual screening study the effects of non-alanine mutations, post-translational modifications or unnatural amino acids on GROMACS the binding free energy of the system under investigation. Finally, it can calculate and rank relative affinity Computational alanine scanning to the same receptor utilizing MD simulations of proteins in complex with different ligands. In order to MM/PBSA dissect the different MM/PBSA energy contributions, including molecular mechanic (MM), electrostatic contribution to solvation (PB) and nonpolar contribution to solvation (SA), the tool combines two freely available programs: the MD simulations GROMACS and the Poisson–Boltzmann equation solver APBS. All the calculations can be performed in single or distributed automatic fashion on a cluster facility in order to increase the calculation by dividing frames across the available processors. The program is freely available under the GPL license.

Program summary

Program title: GMXPBSA 2.0 Catalogue identifier: AETQ_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AETQ_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 No. of lines in distributed program, including test data, etc.: 185 937 No. of bytes in distributed program, including test data, etc.: 7 074 217 Distribution format: tar.gz Programming language: Bash, Perl. Computer: Any computer. : Linux, Unix OS. RAM: ∼2 GB

✩ This paper and its associated computer program are available via the Computer Physics Communication homepage on ScienceDirect (http://www.sciencedirect.com/ science/journal/00104655). ∗ Corresponding author at: Drug Discovery and Development, Istituto Italiano di Tecnologia, Via Morego 30, Genoa 16163, Italy. Tel.: +39 3485188790. E-mail address: [email protected] (A. Spitaleri). 1 Present address: Computational Structural Biology Biochemisches Institut Universität Zürich, Winterthurerstrasse 190, CH-8057 Zürich, Switzerland. http://dx.doi.org/10.1016/j.cpc.2014.06.019 0010-4655/© 2014 Elsevier B.V. All rights reserved. C. Paissoni et al. / Computer Physics Communications 185 (2014) 2920–2929 2921

Classification: 3. External routines: APBS (http://www.Poissonboltzmann.org/apbs/) and GROMACS installations (http:// www.gromacs.org). Optionally LaTeX. Nature of problem: Calculates the Molecular Mechanics (MM) data (Lennard-Jones and Coulomb terms) and the solvation energy terms (polar and nonpolar terms respectively) from an ensemble of structures derived from GROMACS molecular dynamics simulation trajectory. These calculations are performed for each single component of the simulated complex, including protein and ligand. In order to cancel out artefacts an identical grid setup for each component, including complex, protein and ligand, is required. Performs statistical analysis on the extracted data and comparison with wild-type complex in case of either computational alanine scanning or calculations on a set of simulations. Evaluates possible outliers in the frames extracted from the simulations during the binding free energy calculations. Solution method: The tool combines the freely available programs GROMACS and APBS to: 1. extract frames from a single or multiple complex molecular dynamics (MD) simulation, allowing comparison between multiple trajectories; 2. split the complex frames in the single components including complex, protein and ligand; 3. calculate the Lennard-Jones and Coulomb energy values (MM terms); 4. calculate the polar solvation energy values using the implicit solvation Poisson–Boltzmann model (PB); 5. calculate the nonpolar solvation energy value based on the solvent accessible surface area (SASA); 6. combine all the calculated terms into the final binding free energy value; 7. repeat the same procedure from point 1 to 6 for each simulation in case of computational alanine scanning (CAS) or simultaneous comparison of different MDs. Restrictions: Input format files compatible with GROMACS engine 4.5 and later versions. Availability of the or of the topology files. Running time: On a single core, Lennard-Jones, Coulomb and nonpolar solvation terms calculations require a few min- utes. The time required for polar solvation terms calculations depends on the system size. © 2014 Elsevier B.V. All rights reserved.

1. Introduction where Eint indicates bond, angle, and torsional angle energies, and Ecoul and ELJ denote the intramolecular electrostatic and Lennard- MM/PBSA is a versatile method to calculate the binding free en- Jones energies, respectively. ergies of a protein–ligand complex [1]. It incorporates the effects The solvation term Gsolv in Eq. (4) is split into polar Gpolar and of thermal averaging with a force field/continuum solvent model nonpolar contributions, Gnonpolar: to post-process a series of representative snapshots from MD trajectories. MM/PBSA has been successfully applied to compute Gsolv = Gpolar + Gnonpolar (4) the binding free energy of numerous protein–ligand interactions GMXPBSA 2.0 calculates G and G with Adaptive Pois- [2–5]. The method expresses the free energy of binding as the dif- polar nonpolar son–Boltzmann Solver (APBS) program [7]. ference between the free energy of the complex and the free energy The polar contribution G refers to the energy required to of the receptor plus the ligand (end-state method). This difference polar transfer the solute from a continuum medium with a low dielectric is averaged over a number of trajectory snapshots [6]. Of note, the constant (ε = 1) to a continuum medium with the dielectric con- MM/PBSA approach allows for a rapid estimation of the variation stant of water (ε = 80). G is calculated using the non-linearized in the free energy of binding, with the caveat that generally it does polar or linearized Poisson–Boltzmann equation. The nonpolar contribu- not reproduce the absolute binding free energy values. Neverthe- tion G is considered proportional to the solvent accessible less, it usually exhibits good correlations with experiments, thus nonpolar surface area (SASA): representing a fair compromise between efficiency and efficacy for the calculation and comparison of binding free energy variations. Gnonpolar = γ SASA + β (5) The theory underlying MM/PBSA approach has been described pre- − −2 − viously [6]. Briefly, the binding free energy of a protein molecule to where γ = 0.0227 kJ mol 1Å and β = 0 kJ mol 1 [8]. The di- a ligand molecule in solution is defined as: electric boundary is defined using a probe of radius 1.4 Å. Binding free energy calculations based on the MM/PBSA ap- ∆Gbinding = Gcomplex − (Gprotein + Gligand). (1) proach can be performed either according to the three trajecto- A MD simulation is performed to generate a thermodynamically ries method (TTM) or to the single trajectory method (STM). The weighted ensemble of structures. The free energy term is calcu- TTM requires three separate MD simulations on the three system lated as an average over the considered structures: components including the complex, the free ligand and the free re- ceptor. This is a computationally demanding approach and prone ⟨G⟩ = ⟨EMM⟩ + ⟨Gsolv⟩ − T ⟨SMM⟩. (2) to structural noise [3,5]. Conversely, the STM requires a single tra- The energetic term E is defined as: MM jectory run for the complex, whereby both the protein and ligand EMM = Eint + Ecoul + ELJ (3) structures are extracted directly from the complex structure [3], 2922 C. Paissoni et al. / Computer Physics Communications 185 (2014) 2920–2929

Fig. 1. Workflow diagram for GMXPBSA 2.0. Diagram describing the general GMXPBSA 2.0 workflow scheme. GMXPBSA 2.0 combines the GROMACS and APBS programs in order to use the frames extracted from the molecular dynamics simulations and to calculate the binding free energy.

thus zeroing out the Eint term. In this case, the protein and the lig- 3. CAS calculations on a single residues or on a set of residues and are assumed to behave similarly in the bound and in the free simultaneously; forms. Recently, a useful python program (MMPBSA.py) developed 4. handling of multiple protein–ligands MD simulations to allow to perform MM/PBSA calculations on AMBER MD simulations suite comparisons between different ligands; has been presented [9]. In this context, similar tools tailored to per- 5. handling of multiple protein–ligands MD simulations to allow form post-processing end-state method to calculate free energies comparisons (e.g. between wild-type complex and non-alanine using APBS on MD trajectories would be extremely welcome by the mutants); GROMACS users’ community. However, despite the popularity of 6. handling of APBS calculations on a multi core system (dis- both GROMACS [10] and APBS [7], until now there was not freely tributed calculations in cluster). available tool to automatically combine the two programs in or- 7. possibility to use custom van der Waals radii; der to use directly the GROMACS output as input for APBS binding 8. check and restart of the failed MM/PBSA calculations; free energy calculations. To facilitate the interface between the two 9. statistical analysis of the results. programs, we previously wrote a series of Bash/Perl scripts to di- rectly perform MM/PBSA calculations on structures generated by 2. Program usage MD simulations [11]. Herein, we present an updated and revised version of the tool, 2.1. GMXPBSA 2.0 calculation workflow GMXPBSA 2.0 (Fig. 1). One of the major upgrade is the automation of computational alanine scanning (CAS) calculations, that can be GMXPBSA 2.0 is a user-friendly suite of Bash/Perl scripts that performed a posteriori directly on the wild-type trajectory. CAS can efficiently streamlines the set up procedure and the calculation of be performed by adopting two different approaches, depending binding free energies for an ensemble of complex structures gener- on the objectives. In the first approach, a single mutation is ated by GROMACS MD engine. The program workflow, (Figs. 1 and performed in order to qualitatively evaluate the role/contribution 2) consists of three different sequential steps comprising: of a single residue to the binding, in the second, a series of alanine 1. gmxpbsa0.sh: mutations is simultaneously performed in order to investigate In this step, the tool exploits the gmxpbsa0.sh script to set up the the contribution to binding of specific regions, such as binding system and to perform preliminary calculations including: pockets, protein–protein or protein–peptide interaction interfaces. In both cases, a selected amino acid or a set of amino acids is • check of the required input files and directories; mutated into alanine, thus allowing a per-residue decomposition • extraction of the frames of the complex from the MD simula- of the interactions. Under the assumption that the mutation will tions, subsequently split in the protein and the ligand compo- have negligible effects on the protein conformation, CAS can nents by the GROMACS tools; qualitatively highlight the importance of the electrostatic and • calculation of the Coulomb energy contributions using either steric nature of the original side chain. Furthermore, GMXPBSA 2.0 GROMACS tools or the ‘‘coulomb’’ program available in the APBS can simultaneously calculate the binding free energy for a set of suite, and Lennard-Jones term using GROMACS. protein–ligand trajectories and then compare the relative binding free energies. This feature is particularly useful when comparing If the computational alanine scanning (CAS) calculation is required, the binding free energy values of a set of ligands versus the same the script performs alanine mutations on the defined residues receptor, or when analysing the effects of receptor non-alanine on every single extracted frames removing the side chains atoms mutants. In the latter case, the user needs to perform a priori of the target residues up to the beta C atom (CB atom) and different simulations for each mutated protein in complex with then recalculating the Coulomb and the Lennard-Jones energy the ligand to generate the corresponding trajectories. We have contributions of the structure containing the alanine mutant. It also introduced in GMXPBSA 2.0 the following improvements with generates the grid and the input to perform the APBS calculations respect to the previous version [11]: for each frame of the simulation. The latter task is critical, since deletion of artefacts in the MM/PBSA calculation requires an exact 1. control of the input and output options; matching of the grid setup between all the system components 2. automatic setup and a posteriori CAS calculations; (complex, protein and ligand). C. Paissoni et al. / Computer Physics Communications 185 (2014) 2920–2929 2923

Fig. 2. Schematic diagram of the three GMXPBSA 2.0 calculation steps. Diagram showing the input files used by GMXPBSA 2.0 and the output files generated during each MM/PBSA step.

2. gmxpbsa1.sh: a MD simulation using GROMACS engine 4.5 or later versions. In this step, the gmxpbsa1.sh script computes the solvation polar Before starting any GMXPBSA 2.0 calculations, the user should and nonpolar energy contributions using APBS program. These verify the convergence of MD simulations, as lack of convergence calculations can be distributed on a cluster or on a multi core might strongly compromise the reliability of the MM/PBSA results, workstation. as pointed out in [11]. Along with simulations data, the user should 3. gmxpbsa2.sh: edit the INPUT.dat file, defining all the options on the binding free In this last step, the gmxpbsa2.sh script combines for all the energy calculations (see Section 2.4). frames the single terms, ⟨EMM⟩ and ⟨Gsolv⟩ respectively, in order For each system under investigation MM/PBSA calculations to calculate the final binding free energy value. It also checks require the following input files: and tries to fix errors and/or failures occurring in the preceding 1. the trajectory file, with the mandatory name npt.xtc, describ- step 2 (APBS calculations). Statistical analysis is also performed ing the dynamic of the complex. We encourage the user to strip off computing average√ and standard error (SE). The SE is calculated the water from the trajectory to speed up calculations. The possible as follows: SE = σ / N, where σ is the standard deviation and N artefacts deriving from periodic boundary condition (pbc) should is the number of structures (MD frames) used in the calculation. be removed from the trajectory, using the trjconv GROMACS tool The average Coulomb and Lennard-Jones values, the polar and (-pbc whole or -pbc nojump or -pbc res is usually sufficient). The lat- nonpolar solvation terms are calculated along each trajectory. If a ter step is fundamental before carrying out the MM/PBSA calcula- value differs from the average more than two standard deviations it tions in order to remove the presence of possible broken molecules. is considered as an outlier and the corresponding frame is excluded The processed trajectory can be checked using a molecular visual- from the final calculation. However, it is always possible to check izer before performing GMXPBSA calculations. for outlier frames, since their reference-numbers are stored in the 2. the portable binary run input file, with the mandatory name WARNING.dat file. npt.tpr. This file contains the information on mass, charges and force field parameters used in the MD simulations. 2.2. Installation and execution of the program 3. the index file, with mandatory name index.ndx. This file contains the groups used in the simulations. Three groups are Once the source code of the program GMXPBSAtool.tar.gz has compulsory in order to run GMXPBSA 2.0: the complex, containing been downloaded the user should perform the following steps: the atoms index of the complex (union of the receptor and ligand 1. extract the source code in a user defined location, e.g. atoms), the receptor, containing the atoms index of the receptor, /home/myprogram/, by typing tar zxvf GMXPBSAtool.tar.gz; set and the ligand, containing the atoms index of the ligand. The three the GMXPBSAHOME environment variable in bash: export GMXPB- group index names can be chosen by the user. SAHOME=/home/myprogram/GMXPBSAtool; change the /home/ The three files, npt.xtc, npt.tpr, and index.ndx, are placed in a myprogram to whatever directory is appropriate for your machine; directory, whose name will be referred to as root in the INPUT.dat verify write permissions in the directory tree, and execute per- file. Additional files should be present in the root directory in missions for the gmxpbsa0.sh, gmxpbsa1.sh and gmxpbsa2.sh scripts. case the MD simulation has been carried out using either a $ GMXPBSAHOME should be also added to the PATH. custom GROMACS force field (e.g. including modified amino acid) 2. In order to perform MM/PBSA calculations, the user has to or custom topologies (i.e. ligand). See Section 2.4.2 for further run the tool by typing $GMXPBSAHOME/