<<

Understanding Chemistry and Biochemistry with Conceptual Models

EVB Tutorial

Fernanda Duartea, Miha Purgb

a)EaStCHEM School of Chemistry, University of Edinburgh, Edinburgh EH9 3FJ, UK * [email protected] · : http://fduartegroup.org b)Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden *[email protected]· : https://kamerlinlab.com

1. OVERVIEW 1 1.1 REQUIREMENTS & 2 1.2. USEFUL REFERENCES 2 2. EVB THEORY 3 2.1 CHEMICAL REACTIONS WITH EVB 4 2.2 OBTAINING FREE ENERGIES 5 3. CASE STUDIES 6 3.1 SN2 REACTION 6 3.2. DFPASE 14 4. ADDENDUM 22 4.1 SOFTWARE INSTALLATION 22

1. Overview

The aim of this tutorial is to provide you with a general overview about how to perform Free Energy Perturbation/ Umbrella Sampling (FEP/US) Empirical Valence Bond (EVB) simulations of chemical reactions utilizing the software Q [5,7]. Due to the complexity of the task and time constraints, this tutorial will just cover the crucial aspects of the general procedure, including topology generation, FEP file generation, FEP/US simulations and group contribution calculations. Other aspects such as parameterization, structure processing, substrate docking, setting protonation states, equilibrations, etc. will not be covered and the reader is encouraged to master them if the aim is to use the approach in a scientific project. Below, we provide some reference that will help the reader to gain a deeper understanding of complete protocol. By the end of this tutorial, you should be able to understand the fundamental concepts behind EVB calculations, the structure of Q program and how to set up and analyze FEP/US simulations yourself.

1 Understanding Chemistry and Biochemistry with Conceptual Models

1.1 Requirements & Software

A functional understanding of the / environment and specifically the command line, is prerequisite for successfully completing this tutorial, therefore we strongly recommend that those new to Unix/Linux acquaint themselves with the basics beforehand (example tutorial: http://www.ee.surrey.ac.uk/Teaching/Unix/unix1.html). Instructions regarding accessing the remote computer cluster will be provided separately. All FEP/US EVB simulations will be performed using the software package Q. this software allows the calculation of properties such as solvation free energies, binding affinities, and reaction free energies, which is the topic of this tutorial, Q is developed and maintained by the Kamerlin and Åqvist groups at Uppsala University, it is an open source software (GPLv2) available free of charge from the online repository GitHub. The Q manual will be provided with the tutorial material. Please see Addendum for references and installation instructions. To complement the bare-bones nature of Q and make our lives a bit easier, we will also make use of the python scripts provided in the software package Qtools. These scripts will simplify the tasks required to successfully carry out this tutorial: parameter conversion, input generation, simulation analysis, etc. Again, please see Addendum for installation instructions. The final, but crucial software requirement is VMD (Visual ), allowing us to visualize and analyze our simulations. VMD can be obtained free of charge (registration required) here: http://www.ks.uiuc.edu/Research/vmd/. It will also be installed on the node The instructors will provide you with access to a computer cluster where you will submit your calculations, as well as provide you with the required material for this tutorial. Commands throughout the tutorial will be preceded by a dollar sign ($), and comments with a hash (#). Make sure you are able to connect to the cluster, and then download the provided material to your remote account. e.g.:

$ ssh [email protected] # Password: c5bS2A7wNH

$ cp -r qmaterial YOURNAME # replace YOURNAME with your name

# load the required software $ source /home/evb/bin/q_evb/load.sh

1.2. Useful References

[1] A. Warshel, R.M. Weiss, J. Am. Chem. Soc., 1980, 102, 6218. [2] J. Åqvist, and A. Warshel Chem. Rev. 1993, 93, 2523. [3] Hong, G., Rosta, E. and Warshel, A. J. Phys. Chem B, 2006 110, 19570.

2 Understanding Chemistry and Biochemistry with Conceptual Models

[4] Warshel, A. (1991). Computer Modeling of Chemical Reactions in Enzymes and Solutions (New York: Wiley). [5] . Marelius, K. Kolmodin, I. Feierberg, J. Åqvist, J. Mol. Graph. Mod. 1998, 16, 213. [6] P. Bauer, A. Barrozo, B. Amrein, M. Esguerra, P. B. Wilson, D. T. Major, J. Åqvist and S. C. L. Kamerlin, SoftwareX 2018 https://github.com/qusers/Q6 [7] F . Duarte, B. A. Amrein, D. Blaha-Nelson and S. C. L. Kamerlin, BBA - General Subjects 2015, 1850, 954. [8] F. Duarte, S. C. L. Kamerlin (Editors). From Physical Chemistry to Chemical Biology: Theory and Applications of the Empirical Valence Bond Approach. John Wiley & Sons. 2017. [9] Purg, M., Kamerlin, S.C.L., In review in Methods in Enzymology (Vol 607: Phosphatases) [10] Purg, M., Elias, M. and Kamerlin, S.C.L. J. Am. Chem. Soc. 2017, 139, 17533. [11] Muegge, I., Tao, H. and Warshel, A.,. Protein engineering, 1997, 10, 1363.

2. EVB Theory

Thie EVB approach uses a fully classical description of the different VB configurations along a chemical reaction. The advantages of this approach are that it is fast, allowing for extensive conformational sampling and a quantitative description of the effect of different environments on the activation free energy In practice, this mean comparing a chemical reaction in the gas phase versus that in solution, or a reaction in aqueous solution to that in an enzyme (i.e. catalytic power), or even comparing a reaction in wild-type enzyme to that of a mutant variant. The former is usually called “reference reaction”, as it is used to calibrate the EVB potential to reproduce experimental or ab initio data.

‡ H3C H3C H3C CH2 CH2 CH2

Cl + C Cl C Br C + Br H Br Cl H H H H H Figure 1. SN2 reaction between 1-bromopropane and chloride ion. Reactive region is colored in black and surroundings in grey.

To make the model transferable between different environments, the system is split into two regions - the reactive (also known as Q or EVB), which contains all the chemically relevant groups involved on the reaction and surrounding region, which contain the rest of the system (Figure 1). Note that the reactive region is treated in the exact same manner in all mediums. The Hamiltonian can thus be written as the sum of interactions of the reactive region

(Hr), the surrounding region (Hs) and the interactions between the two regions (Hrs) [1,2]:

3 Understanding Chemistry and Biochemistry with Conceptual Models

�!"! = �! + �!" + �! (1)

2.1 Chemical Reactions with EVB We will describe the chemical reaction studied here using a simple two-state EVB model, where the system is described by considering the two lowest energy VB states only. These states have a direct physical meaning, namely, they describe the diabatic states of the reactants and products (Figure 2). The wavefunction is then:

� = �!�! + �!�! (2) where indices 1 and 2 denote the reactant and product VB states, respectively. By solving the secular equation, we obtain as the lowest eigenvalue, the analytical EVB ground-state potential energy function [1] (Figure 1A):

! � = � + � − (� − � )! + 4�! (3) ! ! !! !! !! !! !" where the matrix elements H11, H22 and H12 are somewhat complex integrals of the form ∗ �!" = �!��! �τ. It is at this point that we introduce the empirical part of Empirical Valence

Bond, by approximating H11 and H22 with analytical (MM) potential functions, acknowledging that H11 and H22 have a clear physical meaning - they represent the energies of the two states. With the exception of reactive bonds, which are modeled with Morse potentials, the interactions in the individual states are described using typical force-field functions (harmonic potential, coulomb, Lenard-Jones, etc.).

Figure 1: A) Schematic representation of potential energy functions Eg, H11, H22 and H12. B) Example of configurational sampling of Eg using the biased potential Em(λ).

The off-diagonal element H12 is the quantum coupling of states, and unlike the diagonal counterparts, does not have a classical analogy, and is thus typically approximated by an exponential or Gaussian function, or as in our case, by a constant value. It has been shown to be independent of the environment [3].

4 Understanding Chemistry and Biochemistry with Conceptual Models

We now return to the concept of the reference reaction. The reference reaction is a reaction in an environment (typically in the gas phase or aqueous solution), for which the energetics are known either from experiment or from accurate quantum chemical calculations, and it is used to calibrate the unknown EVB parameters in the EVB model. In a 2-state, constant

H12 EVB model, the calibration involves fitting of two parameters - H12, and the so-called gas- shift (energy difference of diabatic states in the gas phase). Two known values on the energy profile are thus required to obtain a proper fit, a common choice in the field being the activation ≠ and reaction free energies (ΔG and ΔG0). The EVB potential is then calibrated by varying H12 and ≠ gas-shift until the calculated ΔG and ΔG0 coincide with the reference values. The same parameters can then be used in a different environment.

2.2 Obtaining Free Energies

While potential energy surfaces at zero Kelvin obtained using the above methodology can be interesting on their own, they are not particularly relatable to typical experimental conditions involving enzymatic reactions. Instead, we would like to account for the many loose degrees of freedom in our system, i.e. the entropic contribution, and do configurational sampling (e.g. using molecular dynamics simulations) to calculate free energies. The activation free energy of a reaction can for example then be directly related to experimentally determined rate constants via transition state theory (Eyring-Polanyi equation):

!!!‡ !!! � = � � !!! (4) ! A convenient approach for obtaining reaction free energy profiles in the context of EVB, is the mapping approach, also known as the free energy perturbation/umbrella sampling (FEP/US) approach. Here, the configurational sampling is done via biased MD (molecular dynamics) simulations on the so-called mapping potential Em, which is defined as a linear combination of the two hamiltonians:

�! = ��!! + 1 − � �!! (5) where λ is the coupling parameter used to discretely and gradually transform the system from reactant to product state, usually in 51 frames (also known as windows). In each of these windows we then sample the configurational space, as defined by Em(λwindow), using classical molecular dynamics (Figure 2B). The unbiased free energy profile is then obtained using the FEP/US expression [1,9].

! !! ! !!! !,! ∆� �, � = ∆�! � − �� ln � � � − � � !" ! (6) The reaction coordinate is defined as the energy difference between the two states, also known as the energy gap: � ≡ �!! − �!!.

5 Understanding Chemistry and Biochemistry with Conceptual Models

3. Case studies

In this tutorial we will study two systems:

1. The uncatalyzed SN2 reaction in gas and aqueous phase. 2. The hydrolysis of organophosphate compounds in squid diisopropyl-fluorophosphatase (DFPase). This tutorial will guide you through several steps, from the setup of the system to be studied, to the analysis of the calculations. It will be divided into five parts: • Setting up the systems • Creating FEP files • Set up of FEP/US simulation • In silico mutagenesis • Visualization and analysis

3.1 SN2 Reaction

3.1.1 Building a topology

The first task in this tutorial is fairly straightforward – building a Q topology file. The topology file contains information about the system required for running classical MD simulations in Q, which includes coordinates, bonding patterns and force field parameters. It is built using the Qprep6 tool that comes with the default installation of Q6. Note that all the necessary files to build the topology are provided with the tutorial. Go to the folder 0-top. There you will find two subfolders and one input file. If you type ls (or ll - t), you will be able to see all the files (folders are in bold) in this folder: $ ls –l 0-ff prep.inp probr_cl.pdb

Take a look at the PDB file called probr_cl .pdb. It contains the xyz coordinates of the system and residue/atom names. It is essential to have the correct order of the atoms in order for the program to match them to the corresponding parameters of the force field. These parameters can be found in the folder 0- ff (extension .lib and .prm) We will use the Qprep6 program to set up our system. The input file (prep.inp), contains the instructions that will be given to Qprep6 to create a topology file. Here is an example of the input file to be used for this task:

6 Understanding Chemistry and Biochemistry with Conceptual Models

# read library files readlib ./0-ff/qoplsaa.lib readlib ./0-ff/prb.lib readlib ./0-ff/cl-.lib # read parameter file (only one) readprm ./0-ff/qoplsaa_prb_cl-.prm # read coordinate file readpdb probr_cl.pdb # boundary and solvation boundary sphere 1:C1 20 #solvate 1:C1 20 1 HOH # write topology and new coordinate file maketop probr_cl.top writetop probr_cl.top writepdb probr_cl_start.pdb y quit

It tells the program which starting structure to use (readpdb probr_cl.pdb) and where to find the parameters (readlib/readprm) It also contains instruction for solvation the system. In this case, that line is commented and the generated topology (.top) will not contain any solvent. Finally, the input will write the topology file and a new PDB file You can open the new PDB in a molecular visualization software to see that no solvation has been included (you can also uncomment that line). Now build the topology using Qprep6 $ Qprep6 < prep.inp > prep.out The output qprep.out of a successful build should end with “PDB file successfully written”, with Qprep6 writing the following two files to disk:.top, and probr_cl_start.pdb. Copy probr_cl_start.pdb, which contains the processed coordinates, to your local computer and visualize it using VMD. If you’re not familiar with VMD, here are some useful shortcuts: - = reset view - T, R, C translate, rotate, center (mouse-click to select center atom) - 1, 2, 3, 4 label (atom, bond, angle, dihedral) use Graphics->Labels to display and plot

3.1.2. Molecular Dynamics Using the newly created topology we will perform a procedure called relaxation using Qdyn6. It basically consists of running MD simulations to allow our system to find a minimum in its potential energy at the temperature we wish to study. We will go to folder 1-relax and perform a set of MD runs, going from 1 K up to 300 K. Copy the topology from where you created it (0-top to the 1-relax directory. Within this directory, you will find three input files (relax_00X.inp). Those are the inputs used for the full relaxation procedure, and should be run sequentially. To run them we will type: batch run_relax_q.sh

7 Understanding Chemistry and Biochemistry with Conceptual Models

Check the progress of the simulation: $ grep –E “summary|terminated” relax_xxx.out | tail At the end of your relaxation process (three ‘OK’ should appear), you should have three .log, .dcd and .re files. relax_003.re file will be used to launch your EVB calculations. In order to make sure that all relaxation process was accomplished, check the output file from your last input. At the end of the file you should see that Q terminated normally. You can visualize the files in VMD (load the pdb and then the dcd file)

While the simulation is running in the background, try to familiarize yourself with the structure of the Qdyn6 input. It is well commented, however, we encourage you to look up the keywords in the Q manual as well.

Relax_003.inp

[md] steps 100000 temperature 300 stepsize 1 bath_coupling 100

[cut-offs] q_atom 99

[files] topology probr_cl.top restart relax_002.re final relax_003.re trajectory relax_003.dcd fep probr_cl.fep

[lambdas] 1.00 0.00

[sequence_restraints] 1 6 0.1 0 0 11 12 0.1 0 0

[distance_restraints] 1 11 0.0 3.5 3.0 2 1 12 0.0 3.5 3.0 1

8 Understanding Chemistry and Biochemistry with Conceptual Models

We will go through some of these sections: [MD] steps stands for the total number of MD steps to be performed, Stepsize. is the size of the step in fs, Temperature. Number in K, bath-coupling is related to the thermostat used to ensure constant temperature. [files] Here we specify topology and FEP file to be used, as well as the names for the trajectory (.dcd) and restart (.re) files to be written. [lambdas]. This section related to the free energy perturbation calculation to be performed between the reactant and the product state. 1.0 0.0 stands for the system 100 % at the state 1 (reactant), and 0 % at the state two (product). [position restraints] restraints are used to ensure that the heating is made in a smooth manner. The first two numbers correspond to the atom numbers in the PDB generated at the setup step. For further information, please refer to the Q manual. [distance restraints] A weak distance restraint is also used to ensure the fragments are kept in the center of the sphere.

3.1.3 Generating FEP file

To calculate relative free energies of any process using the FEP methodology, we need to define the changes that occur during the transformation from one state to the other. In Q, this is defined in the so-called FEP-file, which, just to give you a brief overview, consists of the following sections (see Q manual p.25-32 for details): - [FEP] # used to define number of states, etc. - [atoms] # define Q atoms (i.e. reactive, EVB, FEP) and map PDB indices to Q indices - [atom_types] # VdW parameters - [change_atoms] # changes in VdW parameters - [change_charges] # changes in partial charges - [bonds] # bonding parameter definitions - [change_bonds] # changes in bonding - [angles] … … # analogous to bonds The FEP files can in principle be generated manually, however, this proves to be a very laborious and somewhat error-prone task. Thus in order to save some time and limit the number of typos, we will make use of the handy qtools utility q_makefep.py. In the next section we will describe in detail how to use this tool. Now let us take a look at each part of the FEP file: The first section indicates the number of sates to be used and the atom to be part of the Q- region (first Q-atom ID and second PDB number of the atom). [FEP] states 2 [atoms] #Q index PDB index

9 Understanding Chemistry and Biochemistry with Conceptual Models

1 1 2 2 3 3 4 4 5 5 6 6 7 11 8 12 The next section gives the name of each atom type, vdW (columns two and three), soft pair (columns four and five), the 1-4 interactions (columns six and seven), and the mass of the atom. The soft pair interaction is used between reacting fragments, in order to reduce the high repulsion from the Lennard-Jones potential as the atoms come close together to form bonds. Details about it can be found at the Q manual. [atom_types] prb_C1 944.5180 22.0296 91.0 2.5 667.8751 15.5773 12.011 prc_Cl11 1692.2485 43.0554 90.0 2.5 1196.6004 30.4447 35.453 cl-_Cl1 5099.0001 59.1621 90.0 2.5 3605.5376 41.8339 35.453

Then the partial charges corresponding to the states 1 (reactant) and 2 (product) for the EVB calculation are indicated. The first column contains the Q-atom ID, and the last two columns are the charges for the EVB states 1 and 2. [change_charges] 1 -0.3022 -0.1984 # PRB.C1 dq= 0.1038 2 0.1441 0.1206 # PRB.H2 dq=-0.0235 3 0.1441 0.1206 # PRB.H3 dq=-0.0235 Sometimes atoms can change their orbital properties, as it happens between sp2 and sp3 C atoms, or bonded and unbounded halogens. Their vdW properties will change, and it is here where we account for this. It works like the change in charges. [change_atoms] 1 prb_C1 prc_C1 # PRB.C1 ! 7 prb_Br11 br-_Br1 # PRB.Br11 ! 8 cl-_Cl1 prc_Cl11 # CL-.Cl1 ! Then we add the pair of atoms where the soft potential will act. Again, we are using the Q-atom IDs. Note that here we are only using the atoms that breaking or forming bonds. [soft_pairs] 1 7 # prb_C1-prb_Br11 1 8 # prc_C1-prc_Cl11 [bond_types] 1 66.0 1.58 1.94 # prb_C1-prb_Br11 2 78.0 1.51 1.80 # prc_C1-prc_Cl11

[change_bonds] 1 11 1 0 # 1.C1-1.Br11 prb_C1-prb_Br11 None 1 12 0 2 # 1.C1-2.Cl1 None prc_C1-prc_Cl11

These two sections are related to bond modifications. The bond types with three columns have

10 Understanding Chemistry and Biochemistry with Conceptual Models changes in the spring constant (second column) and length (third column, in Å) of a covalent bond, and they are represented by a harmonic potential. The four columns in the bond type section provide the parameters for a Morse potential, which we use to represent the cleavage and formation of bonds. On the following section, we assign the type of bonds for every pair of atoms we are interested in. This time we use the PDB-atom ID for the pair of atoms, and two more columns, representing which type of bond will be used for the two states. Note that for cases where a bond is inexistent a zero is used. By analyzing the FEP file, can you tell which bond is being broken, and which one is being formed? For the angles and torsions, the procedure is analogous. The shape of the potentials follow the OPLS-AA force field convention. If you would like to know more, please refer to the Q manual. Finally, although being an old command, this part enables the printing of distances between two atoms in the output file. Here, the columns three and four have the Q-atom IDs of the atoms we would like information about. [off_diagonals] 1 2 1 7 0 0 # prb_C1-prb_Br11 1 2 1 8 0 0 # prc_C1-prc_Cl11

3.1.4 EVB FEP/US simulations

With the system relaxed, you can now begin the EVB calculations. Go to the 2-fep folder. You will see that there are three directories. 1-RS_000 2-TS_000 3-PS_000 Each directory contains 51 input files corresponding to the EVB run. As you have learned from the lecture, EVB is a FEP procedure in which the energies used for the free energy calculation are a mix between the energies of the two or more states. The difference between these files is from where one starts the perturbation (from reactants, TS, or products). In order to run such calculation, we need to have the FEP, topology file and the necessary restart files from the relaxation step: cp ../../1-relax/relax_003.re cont_relax_003.re cp ../../1-relax/relax_003.re cont_relax_003.re.rest cp ../../1-relax/*top .

The files must be executed sequentially, starting with equil_000_1.000 and then followed by the fep*inp files (from fep_000_1.000.inp, which corresponds to our reactant state (lambdas set as 1.0 and 0.0), going all the way to fep_050_0.000.inp, the file corresponding to the product state (lambdas 0.0 and 1.0). Use the submission script available at each of the folders. bash run_feps_q.sh

11 Understanding Chemistry and Biochemistry with Conceptual Models

3.1.5 Analyzing EVB Run and Calibrating the PES

FEP profiles are calculated using the tool Qfep6, by providing the unknown EVB parameters H12 and gas-shift. Alternatively, we can use the qtools wrapper q_mapper.py, which will create the inputs, call Qfep6, and analyze the outputs in one go.

‡ 0 H12 and the gas-shift used here have been pre-calibrated to reproduce the ∆G and ∆G of a reference reaction.

H12: 76.543 Gas-shift: 2.345 In some cases, when experimental data of similar systems are available one can also use them and obtain the corresponding activation barriers from the rate constants. Then it is a matter to find a set of empirical parameters that can adjust our EVB parabola to reproduce the results from such studies. Our values are: ∆G‡ = 13.0 kcal/mol and ∆G0= -5.4 kcal/mol. Go into 2-fep and calculate the free-energy profiles using q_mapper.py $ q_mapper.py –h Required: hij Hij coupling constant alpha state 2 shift (alpha)

Optional: --nt NTHREADS Number of threads (default = 1) --bins GAP_BINS Number of gap-bins (default=50). --skip POINTS_SKIP Number of points to skip in each frame (default=100). --min MINPTS_BIN Minimum points for gap-bin (default=10). --temp TEMPERATURE Temperature (default=300.00). --dirs MAPDIRS [MAPDIRS ...] Directories to map (default=all subdirs in cwd or current dir) --out OUTFILE Logfile name (default=q_mapper.log). --qfep_exec QFEP_EXEC qfep5 executable path (default=).

Note that the coupling or non-diagonal elements of the Hamiltonian, can be a constant value of in more general terms of the form

Hij=exp(−μrij ), where rij is the distance between the atoms composing the two extremes of the reaction. For simplicity here we will use a constant value.

$ q_mapper.py 76.543 2.345 --bins 50 --skip 10 --min 1 --temp 298.15 This will generate files q_mapper.log and qfep.out q_mapper.log

12 Understanding Chemistry and Biochemistry with Conceptual Models

Analysis Stats: # Mean Std.dev Median Std.error N dG* 13.34 nan 13.34 nan 1 dG0 -5.58 nan -5.58 nan 1 dG_lambda -9.44 nan -9.44 nan 1

Parts of the qfep.out file produced are shown below:

# Part 0: Average energies for all states in all files # file state pts lambda EQtot EQbond EQang EQtor EQimp EQel EQvdW Eel_qq EvdW_qq Eel_qp EvdW_qp --> Name of file number 1: fep_000_1.000.en 1 1949 1.00 -74.97 -65.92 0.25 0.05 0.00 -9.22 -0.40 -7.17 -0.19 -2.05 -0.21 fep_000_1.000.en 2 1949 0.00 92.97 -6.27 78.38 0.91 0.00 -7.89 27.84 -7.14 25.92 -0.75 1.93 Name of file number 2: fep_001_0.980.en 1 1949 0.98 -74.90 -65.87 0.19 0.05 0.00 -9.14 -0.41 -7.13 -0.19 -2.01 -0.21

Part 0 shows a summary of the contributions to free energy, from bond to van der Waals contributions, for both states at different stages of the FEP procedure (i.e. different lambdas).

# Part 1: Free energy perturbation summary:

# Calculation for full system # lambda(1) dGf sum(dGf) dGr sum(dGr) 1.000000 0.000 0.000 -3.350 12.881 0.000 0.980000 3.401 3.401 -3.275 16.231 3.375 0.520000 0.763 60.469 0.074 70.531 59.059 0.500000 0.363 60.831 0.556 70.457 59.204 0.480000 -0.110 60.722 0.952 69.901 58.871

Part 1 shows how the free energy is built, both summing from lambda 1.00 to 0.00 (and backwards, dGf and dGb respectively. The last column contains the average between the forward and backward profile. Here you can spot where the transition state (TS) is located. Often the TS is found at the lambda 0.50000, although this is not always necessarily true. Later we will analyze the geometry of our system at the TS. Thus, keep in mind that you will find which lambda, and consequently which fep_XXX.log, we should look in order to get the TS distances.

# Part 2: Reaction free energy summary:

# Lambda(1) bin Energy gap dGa dGb dGg # pts c1**2 c2**2 1.000000 1 -181.57 0.00 178.60 -28.21 66 0.881 0.119 1.000000 2 -173.60 0.00 170.86 -29.11 998 0.874 0.126 1.000000 3 -165.62 0.00 164.28 -29.83 876 0.869 0.131 0.980000 2 -173.60 -0.06 170.63 -29.24 444 0.873 0.127

Part 2 provides information about lambda values, number of bins, energy gap between states 1 2 2 and 2 for every lambda and finally the constants c1 and c2 , which correspond to the coefficients of the mixing of the wavefunctions for states 1 and 2 (ψg = c1ψ1 + c2ψ2).

13 Understanding Chemistry and Biochemistry with Conceptual Models

# Part 3: Bin-averaged summary: # bin energy gap pts 1 -181.57 -28.21 8.08 66 0.881 0.119 1.943 2 -173.60 -29.15 7.14 1454 0.874 0.126 1.953 3 -165.62 -29.97 6.32 4723 0.867 0.133 1.968 4 -157.64 -30.53 5.76 6834 0.859 0.141 1.997 5 -149.66 -30.71 5.58 5437 0.850 0.150 2.027 6 -141.69 -30.53 5.76 4590 0.840 0.160 2.057 7 -133.71 -30.06 6.23 3468 0.829 0.171 2.087

Here, we can extract the free energy profile (dGg norm) plotted against the energy gap as the reaction coordinate. From here you can extract the activation free energy, as well as the free energy difference between reactant and product states. Moreover, we also see the bins and their corresponding center in the energy gap axis. There are plenty of new concepts, so don’t worry you could not grasp all of them. It is just a matter of experience. You can visualize your trajectory in VMD: vmd ../../0-topol/probr_cl_start.pdb fep_0*dcd

Also you can extract your activation barrier from part 3 of the output: If you have some time left, try to include solvent in your simulation and see its effect on the activation barrier. Is that in line with your expectation?

3.2. DFPase

The tutorial is based on the work published in [10], more precisely, on the simulations of diisopropylfluorophoshate (DFP) hydrolysis in diisopropylfluorophosphatase (DFPase) shown in Figure 3.

Figure 3: Overview of DFPase fold and active site (PDB: 3byc). DFP has been manually placed in the active site as described in [10].

14 Understanding Chemistry and Biochemistry with Conceptual Models

This work presents an example of an extensive mechanistic study using the EVB methodology, as applied to enzymatic reactions, with the authors exploring several strategies for differentiating between proposed mechanisms in literature, including measuring the catalytic power, calculating the effects of point mutations, accurately reproducing experimental temperature dependence and qualitative assessment of pH rate dependence. These calculations provide invaluable insight into the nature of the catalyzed hydrolysis of toxic organophosphates, and paved the way for elucidating the apparent selectivity observed in similar organophosphatases. In this tutorial we will perform FEP/US EVB simulations of the favored reaction mechanism (as determined in the above study) to try to reproduce the experimentally obtained rate constants. In addition we will try to computationally reproduce the effects of point mutations.

Figure 4: Reaction scheme of DFP hydrolysis as proposed in [10], and modeled in this tutorial. The general-base Asp229 abstracts the proton from the nucleophile water, with concurrent nucleophilic substitution of the fluoride leaving group. Note that the Ca2+ ion is not included in the reactive EVB region.

Instructions The downloaded folder contains the following subdirectories: WT # wildtype DFPase simulation setup Mut1 # Mutant DFPase simulation setup docs # Q manual and this tutorial Where the WT and Mut1 are further divided into: 0-topol # pre-made parameter, library and coordinate files 1-equil # finished equilibration simulations 2-fep # input files for FEP/US

3.2.1 Building a topology As shown in the previous example, the first step when running an EVB calculation is to build a topology file, which contains information about the system required for running classical MD simulations in Q, Note that all the necessary files to build the topology are provided with the tutorial. Move into the WT/0-topol directory and create a file qprep.inp with the following contents (you can skip the comments):

15 Understanding Chemistry and Biochemistry with Conceptual Models

# read library files readlib ./0-ff/ohh.lib readlib ./0-ff/dfp.lib readlib ./0-ff/ca6.lib readlib ./0-ff/qoplsaa.lib # read parameter file (only one) readprm ./0-ff/qoplsaa_dfp_ca6.prm # read coordinate file readpdb dfpase_dfp.pdb # boundary and solvation boundary 1 314:C61 25 # sphere, center at 314:C61, radius 25 solvate 314:C61 25 1 HOH # center at 314:C61, radius 25, grid, tip3p # write topology and new coordinate file maketop pro.top writetop pro.top writepdb dfpase_dfp_start.pdb y quit

Now build the topology using Qprep6 $ Qprep6 < qprep.inp > qprep.out The output qprep.out of a successful build should end with “PDB file successfully written”, with Qprep6 writing the following two files to disk: pro.top, and dfpase_dfp_start.pdb. Copy dfpase_dfp_start.pdb, which contains the processed solvated coordinates, to your local computer and visualize it using VMD. Use the CPK graphical representation (Graphics- >Representation) with the following selection to focus on the active site: “same residue as all within 5 of resid 314 315 316”. Optional questions: Which residue is the catalytic base? Does the index match the one in Figure 2? Why does the calcium have a weird octahedral shape? 3.2.2 Molecular dynamics Using the newly created topology for wild-type dfpase, we will run a test MD simulation using Qdyn6, using the provided input file in folder WT/0-topol/2-test_md $ Qdyn6 test.inp > test.out & Check the progress of the simulation: $ grep –E “summary|terminated” test.out | tail While the simulation is running in the background, try to familiarize yourself with the structure of the Qdyn6 input. It is well commented, however, we encourage you to look up the keywords in the Q manual as well. Compare this input to WT/equil/relax_015.inp. What are the key differences? Along with the output (test.out), the program will also write a simulation trajectory (test.dcd) and a restart file containing the final coordinates and velocities (test.re). The latter is often used as an input in chained simulations, via the keyword restart, as we will see later.

16 Understanding Chemistry and Biochemistry with Conceptual Models

Once the simulation has completed, copy the trajectory (test.dcd) and starting coordinates (dfpase_dfp_start.pdb) to your local computer and visualize the trajectory in VMD.

Optional questions: Why didn’t we run the simulation at room temperature?

What is the purpose of the random_seed in Qdyn6 input? Which keyword is used to set a restraint on the distance between two atoms? What does the [sequence_restraints] keyword do?

3.2.3 Creating a FEP file

As already done in Section 3.1.3, we need to define the changes that occur during the transformation from one state to the other using a FEP-file, which (see Q manual p.25-32 for details): - [FEP] # used to define number of states, etc. - [atoms] # define Q atoms (i.e. reactive, EVB, FEP) and map PDB indices to Q indices - [atom_types] # VdW parameters - [change_atoms] # changes in VdW parameters - [change_charges] # changes in partial charges - [bonds] # bonding parameter definitions - [change_bonds] # changes in bonding - [angles] … … # analogous to bonds

The differences in parameters between the EVB states can of be determined manually, however, this proves to be in all but the simplest of cases, a very laborious and somewhat error-prone task. Thus in order to save some time and limit the number of typos, we will instead make use of the handy qtools utility q_makefep.py. q_makefep.py determines all the parameter changes and generates a FEP file automatically. It requires the following command-line arguments: - A Q-processed coordinate file - Force-field type (at the moment only oplsaa or amber) - Q parameter and library files for all species in all EVB states - A qmap file, which maps each atom in the coordinate file, to a “library-ID” in each particular EVB state. The library-ID is simply the residue name and atom name, separated by a period. To make the fep file, go into WT/topol/1-fep and type: $ q_makefep.py -m dfpase_dfp_hoh_asp229.qmap \ -s ../dfpase_dfp_start.pdb \ -f oplsaa \ -p ../0-ff/*prm \

17 Understanding Chemistry and Biochemistry with Conceptual Models

-l ../0-ff/*lib \ -o dfpase_dfp_hoh_asp229.fep.tmplt

Unfortunately, q_makefep.py is not all-knowing, so some parts of your FEP file will have missing values for Morse potentials, soft-pair interactions, and off-diagonal definitions. All except the off-diagonal will be marked with a placeholder . Find the necessary parameters to complete the FEP file, by comparing the generated output with the finished version found in folder H287A/0-topol/1-fep.

Optional questions: Why do sections [bond_changes], [angle_changes], [torsion_changes], [improper_changes] use PDB indices, while the rest use internal indices?

Notes: The $315.P1$ notation in the generated FEP file template is specific to qtools, and is not understood by Q. It denotes a placeholder for a PDB atom index and is particularly useful when dealing with different enzyme variants, since it allows the use of common template files. The conversion to indices is automatic when generating simulation inputs, however, it can be done manually using the command q_pdbindex.py.

3.2.4 EVB FEP/US simulations

With the topology and FEP file created, you are now able to simulate the reaction. We have done the messy stuff (cleaning the structure, setting the protonation states, docking, etc.) in advance, and have already pre-equilibrated the system for 20 ns (for full protocol see [2]), allowing you to focus instead on running FEP/US simulations of the reaction. Go into folder WT/1-fep and run the following command (replace X with your favorite number between 1 and 30): $ q_genfeps.py genfeps.proc \ ../1-equil/X/relax/relax_015.inp \ relax \ --pdb ../0-topol/dfpase_dfp_start.pdb \ --repeats 1 \ --frames 51 \ --rs run_q.sh Run the command with --help to see what the arguments do.

$ q_genfeps.py --help Essentially, this command generates a folder rep_000, containing 51 Qdyn6 inputs, one for each mapping window. Each input has a different λ-value, spanning from (λ1=1.0, λ2=0.0) to (λ1=0.0,

λ2=1.0), i.e. EVB state 1 to EVB state 2. Additionally, all necessary files to continue the simulation

18 Understanding Chemistry and Biochemistry with Conceptual Models from the last step of the previous equilibration (../1-equil/relax_015) are automatically copied into the directory (FEP, topology, restart).

Go into directory WT/2-fep/rep_000 and run $ ls $ run_q.sh &

This will run the simulation in the background - it should take a couple of minutes. In the mean time, we suggest that you go quickly through the next section.

Optional questions:

Did you notice that the FEP/US simulation was started from state (λ1=0.5, λ2=0.5)? Does the direction matter in theory? Can you think of a possible practical benefit, as opposed to starting from pure EVB state 1 (λ1=1.0, λ2=0.0)? What do the {SCRIPT_VARS}, {GENERAL}, {STEPS_EQUIL} and {FEP} keywords mean in the genfeps.proc input file?

3.2.5 In-silico mutagenesis

We will now create a mutant enzyme in-silico, by mutating a histidine residue to an alanine. The H287A variant is known to have significantly lower activity than the wild-type enzyme, and is straightforward to create in-silico, without having to resort to external tools (e.g. Rosetta, Modeller, Chimera, PyMOL, SCWRL, etc…).

Go into the Mut1/0-topol folder and copy the wild-type coordinates: $ cp ../WT/0-topol/dfpase_dfp.pdb H287A.pdb Open H287A.pdb in a text editor and remove all side-chain atoms of residue His287 except CB, and change the residue name to ALA. Copy and re-use the WT Qprep6 input to build the topology: $ Qprep6 < qprep.inp > qprep.out If the build failed, make sure that you are reading-in the correct coordinate file (keyword readpdb). Repeat the procedure from Section 3.2.4 for the mutant enzyme (generate inputs, run simulation). Grab some coffee and cookies (if available) and take a break until the simulations are done.

Optional questions: Which assumption did we make about the general structure of the enzyme when doing in-silico mutagenesis? (Compare structure with PDB: 2IAV)

19 Understanding Chemistry and Biochemistry with Conceptual Models

Where are all the hydrogen atoms in H287A.pdb? Compare with H287A_dfp_start.pdb. How does Q determine the correct Histidine tautomer (epsilon vs delta position)?

3.2.6 Analyzing Our EVB Run and Calibrating the Parameters As described in Section 3.1.5, FEP profiles are calculated using the tool Qfep6, by providing the unknown EVB parameters H12 and gas-shift. Alternatively, we can use the qtools wrapper q_mapper.py, which will create the inputs, call Qfep6, and analyze the outputs in one go.

H12 and the gas-shift used here have been pre-calibrated to a reference reaction in solution.

H12: 152.4 Gas-shift: -43.0 Go into WT/2-fep/ and calculate the free-energy profiles using q_mapper.py $ q_mapper.py -h $ q_mapper.py H12 gas-shift --bins 50 --skip 10 --min 1 --temp 298.15 --dirs rep_000 To extract the energy profiles and calculate LRA contributions use the following command: $ q_analysefeps rep_000 --lra_l 0.84 0.16 Note: Here we are using --lra_l to specifying λ values for which we want to calculate LRA energies. These values roughly correspond to the mapping windows that contribute the most point to the minima on the EVB reaction free-energy profile. Repeat the analysis for the mutant enzyme. The experimental activation free energy for WT enzyme is 14.7 kcal mol-1 while that of the H287A is around 16.1 kcal mol-1 (reported value is 10 % s.a. rel. to WT). Do your somewhat malnourished simulations qualitatively match experimental values? Compare the results with your colleagues and try to get a better estimate by combining the results and calculating the mean value and standard error of the mean. Go to main folder and visualize the profiles using q_plot.py: $ q_plot WT/2-fep/qaf.PlotData.json Mut1/2-fep/qaf.PlotData.json

Optional questions Can you find the λ-values used above by looking at qfep.out? Hint: find the bins corresponding to the energy minima and find which λ-window contributes most energy points.

3.2.6 Visualizing the trajectory

Copy the entire WT directory to your local computer and use the provided VMD script to visualize the reaction simulation: (in directory WT/2-fep/) $ vmd –e visualize_fep.vmd

20 Understanding Chemistry and Biochemistry with Conceptual Models

Measure the reactive distances: - Asp.OD1 – Nuc.H1 - Nuc.H1 – Nuc.O - Nuc.O – DFP.P - DFP.P – DFP.F (press “2”, click on two atoms, open Graphics -> Labels -> bonds, select the bond and plot)

Is the sampling smooth? Optional questions How would you describe the dynamics of the system, particularly the active site?

Are there any large-scale conformational changes, and can these be somehow problematic for our study?

3.2.7 LRA group contributions

LRA group contributions is an energy decomposition method for estimating the effects of amino- acid residues on catalysis. Essentially, the idea is to estimate the free energy difference of deleting a residue in the reactant versus the transition state, using the linear response approximation method [8]: !!"! !!" !!"! !!" �� �� → �� − �� �� → �� ≈ !" !" !" − !" !" !" (x) !"! !"!! ! ! where sys is the full system, sys’ is the system with a deleted residue, rQ denotes interaction between the residue and the reactive Q region, and RS and TS denote reactant and transition state, respectively. Calculate the group contributions using the following command: $ q_calc.py gc dfpase_dfp_start.pdb 1 312 --lra_l 0.84 0.16

Visualize them using q_plot.py: $ q_plot qgc.PlotData.json

Spare time: Based on group contribution analysis, try to rationally design a mutant with higher activity towards DFP. Good luck!

21 Understanding Chemistry and Biochemistry with Conceptual Models

4. Addendum

4.1 Software installation

Notes: 1) These instructions assume you are using Linux and the Bash shell (usually default). If you are using a different shell, please adapt the instructions accordingly.

Q (version 6) Prerequisites: git, gcc-fortran compiler, openmpi libraries (or alternatively intel/intelmpi)

$ mkdir -p $HOME/bin/ && cd $HOME/bin $ git clone https://github.com/qusers/Q6 $ cd Q6/src && make all mpi COMP=gcc

(add the following line to your .bashrc) export PATH=$PATH:$HOME/bin/Q6/bin/ (logout and login)

Qtools (version 0.6.0) Prerequisites: git, python2.7, python-matplotlib, Q

$ mkdir -p $HOME/bin && cd $HOME/bin $ git clone https://github.com/mpurg/qtools

(add the following line to your .bashrc) source $HOME/bin/qtools/qtools_init.sh

(logout and login) $ qscripts_config.py

22