Journal of

A Toolkit to Assist ONIOM Calculations

For Peer Review Journal: Journal of Computational Chemistry

Manuscript ID: Draft

Wiley - Manuscript type: News and Updates Date Submitted by the Author:

Complete List of Authors: Tao, Peng; Wayne State University, Department of Chemistry Schlegel, H. Bernhard; Wayne State University, Department of Chemistry

Key Words: QM/MM, ONIOM, charge fitting, efficiency

John Wiley & Sons, Inc. Page 1 of 20 Journal of Computational Chemistry

1 2 3 4 A Toolkit to Assist ONIOM Calculations 5 6 7 8 9 10 11 Peng Tao, and H. Bernhard Schlegel * 12 13 14 Department of Chemistry, Wayne State University, 5101 Cass Ave, Detroit, Michigan 48202 15 16 17 [email protected] 18 For Peer Review 19 RECEIVED DATE 20 21 22 Correspondence to: H. Bernhard Schlegel, Department of Chemistry, Wayne State University, Detroit, 23 24 Michigan 48202. Email: [email protected] 25 26 27 This article contains supplementary material available in the online version of this article. 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 1 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 2 of 20

Abstract: A general procedure for QM/MM studies on biochemical systems is outlined, and a 1 2 3 collection of PERL scripts to facilitate ONIOM type QM/MM calculations is described. This toolkit is 4 5 designed to assist in the different stages of an ONIOM QM/MM study of biomolecules, including input 6 7 file preparation and checking, job monitoring, production calculations, and results analysis. An iterative 8 9 10 procedure for refitting the partial charges of QM region atoms is described and yields a more accurate 11 12 treatment of the electrostatic interaction between QM and MM regions during QM/MM calculations. 13 14 15 The toolkit fully supports this partial charge refitting procedure. By using this toolkit for file 16 17 conversions, structure manipulation, input sanity checks, parameter lookup, charge refitting, tracking 18 For Peer Review 19 optimizations and analyzing results, QM/MM studies of large size biochemical systems can be much 20 21 22 more convenient and practical. 23 24 25 26 Keywords: QM/MM, ONIOM, charge fitting, efficiency 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 2 John Wiley & Sons, Inc. Page 3 of 20 Journal of Computational Chemistry

1 Introduction 2 3 Computational chemistry and biology are becoming routine methods to study many biological 4 5 processes. In computational studies of biochemical reactions involving macromolecules, one of the most 6 7 powerful tools available is combined quantum mechanics and molecular mechanics (QM/MM) 8 1-5 9 methods. In QM/MM methods, the large system of interest is divided into two (or more) parts. The 10 larger part, which is not directly involved with the chemical reaction, is treated at a low cost level of 11 12 theory, such as molecular mechanics (MM). The smaller part where the chemical reaction occurs is 13 14 treated with a more accurate level of theory, such as quantum mechanics (QM), with a higher 15 16 computational cost. Using this scheme, the entire system, e.g. an enzyme with substrate and solvent 17 , can be treated computationally. The reaction of interest in the active site of the enzyme can 18 For Peer Review 19 be studied at an appropriate level of theory with an affordable computational cost. 20 21 Although different cases may vary, the general procedure for a typical QM/MM study of a 22 23 biochemical system can be described as follows. First, reliable experimental structures, usually 24 25 structures, are identified and preprocessed. The preprocessing includes adding hydrogen atoms, 26 determining the most probable protonation states of titratable amino acids, add counter ions and solvent 27 28 molecules, geometry optimization with MM and equilibration by (MD) using MM. 29 30 Then, representative geometries are selected from the MD simulations and are subject to optimization 31 32 again before the QM/MM study is initiated. Third, QM/MM geometry optimizations are set up from 33 selected MD geometries and conducted for the initial structures, usually reactant complexes in the 34 35 reaction sequence under study. Fourth, transition states (TS), intermediate structures and products of 36 37 interest are generated from reactant structures and optimized. Constructing and optimizing 38 39 representative structures along the reaction path is the most time consuming part of the study and needs 40 41 to be carried out very carefully. Fifth, the structures identified in along reaction path are subject to 42 production calculations at a higher level of theory than used for the geometry optimizations. This 43 44 general process is illustrated in Figure 1. 45 46 Based on the mechanism of coupling between the QM and MM region, two types of QM/MM 47 6 48 methods are generally available, additive and subtractive. ONIOM is one popular and robust 49 subtractive QM/MM method. 7,8 The total ONIOM energy for a two layer system is 8 50 51 E ONIOM = E real , MM + E model , QM − E model ,MM 52 53 In a two-layer ONIOM QM/MM calculation, the real system contains all the atoms (including both QM 54 real,MM 55 and MM region) and is calculated at the MM level ( E ). The model system is the QM region treated 56 model,QM 57 at the QM level ( E ). To obtain the total ONIOM energy, the model system also needs to be 58 treated at MM level (Emodel,MM) and be subtracted from real system MM energy. 59 60 3 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 4 of 20

Currently, the QM/MM methods implemented in the package 9 are two- and three-layer 1 2 ONIOM models for calculating the energy, gradient and Hessian for geometry optimization and 3 4 vibrational frequency analysis. ONIOM calculations using Gaussian have been applied in numerous 5 10-18 6 studies in computational biology, chemistry and material science. In a subtractive QM/MM method 7 such as ONIOM, the MM force field is needed for entire system, including QM region. The MM force 8 9 field includes parameters for bond and non-bond interactions and partial charges. Due to its subtractive 10 11 nature, most bond and short-range non-bonded terms in the QM region cancel in the ONIOM QM/MM 12 13 method. 14 15 Two schemes are generally applied to treat the electrostatic interaction between the QM and MM 16 regions. 19 In the mechanical embedding (ME) scheme, this interaction is treated at MM level. In the 17 18 electrostatic embedding (EE)For scheme, Peer the partial charges Review of MM atoms are included in the Hamiltonian 19 20 of the QM region. In the ME scheme, the electrostatic interaction between the QM and MM regions do 21 22 not cancel in the QM/MM energy. The partial charges of QM atoms play an important role in this 23 scheme, therefore they need to be treated carefully in QM/MM calculations. 24 25 Since the size of biochemical systems studied by QM/MM can be rather large, some of the structure 26 27 manipulation and processing of these systems can be cumbersome. During a QM/MM calculation, it is a 28 29 good practice to monitor running jobs in terms of both energies and geometries. If problems in running 30 31 jobs can be spotted at an early stage, much time and resources can be saved. However, since the sizes of 32 structures and files in QM/MM studies can be rather large, it is not always convenient to check running 33 34 QM/MM jobs on the fly, especially geometries. Although there are many graphical interfaces available 35 36 to facilitate QM/MM calculations, the efficiency and convenience of QM/MM studies can be 37 38 substantially increased when appropriate auxiliary tools are available for job preparation, monitoring, 39 and analysis. In this paper, we describe a Toolkit to Assist ONIOM calculations (TAO). The toolkit is 40 41 written in PERL using an object oriented design and was developed during our previous QM/MM 42 18 43 studies. With these tools, QM/MM studies using the ONIOM method in Gaussian become much easier 44 45 and more efficient, especially for large enzymatic systems. To illustrate the use of the toolkit and to 46 guide a user through an ONIOM QM/MM study of an enzymatic system, we also provide a tutorial in 47 48 the Supplementary Information. 49 50 In this paper, the general procedure of a QM/MM study is discussed. The tools available for the 51 52 various stages of this study are listed in Table I. 53 54 55 56 57 58 59 60 4 John Wiley & Sons, Inc. Page 5 of 20 Journal of Computational Chemistry

ONIOM input preparation 1 2 Since many large scale QM/MM studies are focused on biological systems, such as proteins, this 3 4 paper is directed mainly toward calculations of these systems. However, this toolkit can also be applied 5 6 to any other system suitable for QM/MM studies. The Protein Data Bank (PDB) is a common resource 7 for initial structures of biomolecules. The PDB file format is one of the most commonly used formats 8 9 for biomolecule structure storage. The selected initial structures are usually subject to pretreatment by 10 11 MM methods before they are used in QM/MM studies. Pretreatment typically includes geometry 12 13 optimization, immersing target in a box of explicit solvent molecules, equilibration by MD, 14 15 etc. The resulting system after these pretreatment steps could be rather large (>10,000 atoms). Usually, 16 it is not necessary to keep all the solvent molecules in QM/MM calculations. For potential energies 17 18 surface (PES) studies, keepingFor several Peer solvation shells Review may be sufficient. Then, the user needs to select 19 20 a QM region for the QM/MM study. This step requires a compromise between completeness of reaction 21 22 site and computational cost. Due to the large size of biomolecular systems, it is common practice to 23 select a region within certain distance of QM atoms to be optimized during QM/MM study, while 24 25 keeping other parts of the system frozen. This practice serves two purposes: it reduces the computational 26 27 cost and avoids noise in the total energy from unrelated conformational changes in different parts of 28 29 system other than reaction site. 30 31 When studying a potential energy surface (PES) for a reaction, it is necessary to identify the reactant, 32 product, transition states (TS) and intermediates along the reaction path. Since general MM methods 33 34 cannot be used to study bond forming and breaking, QM/MM methods are needed. Unlike for small 35 36 molecules (<100 atoms), searching for reactants, products and TSs in large biological systems (>5000 37 38 atoms) can be very cumbersome. Transition states can be particularly challenging. In our experience, 39 using model QM/MM systems with reduced size can be very helpful when dealing with large systems. 40 41 In early stages of a study, a large system can be reduced to a smaller model system which contains all of 42 43 the QM region, the moving part of the MM region, and enough frozen part of the MM region that 44 45 surround the first two parts (Figure 2). Compared to the whole system with solvent molecules, it is 46 much easier to manipulate the reduced QM/MM model system to generate initial structures for the 47 48 reactant, product and TS, and optimizations are less costly to carry out. Of course there will be 49 50 numerous dangling bonds in frozen MM region, but these missing bonds have little effect on the 51 52 geometry optimizations in this preliminary study stage. The relative energies obtained from the reduced 53 54 model system can be a guide for the production calculations on the entire system. Once a complete 55 reaction path is identified using the reduced model, these geometries can be easily embedded back into 56 57 the whole system, because the surrounding frozen part in the reduced model has coordinates identical to 58 59 60 5 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 6 of 20

the whole system. In our experience, once the initial geometries are optimized in the model system, the 1 2 production geometry optimizations of the whole system are usually easy to complete. 3 4 5 6 Tools for ONIOM input preparation 7 Several tools are available in TAO to assist this stage of an ONIOM study. A program named pdbcore 8 9 can be used to generate a reduced size model for a large size system stored in a PDB file. By defining a 10 11 core region with key residues, pdbcore truncates the full size system to a reduced size system by 12 13 including residues within certain distance (defined by user) of the core region. 14 15 Program pdb2oniom can be used to generate an ONIOM input file from a PDB file. There are other 16 tools available to do similar tasks.20-25 However, several features make pdb2oniom particularly useful 17 18 for large biomolecular systems.For A list Peer of core residues Review and a distance can be set by the user. The core 19 20 residues are those directly involved in the chemical reactions and any other important residues of 21 22 system. In the input file generated by pdb2oniom , all residues which contain one or more atoms less 23 than a given distance from any atom in the core residues are set to be freely optimized (molten region). 24 25 All other residues further from core residues are frozen during optimization (frozen region). Therefore, 26 27 each residue as a unit will be set to either free to move or frozen. Isolated free atoms are avoided in this 28 26 29 setup (Scheme 1). This program uses amino acid and nucleic acid template files from AMBER to 30 31 assign the atom type and partial charge for each atom. A user can add similar template files to the 32 library for any ligand or nonstandard residue for this purpose. The program can automatically read these 33 34 template files and apply them to the corresponding residues. These template files are usually generated 35 36 by a program or constructed by the users during the MM pretreatment of the system. It should be noted 37 38 that this program does not generate a connectivity table for the biomolecular system. Such a table can be 39 generated using other programs such as GaussView and must be copied to the end of the ONIOM input 40 41 file. It is strongly recommended that the same connectivity be used for the QM region for all of the 42 8 43 structures in the reaction sequence (reactant, TS, product, etc.) to ensure comparable ONIOM energies. 44 45 The pdb2oniom program will not set up the QM region, since it is not common practice to add whole 46 residues into the QM region. The user must explicitly choose what parts of which residues to include in 47 48 the QM region for an ONIOM study using GaussView or other tools. 49 50 A correct connectivity table is important for the MM calculation, since the connectivity is used to 51 52 assign partial charges and parameters for bond stretch, bend and dihedral angles, etc. The connectivity 53 54 table in the ONIOM input file should be checked with the program checkconnect . This program will 55 check for any isolated atoms and for atoms with too many bond connections. These atoms could easily 56 57 58 59 60 6 John Wiley & Sons, Inc. Page 7 of 20 Journal of Computational Chemistry

disrupt an ONIOM calculation, and are hard to diagnose. These atoms can also contaminate the total 1 2 energy significantly. 3 4 Another sanity check is to sum up total MM charges of the whole system and of each region of the 5 6 system using the program chargesum . With this tool, the charge distribution within each region can be 7 easily checked. It is particularly important for the QM region that the sum of MM partial charges should 8 9 be close to the total QM charge. 10 11 When running a newly generated ONIOM job for a biomolecular system that includes a small ligand, 12 13 the user will most likely encounter errors due to missing MM parameters. This is most often because the 14 15 AMBER force field does not have parameters for the substrate and for non-natural amino acids or 16 nucleosides in the systems. The program parmlookup can be used to find these missing parameters, first 17 18 in amino acid force field Forfile (parm99.dat) Peer and then Review in the general atom force field file (GAFF.dat). 27 19 20 Users can easily replace these force field files used by parmlookup with updated force field files from 21 22 newer versions of AMBER. 23 24 25 ONIOM Job Monitoring 26 27 To maintain high efficiency in the computational study, it is a good practice to check that running jobs 28 29 are producing meaningful results (both energies and geometries should be checked). When any 30 31 suspicious behavior is detected for a running job, further attention may be needed before the job is 32 finished. Based on the diagnosis, the job may need to be stopped and resubmitted with suitable 33 34 modification. Since ONIOM geometry optimization job files for biomolecular systems can be rather 35 36 large, frequently checking of the energy and geometry along the optimization path with a text editor or 37 38 program may be tedious and a number of tools have been designed to make these tasks 39 more convenient. 40 41 Tools for ONIOM Job Monitoring 42 43 The oniomlog program was developed to facilitate job monitoring. It can extract the energy for each 44 45 component in an ONIOM calculation along the optimization path and list either total energies in hartree 46 or in relative values with respect to the last step of optimization in kcal/mol. With this information, 47 48 users can easily follow the geometry optimization process. This check is particularly useful in detecting 49 50 problems such as the energy alternating between two values rather than decreasing monotonically. To 51 52 help the user check the geometry conveniently, this program can also extract a certain portion of the 53 54 system either from the last step or from all steps in the optimization path. The extracted portion can be 55 any layer or combination of layers in a two- or three-layer ONIOM system or all atoms not frozen in the 56 57 optimization. The output file for the structure is in XYZ file format, and can be visualized by many 58 59 60 7 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 8 of 20

programs. 22,24,28,29 By visually inspecting the selected portion of the system, it may be possible to spot 1 2 unusual behavior of optimization before the job terminates with an error. Sometimes, it is desirable to 3 4 restart an optimization job not from the last step in the current optimization attempt, but from an earlier 5 6 geometry along the optimization path. The oniomlog program can be used to generate a new ONIOM 7 job file containing the geometry from a step specified by the user. With the various options provided by 8 9 this program, users can monitor multiple ONIOM jobs and generate new jobs conveniently. 10 11 The setmvflg program helps a user to reset the optimization flag of an ONIOM job, for example, 12 13 setting all the residues within 7 Å instead of 6 Å from core region allowed to move during optimization. 14 15 Using this program, the optimization flag of each atom can be easily reset during the study without 16 going through the whole preparation process. Caution is needed when studying different structures from 17 18 a reaction sequence. To obtainFor a meaningful Peer energetic Review profile, the frozen region setup should be the 19 20 same for all the structures in one reaction sequence. 21 22 23 ONIOM Production Jobs 24 25 Once key structures in a reaction sequence (e.g. reactant, product and TS etc.) are identified, final 26 27 production calculations need to be carried out. If a reduced ONIOM model system was generated, the 28 29 structures found for the model system must be re-optimized for the whole system. New input files for 30 31 optimizing the whole system can be generated from the optimized geometries of the reduced model. 32 Another issue that needs to be addressed is the MM partial charges for atoms in the QM region. For 33 34 ONIOM calculations, the MM force field is required for the entire system, including QM region. The 35 36 MM force field includes parameters for bond and non-bond interactions and partial charges. Due to its 37 38 subtractive nature, most bond and short-range non-bond terms of the QM region cancel in the ONIOM 39 final energy, but the electrostatic interactions between the QM and MM regions remain. In the ME 40 41 scheme for ONIOM calculations, the electrostatic interactions between the QM and MM regions are 42 43 calculated at the MM level based on the partial charges. Therefore, it is critical that the partial charges 44 45 accurately reflect the charge distribution of the QM region. In the EE scheme for ONIOM calculations, 46 the partial charges of MM atoms are included in the Hamiltonian of the QM region, and the MM partial 47 48 charges of the QM atoms play a less important role in the total energy than in the ME scheme. However, 49 50 ONIOM geometry optimization calculations using EE scheme are generally more expensive than those 51 52 with the ME scheme, especially for TS searching. Therefore, conducting ME ONIOM geometry 53 54 optimization calculations accurately is still a common and practical approach at present. When studying 55 different structures in a reaction sequence, the charge distribution of the whole system changes, and a 56 57 58 59 60 8 John Wiley & Sons, Inc. Page 9 of 20 Journal of Computational Chemistry

convenient method is needed for dealing with the changes in the charge distribution in a reaction 1 2 sequence. 3 4 The partial charges of natural amino acids and nucleosides are often taken from the AMBER force 5 6 field. The partial charges for ligands are usually obtained from a charge fitting process. One of the 7 common procedures for charge fitting is the RESP method,30,31 which was also used in the AMBER 8 9 force field development. It would seem straightforward to obtain partial charges for the reactant and 10 11 product and employ them in QM/MM calculations. However, in many enzymatic systems, key amino 12 13 acid residue side chains may form covalents bond with the substrate or may become protonated / 14 15 deprotonated along the reaction path. In these cases, the charge distributions of substrate and residues at 16 active site are strongly coupled. Not only the partial charges of the substrate, but also those of the key 17 18 residues in the active siteFor of the protein Peer need to be Reviewrefitted in a reaction sequence to accurately reflect 19 20 the structural changes. 21 22 Tools for ONIOM Production Jobs 23 A program transgeom is available to generate an ONIOM input file for the full system containing 24 25 geometry from a reduced model system. This program matches each atom from the model system to its 26 27 counterpart in the full system to create a job file for the full system with the model system geometry. 28 29 This program also has the ability to retain the atom types, partial charges, optimization flags, and layer 30 31 information from the model system, in case any of this setup information has been modified during 32 preliminary study using the model system. With the pre-optimized geometry from the model system, 33 34 ONIOM optimization jobs for the full system generated by transgeom should converge rapidly. This 35 36 program can also extract geometry from a full size model and build a partial model. In cases where a 37 38 full frequency analysis is computationally prohibitive for the full size model, but is affordable for a 39 partial model, transgeom can be used to prepare the partial model input file with geometry from the full 40 41 model. 32 42 43 A program oniomresp is available to facilitate the fitting of a new set of partial charges for the QM 44 45 region. This program is designed to apply the RESP charge fitting protocol to be compatible with the 46 AMBER force field, but other types of partial charges from calculations can also be 47 48 used. This program needs an ONIOM output file as input, because the coordinates of capping atoms 49 50 (hydrogen in most cases) are needed in the charge fitting calculation. This program works in three 51 52 modes. Mode 1 generates a pure QM Gaussian input file for the QM region with capping atoms from 53 54 the ONIOM output file. This input file can be used to run a QM single point calculation for charge 55 fitting. In mode 2, an input file for the RESP program is generated based on the QM single point 56 57 calculation output file for the charge fitting. If there are any capping atoms, it is recommended that they 58 59 60 9 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 10 of 20

be constrained to have partial charges of zero. In this manner, the sum of the partial charges in the QM 1 2 region will be constant and consistent for all the QM/MM calculations. This can be easily done by using 3 4 a flag for the number of capping atoms. The zero charge constraint will be added automatically to all 5 6 the capping atoms in the input file generated for the RESP calculation. In mode 3, a new ONIOM input 7 file is generated with the partial charges of the QM region replaced by the RESP charges. Before using 8 9 mode 3, an ONIOM input file should be generated from the ONIOM output file used for mode 1. The 10 11 program oniomlog can be used for this purpose. This ONIOM input file contains the optimized 12 13 geometry from the output file and the old partial charges. This ONIOM input file and the new partial 14 15 charges fitted for the QM region by RESP are input files in mode 3 to generate an ONIOM input file 16 with the new RESP partial charges. The newly generated ONIOM input file can be used for further 17 18 QM/MM geometry optimizationFor calculation. Peer Review 19 20 In some cases, the user may want to refit the partial charges for different parts of the system other than 21 22 the QM region. Mode 3 of oniomresp can fulfill this purpose by using a map file to match each charge 23 to the corresponding atom in the ONIOM input file. Beside RESP, other types of atomic charges, such 24 25 as APT, NPA, or Mulliken, could be used in ONIOM calculations. Program extractcharge is designed 26 27 to extract partial charges from a QM calculation and to save to a file for use with mode 3 of oniomresp . 28 29 It is obvious that optimization of an input file generated with mode 3 of oniomresp will lead to a 30 31 different geometry. It is intuitive to repeat the charge fitting process after the new optimization 32 calculation. This iterative procedure can be repeated until convergence. A well defined convergence 33 34 threshold is that the optimization from the ONIOM job with refitted charges is finished at the very first 35 36 step, i.e. optimization is converged without any changes in the geometry. Thus, the partial charges for 37 38 the QM region represent the charge distribution of the optimized geometry. However, this optimal 39 convergence may not be achieved easily in some cases. A less stringent criterion used in our previous 40 41 study is that the difference in the total ONIOM energies between the last two rounds of optimizations is 42 18 43 less than 0.1 kcal/mol. Obtaining this convergence typically requires 4~6 rounds of optimization and 44 45 charge fitting. 46 47 48 ONIOM Analysis Module 49 50 After finishing the ONIOM production calculations, the geometry and energy need to be analyzed to 51 52 obtain insight into the systems under study. PDB is one popular file format for storing geometrical 53 54 information of biomolecules. Many free and commercial graphical viewers are available to visualize and 55 manipulate biomolecules in PDB file format. Therefore, it is convenient to convert the final geometry 56 57 into PDB format for visualization and analysis. 58 59 60 10 John Wiley & Sons, Inc. Page 11 of 20 Journal of Computational Chemistry

The final ONIOM energy is calculated by subtracting the MM energy of the QM region from the MM 1 2 energy for the whole system (the difference is denoted as the MM contribution) and plus the QM energy 3 4 of the QM region. When studying a reaction sequence, comparisons of not only the ONIOM energy, but 5 6 also the QM energy and the MM contribution can be informative. 7 Tools for ONIOM Analysis 8 9 The program oniom2pdb can translate the geometry from an ONIOM input or output file to a PDB 10 11 file. It reads in a PDB file and an ONIOM file, and uses the PDB file as a template to generate a new 12 13 PDB file containing the geometry from the ONIOM file. It is recommended that the same template file 14 15 be used for the various structures in a reaction sequence to ensure that all the PDB files generated in a 16 reaction sequence have the same format. In this way, the setup for visualization can be conveniently 17 18 shared among the differentFor structures. Peer Users may findReview this feature very convenient when comparing 19 20 geometries and producing publication images for several structures. When using an ONIOM 21 22 optimization output file to generate a PDB file, any geometry along the optimization path can be 23 extracted with oniom2pdb and stored in a PDB file for further analysis. 24 25 Many users may find that it is easier to manipulate the geometry of biomolecules in PDB format than 26 27 other formats. Program pdbcrd2oniom can generate a new ONIOM input file with coordinates from a 28 29 given PDB file. pdbcrd2oniom takes the coordinates from a given PDB file and replaces the coordinates 30 pdb2oniom 31 in an existing Gaussian ONIOM input file. By contrast, creates a new ONIOM input file 32 using a PDB file. A user can easily create a PDB file with an optimized geometry from a Gaussian 33 34 ONIOM file using oniom2pdb, make some changes to the structure of a PDB file using their favorite 35 36 tools (to create TS or product, etc), then produce a new ONIOM input file with modified coordinates 37 38 using pdbcrd2oniom . 39 The program gaussiantable can read several ONIOM output files, and compare the energies in these 40 41 files. The ONIOM energy, QM energy and MM contribution of each file are extracted and listed in 42 43 hartree. Relative energies of these components with respect to the first output file (usually the reactant as 44 45 reference) are also calculated and listed in kcal/mol. This program is a convenient tool for summarizing 46 calculations to obtain reaction barrier heights and enthalpy changes. 47 48 49 Conclusions 50 51 52 A set of tools has been developed to facilitate QM/MM calculation using the ONIOM method in 53 54 Gaussian. Four modules with numerous programs were developed to assist in job preparation, 55 56 monitoring, production runs, and analysis. Each program has a brief usage message and detailed 57 documentation. This tool set has been used in several studies, and proved to be efficient and effective. It 58 59 60 11 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 12 of 20

is hoped that these tools will make the use of ONIOM calculations easier for QM/MM investigations of 1 2 biochemical systems. This tool set is available free of charge from website 3 4 http://www.chem.wayne.edu/schlegel/Software.html . 5 6 7 Acknowledgments 8 9 This work was supported at Wayne State University by the National Science Foundation 10 11 (CHE091858). We thank Dr. Jason L. Sonnenberg for his help during the program development, and 12 13 Mr. Brian T. Psciuk and Mr. Jia Zhou for their feedback of tutorial. 14 15 16 References 17 18 For Peer Review 19 1. Gao, J. Methods and applications of combined quantum mechanical and molecular mechanical 20 potentials. Rev Comput Chem 1996, 7, 119-185. 21 2. Monard, G.; Merz, K. M., Jr. Combined Quantum Mechanical/Molecular Mechanical 22 Methodologies Applied to Biomolecular Systems. Acc Chem Res 1999, 32(10), 904-911. 23 3. Friesner, R. A.; Guallar, V. Ab initio quantum chemical and mixed quantum 24 mechanics/molecular mechanics (QM/MM) methods for studying enzymatic catalysis. Annu Rev Phys 25 26 Chem 2005, 56, 389-427. 27 4. Lin, H.; Truhlar, D. G. QM/MM: what have we learned, where are we, and where do we go from 28 here? Theor Chem Acc 2007, 117(2), 185-199. 29 5. Hu, H.; Yang, W. Free energies of chemical reactions in solution and in enzymes with ab initio 30 quantum mechanics/molecular mechanics methods. Annu Rev Phys Chem 2008, 59, 573-601. 31 32 6. Senn, H. M.; Thiel, W. QM/MM methods for biomolecular systems. Angew Chem, Int Ed 2009, 33 48(7), 1198-1229. 34 7. Svensson, M.; Humbel, S.; Froese, R. D. J.; Matsubara, T.; Sieber, S.; Morokuma, K. ONIOM: 35 A multilayered integrated MO+MM method for geometry optimizations and single point energy 36 predictions. A test for Diels-Alder reactions and Pt(P(t-Bu)(3))(2)+H-2 oxidative addition. J Phys Chem 37 1996, 100(50), 19357-19363. 38 39 8. Vreven, T.; Byun, K. S.; Komaromi, I.; Dapprich, S.; Montgomery, J. A., Jr.; Morokuma, K.; 40 Frisch, M. J. Combining Quantum Mechanics Methods with Molecular Mechanics Methods in ONIOM. 41 J Chem Theory Comput 2006, 2(3), 815-826. 42 9. Frisch, M. J.; Trucks, G. W.; Schlegel, H. B.; Scuseria, G. E.; Robb, M. A.; Cheeseman, J. R.; 43 Scalmani, G.; Barone, V.; Mennucci, B.; Petersson, G. A.; Nakatsuji, H.; Caricato, M.; Li, X.; 44 45 Hratchian, H. P.; Izmaylov, A. F.; Bloino, J.; Zheng, G.; Sonnenberg, J. L.; Hada, M.; Ehara, M.; 46 Toyota, K.; Fukuda, R.; Hasegawa, J.; Ishida, M.; Nakajima, T.; Honda, Y.; Kitao, O.; Nakai, H.; 47 Vreven, T.; Montgomery, J., J. A.; Peralta, J. E.; Ogliaro, F.; Bearpark, M.; Heyd, J. J.; Brothers, E.; 48 Kudin, K. N.; Staroverov, V. N.; Kobayashi, R.; Normand, J.; Raghavachari, K.; Rendell, A.; Burant, J. 49 C.; Iyengar, S. S.; Tomasi, J.; Cossi, M.; Rega, N.; Millam, J.; Klene, M.; Knox, J. E.; Cross, J. B.; 50 Bakken, V.; Adamo, C.; Jaramillo, J.; Gomperts, R. E.; Stratmann, O.; Yazyev, A. J.; Austin, R.; 51 52 Cammi, C.; Pomelli, J. W.; Ochterski, R.; Martin, R. L.; Morokuma, K.; Zakrzewski, V. G.; Voth, G. 53 A.; Salvador, P.; Dannenberg, J. J.; Dapprich, S.; Daniels, A. D.; Farkas, O.; Foresman, J. B.; Ortiz, J. 54 V.; Cioslowski, J.; Fox, D. J.; Gaussian, Inc.: Wallingford, CT, 2009. 55 10. Li, J.; Cross, J. B.; Vreven, T.; Meroueh, S. O.; Mobashery, S.; Schlegel, H. B. Lysine 56 carboxylation in proteins: OXA-10 beta -lactamase. Proteins: Struct, Funct, Bioinf 2005, 61(2), 246- 57 58 257. 59 60 12 John Wiley & Sons, Inc. Page 13 of 20 Journal of Computational Chemistry

11. Kwiecien, R. A.; Khavrutskii, I. V.; Musaev, D. G.; Morokuma, K.; Banerjee, R.; Paneth, P. 1 2 Computational insights into the mechanism of radical generation in B-12-dependent methylmalonyl- 3 CoA mutase. J Am Chem Soc 2006, 128(4), 1287-1292. 4 12. Hall, K. F.; Vreven, T.; Frisch, M. J.; Bearpark, M. J. Three-Layer ONIOM Studies of the Dark 5 State of Rhodopsin: The Protonation State of Glu181. J Mol Biol 2008, 383(1), 106-121. 6 13. Srivab, P.; Hannongbua, S. A study of the binding energies of efavirenz to wild-type and 7 K103N/Y181C HIV-1 reverse transcriptase based on the ONIOM method. ChemMedChem 2008, 3(5), 8 9 803-811. 10 14. Suresh, C. H.; Vargheese, A. M.; Vijayalakshmi, K. P.; Mohan, N.; Koga, N. Role of structural 11 water molecule in HIV protease-inhibitor complexes: A QM/MM study. J Comput Chem 2008, 29(11), 12 1840-1849. 13 15. Li, X.; Chung, L. W.; Paneth, P.; Morokuma, K. DFT and ONIOM(DFT:MM) Studies on Co-C 14 15 Bond Cleavage and Hydrogen Transfer in B12-Dependent Methylmalonyl-CoA Mutase. Stepwise or 16 Concerted Mechanism? J Am Chem Soc 2009, 131(14), 5115-5125. 17 16. Alzate-Morales, J. H.; Caballero, J.; Gonzalez-Nilo, F. D.; Contreras, R. A computational 18 ONIOM model for the descriptionFor ofPeer the H-bond interactionsReview between NU2058 analogues and CDK2 19 active site. Chem Phys Lett 2009, 479(1-3), 149-155. 20 17. Lundberg, M.; Kawatsu, T.; Vreven, T.; Frisch, M. J.; Morokuma, K. Transition States in a 21 22 Protein Environment - ONIOM QM:MM Modeling of Isopenicillin N Synthesis. J Chem Theory 23 Comput 2009, 5(1), 222-234. 24 18. Tao, P.; Fisher, J. F.; Shi, Q.; Vreven, T.; Mobashery, S.; Schlegel, H. B. Matrix 25 Metalloproteinase 2 Inhibition: Combined Quantum Mechanics and Molecular Mechanics Studies of the 26 Inhibition Mechanism of (4-Phenoxyphenylsulfonyl)methylthiirane and Its Oxirane Analogue. 27 28 Biochemistry 2009, 48(41), 9839-9847. 29 19. Bakowies, D.; Thiel, W. Hybrid models for combined quantum mechanical and molecular 30 mechanical approaches. J Phys Chem 1996, 100(25), 10580-10594. 31 20. GaussView. Gaussian, Inc.: Wallingford, CT. 32 21. Chem3D. CambridgeSoft Corporation. 33 22. : an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/ 34 35 23. Portmann, S.; Luthi, H. P. Molekel: an interactive tool. Chimia 2000, 54(12), 36 766-769. 37 24. Schaftenaar, G.; Noordik, J. H. : a pre- and post-processing program for molecular and 38 electronic structures. J Comput Aided Mol Des 2000, 14(2), 123-134. 39 25. Allouche, A. R.; is a free Graphical User Interface for computational chemistry 40 41 packages. It is available from http://gabedit.sourceforge.net/ . 42 26. Case, D. A.; Cheatham, T. E., III; Darden, T.; Gohlke, H.; Luo, R.; Merz, K. M., Jr.; Onufriev, 43 A.; Simmerling, C.; Wang, B.; Woods, R. J. The amber biomolecular simulation programs. J Comput 44 Chem 2005, 26(16), 1668-1688. 45 27. Wang, J.; Wolf, R. M.; Caldwell, J. W.; Kollman, P. A.; Case, D. A. Development and testing of 46 a general Amber force field. J Comput Chem 2004, 25(9), 1157-1174. 47 48 28. Humphrey, W.; Dalke, A.; Schulten, K. VMD: visual molecular dynamics. J Mol Graph 1996, 49 14(1), 33-38, 27-38. 50 29. PyMol. DeLano Scientific: Palo Alto, CA, USA, 2002. 51 30. Bayly, C. I.; Cieplak, P.; Cornell, W.; Kollman, P. A. A well-behaved electrostatic potential 52 based method using charge restraints for deriving atomic charges: the RESP model. J Phys Chem 1993, 53 54 97(40), 10269-10280. 55 31. Cornell, W. D.; Cieplak, P.; Bayly, C. I.; Kollmann, P. A. Application of RESP charges to 56 calculate conformational energies, hydrogen bond energies, and free energies of solvation. J Am Chem 57 Soc 1993, 115(21), 9620-9631. 58 59 60 13 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 14 of 20

32. Tao, P.; Fisher, J. F.; Shi, Q.; Mobashery, S.; Schlegel, H. B. Matrix Metalloproteinase 2 1 2 (MMP2) Inhibition: DFT and QM/MM Studies of the Deprotonation Initialized Ring Opening Reaction 3 of Sulfoxide Analogue of SB 3CT. J Phys Chem B, submitted. 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 For Peer Review 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 14 John Wiley & Sons, Inc. Page 15 of 20 Journal of Computational Chemistry

Table I. List of Programs in the Toolkit to Assist ONIOM calculations 1 2 3 Module Program Function 4 chargesum sum up the charges in each part of the ONIOM calculation 5 checkconnect connectivity table sanity check 6 7 input preparation parmlookup look up missing parameters in the AMBER force field 8 pdb2oniom generate an ONIOM input file from a PDB file 9 pdbcore generate a reduced size model system 10 oniomlog check the energy and geometry, and generate a new input file 11 job monitoring reset the optimization flag for each atom in an ONIOM input 12 setmvflg 13 file 14 extractcharge extract partial charges from a Gaussian log file 15 oniomresp facilitate the charge refitting procedure production 16 generate an ONIOM input file for whole system containing 17 transgeom 18 For Peergeometry from Review a reduced model system 19 gaussiantable compare the energies from several ONIOM output files 20 oniom2pdb generate a PDB file from an ONIOM file analysis 21 replace coordinates of an ONIOM input file by those from a pdbcrd2oniom 22 PDB file 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 15 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 16 of 20

Figure Captions: 1 2 3 4 5 Figure 1. Flowchart for a typical QM/MM study. Key steps are in rounded rectangles. Operations to 6 7 connect steps are in diamond shapes. Programs developed to facilitate QM/MM studies are listed in 8 9 regular rectangles. 10 11 12 13 14 Figure 2. ONIOM calculations. (a) full size model, (b) reduced size model. The QM region is shown in 15 16 green. Residues allowed to move during optimization are in blue. Frozen residues are in red. When key 17 18 For Peer Review 19 structures are identified in the reduced model, they can be used to generate the full size model. The 20 21 frozen atoms in the reduced size model are used as anchors when generating the full size system model. 22 23 24 25 Scheme 1. Illustration of the optimization setup in an ONIOM calculation. Atoms shown in blue are 26 27 28 allowed to move during geometry optimization. Atoms shown in red are frozen. 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 16 John Wiley & Sons, Inc. Page 17 of 20 Journal of Computational Chemistry

Table I. List of Programs in the Toolkit to Assist ONIOM calculations 1 2 3 Module Program Function 4 chargesum sum up the charges in each part of the ONIOM calculation 5 checkconnect connectivity table sanity check 6 7 input preparation parmlookup look up missing parameters in the AMBER force field 8 pdb2oniom generate an ONIOM input file from a PDB file 9 pdbcore generate a reduced size model system 10 oniomlog check the energy and geometry, and generate a new input file 11 job monitoring reset the optimization flag for each atom in an ONIOM input 12 setmvflg 13 file 14 extractcharge extract partial charges from a Gaussian log file 15 oniomresp facilitate the charge refitting procedure production 16 generate an ONIOM input file for whole system containing 17 transgeom 18 For Peergeometry from Review a reduced model system 19 gaussiantable compare the energies from several ONIOM output files 20 oniom2pdb generate a PDB file from an ONIOM file analysis 21 replace coordinates of an ONIOM input file by those from a pdbcrd2oniom 22 PDB file 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 1 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 18 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 For Peer Review 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 Flowchart for a typical QM/MM study 44 96x144mm (96 x 96 DPI) 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 John Wiley & Sons, Inc. Page 19 of 20 Journal of Computational Chemistry

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 For Peer Review 19 20 21 22 23 24 ONIOM calculations 25 355x168mm (72 x 72 DPI) 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 John Wiley & Sons, Inc. Journal of Computational Chemistry Page 20 of 20

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 For Peer Review 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 Illustration of the optimization setup in an ONIOM calculation 36 90x72mm (600 x 600 DPI) 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 John Wiley & Sons, Inc.