On the Derivation of Accurate Force Field Parameters for Molecular
Total Page:16
File Type:pdf, Size:1020Kb
On the derivation of accurate force field parameters for molecular mechanics simulations A Dissertation Presented by James Allen Maier to The Graduate School in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Biochemistry and Structural Biology Stony Brook University May 2015 Copyright by James Allen Maier 2015 Stony Brook University The Graduate School James Allen Maier We, the dissertation committee for the above candidate for the Doctor of Philosophy degree, hereby recommend acceptance of this dissertation. Carlos Simmerling – Dissertation Advisor Professor, Department of Chemistry David Green – Chairperson of Defense Associate Professor, Department of Applied Mathematics and Statistics Ken Dill Professor, Departments of Physics and Chemistry Fernando Raineri Lecturer Stony Brook University, Department of Chemistry This dissertation is accepted by the Graduate School. Charles Taber Dean of the Graduate School ii Abstract of the Dissertation On the derivation of accurate force field parameters for molecular mechanics simulations by James Allen Maier Doctor of Philosophy in Biochemistry and Structural Biology Stony Brook University 2015 Proteins carry out many diverse but important biological tasks, the understanding of which can be greatly augmented by theoretical methods that can generate microscopic insights. A popular method for simulating proteins is called molecular mechanics. Molecular mechanics drives the dynamics of molecules according to their po- tential energy surface as defined by a force field. Because force fields are simple, molecular mechanics can be fast; but force fields must simultaneously be accurate enough for the conformational ensembles they generate to be useful. One force field that has been widely adopted for its utility is AM- BER force field 99 Stony Brook (ff99SB). The ff99SB protein back- bone parameters were fit to quantum mechanics energies of glycine and alanine tetrapeptides, including a set of minimum energy con- formations in the gas-phase. Although ff99SB rigorously reproduces many thermodynamic properties, it has shortcomings. Issues with backbone parameters may result from training against only ener- getic minima or from the energy calculations being in the gas phase. iii Problems with side chain parameters can stem from the protocol of ff99, where the amino acid side chain parameters were trained against energies of small molecules, while transferability from small molecules to amino acids may be problematic. Small updates to the backbone potential were applied by several groups, as well as the Simmerling group as part of ff14SB. Whereas ff99SB and ff14SB are fixed-charge, additive molecular mechanical models, there are also molecular mechanical models that include non-additive effects like charge polarization. Polar- izable force fields, with their many additional degrees of freedom, promise enhanced accuracy relative to fixed charge force fields. But with so many degrees of freedom and thus parameters, polarizable force fields can be more difficult to train. Although this complexity may be overcome, it is unclear whether the utility of fixed-charge, additive force fields has been exhausted, warranting the great en- deavors of developing a polarizable model. This dissertation seeks to answer how much more fixed charge force fields can be improved. Specifically, this work addresses two ques- tions. Firstly, can the side chain parameters of ff99SB be improved by fitting to quantum mechanics energies? We investigated differ- ent options in the calculation of energies for parameter training, finding that how the structures were minimized can significantly affect transferability of parameters trained against them. Specifi- cally, we found that loosely restraining the side chains, which were being refined, and tightly restraining the backbone, which was not, made the errors most similar between α and β backbone contexts. This transferability can be measured by improved agreement with the quantum mechanics training set as well as experimental scalar couplings. Secondly, can the backbone parameters of ff99SB be made more accurate, alternatively to empirical tweaks, by another, improved fitting to quantum mechanics energies? We found that better re- production of NMR solution scalar couplings was possible, if energy calculations included solvation effects, full grids of structures were included, and, perhaps surprisingly, if parameters were extrapo- lated to those appropriate for a zero-length peptide. These results show that quantum mechanics can be effectively used to improve the accuracy of molecular mechanics force fields. These improvements have implications for protein structure prediction, iv aiding the successful folding of 16 of 17 proteins in GB-Neck2 im- plicit solvent. Beyond, the insights from the QM-based backbone training could be extended to develop residue-specific parameters that bolster the sequence-dependent structural preferences of pro- teins in simulation models. v 영원\ 나X 신부 수M, ø¬고 X나님X < xDÐ게 Dedicated to my eternal bride Grace, and to my divine treasure Noah Contents List of Figures x List of Tables xix Acknowledgements xxiii 1 Introduction 1 1.1 Proteins . 2 1.2 Experimental measurements of proteins . 9 1.3 Statistical mechanics . 13 1.4 Classical mechanics describes the motions of bodies . 16 1.5 The failure of classical mechanics . 18 1.6 Quantum mechanics . 18 1.7 Theoretical chemistry . 20 1.8 Molecular mechanics . 23 1.9 Force fields . 24 1.10 Force field development principles . 30 1.11 Recent AMBER protein force field history . 34 1.12 Outline . 43 2 Examine and improve accuracy of amino acid side chain sam- pling using molecular mechanics force field ff99SB 45 2.1 Acknowledgments . 45 2.2 Introduction . 45 2.3 Fitting Strategy and Goals . 47 2.4 Methods . 53 2.4.1 Side chain dihedral training . 53 2.4.2 Test dynamics simulations . 73 2.4.3 Analysis . 77 2.5 Fitting Results . 78 vii 2.5.1 Side chain rotamer energies improved match to QM data and better transferability between backbone conformations 78 2.6 Testing Strategy . 82 2.7 Testing Results . 85 2.7.1 Agreement with side chain NMR scalar couplings is im- proved with ff14SB . 85 2.7.2 Helical stability is improved with ff14SB backbone changes and further improved with updated side chain parameters 93 2.7.3 Testing hairpin stability and structure . 95 2.7.4 High quality of dynamics in the native state is maintained101 2.8 Conclusion . 104 3 Assessing the factors that may be used to improve the ff99SB protein backbone parameter training 105 3.1 Introduction . 105 3.2 Methods . 115 3.2.1 Structure generation by high-T simulations . 115 3.2.2 Energy calculations . 116 3.2.3 Fitting . 117 3.2.4 Extrapolation of parameters to length-independence . 119 3.2.5 Test simulations . 120 3.2.6 Analysis . 121 3.3 Evaluating methods with alanine . 121 3.3.1 An “optimal point charge” water model improves Ala5 scalar couplings . 122 3.3.2 Training conformations selected from high-T simulation 124 3.3.3 Energies of grids . 126 3.3.4 Dihedral corrections from each training set . 127 3.3.5 Training against tetrapeptides overstabilizes helices . 138 3.3.6 Training against dipeptides stabilizes helices less . 142 3.3.7 Length-independent parameters . 144 3.4 How do backbone parameters differ for β-branched valine? . 153 3.5 Conclusion . 162 3.6 Future directions . 163 4 Implications 174 Bibliography 179 viii A Additional ff14SB analysis 198 A.1 Protein Ramachandran histograms . 199 A.2 Protein backbone RMSDs . 204 B ff12SB versus ff14SB 208 C Analytical CMAP fitting 211 ix List of Figures 1.1 A glycine amino acid in zwitterionic form with protonated amino and deprotonated carboxylic groups. 2 1.2 The resonance of the peptide bond that results in partial double bond character and peptide planarity. 4 1.3 Definitions of φ, , and χ dihedrals . 5 ◦ ◦ ◦ ◦ 1.4 Typical φ= coordinates of α (−60 ; −45 ), αL(60 ; 45 ), ◦ ◦ ◦ ◦ ◦ ◦ 310 (−49 ; −26 ), π (−55 ; −70 ), β (−135 ; 135 ), and ppII (−75◦; 150◦) secondary structures on a Ramachandran plot of 500 alanine conformations [Lovell et al., 2003]. The φ= coordinates of each secondary structure as specified are not exact, but approximately typical, average values. 6 1.5 Characteristic hydrogen bonding of (A) α and (B) β secondary structures. In B, the two left and two right strands are antiparallel, whereas the two center strands are parallel. Both renderings are of an X-ray structure of GB3 refined with dipolar couplings [Ulmer et al., 2003], rendered using VMD [Humphrey et al., 1996]. 7 1.6 Orig [Hu and Bax, 1997], DFT1, and DFT2 [Case et al., 2000] H−Hα Karplus curves. Names from Best et al. [2008]. 12 1.7 The Morse potential describes the anharmonicity characteristic of real bonds, as illustrated for this toy system with 1:5 Å equilibrium bond length, 100 kcal/mol/Å2 force constant at the minimum, and harmonic energy (black) or Morse well depth of 32 (red), 64 (green), 96 (purple), 128 (blue), or 160 (orange) kcal/mol. 26 1.8 The ff94 atom types of each amino acid, separated by group. In the first column are acids, polar residues, and bases. In the second column are aromatics, nonpolar residues, and amino acids with less (Pro) or more (Gly, Ala) flexibility. 37 x 1.9 The ff99SB Ala3 training set as circles, with the Hu and Bax [Hu and Bax, 1997] Karplus curve behind. Vertical lines indicate where the Karplus curve matches NMR (black) or ff99SB (gray). The maximum of the Karplus curve (φ = −120◦) is undersampled. 42 2.1 The amino acids drawn with their AMBER ff14SB [Maier et al., 2015] atom types. 52 2.2 AAE vs. BBD of aspartate and asparagine, calculating MM energies of QM structures and restraining: φ and (red crosses) or all backbone dihedrals (blue stars); or calculating MM energies of MM re-optimized structures and restraining: φ and (green ‘X’s) or all backbone dihedrals (purple squares).