University of Southampton Research Repository Eprints Soton
Total Page:16
File Type:pdf, Size:1020Kb
University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination http://eprints.soton.ac.uk University of Southampton Faculty of Natural and Environmental Sciences Using linear-scaling DFT for biomolecular simulations Christopher James Pittock Thesis for the degree of Doctor of Philosophy September 2013 university of southampton ABSTRACT faculty of natural and environmental sciences Computational Chemistry Doctor of Philosophy Using linear-scaling DFT for biomolecular simulations by Christopher James Pittock In the drug discovery process, there are multiple factors that make a successful can- didate other than whether it antagonises a chosen active site, or performs allosteric regulation. Each test candidate is profiled by its absorption into the bloodstream, distribution throughout the organism, its products of metabolism, method of excre- tion, and overall toxicity; summarised as ADMET. There are currently methods to calculate and predict such properties, but the majority of these involve rule-based, empirical approaches that run the risk of lacking accuracy as one's search of chemical space ventures into the more novel. The lack of experimental data on organometallic systems also means that some of these methods refuse to predict properties on them outright, losing the opportunity to exploit this relatively untapped area that holds promise for new antibacterial and antineoplastic pharmaceutical compounds. Using the more transferable and definitive quantum mechanical (QM) approach to drug dis- covery is desirable, but the computational cost of conventional Hartree-Fock (HF) and Density Functional Theory (DFT) calculations are too high. Using the linear-scaling DFT program, onetep, we aim to exploit the benefits of DFT in calculations with much larger fragments of, and in some cases entire biomolecules, in order to demon- strate calculations which could ultimately be used in developing more accurate methods of profiling drug candidates, with a computational cost that albeit still high, is now feasible with the provision of modern supercomputers. In this thesis, we first use linear-scaling DFT methods to address the lack of elec- tron polarisation and charge transfer effects in energy calculations using a molecular mechanics forcefield. Multiple DFT calculations are performed on molecular dynam- ics (MD) snapshots of small molecules in a waterbox, with the aim of computing a MM!QM correction term, which can be applied to a forcefield binding free energy approach (such as thermodynamic integration) which will process a far greater number of MD snapshots. As a result, one will obtain the precision from processing very large numbers of MD snapshots of biomolecular systems, but the accuracy of QM. To im- prove efficiency of the QM phase of the overall method, we use electrostatic embedding i to model the regions of the waterbox that are far from the solute, yet are still important to include. As this is a relatively new module in onetep, we present validation data prior to its use in the main work. Secondly, we validate different methods of calculating the pKa of a wide variety of molecules: from small, organic compounds, to the organometallic cisplatin, with the ultimate goal being of such calculations to eventually address questions such as, as- suming oral intake, where in the gastrointestinal tract will a drug molecule be absorbed into the bloodstream, and how much of the original dose will be absorbed. These calcu- lations are then scaled up significantly to examine the potential of using linear-scaling DFT to calculate the pKa of specific residues in proteins. This is performed with a 305-atom tryptophan cage, the 814-atom Ovomucoid Silver Pheasant Third Domain (OMSVP3) and a 2346-atom section of the T99A/M102Q T4-lysozyme mutant. We also highlight the challenges in calculating protein pKa. Finally, we study the hydrogen-abstraction reaction between cyclohexene and cyto- chrome P450cam, through onetep single-point energy calculations of a 10-snapshot adiabatic reaction profile generated by the Mulholland Group (University of Bristol). Following this, the LST and QST methods of determining the transition state (avail- able through onetep) are used, with the aims of determining the importance of the protein surrounding the active site in regards to the activation energy and structural geometry of the calculated transition state. The LST and QST methods are also valid- ated, through modelling of the SN2 reaction between fluoride and chloromethane. The aim of this part of our work is to eventually assist in developing a metabolism (and toxicity) model of the different isoforms of cytochrome P450. Overall, this thesis aims to highlight not only the capability of linear-scaling DFT in becoming an important part of biomolecular simulation, but also the challenges that one will face upon scaling up calculations that were previously simple to perform, based on the small size of the system being modelled. ii Contents Abstract . .i Declaration of authorship . xvii Acknowledgements . xviii Glossary . xx 1 Computational Theory 1 1.1 Molecular Mechanics . .1 1.1.1 Origin of AMBER forcefield parameters . 11 1.1.2 GAFF . 14 1.1.3 AMBER energy using the sander module . 16 1.2 Quantum Mechanics . 18 1.2.1 The wavefunction and the Schr¨odingerEquation . 18 1.2.2 Born-Oppenheimer Approximation . 20 1.2.3 Many-electron wavefunctions . 22 1.3 Density Functional Theory . 26 1.3.1 Hohenberg-Kohn Proof . 28 1.3.2 Kohn-Sham DFT . 30 1.3.3 Exchange-correlation functionals . 32 1.4 Linear-scaling DFT methods . 36 1.4.1 Density matrix reformulation of Kohn-Sham DFT . 36 1.4.2 Principle of linear-scaling DFT approaches . 36 1.4.3 ONETEP .............................. 39 1.5 Basis sets . 42 1.5.1 Gaussian Type Orbital (GTO) basis sets . 42 1.5.2 Basis Set Superposition Error . 45 1.5.3 Plane wave basis set . 48 1.5.4 Periodic cardinal sine (psinc) basis set (used in ONETEP)... 51 1.6 Solvent models . 54 1.6.1 Explicit - MM . 54 1.6.2 Implicit - MM . 56 1.6.3 Implicit - QM . 60 1.7 Geometry Optimisation . 64 1.7.1 Steepest Descent . 65 iii 1.7.2 Conjugate gradients . 66 1.7.3 Newton-Raphson and BFGS . 68 1.8 Transition state . 71 1.8.1 Adiabatic (Drag) method . 72 1.8.2 Nudged Elastic Band (NEB) . 73 1.8.3 LST/QST optimisation . 75 2 Use of electrostatic embedding for QM-corrected water-solute inter- actions 77 2.1 Introduction . 77 2.1.1 Free energy of binding methods . 77 2.1.2 QM/MM systems . 81 2.1.3 Electrostatic embedding module within ONETEP . 83 2.1.4 Aims and objectives . 86 2.2 Original calculation parameters . 86 2.2.1 Molecular Mechanics/Molecular Dynamics . 86 2.2.2 ONETEP . 88 2.3 Validation of electrostatic embedding module . 89 2.3.1 Charge spilling . 89 2.3.2 Calibration of electrostatic embedding . 91 2.4 Calculation of energies of solvation . 98 2.4.1 Calculation parameters . 99 2.4.2 Results . 101 2.5 Conclusion . 104 3 Calculating pKa using linear-scaling DFT and an implicit solvation model 107 3.1 Introduction . 107 3.1.1 The importance of pKa ...................... 107 3.1.2 Overview of current pKa calculation methods . 108 3.1.3 Aims and objectives . 111 3.2 Theoretical calculation of pKa ...................... 112 3.2.1 Empirical . 112 3.2.2 Ab initio . 115 3.3 Original calculation parameters . 119 iv 3.3.1 Single-point energy . 121 3.3.2 Geometry optimisation . 121 3.3.3 Molecular Dynamics (MD) . 122 3.3.4 Implicit solvation . 122 3.3.5 Phonons . 123 3.4 Test set of 36 small drug-like molecules . 123 3.4.1 Na¨ıve and proton-exchange method . 125 3.4.2 Cluster-continuum method . 127 3.4.3 Investigating probable sources of error . 129 3.5 Calibration through linear regression . 135 3.6 Increasing the scope of pKa calculation: Small and medium-sized molecules137 3.6.1 More challenging small molecules . 137 3.6.2 Deprotonation of L-histidine . 141 3.6.3 Hydrolysis products of cisplatin . 143 3.7 Increasing the scope of pKa calculation: Protein pKa .......... 147 3.7.1 Tryptophan cage . 150 3.7.2 Ovomucoid Silver Pheasant Third Domain (OMSVP3) . 152 3.7.3 Effects of truncation of lysozyme on pKa of Glu108 and Arg145 153 3.8 Conclusions . 155 3.8.1 Polyprotic pKa . 158 Tables . 164 4 Transition state modelling of the CYP101 enzyme, using large QM regions 171 4.1 Introduction . 171 4.1.1 The significance of CYP . 171 4.1.2 Transition state calculations of CYP . 175 4.1.3 Aims and objectives . 177 4.2 Original calculation parameters . 180 4.3 Validation work . 181 4.3.1 Calculation method for single-point energy of Compound I . 181 − 4.3.2 LST/QST transition state in SN2 reaction between CH3Cl and F 184 4.4 Calculating the energy profile of the CYP101-cyclohexene hydrogen- abstraction process . 188 v 4.5 Transition state search for the CYP101-cyclohexene hydrogen-abstraction process .