Systematic Optimization of a Fragment-Based Force Field Against Experimental Pure-Liquid Properties Considering Large Compound F
Total Page:16
File Type:pdf, Size:1020Kb
Research Collection Journal Article Systematic Optimization of a Fragment-Based Force Field against Experimental Pure-Liquid Properties Considering Large Compound Families: Application to Saturated Haloalkanes Author(s): Oliveira, Marina P.; Andrey, Maurice; Rieder, Salomé R.; Kern, Leyla; Hahn, David F.; Riniker, Sereina; Horta, Bruno A.C.; Hünenberger, Philippe H. Publication Date: 2020-12-08 Permanent Link: https://doi.org/10.3929/ethz-b-000457688 Originally published in: Journal of Chemical Theory and Computation 16(12), http://doi.org/10.1021/acs.jctc.0c00683 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. pubs.acs.org/JCTC Article Systematic Optimization of a Fragment-Based Force Field against Experimental Pure-Liquid Properties Considering Large Compound Families: Application to Saturated Haloalkanes Marina P. Oliveira, Maurice Andrey, SaloméR. Rieder, Leyla Kern, David F. Hahn, Sereina Riniker, Bruno A. C. Horta, and Philippe H. Hünenberger* Cite This: J. Chem. Theory Comput. 2020, 16, 7525−7555 Read Online ACCESS Metrics & More Article Recommendations *sı Supporting Information ABSTRACT: Direct optimization against experimental condensed- phase properties concerning small organic molecules still represents the most reliable way to calibrate the empirical parameters of a force field. However, compared to a corresponding calibration against quantum-mechanical (QM) calculations concerning isolated mole- cules, this approach is typically very tedious and time-consuming. The present article describes an integrated scheme for the automated refinement of force-field parameters against experimental condensed- phase data, considering entire classes of organic molecules constructed using a fragment library via combinatorial isomer enumeration. The main steps of the scheme, referred to as CombiFF, are as follows: (i) definition of a molecule family; (ii) combinatorial enumeration of all isomers; (iii) query for experimental data; (iv) automatic construction of the molecular topologies by fragment assembly; and (v) iterative refinement of the force-field parameters considering the entire family. As a first application, CombiFF is used here to design a GROMOS-compatible united-atom force field for the saturated acyclic haloalkane family. This force field relies on an electronegativity-equalization scheme for the atomic partial charges and fi σ ρ involves no speci c terms for -holes and halogen bonding. A total of 749 experimental liquid densities liq and vaporization Δ enthalpies Hvap concerning 486 haloalkanes are considered for calibration and validation. The resulting root-mean-square · −3 ρ · −1 Δ deviations from experiment are 49.8 (27.6) kg m for liq and 2.7 (1.8) kJ mol for Hvap for the calibration (validation) set. The values are lower for the validation set which contains larger molecules (stronger influence of purely aliphatic interactions). The trends in the optimized parameters along the halogen series and across the compound family are in line with chemical intuition based on considerations related to size, polarizability, softness, electronegativity, induction, and hyperconjugation. This observation fi Downloaded via ETH ZURICH on December 21, 2020 at 11:55:02 (UTC). is particularly remarkable considering that the force- eld calibration did not involve any QM calculation. Once the time-consuming task of target-data selection/curation has been performed, the optimization of a force field only takes a few days. As a result, CombiFF enables an easy assessment of the consequences of functional-form decisions on the accuracy of a force field at an optimal level of parametrization. See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles. 1. INTRODUCTION phase system depends crucially on the quality of the underlying − fi 9−15 Classical atomistic simulation1 3 and, in particular, molecular potential-energy function or force eld. − dynamics4 8 (MD) has become an established tool comple- The past few years have witnessed an intense renewal of mentary to experiment for the investigation of condensed- interest for force-field refinement procedures. This is in large phase systems relevant in physics, chemistry, and biology. The part due to a number of computational and methodological − usefulness of this method results in particular from a favorable advances16 21 that have enabled an increased accuracy, ff trade-o between model resolution and computational cost. automation, and throughput of these procedures, including Although classical atomistic models represent an approxima- tion to quantum mechanics (QM), they provide a realistic description of atom-based systems at spatial and temporal Received: July 1, 2020 resolutions on the orders of 0.1 nm and 1 fs, respectively, while Published: November 24, 2020 their computational cost remains tractable at present for system sizes and time scales on the orders of 10 nm and 1 μs, respectively. The ability of classical MD simulations to represent accurately the properties of a given condensed- © 2020 American Chemical Society https://dx.doi.org/10.1021/acs.jctc.0c00683 7525 J. Chem. Theory Comput. 2020, 16, 7525−7555 Journal of Chemical Theory and Computation pubs.acs.org/JCTC Article more powerful processors and coprocessors (graphics process- chemical space; (iii) take into account induction effects on the − ing units (GPUs)22 27 and field-programmable gate arrays PCs92,101,132 as well as, possibly, on the LJ coeffi- − − (FPGAs)28 30), faster algorithms and improved simulation cients;112,113,120 122,125,126 and (iv) are easier to automate in − methodologies (e.g., for nonbonded interactions31 41 and free- terms of topology construction and parameter derivation. The − energy calculations42 49), more accessible experimental last point is of particular importance if large collections of − data50 58 (online literature and databases), and faster QM molecules are to be considered in the calibration and/or − calculation codes.59 66 In turn, these progresses have increased application of the force field. the practical usefulness of force-field-based simulations, in Automation can be applied at three main levels. particular in the context of material and drug design, where 1. Automated topology building, which aims at assigning approaches relying on atomistic MD simulations can nowadays the force-field parameters relevant for a given molecular keep pace with the fast time scale of industrial and fi 45−47,67−75 system based on the tables of a given force- eld pharmaceutical product development. parameter set (FBFF and HYFF) and/or the results of One may tentatively distinguish three main strategies in the fi QM calculations (HYFF and QDFF). Yet another route design of condensed-phase force elds. is to rely on the direct perception of chemical patterns − 1. In fragment-based force fields (FBFFs),76 80 pioneered via automated substructure search, as done in the − in the 1980s, the covalent and nonbonded (NB) OpenFF force field.133 135 The procedures employed interaction parameters, the latter including Lennard- are too diverse and force-field-specific to be listed Jones81 (LJ) coefficients and atomic partial charges exhaustively; see, e.g., Antechamber136 for AMBER/ (PCs), are specified simultaneously within molecular GAFF, ParamChem137,138 and FFParam139 for − fragments representativeoftherelevantchemical CHARMM/CGenFF, ATB140 142 for GROMOS, and functional groups, or the relevant monomers in the the SMARTS + SMIRKS perception languages135 for case of (bio-)polymers. Molecules are constructed by OpenFF (see also refs 143−149 for other examples). assembling these fragments or monomers, and invoking These approaches include in particular automatic − − − an assumption of transferability. The latter suggests that typing,135 137,150 153 torsional scanning,113,154 160 en- − the properties of a given fragment are essentially ergy/force/Hessian fitting or matching,108,161 164 PC − independent of its covalent surroundings within a derivation schemes82,83,86 105 (possibly with charge- specific molecule as well as of the medium surrounding group partitioning165,166), and LJ derivation sche- − − this molecule. The NB parametrization is then mes.106,112,113,115 126,167 172 fi performed primarily by tting against experimental 2. Automated force-field optimization, which aims at thermodynamic data for small organic compounds, calibrating the force-field parameters. The target of the typically considering pure-liquid, solvation, and/or refinement is to reproduce experimental condensed- crystal properties. phase observables for a large collection of molecules 2. In hybrid force fields (HYFFs),82,83 introduced in the representative of the chemical space to be covered. This 1990s, the covalent parameters and LJ coefficients are automatic optimization has a long history in the context − selected in an FBFF fashion, but the PCs are derived of target QM data127 131 (see also refs 109 and based on a QM calculation involving the target molecule 173−177 for more recent work). However, to date, in a given environment (implicit solvent in the QM the refinement against experimental data has mainly calculation), typically via electrostatic-potential fit- relied on