Exploring Molecular Conformational Space

Exploring Molecular Conformational Space im Fachbereich Physik der Freien Universität Berlin eingereichte Dissertation zur Erlangung des akademischen Grades DOCTOR RERUM NATURALIUM vorgelegt von Adriana Supady Berlin, 2016 Erstgutachter (Betreuer): Prof. Dr. Matthias Scheffler Zweitgutachter: Prof. Dr. Roland Netz Tag der Disputation: 1. September 2016 ABSTRACT Flexible organic molecules and biomolecules can adopt a variety of energetically favorable conformations that differ in chemical and physical properties. The identification of such low-energy conformers is a fundamental problem in molecular physics and computational chemistry. Here we describe our ef- forts to develop methods exploring molecular conformational spaces at the first-principles level. We present a genetic algorithm (GA) based search for sampling the conformational space of molecules. This GA is available in the Python library Fafoom and has been developed in this thesis. The GA search aims not only at finding the global minimum, but also at identifying all conformers within a certain energy window. The implementation of the GA search is designed to work with first-principles methods, facilitated by the incorporation of local optimization and blacklisting conformers to prevent repeated evaluations of very similar solutions. The performance of the GA search is evaluated for seven amino-acid dipeptides and eight drug-like molecules. The evaluation focuses on: (i) how well the GA search can reproduce the reference data; and (ii) how well the conformational space is covered. Our study shows that the GA search samples the conformational space of the evaluated molecules with high accuracy and efficiency. For the purpose of the investigation of the dynamics of the conformational ensemble, we propose a strategy to construct a reduced energy surface from low-energy minima and selected transition states. The strategy selects pairs of conformers for the optimization of the transition states. The resulting energy barriers are then arranged into a barrier tree, a convenient representation of a high-dimensional energy surface. The method is evaluated for: (i) the alanine tetrapeptide, at the force-field level, where it matches the findings of free-energy simulations; and (ii) a synthetic peptide, employing first principles, where the resulting barrier-tree representation helps interpreting the experiment. Accurate predictions of properties, e.g. catalytic activity, require identification of energetically favorable 3D structures. We investigate the relation between the adopted 3D structures and the catalytic activity in eight (thio)urea based compounds. The conformational preferences of the (thio)urea based compounds significantly differ between each other. The investigation of the interaction between an example thiourea catalyst and a model substrate re- veals that only in its active form can the catalyst activate the substrate. iii ZUSAMMENFASSUNG Flexible organische und biologische Moleküle können verschiedene 3D Kon- formationen annehmen, die unterschiedliche chemische und physikalische Eigenschaften aufweisen. Die Suche nach energetisch günstigen Konformeren ist ein fundamentales Problem der Molekülphysik und Computerchemie. In der vorliegenden Arbeit stellen wir Methoden vor, die der Untersuchung des molekularen Konformationsraumes dienen und die ab initio-Methoden ver- wenden. Wir präsentieren eine Suchtechnik, die unter Verwendung eines genetis- chen Algorithmus (GA) den Konformationsraum durchsucht. Diese Suchtech- nik wurde als Teil der Python-Bibliothek Fafoom implementiert, die im Rah- men dieser Doktorarbeit entwickelt wurde. Ziel der GA-basierten Suchtech- nik ist es das Auffinden des globalen Minimums und aller lokalen Minima in einem bestimmen Energiefenster. Die effiziente Verwendung recheninten- siver ab initio-Methoden wird durch die Durchführung lokaler Optimierun- gen und das Vermeiden der Auswertung von bekannten Lösungen unter- stützt. Die Suchtechnik wurde eingesetzt um den Konformationsraum von sieben Dipeptiden und acht Arzneistoff-ähnlichen Molekülen zu untersuchen. Im Anschluss wurden folgende Punkte überprüft: (i) wie gut kann die Such- technik die Referenzdaten reproduzieren; und (ii) wie gut ist der Konfor- mationsraum erforscht worden. Unsere Studie zeigt, dass die GA-basierte Suchtechnik den Konformationsraum der untersuchten Moleküle mit hoher Genauigkeit und Effizienz probt. Wir präsentieren eine Strategie, die eine vereinfachte Darstellung der En- ergiefläche bietet um eine Untersuchung der Vielfalt des Konformationsensem- bles zu ermöglichen. Die vereinfachte Darstellung besteht aus energetisch günstigen lokalen Minima und ausgewählten Übergangszuständen. Die re- sultierenden Energiebarrieren werden verwendet um die vieldimensionale Energiefläche in Form eines Energiebaumes anschaulich darzustellen. Fol- gende Moleküle wurden mit der Methode untersucht: (i) das Alanin-Tetra- peptid mit Hilfe von Molekülmechanik-Rechnungen und (ii) ein synthetis- ches Peptid unter Verwendung von ersten Prinzipien. Die für das Alanin- Tetrapeptid gewonnenen Resultate stimmen mit den Erkenntnissen aus Ver- gleichssimulationen überein. Das für das synthetische Peptid konstruierte Energie-Baumdiagramm unterstützt die Interpretation von experimentellen Daten. Die Bestimmung von energetisch günstigen Konformeren ist zur korrek- ten Vorhersage von Eigenschaften notwendig. Wir untersuchen den Zusam- menhang zwischen der 3D-Struktur und der katalytischen Aktivität von acht (Thio-)Harnstoffverbindungen. Die Unterschiede zwischen den strukturellen Präferenzen von den (Thio-)Harnstoffen sind signifikant. Die Untersuchung der Interaktion zwischen einem Thioharnstoff basierten Katalysator und einem Modellsubstrat hat ergeben, dass nur ein bestimmtes Konformer des Kataly- sators das Substrat aktivieren kann. iv CONTENTS 1 introduction1 2 investigating molecular structures5 2.1 Flexible organic molecules . 5 2.1.1 Describing molecular structures . 5 2.1.2 Geometrical similarity of structures . 6 2.1.3 Technical details . 8 2.2 Theoretical methods for treatment of molecular structures . 8 2.2.1 Force fields . 9 2.2.2 Towards quantum mechanics . 10 2.2.3 The Hartree-Fock approximation and beyond . 11 2.2.4 Density-Functional Theory . 12 2.2.5 Dispersion corrections . 14 2.2.6 Solvent models . 15 2.3 Energy landscapes . 16 2.3.1 Characterizing the Potential-Energy Surface . 16 2.3.2 Towards Free-Energy Surface . 20 2.4 Packages for molecular simulations . 21 2.4.1 FHI-aims . 21 2.4.2 ORCA . 21 2.4.3 Tinker . 22 3 flexible algorithm for optimization of molecules 23 3.1 Motivation . 23 3.2 Fafoom: implementation details . 24 3.2.1 Structure of the package . 24 3.2.2 Example of use: genetic algorithm based search . 26 3.3 Fafoom: benchmark calculations . 28 3.3.1 Amino acid dipeptides . 29 3.3.2 Drug-like molecules . 33 3.4 Fafoom: and beyond . 36 3.4.1 Applications . 36 3.4.2 Free choice of the external simulation package . 38 3.4.3 Extensibility of the kind of optimized degrees of freedom 38 3.4.4 Parallelization . 40 4 reduced molecular potential-energy landscapes 41 4.1 Motivation . 41 4.2 Description of the method . 43 4.3 The example of AcAla3NMe . 45 4.3.1 Reference data for comparison . 46 4.3.2 Sampling of the potential-energy surface . 46 4.3.3 Towards the network of states . 47 4.3.4 Calculations of the energy barriers . 48 4.4 Reduced PES of a synthetic peptide from first principles . 51 4.4.1 Sampling of the potential-energy surface . 51 4.4.2 Towards a network of states . 52 4.4.3 Calculations of the energy barriers . 53 v vi contents 5 global structure search from first principles: thiourea catalysts 57 5.1 Motivation . 57 5.1.1 Structure-dependent catalytic activity . 57 5.1.2 Thioureas acting as organocatalysts . 59 5.1.3 Studied problem . 60 5.2 Molecular systems . 60 5.3 Isolated molecules . 61 5.3.1 Energetics . 61 5.3.2 Energy barriers . 67 5.4 Catalyst-substrate complexes . 67 6 conclusions and outlook 71 6.1 Summary . 71 6.1.1 Fafoom - Flexible algorithm for optimization of molecules 71 6.1.2 Reduced representation of molecular landscapes . 71 6.1.3 Structure-dependent catalytic activity . 72 6.2 Conclusions . 72 6.3 Outlook . 73 a appendix 75 a.1 Fafoom parameter . 75 a.2 Fafoom benchmark: parameter lists . 77 a.3 (Thio)urea molecules: impact of different dispersion corrections 78 selbständigkeitserklärung 79 curriculum vitae 81 publications 83 acknowledgments 85 bibliography 87 INTRODUCTION 1 The properties of a chemical compound are determined by its structure [1]. This general formulation is one of the most fundamental concepts in chemistry, physics, and materials science. The chemical structure is given by the molecular composition, i.e. type and count of atoms, and their spatial arrangement. The interest in studying properties of a chemical compound drives the de- mand to identify its structure. Thus, a number of experimental techniques for determination of molecular structures have been developed. The most popu- lar technique is X-ray diffraction, which allows for resolving the arrangement of atoms in a solid crystal. Other methods, e.g. neutron scattering and elec- tron microscopy, can also be utilized to identify structures. An alternative to experimental structure determination is theoretical structure prediction. The idea of structure prediction is to obtain a structure with- out prior experimental knowledge. Structure prediction is an integral part of diverse research areas, such as: (i) biophysics, e.g. in protein structure prediction

Load more