Exploring Molecular Conformational Space

Total Page:16

File Type:pdf, Size:1020Kb

Exploring Molecular Conformational Space Exploring Molecular Conformational Space im Fachbereich Physik der Freien Universität Berlin eingereichte Dissertation zur Erlangung des akademischen Grades DOCTOR RERUM NATURALIUM vorgelegt von Adriana Supady Berlin, 2016 Erstgutachter (Betreuer): Prof. Dr. Matthias Scheffler Zweitgutachter: Prof. Dr. Roland Netz Tag der Disputation: 1. September 2016 ABSTRACT Flexible organic molecules and biomolecules can adopt a variety of energeti- cally favorable conformations that differ in chemical and physical properties. The identification of such low-energy conformers is a fundamental problem in molecular physics and computational chemistry. Here we describe our ef- forts to develop methods exploring molecular conformational spaces at the first-principles level. We present a genetic algorithm (GA) based search for sampling the con- formational space of molecules. This GA is available in the Python library Fafoom and has been developed in this thesis. The GA search aims not only at finding the global minimum, but also at identifying all conformers within a certain energy window. The implementation of the GA search is designed to work with first-principles methods, facilitated by the incorporation of local optimization and blacklisting conformers to prevent repeated evaluations of very similar solutions. The performance of the GA search is evaluated for seven amino-acid dipeptides and eight drug-like molecules. The evaluation focuses on: (i) how well the GA search can reproduce the reference data; and (ii) how well the conformational space is covered. Our study shows that the GA search samples the conformational space of the evaluated molecules with high accuracy and efficiency. For the purpose of the investigation of the dynamics of the conformational ensemble, we propose a strategy to construct a reduced energy surface from low-energy minima and selected transition states. The strategy selects pairs of conformers for the optimization of the transition states. The resulting energy barriers are then arranged into a barrier tree, a convenient representation of a high-dimensional energy surface. The method is evaluated for: (i) the alanine tetrapeptide, at the force-field level, where it matches the findings of free-energy simulations; and (ii) a synthetic peptide, employing first prin- ciples, where the resulting barrier-tree representation helps interpreting the experiment. Accurate predictions of properties, e.g. catalytic activity, require identifica- tion of energetically favorable 3D structures. We investigate the relation be- tween the adopted 3D structures and the catalytic activity in eight (thio)urea based compounds. The conformational preferences of the (thio)urea based compounds significantly differ between each other. The investigation of the interaction between an example thiourea catalyst and a model substrate re- veals that only in its active form can the catalyst activate the substrate. iii ZUSAMMENFASSUNG Flexible organische und biologische Moleküle können verschiedene 3D Kon- formationen annehmen, die unterschiedliche chemische und physikalische Eigenschaften aufweisen. Die Suche nach energetisch günstigen Konformeren ist ein fundamentales Problem der Molekülphysik und Computerchemie. In der vorliegenden Arbeit stellen wir Methoden vor, die der Untersuchung des molekularen Konformationsraumes dienen und die ab initio-Methoden ver- wenden. Wir präsentieren eine Suchtechnik, die unter Verwendung eines genetis- chen Algorithmus (GA) den Konformationsraum durchsucht. Diese Suchtech- nik wurde als Teil der Python-Bibliothek Fafoom implementiert, die im Rah- men dieser Doktorarbeit entwickelt wurde. Ziel der GA-basierten Suchtech- nik ist es das Auffinden des globalen Minimums und aller lokalen Minima in einem bestimmen Energiefenster. Die effiziente Verwendung recheninten- siver ab initio-Methoden wird durch die Durchführung lokaler Optimierun- gen und das Vermeiden der Auswertung von bekannten Lösungen unter- stützt. Die Suchtechnik wurde eingesetzt um den Konformationsraum von sieben Dipeptiden und acht Arzneistoff-ähnlichen Molekülen zu untersuchen. Im Anschluss wurden folgende Punkte überprüft: (i) wie gut kann die Such- technik die Referenzdaten reproduzieren; und (ii) wie gut ist der Konfor- mationsraum erforscht worden. Unsere Studie zeigt, dass die GA-basierte Suchtechnik den Konformationsraum der untersuchten Moleküle mit hoher Genauigkeit und Effizienz probt. Wir präsentieren eine Strategie, die eine vereinfachte Darstellung der En- ergiefläche bietet um eine Untersuchung der Vielfalt des Konformationsensem- bles zu ermöglichen. Die vereinfachte Darstellung besteht aus energetisch günstigen lokalen Minima und ausgewählten Übergangszuständen. Die re- sultierenden Energiebarrieren werden verwendet um die vieldimensionale Energiefläche in Form eines Energiebaumes anschaulich darzustellen. Fol- gende Moleküle wurden mit der Methode untersucht: (i) das Alanin-Tetra- peptid mit Hilfe von Molekülmechanik-Rechnungen und (ii) ein synthetis- ches Peptid unter Verwendung von ersten Prinzipien. Die für das Alanin- Tetrapeptid gewonnenen Resultate stimmen mit den Erkenntnissen aus Ver- gleichssimulationen überein. Das für das synthetische Peptid konstruierte Energie-Baumdiagramm unterstützt die Interpretation von experimentellen Daten. Die Bestimmung von energetisch günstigen Konformeren ist zur korrek- ten Vorhersage von Eigenschaften notwendig. Wir untersuchen den Zusam- menhang zwischen der 3D-Struktur und der katalytischen Aktivität von acht (Thio-)Harnstoffverbindungen. Die Unterschiede zwischen den strukturellen Präferenzen von den (Thio-)Harnstoffen sind signifikant. Die Untersuchung der Interaktion zwischen einem Thioharnstoff basierten Katalysator und einem Modellsubstrat hat ergeben, dass nur ein bestimmtes Konformer des Kataly- sators das Substrat aktivieren kann. iv CONTENTS 1 introduction1 2 investigating molecular structures5 2.1 Flexible organic molecules . 5 2.1.1 Describing molecular structures . 5 2.1.2 Geometrical similarity of structures . 6 2.1.3 Technical details . 8 2.2 Theoretical methods for treatment of molecular structures . 8 2.2.1 Force fields . 9 2.2.2 Towards quantum mechanics . 10 2.2.3 The Hartree-Fock approximation and beyond . 11 2.2.4 Density-Functional Theory . 12 2.2.5 Dispersion corrections . 14 2.2.6 Solvent models . 15 2.3 Energy landscapes . 16 2.3.1 Characterizing the Potential-Energy Surface . 16 2.3.2 Towards Free-Energy Surface . 20 2.4 Packages for molecular simulations . 21 2.4.1 FHI-aims . 21 2.4.2 ORCA . 21 2.4.3 Tinker . 22 3 flexible algorithm for optimization of molecules 23 3.1 Motivation . 23 3.2 Fafoom: implementation details . 24 3.2.1 Structure of the package . 24 3.2.2 Example of use: genetic algorithm based search . 26 3.3 Fafoom: benchmark calculations . 28 3.3.1 Amino acid dipeptides . 29 3.3.2 Drug-like molecules . 33 3.4 Fafoom: and beyond . 36 3.4.1 Applications . 36 3.4.2 Free choice of the external simulation package . 38 3.4.3 Extensibility of the kind of optimized degrees of freedom 38 3.4.4 Parallelization . 40 4 reduced molecular potential-energy landscapes 41 4.1 Motivation . 41 4.2 Description of the method . 43 4.3 The example of AcAla3NMe . 45 4.3.1 Reference data for comparison . 46 4.3.2 Sampling of the potential-energy surface . 46 4.3.3 Towards the network of states . 47 4.3.4 Calculations of the energy barriers . 48 4.4 Reduced PES of a synthetic peptide from first principles . 51 4.4.1 Sampling of the potential-energy surface . 51 4.4.2 Towards a network of states . 52 4.4.3 Calculations of the energy barriers . 53 v vi contents 5 global structure search from first principles: thiourea catalysts 57 5.1 Motivation . 57 5.1.1 Structure-dependent catalytic activity . 57 5.1.2 Thioureas acting as organocatalysts . 59 5.1.3 Studied problem . 60 5.2 Molecular systems . 60 5.3 Isolated molecules . 61 5.3.1 Energetics . 61 5.3.2 Energy barriers . 67 5.4 Catalyst-substrate complexes . 67 6 conclusions and outlook 71 6.1 Summary . 71 6.1.1 Fafoom - Flexible algorithm for optimization of molecules 71 6.1.2 Reduced representation of molecular landscapes . 71 6.1.3 Structure-dependent catalytic activity . 72 6.2 Conclusions . 72 6.3 Outlook . 73 a appendix 75 a.1 Fafoom parameter . 75 a.2 Fafoom benchmark: parameter lists . 77 a.3 (Thio)urea molecules: impact of different dispersion corrections 78 selbständigkeitserklärung 79 curriculum vitae 81 publications 83 acknowledgments 85 bibliography 87 INTRODUCTION 1 The properties of a chemical compound are determined by its structure [1]. This general formulation is one of the most fundamental concepts in chem- istry, physics, and materials science. The chemical structure is given by the molecular composition, i.e. type and count of atoms, and their spatial ar- rangement. The interest in studying properties of a chemical compound drives the de- mand to identify its structure. Thus, a number of experimental techniques for determination of molecular structures have been developed. The most popu- lar technique is X-ray diffraction, which allows for resolving the arrangement of atoms in a solid crystal. Other methods, e.g. neutron scattering and elec- tron microscopy, can also be utilized to identify structures. An alternative to experimental structure determination is theoretical struc- ture prediction. The idea of structure prediction is to obtain a structure with- out prior experimental knowledge. Structure prediction is an integral part of diverse research areas, such as: (i) biophysics, e.g. in protein structure prediction
Recommended publications
  • Open Babel Documentation Release 2.3.1
    Open Babel Documentation Release 2.3.1 Geoffrey R Hutchison Chris Morley Craig James Chris Swain Hans De Winter Tim Vandermeersch Noel M O’Boyle (Ed.) December 05, 2011 Contents 1 Introduction 3 1.1 Goals of the Open Babel project ..................................... 3 1.2 Frequently Asked Questions ....................................... 4 1.3 Thanks .................................................. 7 2 Install Open Babel 9 2.1 Install a binary package ......................................... 9 2.2 Compiling Open Babel .......................................... 9 3 obabel and babel - Convert, Filter and Manipulate Chemical Data 17 3.1 Synopsis ................................................. 17 3.2 Options .................................................. 17 3.3 Examples ................................................. 19 3.4 Differences between babel and obabel .................................. 21 3.5 Format Options .............................................. 22 3.6 Append property values to the title .................................... 22 3.7 Filtering molecules from a multimolecule file .............................. 22 3.8 Substructure and similarity searching .................................. 25 3.9 Sorting molecules ............................................ 25 3.10 Remove duplicate molecules ....................................... 25 3.11 Aliases for chemical groups ....................................... 26 4 The Open Babel GUI 29 4.1 Basic operation .............................................. 29 4.2 Options .................................................
    [Show full text]
  • Open Data, Open Source, and Open Standards in Chemistry: the Blue Obelisk Five Years On" Journal of Cheminformatics Vol
    Oral Roberts University Digital Showcase College of Science and Engineering Faculty College of Science and Engineering Research and Scholarship 10-14-2011 Open Data, Open Source, and Open Standards in Chemistry: The lueB Obelisk five years on Andrew Lang Noel M. O'Boyle Rajarshi Guha National Institutes of Health Egon Willighagen Maastricht University Samuel Adams See next page for additional authors Follow this and additional works at: http://digitalshowcase.oru.edu/cose_pub Part of the Chemistry Commons Recommended Citation Andrew Lang, Noel M O'Boyle, Rajarshi Guha, Egon Willighagen, et al.. "Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on" Journal of Cheminformatics Vol. 3 Iss. 37 (2011) Available at: http://works.bepress.com/andrew-sid-lang/ 19/ This Article is brought to you for free and open access by the College of Science and Engineering at Digital Showcase. It has been accepted for inclusion in College of Science and Engineering Faculty Research and Scholarship by an authorized administrator of Digital Showcase. For more information, please contact [email protected]. Authors Andrew Lang, Noel M. O'Boyle, Rajarshi Guha, Egon Willighagen, Samuel Adams, Jonathan Alvarsson, Jean- Claude Bradley, Igor Filippov, Robert M. Hanson, Marcus D. Hanwell, Geoffrey R. Hutchison, Craig A. James, Nina Jeliazkova, Karol M. Langner, David C. Lonie, Daniel M. Lowe, Jerome Pansanel, Dmitry Pavlov, Ola Spjuth, Christoph Steinbeck, Adam L. Tenderholt, Kevin J. Theisen, and Peter Murray-Rust This article is available at Digital Showcase: http://digitalshowcase.oru.edu/cose_pub/34 Oral Roberts University From the SelectedWorks of Andrew Lang October 14, 2011 Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on Andrew Lang Noel M O'Boyle Rajarshi Guha, National Institutes of Health Egon Willighagen, Maastricht University Samuel Adams, et al.
    [Show full text]
  • Computer-Assisted Catalyst Development Via Automated Modelling of Conformationally Complex Molecules
    www.nature.com/scientificreports OPEN Computer‑assisted catalyst development via automated modelling of conformationally complex molecules: application to diphosphinoamine ligands Sibo Lin1*, Jenna C. Fromer2, Yagnaseni Ghosh1, Brian Hanna1, Mohamed Elanany3 & Wei Xu4 Simulation of conformationally complicated molecules requires multiple levels of theory to obtain accurate thermodynamics, requiring signifcant researcher time to implement. We automate this workfow using all open‑source code (XTBDFT) and apply it toward a practical challenge: diphosphinoamine (PNP) ligands used for ethylene tetramerization catalysis may isomerize (with deleterious efects) to iminobisphosphines (PPNs), and a computational method to evaluate PNP ligand candidates would save signifcant experimental efort. We use XTBDFT to calculate the thermodynamic stability of a wide range of conformationally complex PNP ligands against isomeriation to PPN (ΔGPPN), and establish a strong correlation between ΔGPPN and catalyst performance. Finally, we apply our method to screen novel PNP candidates, saving signifcant time by ruling out candidates with non‑trivial synthetic routes and poor expected catalytic performance. Quantum mechanical methods with high energy accuracy, such as density functional theory (DFT), can opti- mize molecular input structures to a nearby local minimum, but calculating accurate reaction thermodynamics requires fnding global minimum energy structures1,2. For simple molecules, expert intuition can identify a few minima to focus study on, but an alternative approach must be considered for more complex molecules or to eventually fulfl the dream of autonomous catalyst design 3,4: the potential energy surface must be frst surveyed with a computationally efcient method; then minima from this survey must be refned using slower, more accurate methods; fnally, for molecules possessing low-frequency vibrational modes, those modes need to be treated appropriately to obtain accurate thermodynamic energies 5–7.
    [Show full text]
  • Designing Universal Chemical Markup (UCM) Through the Reusable Methodology Based on Analyzing Existing Related Formats
    Designing Universal Chemical Markup (UCM) through the reusable methodology based on analyzing existing related formats Background: In order to design concepts for a new general-purpose chemical format we analyzed the strengths and weaknesses of current formats for common chemical data. While the new format is discussed more in the next article, here we describe our software s t tools and two stage analysis procedure that supplied the necessary information for the n i r development. The chemical formats analyzed in both stages were: CDX, CDXML, CML, P CTfile and XDfile. In addition the following formats were included in the first stage only: e r P CIF, InChI, NCBI ASN.1, NCBI XML, PDB, PDBx/mmCIF, PDBML, SMILES, SLN and Mol2. Results: A two stage analysis process devised for both XML (Extensible Markup Language) and non-XML formats enabled us to verify if and how potential advantages of XML are utilized in the widely used general-purpose chemical formats. In the first stage we accumulated information about analyzed formats and selected the formats with the most general-purpose chemical functionality for the second stage. During the second stage our set of software quality requirements was used to assess the benefits and issues of selected formats. Additionally, the detailed analysis of XML formats structure in the second stage helped us to identify concepts in those formats. Using these concepts we came up with the concise structure for a new chemical format, which is designed to provide precise built-in validation capabilities and aims to avoid the potential issues of analyzed formats.
    [Show full text]
  • Open Data, Open Source and Open Standards in Chemistry: the Blue Obelisk five Years On
    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk ¯ve years on Noel M O'Boyle¤1 , Rajarshi Guha2 , Egon L Willighagen3 , Samuel E Adams4 , Jonathan Alvarsson5 , Richard L Apodaca6 , Jean-Claude Bradley7 , Igor V Filippov8 , Robert M Hanson9 , Marcus D Hanwell10 , Geo®rey R Hutchison11 , Craig A James12 , Nina Jeliazkova13 , Andrew SID Lang14 , Karol M Langner15 , David C Lonie16 , Daniel M Lowe4 , J¶er^omePansanel17 , Dmitry Pavlov18 , Ola Spjuth5 , Christoph Steinbeck19 , Adam L Tenderholt20 , Kevin J Theisen21 , Peter Murray-Rust4 1Analytical and Biological Chemistry Research Facility, Cavanagh Pharmacy Building, University College Cork, College Road, Cork, Co. Cork, Ireland 2NIH Center for Translational Therapeutics, 9800 Medical Center Drive, Rockville, MD 20878, USA 3Division of Molecular Toxicology, Institute of Environmental Medicine, Nobels vaeg 13, Karolinska Institutet, 171 77 Stockholm, Sweden 4Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Lens¯eld Road, CB2 1EW, UK 5Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden 6Metamolecular, LLC, 8070 La Jolla Shores Drive #464, La Jolla, CA 92037, USA 7Department of Chemistry, Drexel University, 32nd and Chestnut streets, Philadelphia, PA 19104, USA 8Chemical Biology Laboratory, Basic Research Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD 21702, USA 9St. Olaf College, 1520 St. Olaf Ave., North¯eld, MN 55057, USA 10Kitware, Inc., 28 Corporate Drive, Clifton Park, NY 12065, USA 11Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, PA 15260, USA 12eMolecules Inc., 380 Stevens Ave., Solana Beach, California 92075, USA 13Ideaconsult Ltd., 4.A.Kanchev str., So¯a 1000, Bulgaria 14Department of Engineering, Computer Science, Physics, and Mathematics, Oral Roberts University, 7777 S.
    [Show full text]
  • Conformational Analysis 3D Structures, Conformations and Molecular Surfaces
    Conformational Analysis 3D Structures, Conformations and Molecular Surfaces Christof H. Schwab Molecular Networks GmbH Altamira LLC Henkestr. 91 1455 Candlewood Drive 91052 Erlangen, Germany mn-am.com Columbus, Ohio 43235, USA Molecular Networks and Altamira MN-AM Erlangen, Germany Columbus, Ohio, USA Friedrich-Alexander-Universität The Ohio State University 1997 2008 Chemoinformatics 3D structure generation Physicochemical and reaction properties Metabolic reaction knowledge Computational toxicology and risk assessment Database and knowledgebase Predictive models Consulting services 2 Product Lines – Chemoinformatics CORINA Classic CORINA Classic Industry-standard 3D structure generation CORINA Symphony Profiling and managing of chemical datasets Workflows Structure cleaning/processing Descriptor generation (properties and fragments) SYLVIA Estimation of synthetic accessibility of compounds Public tools ChemoTyper (with ToxPrint Chemotypes) https://chemotyper.org, https://toxprint.org 3 Product Lines – Computational Toxicology and Risk Assessment ChemTunes Platform to support decision making in human health and regulatory critical endpoints Toxicity database (all endpoints) "Inventory" concept for compound location ChemTunes ToxGPS Prediction models All human health related endpoints Workflows TTC (thresholds of toxicological concern) Genotoxic Impurities Read-Across (in development) Workflow ICH M7 GTI (in development) 4 3D Structures, Conformations and Molecular Surfaces – Overview 3D structures – why,
    [Show full text]
  • Bringing Open Source to Drug Discovery
    Bringing Open Source to Drug Discovery Chris Swain Cambridge MedChem Consulting Standing on the shoulders of giants • There are a huge number of people involved in writing open source software • It is impossible to acknowledge them all individually • The slide deck will be available for download and includes 25 slides of details and download links – Copy on my website www.cambridgemedchemconsulting.com Why us Open Source software? • Allows access to source code – You can customise the code to suit your needs – If developer ceases trading the code can continue to be developed – Outside scrutiny improves stability and security What Resources are available • Toolkits • Databases • Web Services • Workflows • Applications • Scripts Toolkits • OpenBabel (htttp://openbabel.org) is a chemical toolbox – Ready-to-use programs, and complete programmer's toolkit – Read, write and convert over 110 chemical file formats – Filter and search molecular files using SMARTS and other methods, KNIME add-on – Supports molecular modeling, cheminformatics, bioinformatics – Organic chemistry, inorganic chemistry, solid-state materials, nuclear chemistry – Written in C++ but accessible from Python, Ruby, Perl, Shell scripts… Toolkits • OpenBabel • R • CDK • OpenCL • RDkit • SciPy • Indigo • NumPy • ChemmineR • Pandas • Helium • Flot • FROWNS • GNU Octave • Perlmol • OpenMPI Toolkits • RDKit (http://www.rdkit.org) – A collection of cheminformatics and machine-learning software written in C++ and Python. – Knime nodes – The core algorithms and data structures are written in C ++. Wrappers are provided to use the toolkit from either Python or Java. – Additionally, the RDKit distribution includes a PostgreSQL-based cartridge that allows molecules to be stored in relational database and retrieved via substructure and similarity searches.
    [Show full text]
  • Open Source Molecular Modeling
    Accepted Manuscript Title: Open Source Molecular Modeling Author: Somayeh Pirhadi Jocelyn Sunseri David Ryan Koes PII: S1093-3263(16)30118-8 DOI: http://dx.doi.org/doi:10.1016/j.jmgm.2016.07.008 Reference: JMG 6730 To appear in: Journal of Molecular Graphics and Modelling Received date: 4-5-2016 Accepted date: 25-7-2016 Please cite this article as: Somayeh Pirhadi, Jocelyn Sunseri, David Ryan Koes, Open Source Molecular Modeling, <![CDATA[Journal of Molecular Graphics and Modelling]]> (2016), http://dx.doi.org/10.1016/j.jmgm.2016.07.008 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Open Source Molecular Modeling Somayeh Pirhadia, Jocelyn Sunseria, David Ryan Koesa,∗ aDepartment of Computational and Systems Biology, University of Pittsburgh Abstract The success of molecular modeling and computational chemistry efforts are, by definition, de- pendent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry. 1. Introduction What is Open Source? Free and open source software (FOSS) is software that is both considered \free software," as defined by the Free Software Foundation (http://fsf.org) and \open source," as defined by the Open Source Initiative (http://opensource.org).
    [Show full text]
  • 3Dmolnet: a Generative Network for Molecular Structures
    3DMolNet: A Generative Network for Molecular Structures Vitali Nesterov, Mario Wieser, Volker Roth Department of Mathematics and Computer Science University of Basel, Switzerland [email protected] Abstract With the recent advances in machine learning for quan- tum chemistry, it is now possible to predict the chemical properties of compounds and to generate novel molecules. Existing generative models mostly use a string- or graph- based representation, but the precise three-dimensional co- ordinates of the atoms are usually not encoded. First at- tempts in this direction have been proposed, where autore- gressive or GAN-based models generate atom coordinates. Those either lack a latent space in the autoregressive setting, such that a smooth exploration of the compound space is not possible, or cannot generalize to varying chemical compo- sitions. We propose a new approach to efficiently generate molecular structures that are not restricted to a fixed size or composition. Our model is based on the variational autoen- coder which learns a translation-, rotation-, and permutation- invariant low-dimensional representation of molecules. Our experiments yield a mean reconstruction error below 0.05 A,˚ outperforming the current state-of-the-art methods by a fac- tor of four, and which is even lower than the spatial quanti- zation error of most chemical descriptors. The compositional and structural validity of newly generated molecules has been confirmed by quantum chemical methods in a set of experi- Figure 1: A schematic illustration of the decoding process of ments. the 3DMolNet. The model learns a low-dimensional repre- sentation Z of molecules (red plane). Each latent encoding can be decoded to a nuclear charge matrix C, an Euclidean Introduction distance matrix D, and a bond matrix B using neural net- An exploration of the chemical compound space (CCS) is works (blue dotted arrows).
    [Show full text]
  • SOMA2 –Application Oriented Molecular Modelling
    SOMA2 SOMA2 – Application Oriented Molecular Modelling Workflows in WWW-Browser EGI Community Forum 2012 Pinja Koskinen, Kimmo Mattila and Dr. Tapani Kinnunen (tapani.kinnunen [at] csc.fi) CSC – IT Center for Science Ltd., Espoo, Finland SOMA2 CSC at a glance – Founded in 1971 as a technical support unit for Univac 1108 – Connected Finland to the Internet in 1988 – Reorganized as a company, CSC – Scientific Computing Ltd. in 1993 – All shares to the Ministry of Education and Culture of Finland in 1997 – Operates on a non-profit principle – Facilities in Espoo, close to Otaniemi campus (of 15,000 students and 16,000 technology professionals) and Kajaani – Staff 230 – Turnover 2011 27,3 million euros SOMA2 CSC’s mission – CSC, as part of the Finnish national research structure, develops and offers high-quality information technology services. CSC’s services – Funet Services – Computing Services – Application Services – Data Services for Science and Culture – Information Management Services SOMA2 SOMA2 is a modelling environment for computational drug discovery and molecular modelling – SOMA2 is operated with web browser Intuitive WWW interface provides an easy access to computational tools. Offers a full scale environment from data input to result analysis. System is operated with user’s own user account and access rights. – SOMA2 makes use of scientific applications available in the computing system Uniform interface tools for applications. Automatic configuration and execution of applications. Different applications and tools can be integrated into application workflows. Applications can be executed locally or remotely and as interactive or batch jobs Jobs in grids are supported via suitable middleware (since version 1.4 Aluminium) – SOMA2 Software is open source Released in May 2007 under GNU General Public License (GPL).
    [Show full text]
  • Fast, Efficient Fragment-Based Coordinate Generation for Open Babel
    Yoshikawa and Hutchison J Cheminform (2019) 11:49 https://doi.org/10.1186/s13321-019-0372-5 Journal of Cheminformatics SOFTWARE Open Access Fast, efcient fragment-based coordinate generation for Open Babel Naruki Yoshikawa1 and Geofrey R. Hutchison2* Abstract Rapidly predicting an accurate three dimensional geometry of a molecule is a crucial task for cheminformatics and across a wide range of molecular modeling. Consequently, developing a fast, accurate, and open implementation of structure prediction is necessary for reproducible cheminformatics research. We introduce a fragment-based coordinate generation implementation for Open Babel, a widely-used open source toolkit for cheminformatics. The new implementation improves speed and stereochemical accuracy, while retaining or improving accuracy of bond lengths, bond angles, and dihedral torsions. Input molecules are broken into fragments by cutting at rotatable bonds. The coordinates of fragments are set according to a fragment library, prepared from open crystallographic databases. Since the coordinates of multiple atoms are decided at once, coordinate prediction is accelerated over the previous rules-based implementation in Open Babel, as well as the widely-used distance geometry methods in RDKit. This new implementation will be benefcial for a wide range of applications, including computational property prediction in polymers, molecular materials and drug design. Keywords: Coordinate generation, Fragments, Molecular geometry Introduction coordinatte generation, including BALL [19], FROG [20, Accurate prediction of the three-dimensional structure of 21], RDKit [22], and Open Babel [23]. Te latter, which is a molecule is critical to a wide range of cheminformatics highly popular, has used a rule-based coordinate builder and molecular modeling tasks, since electrostatic, inter- with a small set of ring fragments, followed by force feld molecular, and other conformation-driven properties minimization.
    [Show full text]
  • Open Babel Access and Interconvert Chemical Information
    Open Babel Access and interconvert chemical information Noel M. O’Boyle Open Babel development team and NextMove Software, Cambridge, UK Dec 2013 EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery Image credit: AJ Cann (AJC1 on Flickr) Image credit: Jon Osborne (jonno101101 on Flickr) • Volunteer effort, an open source success story – Originally a fork from OpenEye’s OELib in 2001 – Lead is Geoff Hutchison (Uni of Pittsburgh) – 4 or 5 active developers – I got involved in late 2005 • http://openbabel.org • Associated paper: (Open Access) – Open Babel: An open chemical toolbox, J. Cheminf., 2011, 3, 33. 5 Does anyone else use Open Babel? • 40K downloads (from SF) in last 12 months – 1.4K downloads of Windows Python bindings • Paper #1 most accessed in last year – Cited 60 times in 1 year • In short, very widely-used Features • Multiple chemical file formats (+ options) and utility formats • 2D coordinate generation and depiction (PNG and SVG) • 3D coordinate generation, forcefield minimisation, conformer generation • Binary fingerprints (path-based, substructure- based) and associated “fast search” database • Bond perception, aromaticity detection and atom- typing • Canonical labelling, automorphisms, alignment • Plugin architecture • Several command-line applications, but also a software library • Written in C++ but bindings in several languages obabel and file conversion • Basic usage: obabel infile.extn –O outfile.extn • Can also read from stdin, write to stdout, read from a SMILES string, specify the input and
    [Show full text]