Comprehensive Medicinal Chemistry III 30010. Fingerprints and Other

Total Page:16

File Type:pdf, Size:1020Kb

Comprehensive Medicinal Chemistry III 30010. Fingerprints and Other Comprehensive Medicinal Chemistry III 30010. Fingerprints and other molecular descriptions for database analysis and searching Dávid Bajusz1, Anita Rácz2,3, and Károly Héberger2 1 Medicinal Chemistry Research Group, Research Centre for Natural Sciences, Institute of Organic Chemistry Hungarian Academy of Sciences, Magyar tudósok krt. 2, H-1117 Budapest, Hungary E-mail: [email protected] Phone: + 36 1 382 69 74 2 Plasma Chemistry Research Group, Institute of Materials and Environmental Chemistry, Research Centre for Natural Sciences, Hungarian Academy of Sciences, Magyar tudósok krt. 2, H-1117 Budapest, Hungary E-mail: [email protected] and [email protected] Phone: + 36 1 382 65 09 3 Department of Applied Chemistry, Szent István University, Villányi út 29-43, H-1118 Budapest, Hungary Cite as follows: Dávid Bajusz, Anita Rácz, and Károly Héberger*, Chapter 3.14 – Chemical Data Formats, Fingerprints, and Other Molecular Descriptions for Database Analysis and Searching. In: Reference Module in Chemistry, Molecular Sciences and Chemical Engineering Comprehensive Medicinal Chemistry III - Volume 3 Editor-in-Chiefs Samuel Chackalamannil, David P. Rotella and Simon E. Ward; In silico methods Eds: A. Davies, C. Edge, Available online 13 June 2017, Pages 329–378 Oxford: Elsevier. Elsevier (2017) http://dx.doi.org/10.1016/B978-0-12-409547-2.12345-5 ISBN: 9780128032008 Abstract In this chapter we strive to provide a comprehensive but reasonably compact overview of the various possibilities for the computational representation of molecules. This includes a detailed introduction to the most commonly used chemical file formats (complemented with a few novel or more specific representations), a thorough overview of the theoretical backgrounds of various molecular fingerprints and descriptors, and a complete subchapter devoted to similarity measures and data fusion approaches. Finally, we provide a list of the most important online chemical databases and conclude the chapter with a short outlook on present trends and future expectations. Keywords molecular fingerprint, similarity, distance metric, molecular descriptor, database analysis, data fusion, cheminformatics, drug design, chemical file format, virtual screening 1. Introduction Molecules possess an abundance of properties. 3D structure, atom connectivity, shape and physicochemical descriptors such as molecular weight or logP (logarithm of the n-octanol/water partition coefficient) are just a few of such properties that are usually of interest in the overlapping domains of cheminformatics, molecular modeling and drug design. Many of these properties have a quite intricate nature: for example 3D structure, which can rarely be characterized with a single conformation (but instead an ensemble of conformations), or even simple properties, such as molecular weight that depends on the isotope distribution of the composing atoms. The affinity of the molecule towards various types of environment (such as a physiological solution, a mixture of solvents or a particular protein binding pocket) – which is of particular interest in the mentioned fields – expands the property space of even a single compound beyond comprehension. As such, it is currently impossible to give a perfectly accurate and complete computational representation of a molecule that accounts for its every possible aspect; however in most cases this is not necessary. Nonetheless, one should always be aware of the implied simplifications in the computational representation of molecules, especially when making predictions based on them. In other words, the main responsibility of the computational chemist (or drug designer or cheminformatician) is to know, which aspects of a molecule are important to be considered for the given application. If it is so, then one can utilize the predictive power of computational chemistry techniques to provide invaluable guidance for medicinal chemistry endeavors. In the recent decades, medicinal chemistry has been armed with a growing number of large, highly featured and annotated databases. These databases contain the ever growing public (and proprietary) knowledge that is being aggregated by medicinal chemistry research and they provide an invaluable foundation for retrospective analyses and prospective/predictive work. Databases containing chemical information can be queried in a number of ways, each of which requires specific types of molecular representations. Substructure searches require an efficient chemical query language (e.g. SMARTS), while similarity searches need means to quantify the similarity of two molecules and appropriate structural representations that allow for such operations (e.g. molecular fingerprints). It is also common to query databases for specific values or ranges of quantifiable molecular properties such as molecular weight or logP. In this chapter, we provide a comprehensive overview of molecular representations, including file formats, fingerprints and molecular descriptors – with an emphasis on those that are widely applied in works involving large databases. We also dedicate a subchapter to molecular similarity (and the diverse methodologies that have been developed to elaborate this sub-field), as a ubiquitously applied concept in cheminformatics and drug design. Last but not least, we provide a short overview of the most prominent online resources of molecular information that ultimately fuel most of the computational chemical research that involves a consideration of large compound sets. 2. Chemical file formats and data structures The means to store chemical information started to develop in parallel with the first computers (in fact, earlier than computers became standard work tools). Since that time, generations of file formats have been developed (and have occasionally become obsolete). Since each file format was developed with a specific purpose, each encodes different aspects of the structure and properties of a molecule. Some of them are optimized for compactness, while others allow the storage of various properties or other information besides the 3D structure of the molecule. A complete overview of the available chemical file formats would be a futile effort – and probably pointless, as there is little relevance (from a cheminformatics point of view) of covering file formats that are developed for e.g. specific molecular dynamics or quantum chemistry programs. Instead, we will dedicate this subchapter to introduce the reader to today’s most widely used, cross-platform, human-readable chemical file formats, with an emphasis on their use in database searching and other subfields of cheminformatics. Throughout these subsections, we will occasionally refer to the structural representation of L-phenylalanine (see Figure 1) in the various file formats as a common example. Figure 1. Neural (1) and zwitterionic (2) forms of L-phenylalanine. The α-carbon is annotated with the absolute configuration of the stereogenic center. Before proceeding with the description of specific file formats, we make note of a convenient means for the interconversion of various chemical file formats. While many modeling software suites are capable of handling multiple file formats (and consequently, to convert them), computer programs dedicated specifically for this tasks also exist. Probably the best known example is Open Babel, an open source software for the interconversion of chemical structures between an abundance of file formats (over 110 for the current version).1 2.1 Line notation formats The principle of line notation formats is to store chemical structures unambiguously, but as compactly as possible (usually on a single line). In fact, the IUPAC Recommendations on Organic and Biochemical Nomenclature can be thought of as a line notation system as well.2 (Though ironically it is not necessarily more human-readable than e.g. the SMILES format, especially for large molecules.) In that “format”, the representation of L-phenylalanine would be (2S)-2-amino-3-phenylpropanoic acid (or somewhat less intuitively, (2S)-2-azaniumyl-3- phenylpropanoate for the zwitterionic form). While IUPAC names can unambiguously describe arbitrarily complicated molecules, the names themselves are often very complicated as well. Also, a substructure can usually be specified in multiple ways (depending on other groups, the order of priority, etc.), which essentially removes text-based IUPAC name queries from the toolbox of substructure searching. From the available tools for chemical identification and documentation, today’s standard is the CAS (Chemical Abstracts Service) Registry number.3,4 CAS maintains a constantly updated database (the CAS Registry) to store every reported chemical structure (cca. 111 million at the time these lines are written) with a uniquely assigned identifier, the CAS Registry number (e.g. for L-phenylalanine: 63-91-2). In addition to the database itself, the CAS Registry powers the two major chemical information services of CAS: Scifinder (for querying chemical structures and reactions, primarily in the literature)5 and STN (a search engine that provides access to patent content).6 While CAS Registry numbers are quite compact linear notations, they do not provide direct, human-readable information on molecular structure, nor relation/proximity to another substance (e.g. the CAS number for racemic phenylalanine is 150-30-1). Efforts to provide compact means of chemical structure representation using line notations date back to the late 1940’s when the
Recommended publications
  • Substrate Selectivity Profiling of the Human Monoamine Transporters
    DISSERTATION Titel der Dissertation Substrate Selectivity Profiling of the Human Monoamine Transporters Verfasst von Amir Seddik, B.Sc., M.Sc. angestrebter akademischer Grad Doktor der Naturwissenschaften (Dr. rer. nat.) Wien, 2015 Studienkennzahl lt. Studienblatt: A 796 610 449 Dissertationsgebiet lt. Studienblatt: Pharmazie, DK: Molecular Drug Targets Betreut von: Univ.-Prof. Mag. Dr. Gerhard F. Ecker A. Seddik - Substrate Selectivity Profiling of the Human Monoamine Transporters A. Seddik - Substrate Selectivity Profiling of the Human Monoamine Transporters Acknowledgement Hereby I would like to express my sincere gratitude to Prof. Gerhard F. Ecker, who has integrated me into the scientific community by letting me join his research group. I am thankful for his training during all these years, which has formed me into a very independent researcher. I thank him for his time and support and I admire his ambitions and interest for integrating students on European and international level. It has been an honor to work at the pharmaceutical department of the University of Vienna in this beautiful city. Gerhard, thank you for the great time. My gratitude goes out to Michael Freissmuth and my co-supervisor Harald H. Sitte with whom we had very successful collaborations and I thank them for giving me the opportunity to learn the experimental methods. I acknowledge the support from the MolTag program, to which I have applied for in the first place. The consortium has proven that collaboration between groups of different expertise is educative and beneficial to publish in world-class journals. I thank Steffen Hering, Gerhard Ecker, Marko Mihovilovic, Margot Ernst, Doris Stenitzer and Sophia Khom for devoting their time to the management of the project.
    [Show full text]
  • Evaluating Small Molecule Microscopic and Macroscopic Pka
    bioRxiv preprint doi: https://doi.org/10.1101/2020.10.15.341792; this version posted October 15, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. 1 Overview of the SAMPL6 pK a Challenge: 2 Evaluating small molecule microscopic and 3 macroscopic pK a predictions 4 Mehtap Işık (ORCID: 0000-0002-6789-952X)1,2*, Ariën S. Rustenburg (ORCID: 0000-0002-3422-0613)1,3, Andrea 5 Rizzi (ORCID: 0000-0001-7693-2013)1,4, M. R. Gunner (ORCID: 0000-0003-1120-5776)6, David L. Mobley (ORCID: 6 0000-0002-1083-5533)5, John D. Chodera (ORCID: 0000-0003-0542-119X)1 7 1Computational and Systems Biology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, 8 New York, NY 10065, United States; 2Tri-Institutional PhD Program in Chemical Biology, Weill Cornell Graduate 9 School of Medical Sciences, Cornell University, New York, NY 10065, United States; 3Graduate Program in 10 Physiology, Biophysics, and Systems Biology, Weill Cornell Medical College, New York, NY 10065, United States; 11 4Tri-Institutional PhD Program in Computational Biology and Medicine, Weill Cornell Graduate School of Medical 12 Sciences, Cornell University, New York, NY 10065, United States; 5Department of Pharmaceutical Sciences and 13 Department of Chemistry, University of California, Irvine, Irvine, California 92697, United States; 6Department of 14 Physics, City College of New York, New York NY 10031 15 *For correspondence: 16 [email protected] (MI) 17 18 Abstract 19 K The prediction of acid dissociation constants (p a) is a prerequisite for predicting many other properties of a small molecule, 20 such as its protein-ligand binding affinity, distribution coefficient (log D), membrane permeability, and solubility.
    [Show full text]
  • Open Babel Documentation Release 2.3.1
    Open Babel Documentation Release 2.3.1 Geoffrey R Hutchison Chris Morley Craig James Chris Swain Hans De Winter Tim Vandermeersch Noel M O’Boyle (Ed.) December 05, 2011 Contents 1 Introduction 3 1.1 Goals of the Open Babel project ..................................... 3 1.2 Frequently Asked Questions ....................................... 4 1.3 Thanks .................................................. 7 2 Install Open Babel 9 2.1 Install a binary package ......................................... 9 2.2 Compiling Open Babel .......................................... 9 3 obabel and babel - Convert, Filter and Manipulate Chemical Data 17 3.1 Synopsis ................................................. 17 3.2 Options .................................................. 17 3.3 Examples ................................................. 19 3.4 Differences between babel and obabel .................................. 21 3.5 Format Options .............................................. 22 3.6 Append property values to the title .................................... 22 3.7 Filtering molecules from a multimolecule file .............................. 22 3.8 Substructure and similarity searching .................................. 25 3.9 Sorting molecules ............................................ 25 3.10 Remove duplicate molecules ....................................... 25 3.11 Aliases for chemical groups ....................................... 26 4 The Open Babel GUI 29 4.1 Basic operation .............................................. 29 4.2 Options .................................................
    [Show full text]
  • JRC QSAR Model Database
    JRC QSAR Model Database EURL ECVAM DataBase service on ALternative Methods to animal experimentation To promote the development and uptake of alternative and advanced methods in toxicology and biomedical sciences SDF - STRUCTURE DATA FORMAT: How to create from SMILES The European Commission’s science and knowledge service Joint Research Centre Directorate F Health, Consumers & Reference Materials Chemicals Safety & Alternative Methods Unit The European Commission’s science and knowledge service Joint Research Centre EUR 28708 EN This publication is a Tutorial by the Joint Research Centre (JRC), the European Commission’s science and knowledge service. It aims to provide user support. The scientific output expressed does not imply a policy position of the European Commission. Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use that might be made of this publication. Contact information Email: [email protected] JRC Science Hub https://ec.europa.eu/jrc JRC107492 EUR 28708 EN PDF ISBN 978-92-79-71294-4 ISSN 1831-9424 doi:10.2760/952280 Print ISBN 978-92-79-71295-1 ISSN 1018-5593 doi:10.2760/668595 Luxembourg: Publications Office of the European Union, 2017 Ispra: European Commission, 2017 © European Union, 2017 The reuse of the document is authorised, provided the source is acknowledged and the original meaning or message of the texts are not distorted. The European Commission shall not be held liable for any consequences stemming from the reuse. How to cite this document: Triebe
    [Show full text]
  • MACSIMUS Manual
    1 MACSIMUS manual benzocaine (ethylaminobenzoate) parameter_set = charmm21 HA | HA-CT-HA | HA-CT-HA | OSn.2 | Cp.7=On.5 | C6R--C6R-HA Most often used links: | | HA-C6R C6R-HA 2.2 Blend synopsis and options | | HA-C6R--C6R-NPn.5^-Hp.25 9.2 Cook synopsis and options | 9.2.5 Cook input data Hp.25 MACromolecule SIMUlation Software © Jiˇr´ıKolafa 1993{2020 MACSIMUS may be distributed under the terms of the GNU General Public Licence Credits: ray: Mark VandeWettering \reasonably intelligent raytracer" CHARMM force field (files: charmm*.par, charmm*/*.rsd) GROMOS force field (files: gromos*.par, gromos*/*.rsd) amoeba implementation by Z. Wagner moil support by J. Schofield several bug fixes by T. Trnka bug discovered by N. Parfenov Contents I Program `blend' version 2.4b 14 1 Introduction 16 1.1 Force fields...................................... 16 1.2 `blend' overview.................................... 16 1.3 Versions........................................ 17 2 Running blend 18 2.1 Environment...................................... 18 2.2 Synopsis........................................ 19 2.2.1 Global options................................. 19 2.2.2 par-options and parameter files....................... 20 2.2.3 mol-options and molecular files....................... 22 2.2.4 Extra-options ................................. 29 2.3 File extensions.................................... 34 2.4 Run-time control................................... 37 2.4.1 get data format for input........................... 37 2.4.2 Scrolling.................................... 38 2.4.3 Error handling................................ 39 2.4.4 Interrupts................................... 40 2.5 Showing molecules graphically............................ 40 2.5.1 X11 Graphics................................. 40 2.5.2 Playback output............................... 44 2.6 Energy minimization................................. 44 2.7 Missing coordinates.................................. 45 3 Force field and the parameter file 46 3.1 Structure of the parameter file...........................
    [Show full text]
  • In Silico Molecular Modelling and Design of Heme-Containing Peroxidases for Industrial Applications
    In silico molecular modelling and design of heme-containing peroxidases for industrial applications Marina Cañellas Fontanilles Aquesta tesi doctoral està subjecta a la llicència Reconeixement- NoComercial 3.0. Espanya de Creative Commons. Esta tesis doctoral está sujeta a la licencia Reconocimiento - NoComercial 3.0. España de Creative Commons. This doctoral thesis is licensed under the Creative Commons Attribution-NonCommercial 3.0. Spain License. Marina Cañellas Fontanilles molecular modelling and design of heme-containing peroxidases for industrial applications applications industrial for peroxidases molecular modelling and design of heme-containing In silico In silico molecular modelling and design of heme-containing peroxidases for industrial applications Marina Cañellas Fontanilles UNIVERSITAT DE BARCELONA Facultat de Farmàcia i Ciències de l’Alimentació Programa de Doctorat en Biotecnologia In silico molecular modelling and design of heme-containing peroxidases for industrial applications Memòria presentada per Marina Cañellas Fontanilles per optar al títol de doctor per la Universitat de Barcelona Dirigida per: Dr. Victor Guallar Tasies Dr. Maria Fátima Lucas Tutora: Dr. Josefa Badia Palacín Marina Cañellas Fontanilles Barcelona, 2018 “Voici mon secret: L’essentiel est invisible pour les yeux.” “And now here is my secret: what is essential is invisible to the eye.” Antoine de Saint-Exupéry, Le Petit Prince Table of Contents ACKNOWLEDGEMENTS ............................................................................ i LIST OF
    [Show full text]
  • Ontology-Based Classification of Molecules
    Ontology-Based Classification of Molecules: a Logic Programming Approach Despoina Magka Department of Computer Science, University of Oxford Wolfson Building, Parks Road, OX1 3QD, UK [email protected] Abstract. We describe a prototype that performs structure-based classification of molecular structures. The software we present implements a sound and com- plete reasoning procedure of a formalism that extends logic programming and builds upon the DLV deductive databases system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. In terms of performance, a no- ticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language which is suit- able for the Life Sciences domain and exhibits an encouraging balance between expressive power and practical feasibility. Keywords: Knowledge representation and reasoning, Logic programming and answer set programming, Cheminformatics. 1 Introduction The volume of bioinformatics data produced by research laboratories worldwide is in- creasing at an astonishing rate turning the need to adequately catalogue, represent and index the vast amounts of Life Sciences data sources into a pressing challenge. Semantic technologies have achieved significant progress towards the federation of biochemical information [33, 3, 4] via the definition and use of domain vocabularies with formal semantics, also known as ontologies. OWL [15], a family of logic-based knowledge representation (KR) formalisms standardised by W3C, has played a pivotal role in the advent of Semantic technologies due to its significant ability to reason over ontolo- gies by means of logical inference.
    [Show full text]
  • (CDK): an Open-Source Java Library for Chemo- and Bioinformatics
    J. Chem. Inf. Comput. Sci. 2003, 43, 493-500 493 The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics Christoph Steinbeck,*,† Yongquan Han,† Stefan Kuhn,† Oliver Horlacher,‡ Edgar Luttmann,§ and Egon Willighagen# Max-Planck-Institute of Chemical Ecology, Jena, Germany, TheraSTrat AG, Allschwil, Switzerland, Institute of Organic Chemistry, University of Paderborn, Germany, and Nijmegen, The Netherlands Received August 17, 2002 The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given. 1. INTRODUCTION of software development, most widely recognized through the great success of the free Unix-like operating system Whoever pursues the endeavor of creating a larger GNU/Linux, a collaborative work of many individuals and software package in chemoinformatics or computational organizations, including the Free Software Foundation lead chemistry from scratch will soon be confronted with the by Richard Stallman and the Finish computer science student Syssiphus task of implementing the standard repertoire of Linus Torvalds who started the project. According to several chemoinformatical algorithms and components invented essays on this subject, open-source software, for which, by during the last 20 or 30 years.
    [Show full text]
  • Chemical File Format Conversion Tools : a N Overview
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 3 Issue 2, February - 2014 Chemical File Format Conversion Tools : A n Overview Kavitha C. R Dr. T Mahalekshmi Research Scholar, Bharathiyar University Principal Dept of Computer Applications Sree Narayana Institute of Technology SNGIST Kollam, India Cochin, India Abstract— There are a lot of chemical data stored in large different chemical file formats. Three types of file format databases, repositories and other resources. These data are used conversion tools are discussed in section III. And the by different researchers in different applications in various areas conclusion is given in section IV followed by the references. of chemistry. Since these data are stored in several standard chemical file formats, there is a need for the inter-conversion of II. CHEMICAL FILE FORMATS chemical structures between different formats because all the formats are not supported by various software and tools used by A chemical is a collection of atoms bonded together the researchers. Therefore it becomes essential to convert one file in space. The structure of a chemical makes it unique and format to another. This paper reviews some of the chemical file gives it its physical and biological characteristics. This formats and also presents a few inter-conversion tools such as structure is represented in a variety of chemical file formats. Open Babel [1], Mol converter [2] and CncTranslate [3]. These formats are used to represent chemical structure records and its associated data fields. Some of the file Keywords— File format Conversion, Open Babel, mol formats are CML (Chemical Markup Language), SDF converter, CncTranslate, inter- conversion tools.
    [Show full text]
  • Evaluation of Log P, Pka, and Log D Predictions from the SAMPL7 Blind Challenge
    Journal of Computer-Aided Molecular Design (2021) 35:771–802 https://doi.org/10.1007/s10822-021-00397-3 Evaluation of log P, pKa, and log D predictions from the SAMPL7 blind challenge Teresa Danielle Bergazin1 · Nicolas Tielker6 · Yingying Zhang3 · Junjun Mao4 · M. R. Gunner3,4 · Karol Francisco5 · Carlo Ballatore5 · Stefan M. Kast6 · David L. Mobley1,2 Received: 21 April 2021 / Accepted: 5 June 2021 / Published online: 24 June 2021 © The Author(s) 2021 Abstract The Statistical Assessment of Modeling of Proteins and Ligands (SAMPL) challenges focuses the computational modeling community on areas in need of improvement for rational drug design. The SAMPL7 physical property challenge dealt with prediction of octanol-water partition coefcients and pKa for 22 compounds. The dataset was composed of a series of N-acylsulfonamides and related bioisosteres. 17 research groups participated in the log P challenge, submitting 33 blind submissions total. For the pKa challenge, 7 diferent groups participated, submitting 9 blind submissions in total. Overall, the accuracy of octanol-water log P predictions in the SAMPL7 challenge was lower than octanol-water log P predictions in SAMPL6, likely due to a more diverse dataset. Compared to the SAMPL6 pKa challenge, accuracy remains unchanged in SAMPL7. Interestingly, here, though macroscopic pKa values were often predicted with reasonable accuracy, there was dramatically more disagreement among participants as to which microscopic transitions produced these values (with methods often disagreeing even as
    [Show full text]
  • Bringing Open Source to Drug Discovery
    Bringing Open Source to Drug Discovery Chris Swain Cambridge MedChem Consulting Standing on the shoulders of giants • There are a huge number of people involved in writing open source software • It is impossible to acknowledge them all individually • The slide deck will be available for download and includes 25 slides of details and download links – Copy on my website www.cambridgemedchemconsulting.com Why us Open Source software? • Allows access to source code – You can customise the code to suit your needs – If developer ceases trading the code can continue to be developed – Outside scrutiny improves stability and security What Resources are available • Toolkits • Databases • Web Services • Workflows • Applications • Scripts Toolkits • OpenBabel (htttp://openbabel.org) is a chemical toolbox – Ready-to-use programs, and complete programmer's toolkit – Read, write and convert over 110 chemical file formats – Filter and search molecular files using SMARTS and other methods, KNIME add-on – Supports molecular modeling, cheminformatics, bioinformatics – Organic chemistry, inorganic chemistry, solid-state materials, nuclear chemistry – Written in C++ but accessible from Python, Ruby, Perl, Shell scripts… Toolkits • OpenBabel • R • CDK • OpenCL • RDkit • SciPy • Indigo • NumPy • ChemmineR • Pandas • Helium • Flot • FROWNS • GNU Octave • Perlmol • OpenMPI Toolkits • RDKit (http://www.rdkit.org) – A collection of cheminformatics and machine-learning software written in C++ and Python. – Knime nodes – The core algorithms and data structures are written in C ++. Wrappers are provided to use the toolkit from either Python or Java. – Additionally, the RDKit distribution includes a PostgreSQL-based cartridge that allows molecules to be stored in relational database and retrieved via substructure and similarity searches.
    [Show full text]
  • Estrogenic Activity of Lignin-Derivable Alternatives to Bisphenol a Assessed Via Molecular Docking Cite This: RSC Adv.,2021,11, 22149 Simulations†
    RSC Advances View Article Online PAPER View Journal | View Issue Estrogenic activity of lignin-derivable alternatives to bisphenol A assessed via molecular docking Cite this: RSC Adv.,2021,11, 22149 simulations† Alice Amitrano, ‡a Jignesh S. Mahajan, ‡b LaShanda T. J. Korley abc and Thomas H. Epps, III *abc Lignin-derivable bisphenols are potential alternatives to bisphenol A (BPA), a suspected endocrine disruptor; however, a greater understanding of structure–activity relationships (SARs) associated with such lignin- derivable building blocks is necessary to move replacement efforts forward. This study focuses on the prediction of bisphenol estrogenic activity (EA) to inform the design of potentially safer BPA alternatives. To achieve this goal, the binding affinities to estrogen receptor alpha (ERa) of lignin-derivable bisphenols were calculated via molecular docking simulations and correlated to median effective concentration (EC50) values using an empirical correlation curve created from known EC50 values and binding affinities Creative Commons Attribution-NonCommercial 3.0 Unported Licence. of commercial (bis)phenols. Based on the correlation curve, lignin-derivable bisphenols with binding affinities weaker than À6.0 kcal molÀ1 were expected to exhibit no EA, and further analysis suggested that having two methoxy groups on an aromatic ring of the bio-derivable bisphenol was largely responsible for the reduction in binding to ERa. Such dimethoxy aromatics are readily sourced from the depolymerization of hardwood biomass. Additionally, bulkier
    [Show full text]