Blue Obelisk - Interoperability in Chemical Informatics

Total Page:16

File Type:pdf, Size:1020Kb

Blue Obelisk - Interoperability in Chemical Informatics Blue Obelisk - Interoperability in chemical informatics Citation for published version (APA): Guha, R., Howard, MT., Hutchison, GR., Murray-Rust, P., Rzepa, H., Steinbeck, C., Wegner, J., & Willighagen, EL. (2006). Blue Obelisk - Interoperability in chemical informatics. Journal of Chemical Information and Modeling, 46(3), 991-998. https://doi.org/10.1021/ci050400b Document status and date: Published: 01/05/2006 DOI: 10.1021/ci050400b Document Version: Publisher's PDF, also known as Version of record Document license: Taverne Please check the document version of this publication: • A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.umlib.nl/taverne-license Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim. Download date: 23 Sep. 2021 This is an open access article published under an ACS AuthorChoice License, which permits copying and redistribution of the article or any adaptations for non-commercial purposes. J. Chem. Inf. Model. 2006, 46, 991-998 991 The Blue ObelisksInteroperability in Chemical Informatics Rajarshi Guha,† Michael T. Howard,‡ Geoffrey R. Hutchison,§ Peter Murray-Rust,| Henry Rzepa,⊥ Christoph Steinbeck,*,# Jo¨rg Wegner,∇ and Egon L. WillighagenO Pennsylvania State University, University Park, Pennsylvania 16804-3000, Jmol Project, U. S. A., Cornell University, Ithaca, New York 14853, Cambridge University, Cambridge CB2 1TN, Great Britain, Imperial College, London SW7 2AZ, Great Britain, Cologne University Bioinformatics Center (CUBIC), Zu¨lpicher Str. 47, D-50674 Ko¨ln, Germany, University of Tu¨bingen, Tu¨bingen, Germany, and Jmol project, The Netherlands Received September 12, 2005 The Blue Obelisk Movement (http://www.blueobelisk.org/) is the name used by a diverse Internet group promoting reusable chemistry via open source software development, consistent and complimentary chemoinformatics research, open data, and open standards. We outline recent examples of cooperation in the Blue Obelisk group: a shared dictionary of algorithms and implementations in chemoinformatics algorithms drawing from our various software projects; a shared repository of chemoinformatics data including elemental properties, atomic radii, isotopes, atom typing rules, and so forth; and Web services for the platform- independent use of chemoinformatics programs. 1. INTRODUCTION working on another machine. But the 1980s saw the value of chemical informatics and the need to “productize” it. Much While the past 20 or 30 years of development in chemo- of this was meritorious, as it brought informatics into the informatics has created a plethora of published software classroom and the research lab and helped pay for some systems and algorithms for solving chemical problems, little chemistry research, but it also had hidden costs, which we effort has been spent in providing the community with open are now facing today. In particular, costs include non- components and data, to be reused and improved by interoperability and centralized control of informatics. communal efforts. Bioinformatics, with its much younger history, adopted the principles taught by success stories of Now, several open chemistry and chemoinformatics projects the open source movement in general, and Linux in (Table 1) have pooled forces to enhance interoperability particular, from the very beginning. Recent years, however, between these tools in a movement we call “The Blue have seen the emergence of open tools and databases also Obelisk” (BO). The name originates from an informal in chemical informatics.1-4 These draw on the existing ideas meeting place in San Diego, California, during the American of independent peer review and scientific collaboration, Chemical Society 2005 Spring National Meeting (see Figure mixed with “open source” software development paradigms. 1) and was coined by one of the authors. Because contribu- Community involvement, including assessments, suggestions, tors to the component projects live around the world, few s critiques, and rapid evolution, is a core component of these had met in person instead collaborating and meeting via efforts. The benefits of open source software have been the Internet. We identify three core areas for the Blue Obelisk Move- Downloaded via MAASTRICHT UNIV on June 30, 2021 at 18:25:52 (UTC). discussed in great detail by Eric Raymond in his seminal work The Cathedral and the Bazaar and following works.5 ment: The Open Source Initiative (OSI) summarizes: “Open source • Open Source. One can use other people’s code without promotes software reliability and quality by supporting further permission, including changing it for one’s own use See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles. independent peer review and rapid evolution of source code. and distributing it again. To be OSI certified, the software must be distributed under • Open Standards. One can find visible community mech- a license that guarantees the right to read, redistribute, anisms for protocols and communicating information. The modify, and use the software freely.”6 mechanisms for creating and maintaining these standards In the beginning, most scientific software was free. It was cover a wide spectrum of human organizations, including so difficult to port that scientists did not bother about various degrees of consent. We have been heavily influenced licensessone was delighted if someone else could get it by the mantra of the Internet Engineering Task Force: “rough consensus and running code”. * Corresponding author phone: +49 (0)221 470-7426; fax: +49 (0) • 221 470-7786; e-mail: [email protected]. Open Data. One can obtain all data in the public domain † Pennsylvania State University. when wanted and reuse it for whatever purpose. This is an ‡ Jmol project, http://www.jmol.org. underused term, which we are resurrecting. It is independent § Cornell University. | Cambridge University. of “open access” and has relevance to “closed access” as ⊥ Imperial College. well. # Cologne University Bioinformatics Center. ∇ University of Tu¨bingen. As outlined above, these areas are independent of the O Jmol project, http://www.jmol.org. concept of “open access” to read publications freely. Instead, 10.1021/ci050400b CCC: $33.50 © 2006 American Chemical Society Published on Web 02/22/2006 992 J. Chem. Inf. Model., Vol. 46, No. 3, 2006 GUHA ET AL. Table 1. Current Blue Obelisk Projects project URL principal authors CML, JUMBO12 http://cml.sf.net/ P.M.-R., H.R. JChemPaint13 http://jchempaint.sf.net/ C.S., E.L.W. Jmol http://jmol.sf.net/ M.T.H., E.L.W. NMRShiftDB3 http://www.nmrshiftdb.org/ C.S. JOElib http://joelib.sf.net/ J.W. Kalzium http://edu.kde.org/kalzium/ Carsten Niehaus Octet http://octet.sf.net/ Rich Apodaca Open Babel http://openbabel.sf.net/ G.R.H. QSAR http://qsar.sf.net/ E.L.W., R.G., C.S., J.W. The Chemistry Development Kit1 http://cdk.sf.net/ E.L.W., C.S. WWMM http://wwmm.sf.net/ P.M.-R. the three points focus on access to the scientific data, coding,10 and atom typing,11 however, are indispensable in algorithms, and implementations themselves, rather than the academic chemoinformatics research to build better, more formatted manuscript. In particular, we believe that these stable, and more reproducible chemical information systems. concepts strongly continue the spirit of communal peer In this contribution, we outline several examples for how review and reproducibility at the heart of modern scientific the Blue Obelisk projects address this need: a shared research. dictionary of algorithms and implementations in chemo- It is well-known in software development that 80% of the informatics algorithms drawing from our various software costs are caused by maintaining software and not by the projects and a shared repository of chemoinformatics data initial implementation.7 This holds both for the in-house including elemental properties, atomic radii, isotopes, atom development in pharmaceutical companies and the develop- typing rules, a set of Web-based chemoinformatics services, ment for commercial chemoinformatics suppliers. Besides
Recommended publications
  • Open Babel Documentation Release 2.3.1
    Open Babel Documentation Release 2.3.1 Geoffrey R Hutchison Chris Morley Craig James Chris Swain Hans De Winter Tim Vandermeersch Noel M O’Boyle (Ed.) December 05, 2011 Contents 1 Introduction 3 1.1 Goals of the Open Babel project ..................................... 3 1.2 Frequently Asked Questions ....................................... 4 1.3 Thanks .................................................. 7 2 Install Open Babel 9 2.1 Install a binary package ......................................... 9 2.2 Compiling Open Babel .......................................... 9 3 obabel and babel - Convert, Filter and Manipulate Chemical Data 17 3.1 Synopsis ................................................. 17 3.2 Options .................................................. 17 3.3 Examples ................................................. 19 3.4 Differences between babel and obabel .................................. 21 3.5 Format Options .............................................. 22 3.6 Append property values to the title .................................... 22 3.7 Filtering molecules from a multimolecule file .............................. 22 3.8 Substructure and similarity searching .................................. 25 3.9 Sorting molecules ............................................ 25 3.10 Remove duplicate molecules ....................................... 25 3.11 Aliases for chemical groups ....................................... 26 4 The Open Babel GUI 29 4.1 Basic operation .............................................. 29 4.2 Options .................................................
    [Show full text]
  • JRC QSAR Model Database
    JRC QSAR Model Database EURL ECVAM DataBase service on ALternative Methods to animal experimentation To promote the development and uptake of alternative and advanced methods in toxicology and biomedical sciences SDF - STRUCTURE DATA FORMAT: How to create from SMILES The European Commission’s science and knowledge service Joint Research Centre Directorate F Health, Consumers & Reference Materials Chemicals Safety & Alternative Methods Unit The European Commission’s science and knowledge service Joint Research Centre EUR 28708 EN This publication is a Tutorial by the Joint Research Centre (JRC), the European Commission’s science and knowledge service. It aims to provide user support. The scientific output expressed does not imply a policy position of the European Commission. Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use that might be made of this publication. Contact information Email: [email protected] JRC Science Hub https://ec.europa.eu/jrc JRC107492 EUR 28708 EN PDF ISBN 978-92-79-71294-4 ISSN 1831-9424 doi:10.2760/952280 Print ISBN 978-92-79-71295-1 ISSN 1018-5593 doi:10.2760/668595 Luxembourg: Publications Office of the European Union, 2017 Ispra: European Commission, 2017 © European Union, 2017 The reuse of the document is authorised, provided the source is acknowledged and the original meaning or message of the texts are not distorted. The European Commission shall not be held liable for any consequences stemming from the reuse. How to cite this document: Triebe
    [Show full text]
  • MACSIMUS Manual
    1 MACSIMUS manual benzocaine (ethylaminobenzoate) parameter_set = charmm21 HA | HA-CT-HA | HA-CT-HA | OSn.2 | Cp.7=On.5 | C6R--C6R-HA Most often used links: | | HA-C6R C6R-HA 2.2 Blend synopsis and options | | HA-C6R--C6R-NPn.5^-Hp.25 9.2 Cook synopsis and options | 9.2.5 Cook input data Hp.25 MACromolecule SIMUlation Software © Jiˇr´ıKolafa 1993{2020 MACSIMUS may be distributed under the terms of the GNU General Public Licence Credits: ray: Mark VandeWettering \reasonably intelligent raytracer" CHARMM force field (files: charmm*.par, charmm*/*.rsd) GROMOS force field (files: gromos*.par, gromos*/*.rsd) amoeba implementation by Z. Wagner moil support by J. Schofield several bug fixes by T. Trnka bug discovered by N. Parfenov Contents I Program `blend' version 2.4b 14 1 Introduction 16 1.1 Force fields...................................... 16 1.2 `blend' overview.................................... 16 1.3 Versions........................................ 17 2 Running blend 18 2.1 Environment...................................... 18 2.2 Synopsis........................................ 19 2.2.1 Global options................................. 19 2.2.2 par-options and parameter files....................... 20 2.2.3 mol-options and molecular files....................... 22 2.2.4 Extra-options ................................. 29 2.3 File extensions.................................... 34 2.4 Run-time control................................... 37 2.4.1 get data format for input........................... 37 2.4.2 Scrolling.................................... 38 2.4.3 Error handling................................ 39 2.4.4 Interrupts................................... 40 2.5 Showing molecules graphically............................ 40 2.5.1 X11 Graphics................................. 40 2.5.2 Playback output............................... 44 2.6 Energy minimization................................. 44 2.7 Missing coordinates.................................. 45 3 Force field and the parameter file 46 3.1 Structure of the parameter file...........................
    [Show full text]
  • Ontology-Based Classification of Molecules
    Ontology-Based Classification of Molecules: a Logic Programming Approach Despoina Magka Department of Computer Science, University of Oxford Wolfson Building, Parks Road, OX1 3QD, UK [email protected] Abstract. We describe a prototype that performs structure-based classification of molecular structures. The software we present implements a sound and com- plete reasoning procedure of a formalism that extends logic programming and builds upon the DLV deductive databases system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. In terms of performance, a no- ticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language which is suit- able for the Life Sciences domain and exhibits an encouraging balance between expressive power and practical feasibility. Keywords: Knowledge representation and reasoning, Logic programming and answer set programming, Cheminformatics. 1 Introduction The volume of bioinformatics data produced by research laboratories worldwide is in- creasing at an astonishing rate turning the need to adequately catalogue, represent and index the vast amounts of Life Sciences data sources into a pressing challenge. Semantic technologies have achieved significant progress towards the federation of biochemical information [33, 3, 4] via the definition and use of domain vocabularies with formal semantics, also known as ontologies. OWL [15], a family of logic-based knowledge representation (KR) formalisms standardised by W3C, has played a pivotal role in the advent of Semantic technologies due to its significant ability to reason over ontolo- gies by means of logical inference.
    [Show full text]
  • (CDK): an Open-Source Java Library for Chemo- and Bioinformatics
    J. Chem. Inf. Comput. Sci. 2003, 43, 493-500 493 The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics Christoph Steinbeck,*,† Yongquan Han,† Stefan Kuhn,† Oliver Horlacher,‡ Edgar Luttmann,§ and Egon Willighagen# Max-Planck-Institute of Chemical Ecology, Jena, Germany, TheraSTrat AG, Allschwil, Switzerland, Institute of Organic Chemistry, University of Paderborn, Germany, and Nijmegen, The Netherlands Received August 17, 2002 The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given. 1. INTRODUCTION of software development, most widely recognized through the great success of the free Unix-like operating system Whoever pursues the endeavor of creating a larger GNU/Linux, a collaborative work of many individuals and software package in chemoinformatics or computational organizations, including the Free Software Foundation lead chemistry from scratch will soon be confronted with the by Richard Stallman and the Finish computer science student Syssiphus task of implementing the standard repertoire of Linus Torvalds who started the project. According to several chemoinformatical algorithms and components invented essays on this subject, open-source software, for which, by during the last 20 or 30 years.
    [Show full text]
  • Chemical File Format Conversion Tools : a N Overview
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 3 Issue 2, February - 2014 Chemical File Format Conversion Tools : A n Overview Kavitha C. R Dr. T Mahalekshmi Research Scholar, Bharathiyar University Principal Dept of Computer Applications Sree Narayana Institute of Technology SNGIST Kollam, India Cochin, India Abstract— There are a lot of chemical data stored in large different chemical file formats. Three types of file format databases, repositories and other resources. These data are used conversion tools are discussed in section III. And the by different researchers in different applications in various areas conclusion is given in section IV followed by the references. of chemistry. Since these data are stored in several standard chemical file formats, there is a need for the inter-conversion of II. CHEMICAL FILE FORMATS chemical structures between different formats because all the formats are not supported by various software and tools used by A chemical is a collection of atoms bonded together the researchers. Therefore it becomes essential to convert one file in space. The structure of a chemical makes it unique and format to another. This paper reviews some of the chemical file gives it its physical and biological characteristics. This formats and also presents a few inter-conversion tools such as structure is represented in a variety of chemical file formats. Open Babel [1], Mol converter [2] and CncTranslate [3]. These formats are used to represent chemical structure records and its associated data fields. Some of the file Keywords— File format Conversion, Open Babel, mol formats are CML (Chemical Markup Language), SDF converter, CncTranslate, inter- conversion tools.
    [Show full text]
  • Data Mining for Rational Drug Design
    FACULDADE DE ENGENHARIA DA UNIVERSIDADE DO PORTO Data Mining for Rational Drug Design Catarina Isabel Peixoto Candeias Mestrado em Engenharia Biomédica Supervisor: Rui Camacho July 7, 2017 c Catarina Isabel Peixoto Candeias, 2017 Data Mining for Rational Drug Design Catarina Isabel Peixoto Candeias Mestrado em Engenharia Biomédica Faculdade de Engenharia da Universidade do Porto July 7, 2017 Resumo Atualmente existe uma crescente incidência de doenças no mundo e por isso, existe uma maior quantidade de medicamentos disponíveis para promover uma maior eficiência nos sistemas de saúde. Assim, a importância dos novos medicamentos é indiscutível para a vida humana e, conse- quentemente traduz-se numa maior competência a nível profissional, trazendo muitos benefícios para a sociedade em geral. Por acréscimo, o seu uso consciente conduz também a uma decrescente necessidade de outros cuidados de saúde mais prolongados e de custos mais elevados. Os fármacos desencadeiam um efeito terapêutico que proporciona uma melhoria de qualidade de vida, no entanto, para sintetizar um fármaco novo, a indústria farmacêutica tem que percorrer um longo, complexo e oneroso processo. Um dos problemas de saúde que tem vindo a crescer exponencialmente corresponde às doenças neurodegenerativas. Este crescimento veio então aumentar a necessidade de descobrir e desen- volver novos fármacos que possam combater este problema. O processo de conceção de fármacos para este tipo de doenças, em que o fármaco tem que alcançar o Sistema Nervoso Central (SNC), é ainda mais demorado devido à complexidade do cérebro, à tendência dos fármacos para provo- carem efeitos adversos graves e principalmente devido à existência da Barreira Hemato-Encefálica (BHE).
    [Show full text]
  • Downloaded Aug
    O’Boyle Journal of Cheminformatics 2012, 4:22 http://www.jcheminf.com/content/4/1/22 RESEARCH ARTICLE Open Access Towards a Universal SMILES representation - A standard method to generate canonical SMILES based on the InChI Noel M O’Boyle Abstract Background: There are two line notations of chemical structures that have established themselves in the field: the SMILES string and the InChI string. The InChI aims to provide a unique, or canonical, identifier for chemical structures, while SMILES strings are widely used for storage and interchange of chemical structures, but no standard exists to generate a canonical SMILES string. Results: I describe how to use the InChI canonicalisation to derive a canonical SMILES string in a straightforward way, either incorporating the InChI normalisations (Inchified SMILES) or not (Universal SMILES). This is the first description of a method to generate canonical SMILES that takes stereochemistry into account. When tested on the 1.1 m compounds in the ChEMBL database, and a 1 m compound subset of the PubChem Substance database, no canonicalisation failures were found with Inchified SMILES. Using Universal SMILES, 99.79% of the ChEMBL database was canonicalised successfully and 99.77% of the PubChem subset. Conclusions: The InChI canonicalisation algorithm can successfully be used as the basis for a common standard for canonical SMILES. While challenges remain – such as the development of a standard aromatic model for SMILES – the ability to create the same SMILES using different toolkits will mean that for the first time it will be possible to easily compare the chemical models used by different toolkits.
    [Show full text]
  • Herman Skolnik Award Symposium 2016
    Herman Skolnik Award Symposium 2016 Honoring Stephen Bryant and Evan Bolton A report by Wendy Warr ([email protected]) for the ACS CINF Chemical Information Bulletin Introduction Stephen Bryant and Evan Bolton were selected to receive the 2016 Herman Skolnik Award for their work on developing, maintaining, and expanding the Web-based National Center for Biotechnology Information (NCBI) PubChem database, and related software capabilities and analytical tools, to enhance the scientific discovery process. NCBI is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). A summary of Steve and Evan’s achievements has been published in the Chemical Information Bulletin. They were invited to present an award symposium at the Fall 2016 ACS National Meeting in Philadelphia, PA. They invited twelve speakers: L to R: Valery Tkachenko, Roger Sayle, Leah McEwen, Steve Heller, Wolf-Dietrich Ihlenfeldt (partially obscured), Yulia Borodina, Peter Linstrom, Steve Bryant, Marc Nicklaus (at front), Evan Bolton (at back), Steve Boyer, Daniel Zaharevitz, Christoph Steinbeck. Not pictured: Michel Dumontier (inset) Developing databases and standards in chemistry Steve Heller was the first speaker, with an amusing scene-setting talk. He admitted that his secret in getting to where he is now was “luck, luck, luck”. He disliked chemistry lab work; he was at the right place at the right time with the right people; he worked with supportive people; and he planned for who would take over the work next. If the problem were just technology, someone would have solved it already. The real problem is always cultural and political, not technical.
    [Show full text]
  • © 2015 Michelle Hause Johnson
    © 2015 Michelle Hause Johnson ANTHOCYANINS AND PROANTHOCYANIDINS FROM BLUEBERRY AND BLACKBERRY FERMENTED BEVERAGES TO REDUCE INFLAMMATION AND TYPE-2 DIABETES: A COMPREHENSIVE IN VITRO AND IN VIVO EVALUATION BY MICHELLE HAUSE JOHNSON DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Nutritional Sciences in the Graduate College of the University of Illinois at Urbana-Champaign, 2015 Urbana, Illinois Doctoral Committee: Professor John Erdman, Jr., Chair Professor Elvira de Mejia, Director of Research Professor Gregory Freund Professor Jack Juvik Professor Mary Ann Lila, North Carolina State University ABSTRACT Type-2 diabetes is a serious metabolic disease that currently affects 9.3% of the U.S. population and is aggravated by diets low in fruit and vegetable intake leading to postprandial oxidative stress and inflammation [1, 2]. Berries are one of the richest dietary sources of polyphenolic compounds associated with decreased markers of chronic inflammation and decreased risk for type-2 diabetes [3,4], specifically anthocyanins [5] (ANC) and proanthocyanidins (PAC), polymerized forms of ANC and other phenolics [6]. Fermentation is a feasible way to increase phenolic content of berry juice products, and fermented products may be more bioactive than their unfermented counterparts, thus increasing the potential for health benefits [7]. Our long-term goal, as outlined in Chapter 1, was to fill the gap of knowledge in understanding the mechanisms by which dietary bioactives found in fermented berry beverages contribute to the management of type-2 diabetes using in vitro enzymatic kinetics, analytical assays, computational modeling, in vitro cell culture, and in vivo animal feeding trials.
    [Show full text]
  • Open Babel Documentation
    Open Babel Documentation Geoffrey R Hutchison Chris Morley Craig James Chris Swain Hans De Winter Tim Vandermeersch Noel M O’Boyle (Ed.) Mar 26, 2021 Contents 1 Introduction 3 1.1 Goals of the Open Babel project.....................................3 1.2 Frequently Asked Questions.......................................4 1.3 Thanks..................................................7 2 Install Open Babel 11 2.1 Install a binary package......................................... 11 2.2 Compiling Open Babel.......................................... 11 3 obabel - Convert, Filter and Manipulate Chemical Data 19 3.1 Synopsis................................................. 19 3.2 Options.................................................. 19 3.3 Examples................................................. 22 3.4 Format Options.............................................. 24 3.5 Append property values to the title.................................... 24 3.6 Generating conformers for structures.................................. 24 3.7 Filtering molecules from a multimolecule file.............................. 25 3.8 Substructure and similarity searching.................................. 28 3.9 Sorting molecules............................................ 28 3.10 Remove duplicate molecules....................................... 28 3.11 Aliases for chemical groups....................................... 29 3.12 Forcefield energy and minimization................................... 30 3.13 Aligning molecules or substructures..................................
    [Show full text]
  • Chemistry Resources and Tools for Compound Selection Cheminformatics
    Chemistry resources and tools for compound selection Cheminformatics Noel M. O’Boyle NextMove Software and Open Babel developer “Noel O’Blog” Dec 2013 EMBL-EBI/Wellcome Trust Course: Resources for Computational Drug Discovery Cheminformatics • Hard to define in words: – David Wild: “The field that studies all aspects of the representation and use of chemical and related biological information on computers” – Design, creation, organization, management, retrieval, analysis, dissemination, visualization and use of chemical information • Hard to agree on spelling: – Sometimes chemoinformatics • More easily thought of as encompassing a range of concepts and techniques – Molecular similarity – Quantitative-structure activity relationships (QSAR) – Substructure search – (Automated) Molecular depiction – Encoding/decoding of molecular structures – 3D structure generation from a 2D or 0D structure – Conformer generation – Algorithms: ring perception, aromaticity, isomers References • An introduction to cheminformatics, A. R. Leach, V. J. Gillet • Cheminformatics, Johann Gasteiger and Thomas Engel (Eds) • Molecular modelling – Principles and Applications, A. R. Leach • I571 Chemical Information Technology, David Wild, University of Indiana – http://i571.wikispaces.com – Introducing cheminformatics, D. Wild Molecular representation Mike Hann (GSK): “Ceci n'est pas une molecule serves to remind us that all of the graphics images presented here are not molecules, not even pictures of molecules, but pictures of icons which we believe represent some aspects of the molecule's properties.” http://mgl.scripps.edu/people/goodsell/mgs_art/hann.html Computer representations of molecules • How can a molecular structure be stored on a computer? – Common names: aspirin – IUPAC name: 2-acetoxybenzoic acid – Formula: C9H8O4 – As an image (PNG, GIF, etc.) – CAS number: 50-78-2 – File format: ChemDraw file, MOL file, etc.
    [Show full text]