1.3 Quantum Chemistry

Total Page:16

File Type:pdf, Size:1020Kb

1.3 Quantum Chemistry PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is an author's version which may differ from the publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/72267 Please be advised that this information was generated on 2021-10-05 and may be subject to change. Representation of Molecules and Molecular Systems in Data Analysis and Modeling EEN WETENSCHAPPELIJKE PROEVE OP HET GEBIED VAN DE NATUURWETENSCHAPPEN, WISKUNDE EN INFORMATICA PROEFSCHRIFT TER VERKRIJGING VAN DE GRAAD VAN DOCTOR AAN DE RADBOUD UNIVERSITEIT NIJMEGEN OP GEZAG VAN DE RECTOR MAGNIFICUS PROF. MR. S.C.J.J. KORTMANN, VOLGENS BESLUIT VAN HET COLLEGE VAN DECANEN IN HET OPENBAAR TE VERDEDIGEN OP WOENSDAG 2 APRIL 2008 OM 13:30 UUR PRECIES DOOR EGON LENNERT WILLIGHAGEN GEBOREN OP 27 OKTOBER 1974 TE ARNHEM Promotores Prof. dr. L.M.C. Buydens Prof. dr. P. Murray-Rust (University of Cambridge, United Kingdom) Copromotor Dr. R. Wehrens Manuscriptcommissie Prof. E. Vlieg Prof. dr. J. Gasteiger (Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany) Dr. C. Steinbeck (European Bioinformatics Institute, United Kingdom) The work presented in this thesis was supported financially by the Netherlands Organization for Scientific Research (NWO). Copyright © 2008 by E.L. Willighagen All rights reserved ISBN 978-90-9022806-8 Contents 1 Introduction 7 1.1 Molecular Representations . .8 1.2 Chemical Graphs . .9 1.3 Quantum Chemistry . 10 1.4 Numerical Representations . 11 1.5 Chemometrics . 12 1.5.1 Example: CoMFA . 12 1.5.2 Example: classification of enzyme reactions . 13 1.6 Challenges . 14 1.6.1 Representation of Molecular Systems . 14 1.6.2 Data Storage and Communication . 15 1.7 Selected Problems . 16 Bibliography . 17 2 Molecular Chemometrics 23 2.1 Introduction . 24 2.2 Molecular Representation . 25 2.2.1 Molecular Descriptions . 26 2.2.2 Beyond the molecule . 28 2.3 Chemical Space, similarity and diversity . 29 2.4 Activity and Property Modeling . 30 2.4.1 Dimension Reduction . 32 2.4.2 Model Validation . 32 3 4 CONTENTS 2.5 Library Searching . 33 2.6 Conclusion . 35 Bibliography . 35 3 1D NMR in QSPR 45 3.1 Introduction . 46 3.2 Experimental . 48 3.2.1 Methods . 48 3.2.2 Data . 49 3.3 Results . 49 3.3.1 Data rank . 49 3.3.2 Predictivity . 50 3.3.3 Model interpretation . 53 3.4 Discussion . 57 3.5 Conclusions . 58 Bibliography . 59 4 Comparing Crystals 63 4.1 The Descriptor . 65 4.2 Data . 68 4.2.1 Cephalosporin data set . 69 4.2.2 Estrone data set . 70 4.3 Experimental . 73 4.4 Results . 74 4.4.1 Dissimilarity Classes . 74 4.4.2 Dendrograms and Partitionings . 74 4.4.3 Matching ESTRON10 . 76 4.5 Conclusions . 79 Bibliography . 79 5 Supervised SOMs 83 5.1 Introduction . 84 CONTENTS 5 5.2 Supervised self-organizing maps . 85 5.3 Experimental . 87 5.3.1 Data . 87 5.3.2 Representation in X and Y space . 88 5.3.3 Similarity calculations . 89 5.3.4 SOM training . 90 5.3.5 Software . 91 5.4 Applications . 92 5.4.1 Unit cell volume in Y space . 92 5.4.2 Adding space group information in Y space . 93 5.4.3 Analyzing simulated polymorphs . 96 5.5 Conclusions . 98 Bibliography . 99 6 Chemical Metadata in RSS 103 6.1 Introduction . 104 6.2 Implementations of RSS for chemical data sources . 106 6.2.1 Namespaces and RSS 1.0 . 106 6.2.2 Example 1. The ChemStock System . 110 6.2.3 Example 2. The Dutch Dictionary on Organic Chemistry . 110 6.2.4 Example 3. The World Wide Molecular Matrix . 111 6.3 Chemical postprocessing and aggregation of RSS metadata . 111 6.4 Discussion and Conclusions . 112 Bibliography . 115 7 Interoperability 117 7.1 Introduction . 118 7.2 The Importance of Open Specifications for Algorithms and Data . 120 7.3 The Blue Obelisk Dictionary . 125 7.3.1 The Dictionary . 125 7.3.2 Finding Implementations . 127 7.4 The Blue Obelisk Repository . 128 6 CONTENTS 7.5 Web Services . 130 7.6 Social Aspects . 130 7.7 Conclusion . 132 Bibliography . 133 8 Discussion and Outlook 137 8.1 Information Content . 137 8.2 Representation Characteristics . 138 8.3 Validation . 139 8.4 Reproducibility . 141 8.5 Data Storage and Communication . 141 8.6 Outlook . 142 8.6.1 Crystal Engineering . 143 8.6.2 Data Fusion . 143 8.7 Conclusion . 143 List of Abbreviations 145 Summary 149 Samenvatting 153 Curriculum Vitae 157 Publication List 159 Dankwoord 161 Chapter 1 Introduction The topic of this thesis is representation of molecules and molecular systems. Such a representation is needed to allow analysis and manipulation of chemical structures in the computer. This is of paramount importance in areas like drug design, synthesis plan- ning, property prediction, crystal structure engineering, structure elucidation, searching in chemical literature, exchange of chemical knowledge, and structure elucidation. Many different representations have been developed, each capturing different bits of informa- tion about the molecular system under study. Unfortunately, in many cases it is unclear which part of the information is essential for a certain application. For example, although the boiling points correlates well with the number of carbon atoms in a series of alkane homologues [1], the carbon count descriptor is not generally useful for predicting other properties, or even the same property for a more diverse set of molecules. From simple physico-chemical principles, it is clear why this is the case. However, for more complex problems there is very little a-priori knowledge that guides us in choosing appropriate descriptors. Nevertheless, in certain areas specific habits have evolved; for example, a large part of the quantitative structure-activity and structure- property relationship (QSAR and QSPR) community routinely calculates hundreds or thousands of simple molecular descriptors, and uses various variable-selection techniques to extract the most useful ones. Unfortunately, validation of this process is almost impossible due to the small size of data sets. It would a giant leap forward if we could say beforehand, based on the characteristics of the molecular system and our aim, what descriptors would be most informative. This is currently, however, still too far-fetched. Therefore, we are forced to judge the quality of the representation on the basis on the quality of the prediction: if we are able to correctly predict properties of new compounds, then we conclude that the representations contains relevant information. This thesis studies the role of representation in modeling properties of molecular systems of organic molecules and in the exchange of molecular information. The following paragraphs give an overview on useful representations. 7 8 Chapter 1 Introduction 1.1 Molecular Representations The two most common methods to represent organic molecules are the (systematic) name and the 2D drawing of the molecule. They identify the molecule of interest, but can- not be used for machine processing. To prevent ambiguities, conventions describing how molecules should be named and drawn are needed. IUPAC name recommendations, and line notations such.
Recommended publications
  • Open Babel Documentation Release 2.3.1
    Open Babel Documentation Release 2.3.1 Geoffrey R Hutchison Chris Morley Craig James Chris Swain Hans De Winter Tim Vandermeersch Noel M O’Boyle (Ed.) December 05, 2011 Contents 1 Introduction 3 1.1 Goals of the Open Babel project ..................................... 3 1.2 Frequently Asked Questions ....................................... 4 1.3 Thanks .................................................. 7 2 Install Open Babel 9 2.1 Install a binary package ......................................... 9 2.2 Compiling Open Babel .......................................... 9 3 obabel and babel - Convert, Filter and Manipulate Chemical Data 17 3.1 Synopsis ................................................. 17 3.2 Options .................................................. 17 3.3 Examples ................................................. 19 3.4 Differences between babel and obabel .................................. 21 3.5 Format Options .............................................. 22 3.6 Append property values to the title .................................... 22 3.7 Filtering molecules from a multimolecule file .............................. 22 3.8 Substructure and similarity searching .................................. 25 3.9 Sorting molecules ............................................ 25 3.10 Remove duplicate molecules ....................................... 25 3.11 Aliases for chemical groups ....................................... 26 4 The Open Babel GUI 29 4.1 Basic operation .............................................. 29 4.2 Options .................................................
    [Show full text]
  • Open Data, Open Source, and Open Standards in Chemistry: the Blue Obelisk Five Years On" Journal of Cheminformatics Vol
    Oral Roberts University Digital Showcase College of Science and Engineering Faculty College of Science and Engineering Research and Scholarship 10-14-2011 Open Data, Open Source, and Open Standards in Chemistry: The lueB Obelisk five years on Andrew Lang Noel M. O'Boyle Rajarshi Guha National Institutes of Health Egon Willighagen Maastricht University Samuel Adams See next page for additional authors Follow this and additional works at: http://digitalshowcase.oru.edu/cose_pub Part of the Chemistry Commons Recommended Citation Andrew Lang, Noel M O'Boyle, Rajarshi Guha, Egon Willighagen, et al.. "Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on" Journal of Cheminformatics Vol. 3 Iss. 37 (2011) Available at: http://works.bepress.com/andrew-sid-lang/ 19/ This Article is brought to you for free and open access by the College of Science and Engineering at Digital Showcase. It has been accepted for inclusion in College of Science and Engineering Faculty Research and Scholarship by an authorized administrator of Digital Showcase. For more information, please contact [email protected]. Authors Andrew Lang, Noel M. O'Boyle, Rajarshi Guha, Egon Willighagen, Samuel Adams, Jonathan Alvarsson, Jean- Claude Bradley, Igor Filippov, Robert M. Hanson, Marcus D. Hanwell, Geoffrey R. Hutchison, Craig A. James, Nina Jeliazkova, Karol M. Langner, David C. Lonie, Daniel M. Lowe, Jerome Pansanel, Dmitry Pavlov, Ola Spjuth, Christoph Steinbeck, Adam L. Tenderholt, Kevin J. Theisen, and Peter Murray-Rust This article is available at Digital Showcase: http://digitalshowcase.oru.edu/cose_pub/34 Oral Roberts University From the SelectedWorks of Andrew Lang October 14, 2011 Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on Andrew Lang Noel M O'Boyle Rajarshi Guha, National Institutes of Health Egon Willighagen, Maastricht University Samuel Adams, et al.
    [Show full text]
  • A Java-Based Platform for Evolutionary De Novo Molecular Design
    Article MoleGear: A Java-Based Platform for Evolutionary De Novo Molecular Design Yunhan Chu and Xuezhong He * Department of Chemical Engineering, Norwegian University of Science and Technology, N-7491 Trondheim, Norway; [email protected] * Correspondence: [email protected]; Tel.: +47-73593942 Received: 25 March 2019; Accepted: 10 April 2019; Published: 11 April 2019 Abstract: A Java-based platform, MoleGear, is developed for de novo molecular design based on the chemistry development kit (CDK) and other Java packages. MoleGear uses evolutionary algorithm (EA) to explore chemical space, and a suite of fragment-based operators of growing, crossover, and mutation for assembling novel molecules that can be scored by prediction of binding free energy or a weighted-sum multi-objective fitness function. The EA can be conducted in parallel over multiple nodes to support large-scale molecular optimizations. Some complementary utilities such as fragment library design, chemical space analysis, and graphical user interface are also integrated into MoleGear. The candidate molecules as inhibitors for the human immunodeficiency virus 1 (HIV-1) protease were designed by MoleGear, which validates the potential capability for de novo molecular design. Keywords: de novo design; evolutionary algorithm; drug molecules; fitness; multi-objective function 1. Introduction Computational chemistry plays an important role in the design of new drug-like molecules [1– 5], catalysts [6–8], and novel solvents of ionic liquids [9–11]. De novo molecular design has been an active research area of drug design/discovery over the last decades, and many approaches such as LUDI [12], LEA3D [13], Flux [14,15], and pharmacophore-linked fragment virtual screening (PFVS) [16] have been developed by using protein and ligand structures.
    [Show full text]
  • Molecular Structure Input on the Web Peter Ertl
    Ertl Journal of Cheminformatics 2010, 2:1 http://www.jcheminf.com/content/2/1/1 REVIEW Open Access Molecular structure input on the web Peter Ertl Abstract A molecule editor, that is program for input and editing of molecules, is an indispensable part of every cheminfor- matics or molecular processing system. This review focuses on a special type of molecule editors, namely those that are used for molecule structure input on the web. Scientific computing is now moving more and more in the direction of web services and cloud computing, with servers scattered all around the Internet. Thus a web browser has become the universal scientific user interface, and a tool to edit molecules directly within the web browser is essential. The review covers a history of web-based structure input, starting with simple text entry boxes and early molecule editors based on clickable maps, before moving to the current situation dominated by Java applets. One typical example - the popular JME Molecule Editor - will be described in more detail. Modern Ajax server-side molecule editors are also presented. And finally, the possible future direction of web-based molecule editing, based on tech- nologies like JavaScript and Flash, is discussed. Introduction this trend and input of molecular structures directly A program for the input and editing of molecules is an within a web browser is therefore of utmost importance. indispensable part of every cheminformatics or molecu- In this overview a history of entering molecules into lar processing system. Such a program is known as a web applications will be covered, starting from simple molecule editor, molecular editor or structure sketcher.
    [Show full text]
  • JRC QSAR Model Database
    JRC QSAR Model Database EURL ECVAM DataBase service on ALternative Methods to animal experimentation To promote the development and uptake of alternative and advanced methods in toxicology and biomedical sciences SDF - STRUCTURE DATA FORMAT: How to create from SMILES The European Commission’s science and knowledge service Joint Research Centre Directorate F Health, Consumers & Reference Materials Chemicals Safety & Alternative Methods Unit The European Commission’s science and knowledge service Joint Research Centre EUR 28708 EN This publication is a Tutorial by the Joint Research Centre (JRC), the European Commission’s science and knowledge service. It aims to provide user support. The scientific output expressed does not imply a policy position of the European Commission. Neither the European Commission nor any person acting on behalf of the Commission is responsible for the use that might be made of this publication. Contact information Email: [email protected] JRC Science Hub https://ec.europa.eu/jrc JRC107492 EUR 28708 EN PDF ISBN 978-92-79-71294-4 ISSN 1831-9424 doi:10.2760/952280 Print ISBN 978-92-79-71295-1 ISSN 1018-5593 doi:10.2760/668595 Luxembourg: Publications Office of the European Union, 2017 Ispra: European Commission, 2017 © European Union, 2017 The reuse of the document is authorised, provided the source is acknowledged and the original meaning or message of the texts are not distorted. The European Commission shall not be held liable for any consequences stemming from the reuse. How to cite this document: Triebe
    [Show full text]
  • Designing Universal Chemical Markup (UCM) Through the Reusable Methodology Based on Analyzing Existing Related Formats
    Designing Universal Chemical Markup (UCM) through the reusable methodology based on analyzing existing related formats Background: In order to design concepts for a new general-purpose chemical format we analyzed the strengths and weaknesses of current formats for common chemical data. While the new format is discussed more in the next article, here we describe our software s t tools and two stage analysis procedure that supplied the necessary information for the n i r development. The chemical formats analyzed in both stages were: CDX, CDXML, CML, P CTfile and XDfile. In addition the following formats were included in the first stage only: e r P CIF, InChI, NCBI ASN.1, NCBI XML, PDB, PDBx/mmCIF, PDBML, SMILES, SLN and Mol2. Results: A two stage analysis process devised for both XML (Extensible Markup Language) and non-XML formats enabled us to verify if and how potential advantages of XML are utilized in the widely used general-purpose chemical formats. In the first stage we accumulated information about analyzed formats and selected the formats with the most general-purpose chemical functionality for the second stage. During the second stage our set of software quality requirements was used to assess the benefits and issues of selected formats. Additionally, the detailed analysis of XML formats structure in the second stage helped us to identify concepts in those formats. Using these concepts we came up with the concise structure for a new chemical format, which is designed to provide precise built-in validation capabilities and aims to avoid the potential issues of analyzed formats.
    [Show full text]
  • Open Data, Open Source and Open Standards in Chemistry: the Blue Obelisk five Years On
    Open Data, Open Source and Open Standards in chemistry: The Blue Obelisk ¯ve years on Noel M O'Boyle¤1 , Rajarshi Guha2 , Egon L Willighagen3 , Samuel E Adams4 , Jonathan Alvarsson5 , Richard L Apodaca6 , Jean-Claude Bradley7 , Igor V Filippov8 , Robert M Hanson9 , Marcus D Hanwell10 , Geo®rey R Hutchison11 , Craig A James12 , Nina Jeliazkova13 , Andrew SID Lang14 , Karol M Langner15 , David C Lonie16 , Daniel M Lowe4 , J¶er^omePansanel17 , Dmitry Pavlov18 , Ola Spjuth5 , Christoph Steinbeck19 , Adam L Tenderholt20 , Kevin J Theisen21 , Peter Murray-Rust4 1Analytical and Biological Chemistry Research Facility, Cavanagh Pharmacy Building, University College Cork, College Road, Cork, Co. Cork, Ireland 2NIH Center for Translational Therapeutics, 9800 Medical Center Drive, Rockville, MD 20878, USA 3Division of Molecular Toxicology, Institute of Environmental Medicine, Nobels vaeg 13, Karolinska Institutet, 171 77 Stockholm, Sweden 4Unilever Centre for Molecular Sciences Informatics, Department of Chemistry, University of Cambridge, Lens¯eld Road, CB2 1EW, UK 5Department of Pharmaceutical Biosciences, Uppsala University, Box 591, 751 24 Uppsala, Sweden 6Metamolecular, LLC, 8070 La Jolla Shores Drive #464, La Jolla, CA 92037, USA 7Department of Chemistry, Drexel University, 32nd and Chestnut streets, Philadelphia, PA 19104, USA 8Chemical Biology Laboratory, Basic Research Program, SAIC-Frederick, Inc., NCI-Frederick, Frederick, MD 21702, USA 9St. Olaf College, 1520 St. Olaf Ave., North¯eld, MN 55057, USA 10Kitware, Inc., 28 Corporate Drive, Clifton Park, NY 12065, USA 11Department of Chemistry, University of Pittsburgh, 219 Parkman Avenue, Pittsburgh, PA 15260, USA 12eMolecules Inc., 380 Stevens Ave., Solana Beach, California 92075, USA 13Ideaconsult Ltd., 4.A.Kanchev str., So¯a 1000, Bulgaria 14Department of Engineering, Computer Science, Physics, and Mathematics, Oral Roberts University, 7777 S.
    [Show full text]
  • MACSIMUS Manual
    1 MACSIMUS manual benzocaine (ethylaminobenzoate) parameter_set = charmm21 HA | HA-CT-HA | HA-CT-HA | OSn.2 | Cp.7=On.5 | C6R--C6R-HA Most often used links: | | HA-C6R C6R-HA 2.2 Blend synopsis and options | | HA-C6R--C6R-NPn.5^-Hp.25 9.2 Cook synopsis and options | 9.2.5 Cook input data Hp.25 MACromolecule SIMUlation Software © Jiˇr´ıKolafa 1993{2020 MACSIMUS may be distributed under the terms of the GNU General Public Licence Credits: ray: Mark VandeWettering \reasonably intelligent raytracer" CHARMM force field (files: charmm*.par, charmm*/*.rsd) GROMOS force field (files: gromos*.par, gromos*/*.rsd) amoeba implementation by Z. Wagner moil support by J. Schofield several bug fixes by T. Trnka bug discovered by N. Parfenov Contents I Program `blend' version 2.4b 14 1 Introduction 16 1.1 Force fields...................................... 16 1.2 `blend' overview.................................... 16 1.3 Versions........................................ 17 2 Running blend 18 2.1 Environment...................................... 18 2.2 Synopsis........................................ 19 2.2.1 Global options................................. 19 2.2.2 par-options and parameter files....................... 20 2.2.3 mol-options and molecular files....................... 22 2.2.4 Extra-options ................................. 29 2.3 File extensions.................................... 34 2.4 Run-time control................................... 37 2.4.1 get data format for input........................... 37 2.4.2 Scrolling.................................... 38 2.4.3 Error handling................................ 39 2.4.4 Interrupts................................... 40 2.5 Showing molecules graphically............................ 40 2.5.1 X11 Graphics................................. 40 2.5.2 Playback output............................... 44 2.6 Energy minimization................................. 44 2.7 Missing coordinates.................................. 45 3 Force field and the parameter file 46 3.1 Structure of the parameter file...........................
    [Show full text]
  • Biomolecular Simulation Data Management In
    BIOMOLECULAR SIMULATION DATA MANAGEMENT IN HETEROGENEOUS ENVIRONMENTS Julien Charles Victor Thibault A dissertation submitted to the faculty of The University of Utah in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Biomedical Informatics The University of Utah December 2014 Copyright © Julien Charles Victor Thibault 2014 All Rights Reserved The University of Utah Graduate School STATEMENT OF DISSERTATION APPROVAL The dissertation of Julien Charles Victor Thibault has been approved by the following supervisory committee members: Julio Cesar Facelli , Chair 4/2/2014___ Date Approved Thomas E. Cheatham , Member 3/31/2014___ Date Approved Karen Eilbeck , Member 4/3/2014___ Date Approved Lewis J. Frey _ , Member 4/2/2014___ Date Approved Scott P. Narus , Member 4/4/2014___ Date Approved And by Wendy W. Chapman , Chair of the Department of Biomedical Informatics and by David B. Kieda, Dean of The Graduate School. ABSTRACT Over 40 years ago, the first computer simulation of a protein was reported: the atomic motions of a 58 amino acid protein were simulated for few picoseconds. With today’s supercomputers, simulations of large biomolecular systems with hundreds of thousands of atoms can reach biologically significant timescales. Through dynamics information biomolecular simulations can provide new insights into molecular structure and function to support the development of new drugs or therapies. While the recent advances in high-performance computing hardware and computational methods have enabled scientists to run longer simulations, they also created new challenges for data management. Investigators need to use local and national resources to run these simulations and store their output, which can reach terabytes of data on disk.
    [Show full text]
  • Ontology-Based Classification of Molecules
    Ontology-Based Classification of Molecules: a Logic Programming Approach Despoina Magka Department of Computer Science, University of Oxford Wolfson Building, Parks Road, OX1 3QD, UK [email protected] Abstract. We describe a prototype that performs structure-based classification of molecular structures. The software we present implements a sound and com- plete reasoning procedure of a formalism that extends logic programming and builds upon the DLV deductive databases system. We capture a wide range of chemical classes that are not expressible with OWL-based formalisms such as cyclic molecules, saturated molecules and alkanes. In terms of performance, a no- ticeable improvement is observed in comparison with previous approaches. Our evaluation has discovered subsumptions that are missing from the the manually curated ChEBI ontology as well as discrepancies with respect to existing subclass relations. We illustrate thus the potential of an ontology language which is suit- able for the Life Sciences domain and exhibits an encouraging balance between expressive power and practical feasibility. Keywords: Knowledge representation and reasoning, Logic programming and answer set programming, Cheminformatics. 1 Introduction The volume of bioinformatics data produced by research laboratories worldwide is in- creasing at an astonishing rate turning the need to adequately catalogue, represent and index the vast amounts of Life Sciences data sources into a pressing challenge. Semantic technologies have achieved significant progress towards the federation of biochemical information [33, 3, 4] via the definition and use of domain vocabularies with formal semantics, also known as ontologies. OWL [15], a family of logic-based knowledge representation (KR) formalisms standardised by W3C, has played a pivotal role in the advent of Semantic technologies due to its significant ability to reason over ontolo- gies by means of logical inference.
    [Show full text]
  • (CDK): an Open-Source Java Library for Chemo- and Bioinformatics
    J. Chem. Inf. Comput. Sci. 2003, 43, 493-500 493 The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo- and Bioinformatics Christoph Steinbeck,*,† Yongquan Han,† Stefan Kuhn,† Oliver Horlacher,‡ Edgar Luttmann,§ and Egon Willighagen# Max-Planck-Institute of Chemical Ecology, Jena, Germany, TheraSTrat AG, Allschwil, Switzerland, Institute of Organic Chemistry, University of Paderborn, Germany, and Nijmegen, The Netherlands Received August 17, 2002 The Chemistry Development Kit (CDK) is a freely available open-source Java library for Structural Chemo- and Bioinformatics. Its architecture and capabilities as well as the development as an open-source project by a team of international collaborators from academic and industrial institutions is described. The CDK provides methods for many common tasks in molecular informatics, including 2D and 3D rendering of chemical structures, I/O routines, SMILES parsing and generation, ring searches, isomorphism checking, structure diagram generation, etc. Application scenarios as well as access information for interested users and potential contributors are given. 1. INTRODUCTION of software development, most widely recognized through the great success of the free Unix-like operating system Whoever pursues the endeavor of creating a larger GNU/Linux, a collaborative work of many individuals and software package in chemoinformatics or computational organizations, including the Free Software Foundation lead chemistry from scratch will soon be confronted with the by Richard Stallman and the Finish computer science student Syssiphus task of implementing the standard repertoire of Linus Torvalds who started the project. According to several chemoinformatical algorithms and components invented essays on this subject, open-source software, for which, by during the last 20 or 30 years.
    [Show full text]
  • Chemical File Format Conversion Tools : a N Overview
    International Journal of Engineering Research & Technology (IJERT) ISSN: 2278-0181 Vol. 3 Issue 2, February - 2014 Chemical File Format Conversion Tools : A n Overview Kavitha C. R Dr. T Mahalekshmi Research Scholar, Bharathiyar University Principal Dept of Computer Applications Sree Narayana Institute of Technology SNGIST Kollam, India Cochin, India Abstract— There are a lot of chemical data stored in large different chemical file formats. Three types of file format databases, repositories and other resources. These data are used conversion tools are discussed in section III. And the by different researchers in different applications in various areas conclusion is given in section IV followed by the references. of chemistry. Since these data are stored in several standard chemical file formats, there is a need for the inter-conversion of II. CHEMICAL FILE FORMATS chemical structures between different formats because all the formats are not supported by various software and tools used by A chemical is a collection of atoms bonded together the researchers. Therefore it becomes essential to convert one file in space. The structure of a chemical makes it unique and format to another. This paper reviews some of the chemical file gives it its physical and biological characteristics. This formats and also presents a few inter-conversion tools such as structure is represented in a variety of chemical file formats. Open Babel [1], Mol converter [2] and CncTranslate [3]. These formats are used to represent chemical structure records and its associated data fields. Some of the file Keywords— File format Conversion, Open Babel, mol formats are CML (Chemical Markup Language), SDF converter, CncTranslate, inter- conversion tools.
    [Show full text]