The Chemistry Development Kit (CDK) V2.0: Atom Typing, Depiction, Molecular Formulas, and Substructure Searching Egon L

Total Page:16

File Type:pdf, Size:1020Kb

The Chemistry Development Kit (CDK) V2.0: Atom Typing, Depiction, Molecular Formulas, and Substructure Searching Egon L Willighagen et al. J Cheminform (2017) 9:33 DOI 10.1186/s13321-017-0220-4 SOFTWARE Open Access The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching Egon L. Willighagen1* , John W. Mayfeld2 , Jonathan Alvarsson3 , Arvid Berg3, Lars Carlsson4 , Nina Jeliazkova5 , Stefan Kuhn6 , Tomáš Pluskal7 , Miquel Rojas‑Chertó8 , Ola Spjuth3 , Gilleain Torrance9 , Chris T. Evelo1 , Rajarshi Guha10 and Christoph Steinbeck11 Abstract Background: The Chemistry Development Kit (CDK) is a widely used open source cheminformatics toolkit, provid‑ ing data structures to represent chemical concepts along with methods to manipulate such structures and perform computations on them. The library implements a wide variety of cheminformatics algorithms ranging from chemical structure canonicalization to molecular descriptor calculations and pharmacophore perception. It is used in drug dis‑ covery, metabolomics, and toxicology. Over the last 10 years, the code base has grown signifcantly, however, result‑ ing in many complex interdependencies among components and poor performance of many algorithms. Results: We report improvements to the CDK v2.0 since the v1.2 release series, specifcally addressing the increased functional complexity and poor performance. We frst summarize the addition of new functionality, such atom typing and molecular formula handling, and improvement to existing functionality that has led to signifcantly better perfor‑ mance for substructure searching, molecular fngerprints, and rendering of molecules. Second, we outline how the CDK has evolved with respect to quality control and the approaches we have adopted to ensure stability, including a code review mechanism. Conclusions: This paper highlights our continued eforts to provide a community driven, open source cheminfor‑ matics library, and shows that such collaborative projects can thrive over extended periods of time, resulting in a high-quality and performant library. By taking advantage of community support and contributions, we show that an open source cheminformatics project can act as a peer reviewed publishing platform for scientifc computing software. Keywords: Java, Cheminformatics, Bioinformatics, Metabolomics, Depiction Background Obelisk, a movement promoting Open Data, Open Source, Te open source cheminformatics community has made and Open Standards in chemistry [1, 2]. Te CDK provid- signifcant steps forward recently [1] as evidenced by the ing data structures to represent chemical concepts along growing number of tools and underlying toolkits, along with methods to manipulate such structures and perform with the usage of these software components in a variety computations on them. Previously documented CDK ver- of applications. Te Chemistry Development Kit (CDK) sions have been widely adopted [3, 4]. Use of the CDK is one of the tools developed under the aegis of the Blue ranges from inclusion of CDK functionality in wrapper platforms such as Cinfony [5], incorporation within the R environment (rcdk [6]), and as plugins for Taverna [7], *Correspondence: [email protected] KNIME [8], Cytoscape (ChemViz2 [9]), and for Microsoft 1 Department of Bioinformatics ‑ BiGCaT, NUTRIM, Maastricht University, 6200 MD Maastricht, The Netherlands Excel (LICSS [10]). In contrast to scenarios that have made Full list of author information is available at the end of the article CDK functionality available in larger systems, a number of © The Author(s) 2017. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/ publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Willighagen et al. J Cheminform (2017) 9:33 Page 2 of 19 projects have employed the CDK as a general cheminfor- improved and formalized the development model for the matics toolkit. Examples include jCompoundMapper [11], project using unit testing, code review and guidelines for ScafoldHunter [12, 13], OMG [14], PaDEL [15], Chem- handling version control. Finally we report on the availa- Des [16], ReactPRED [17], SMSD [18–20], WhichCyp [21], bility of binary distributions of the library, allowing users MetaPrint2D [22], MetFrag [23], and the IUPHAR/BPS to include specifc modules (and their dependencies) of Guide to Pharmacology [24], BRENDA [25] and QSAR the CDK in their own projects (as opposed to developers DataBank [26] databases. A number of such tools were ini- who work on the CDK library itself). tially developed using older versions of the CDK and are updated to new releases as they are made available. Exam- New APIs and improved implementations ples include Bioclipse [27, 28] and AMBIT [29–31]. Te We here outline various new and improved APIs in the CDK has also played a role in a number of chemical stud- CDK library since the two previous publications in 2003 ies, such as fnding the maximally bridging rings in chemi- and 2006 [3, 4]. cal structures [32], prediction of organic reactions [33], and bioactivities of compounds [34]. Atom typing While the CDK has purported to be a general purpose Atom type perception is core cheminformatics function- cheminformatics toolkit, older versions were designed by ality: the atom types describe chemical features of atoms, a community with specifc applications in mind, primary such as the number of neighbors, possible formal charges, among them being structure elucidation. In addition, an (approximate) hybridization, electron distribution over implicit goal of previous versions was to have the CDK serve orbitals and so on. However, previous versions of the CDK as an educational resource to enable students of chemin- implemented atom type perception as part of diferent formatics to understand the underlying algorithms. Tis algorithms, resulting in duplicated and sometimes diver- resulted in certain functionalities, such as molecular fnger- gent typing schemes. As a result it was cumbersome to add printing [35, 36], receiving more attention than others, such new atom types and implement support for new charged as stereochemistry. Te outcome was signifcant variance in and radical species in a consistent manner. performance and features throughout the toolkit. Tis CDK version has a new, centralized atom typ- Te growth of open source software over the last 10 ing framework, removing the perception of atom types years is evidence of the ability of communities of devel- from various algorithms. Tis allows for a consistent and opers to develop systems and processes that lead to high extensive typing scheme, that can be also be tested inde- quality software systems for long term use. Te CDK is pendently of other code. Te new code defnes the atom no diferent. Te adoption of automatic build systems types using a list that specifes for each type the element and quality control methodologies such as unit test- symbol, hybridization, formal charge, number of lone ing, automated source code validation, and peer review pairs, and an enumeration of the bond orders (see Fig. 1). by fellow developers have greatly improved the stability Tis list of properties captures the information needed of the library. While it has slowed development some- for the various algorithms in the CDK. For example, what, it has allowed for cleaning up interdependencies hybridization information can be used in certain aroma- between modules of functionality, and importantly, has ticity models (see later), and the lone pair information is improved the scalability of the development model. Tis needed for resonance structure calculation needed, for π has resulted in signifcant new functionality in core appli- example, for Gasteiger -charges. cation programming interfaces (APIs) while maintaining the quality of code depending on those core APIs. Examples of new features supported by the improved development model include InChI functionality [37], greatly improved ring detection algorithms [38], improve- ments to the core atom type perception module that now covers a much more comprehensive set of elements, charge states and radical species than previous versions, a more comprehensive fngerprinting API, new depiction functionality, and many speed and stability improvements. Implementation and results Tis section describes the specifcs of new APIs and improvements to pre-existing methods that are avail- 3 able in the latest CDK. We then discuss how we have Fig. 1 Atom type information specifed for a sp ‑hybridized carbon Willighagen et al. J Cheminform (2017) 9:33 Page 3 of 19 A reference implementation, CDKAtomType- Matcher, has been written in such a way that per- ceives these atom types, and validates the perception automatically against the properties defned by the ontology. Tis class handles a variety of types of miss- ing information, as commonly resulting from various (fle) formats; for example, it can handle undefned hydrogen counts and undefned double bond positions if hybridization information is provided instead. Tat makes the perception code fexible but also more com- plex. Alternative algorithms for atom typing have not been explored. Tis reference implementation can be
Recommended publications
  • Istls Information Services to Life Science Internet Bioinformatics Resources Josef Maier [E-Mail: [email protected]] Last Checked August, 17Th, 2011
    IStLS Information Services to Life Science Internet Bioinformatics Resources Josef Maier [e-mail: [email protected]] Last checked August, 17th, 2011 IStLS Bioinformatics Resources http://www.istls.de/bioinfolinks.php Courses and lectures Bioinformatics - Online Courses and Tutorials http://www.bioinformatik.de/cgi-bin/browse/Catalog/Research_and_Education/Online_Courses_and_Tutorials/ EMBRACE Network of Excellence http://www.embracegrid.info/page.php EMBNet Quick Guides http://www.embnet.org/node/64 EMBNet Courses http://www.embnet.org/ Sequence Analysis with distributed Resources http://bibiserv.techfak.uni-bielefeld.de/sadr/ Tutorial Protein Structures (EXPASY) SwissModel http://swissmodel.expasy.org/course/course-index.htm CMBI Courses for protein structure http://swift.cmbi.ru.nl/teach/courses/index.html 2Can Support Portal - Bioinformatics educational resource http://www.ebi.ac.uk/2can Bioconductor Workshops http://www.bioconductor.org/workshops/ CBS Bioinformatics Courses http://www.cbs.dtu.dk/courses.php The European School In Bioinformatics (Biosapiens) http://www.biosapiens.info/page.php?page=esb Institutes Centers Networks Bioinformatics Institutes Germany WSI Wilhelm-Schickard-Institut für Informatik - Universitaet Tuebingen http://www.uni-tuebingen.de/en/faculties/faculty-of-science/departments/computer-science/department.html WSI Huson - Algorithms in Bioinformatics http://www-ab.informatik.uni-tuebingen.de/welcome.html WSI Prof. Zell - Computer Architecture http://www.ra.cs.uni-tuebingen.de/ WSI Kohlbacher - Div. for Simulation
    [Show full text]
  • Practical Chemoinformatics Muthukumarasamy Karthikeyan • Renu Vyas
    Practical Chemoinformatics Muthukumarasamy Karthikeyan • Renu Vyas Practical Chemoinformatics 1 3 Muthukumarasamy Karthikeyan Renu Vyas Digital Information Resource Centre Scientist (DST) National Chemical Laboratory Division of Chemical Engineering and Pune Process Development India National Chemical Laboratory Pune India ISBN 978-81-322-1779-4 ISBN 978-81-322-1780-0 (eBook) DOI 10.1007/978-81-322-1780-0 Springer New Delhi Dordrecht Heidelberg London New York Library of Congress Control Number: 2014931501 © Springer India 2014 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recita- tion, broadcasting, reproduction on microfilms or in any other physical way, and transmission or infor- mation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar meth- odology now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Duplica- tion of this publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s location, in its current version, and permission for use must always be obtained from Springer. Permissions for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to prosecution under the respective Copyright Law. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publica- tion does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.
    [Show full text]
  • Open Data, Open Source, and Open Standards in Chemistry: the Blue Obelisk Five Years On" Journal of Cheminformatics Vol
    Oral Roberts University Digital Showcase College of Science and Engineering Faculty College of Science and Engineering Research and Scholarship 10-14-2011 Open Data, Open Source, and Open Standards in Chemistry: The lueB Obelisk five years on Andrew Lang Noel M. O'Boyle Rajarshi Guha National Institutes of Health Egon Willighagen Maastricht University Samuel Adams See next page for additional authors Follow this and additional works at: http://digitalshowcase.oru.edu/cose_pub Part of the Chemistry Commons Recommended Citation Andrew Lang, Noel M O'Boyle, Rajarshi Guha, Egon Willighagen, et al.. "Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on" Journal of Cheminformatics Vol. 3 Iss. 37 (2011) Available at: http://works.bepress.com/andrew-sid-lang/ 19/ This Article is brought to you for free and open access by the College of Science and Engineering at Digital Showcase. It has been accepted for inclusion in College of Science and Engineering Faculty Research and Scholarship by an authorized administrator of Digital Showcase. For more information, please contact [email protected]. Authors Andrew Lang, Noel M. O'Boyle, Rajarshi Guha, Egon Willighagen, Samuel Adams, Jonathan Alvarsson, Jean- Claude Bradley, Igor Filippov, Robert M. Hanson, Marcus D. Hanwell, Geoffrey R. Hutchison, Craig A. James, Nina Jeliazkova, Karol M. Langner, David C. Lonie, Daniel M. Lowe, Jerome Pansanel, Dmitry Pavlov, Ola Spjuth, Christoph Steinbeck, Adam L. Tenderholt, Kevin J. Theisen, and Peter Murray-Rust This article is available at Digital Showcase: http://digitalshowcase.oru.edu/cose_pub/34 Oral Roberts University From the SelectedWorks of Andrew Lang October 14, 2011 Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on Andrew Lang Noel M O'Boyle Rajarshi Guha, National Institutes of Health Egon Willighagen, Maastricht University Samuel Adams, et al.
    [Show full text]
  • A Java-Based Platform for Evolutionary De Novo Molecular Design
    Article MoleGear: A Java-Based Platform for Evolutionary De Novo Molecular Design Yunhan Chu and Xuezhong He * Department of Chemical Engineering, Norwegian University of Science and Technology, N-7491 Trondheim, Norway; [email protected] * Correspondence: [email protected]; Tel.: +47-73593942 Received: 25 March 2019; Accepted: 10 April 2019; Published: 11 April 2019 Abstract: A Java-based platform, MoleGear, is developed for de novo molecular design based on the chemistry development kit (CDK) and other Java packages. MoleGear uses evolutionary algorithm (EA) to explore chemical space, and a suite of fragment-based operators of growing, crossover, and mutation for assembling novel molecules that can be scored by prediction of binding free energy or a weighted-sum multi-objective fitness function. The EA can be conducted in parallel over multiple nodes to support large-scale molecular optimizations. Some complementary utilities such as fragment library design, chemical space analysis, and graphical user interface are also integrated into MoleGear. The candidate molecules as inhibitors for the human immunodeficiency virus 1 (HIV-1) protease were designed by MoleGear, which validates the potential capability for de novo molecular design. Keywords: de novo design; evolutionary algorithm; drug molecules; fitness; multi-objective function 1. Introduction Computational chemistry plays an important role in the design of new drug-like molecules [1– 5], catalysts [6–8], and novel solvents of ionic liquids [9–11]. De novo molecular design has been an active research area of drug design/discovery over the last decades, and many approaches such as LUDI [12], LEA3D [13], Flux [14,15], and pharmacophore-linked fragment virtual screening (PFVS) [16] have been developed by using protein and ligand structures.
    [Show full text]
  • Molecular Structure Input on the Web Peter Ertl
    Ertl Journal of Cheminformatics 2010, 2:1 http://www.jcheminf.com/content/2/1/1 REVIEW Open Access Molecular structure input on the web Peter Ertl Abstract A molecule editor, that is program for input and editing of molecules, is an indispensable part of every cheminfor- matics or molecular processing system. This review focuses on a special type of molecule editors, namely those that are used for molecule structure input on the web. Scientific computing is now moving more and more in the direction of web services and cloud computing, with servers scattered all around the Internet. Thus a web browser has become the universal scientific user interface, and a tool to edit molecules directly within the web browser is essential. The review covers a history of web-based structure input, starting with simple text entry boxes and early molecule editors based on clickable maps, before moving to the current situation dominated by Java applets. One typical example - the popular JME Molecule Editor - will be described in more detail. Modern Ajax server-side molecule editors are also presented. And finally, the possible future direction of web-based molecule editing, based on tech- nologies like JavaScript and Flash, is discussed. Introduction this trend and input of molecular structures directly A program for the input and editing of molecules is an within a web browser is therefore of utmost importance. indispensable part of every cheminformatics or molecu- In this overview a history of entering molecules into lar processing system. Such a program is known as a web applications will be covered, starting from simple molecule editor, molecular editor or structure sketcher.
    [Show full text]
  • A Web-Based 3D Molecular Structure Editor and Visualizer Platform
    Mohebifar and Sajadi J Cheminform (2015) 7:56 DOI 10.1186/s13321-015-0101-7 SOFTWARE Open Access Chemozart: a web‑based 3D molecular structure editor and visualizer platform Mohamad Mohebifar* and Fatemehsadat Sajadi Abstract Background: Chemozart is a 3D Molecule editor and visualizer built on top of native web components. It offers an easy to access service, user-friendly graphical interface and modular design. It is a client centric web application which communicates with the server via a representational state transfer style web service. Both client-side and server-side application are written in JavaScript. A combination of JavaScript and HTML is used to draw three-dimen- sional structures of molecules. Results: With the help of WebGL, three-dimensional visualization tool is provided. Using CSS3 and HTML5, a user- friendly interface is composed. More than 30 packages are used to compose this application which adds enough flex- ibility to it to be extended. Molecule structures can be drawn on all types of platforms and is compatible with mobile devices. No installation is required in order to use this application and it can be accessed through the internet. This application can be extended on both server-side and client-side by implementing modules in JavaScript. Molecular compounds are drawn on the HTML5 Canvas element using WebGL context. Conclusions: Chemozart is a chemical platform which is powerful, flexible, and easy to access. It provides an online web-based tool used for chemical visualization along with result oriented optimization for cloud based API (applica- tion programming interface). JavaScript libraries which allow creation of web pages containing interactive three- dimensional molecular structures has also been made available.
    [Show full text]
  • Getting Started in Jmol
    Getting Started in Jmol Part of the Jmol Training Guide from the MSOE Center for BioMolecular Modeling Interactive version available at http://cbm.msoe.edu/teachingResources/jmol/jmolTraining/started.html Introduction Physical models of proteins are powerful tools that can be used synergistically with computer visualizations to explore protein structure and function. Although it is interesting to explore models and visualizations created by others, it is much more engaging to create your own! At the MSOE Center for BioMolecular Modeling we use the molecular visualization program Jmol to explore protein and molecular structures in fully interactive 3-dimensional displays. Jmol a free, open source molecular visualization program used by students, educators and researchers internationally. The Jmol Training Guide from the MSOE Center for BioMolecular Modeling will provide the tools needed to create molecular renderings, physical models using 3-D printing technologies, as well as Jmol animations for online tutorials or electronic posters. Examples of Proteins in Jmol Jmol allows users to rotate proteins and molecular structures in a fully interactive 3-dimensional display. Some sample proteins designed with Jmol are shown to the right. Hemoglobin Proteins Insulin Proteins Green Fluorescent safely carry oxygen in the help regulate sugar in Proteins create blood. the bloodstream. bioluminescence in animals like jellyfish. Downloading Jmol Jmol Can be Used in Two Ways: 1. As an independent program on a desktop - Jmol can be downloaded to run on your desktop like any other program. It uses a Java platform and therefore functions equally well in a PC or Mac environment. 2. As a web application - Jmol has a web-based version (oftern refered to as "JSmol") that runs on a JavaScript platform and therefore functions equally well on all HTML5 compatible browsers such as Firefox, Internet Explorer, Safari and Chrome.
    [Show full text]
  • Spoken Tutorial Project, IIT Bombay Brochure for Chemistry Department
    Spoken Tutorial Project, IIT Bombay Brochure for Chemistry Department Name of FOSS Applications Employability GChemPaint GChemPaint is an editor for 2Dchem- GChemPaint is currently being developed ical structures with a multiple docu- as part of The Chemistry Development ment interface. Kit, and a Standard Widget Tool kit- based GChemPaint application is being developed, as part of Bioclipse. Jmol Jmol applet is used to explore the Jmol is a free, open source molecule viewer structure of molecules. Jmol applet is for students, educators, and researchers used to depict X-ray structures in chemistry and biochemistry. It is cross- platform, running on Windows, Mac OS X, and Linux/Unix systems. For PG Students LaTeX Document markup language and Value addition to academic Skills set. preparation system for Tex typesetting Essential for International paper presentation and scientific journals. For PG student for their project work Scilab Scientific Computation package for Value addition in technical problem numerical computations solving via use of computational methods for engineering problems, Applicable in Chemical, ECE, Electrical, Electronics, Civil, Mechanical, Mathematics etc. For PG student who are taking Physical Chemistry Avogadro Avogadro is a free and open source, Research and Development in Chemistry, advanced molecule editor and Pharmacist and University lecturers. visualizer designed for cross-platform use in computational chemistry, molecular modeling, material science, bioinformatics, etc. Spoken Tutorial Project, IIT Bombay Brochure for Commerce and Commerce IT Name of FOSS Applications / Employability LibreOffice – Writer, Calc, Writing letters, documents, creating spreadsheets, tables, Impress making presentations, desktop publishing LibreOffice – Base, Draw, Managing databases, Drawing, doing simple Mathematical Math operations For Commerce IT Students Drupal Drupal is a free and open source content management system (CMS).
    [Show full text]
  • Designing Universal Chemical Markup (UCM) Through the Reusable Methodology Based on Analyzing Existing Related Formats
    Designing Universal Chemical Markup (UCM) through the reusable methodology based on analyzing existing related formats Background: In order to design concepts for a new general-purpose chemical format we analyzed the strengths and weaknesses of current formats for common chemical data. While the new format is discussed more in the next article, here we describe our software s t tools and two stage analysis procedure that supplied the necessary information for the n i r development. The chemical formats analyzed in both stages were: CDX, CDXML, CML, P CTfile and XDfile. In addition the following formats were included in the first stage only: e r P CIF, InChI, NCBI ASN.1, NCBI XML, PDB, PDBx/mmCIF, PDBML, SMILES, SLN and Mol2. Results: A two stage analysis process devised for both XML (Extensible Markup Language) and non-XML formats enabled us to verify if and how potential advantages of XML are utilized in the widely used general-purpose chemical formats. In the first stage we accumulated information about analyzed formats and selected the formats with the most general-purpose chemical functionality for the second stage. During the second stage our set of software quality requirements was used to assess the benefits and issues of selected formats. Additionally, the detailed analysis of XML formats structure in the second stage helped us to identify concepts in those formats. Using these concepts we came up with the concise structure for a new chemical format, which is designed to provide precise built-in validation capabilities and aims to avoid the potential issues of analyzed formats.
    [Show full text]
  • Report on an NIH Workshop on Ultralarge Chemistry Databases Wendy A
    1 Report on an NIH Workshop on Ultralarge Chemistry Databases Wendy A. Warr Wendy Warr & Associates, 6 Berwick Court, Holmes Chapel, Cheshire, CW4 7HZ, United Kingdom. Email: [email protected] Introduction The virtual workshop took place on December 1-3, 2020. It was aimed at researchers, groups, and companies that generate, manage, sell, search, and screen databases of more than one billion small molecules (Figure 1). There were about 550 “attendees” from 37 different countries. Recent advances in computational chemistry have enabled researchers to navigate virtual chemical spaces containing billions of chemical structures, carrying out similarity searches, studying structure-activity relationships (SAR), experimenting with scaffold-hopping, and using other drug discovery methodologies.1 For clarity, one could differentiate “spaces” from “libraries”, and “libraries” from “databases”. Spaces are combinatorially constructed collections of compounds; they are usually very big indeed and it is not possible to enumerate all the precise chemical structures that are covered. Libraries are enumerated collections of full structures: usually fewer than 1010 molecules. Databases are a way to storing libraries, for example, in a relational database management system. Figure 1. Ultralarge chemical databases. (Source: Marcus Gastreich based on the publication by Hoffmann and Gastreich.) This report summarizes talks from about 30 practitioners in the field of ultralarge collections of molecules. The aim is to represent as accurately as possible the information that was delivered by the speakers; the report does not seek to be evaluative. 2 Welcoming remarks; defining a drug discovery gateway Susan Gregurick, Office of Data Science and Strategy, NIH, USA Data should be “findable, accessible, interoperable and reusable” (FAIR)2 and with this in mind, NIH has been creating, curating, integrating, and querying ultralarge chemistry databases.
    [Show full text]
  • Qsar Methods Development, Virtual and Experimental Screening for Cannabinoid Ligand Discovery
    QSAR METHODS DEVELOPMENT, VIRTUAL AND EXPERIMENTAL SCREENING FOR CANNABINOID LIGAND DISCOVERY by Kyaw Zeyar Myint BS, Biology, BS, Computer Science, Hampden-Sydney College, 2007 Submitted to the Graduate Faculty of School of Medicine in partial fulfillment of the requirements for the degree of Doctor of Philosophy University of Pittsburgh 2012 UNIVERSITY OF PITTSBURGH SCHOOL OF MEDICINE This dissertation was presented by Kyaw Zeyar Myint It was defended on August 20th, 2012 and approved by Dr. Ivet Bahar, Professor, Department of Computational and Systems Biology Dr. Billy W. Day, Professor, Department of Pharmaceutical Sciences Dr. Christopher Langmead, Associate Professor, Department of Computer Science, CMU Dissertation Advisor: Dr. Xiang-Qun Xie, Professor, Department of Pharmaceutical Sciences ii Copyright © by Kyaw Zeyar Myint 2012 iii QSAR METHODS DEVELOPMENT, VIRTUAL AND EXPERIMENTAL SCREENING FOR CANNABINOID LIGAND DISCOVERY Kyaw Zeyar Myint, PhD University of Pittsburgh, 2012 G protein coupled receptors (GPCRs) are the largest receptor family in mammalian genomes and are known to regulate wide variety of signals such as ions, hormones and neurotransmitters. It has been estimated that GPCRs represent more than 30% of current drug targets and have attracted many pharmaceutical industries as well as academic groups for potential drug discovery. Cannabinoid (CB) receptors, members of GPCR superfamily, are also involved in the activation of multiple intracellular signal transductions and their endogenous ligands or cannabinoids have attracted pharmacological research because of their potential therapeutic effects. In particular, the cannabinoid subtype-2 (CB2) receptor is known to be involved in immune system signal transductions and its ligands have the potential to be developed as drugs to treat many immune system disorders without potential psychotic side- effects.
    [Show full text]
  • Applications of the Quantum Algorithm for St-Connectivity
    Applications of the quantum algorithm for st-connectivity KAI DELORENZO1 , SHELBY KIMMEL1 , AND R. TEAL WITTER1 1Middlebury College, Computer Science Department, Middlebury, VT Abstract We present quantum algorithms for various problems related to graph connectivity. We give simple and query-optimal algorithms for cycle detection and odd-length cycle detection (bipartiteness) using a reduction to st-connectivity. Furthermore, we show that our algorithm for cycle detection has improved performance under the promise of large circuit rank or a small number of edges. We also provide algo- rithms for detecting even-length cycles and for estimating the circuit rank of a graph. All of our algorithms have logarithmic space complexity. 1 Introduction Quantum query algorithms are remarkably described by span programs [15, 16], a linear algebraic object originally created to study classical logspace complexity [12]. However, finding optimal span program algorithms can be challenging; while they can be obtained using a semidefinite program, the size of the program grows exponentially with the size of the input to the algorithm. Moreover, span programs are designed to characterize the query complexity of an algorithm, while in practice we also care about the time and space complexity. One of the nicest span programs is for the problem of undirected st-connectivity, in which one must decide whether two vertices s and t are connected in a given graph. It is “nice” for several reasons: It is easy to describe and understand why it is correct. • It corresponds to a quantum algorithm that uses logarithmic (in the number of vertices and edges of • the graph) space [4, 11].
    [Show full text]