Chemical Predictive Modelling to Improve Compound Quality

Total Page:16

File Type:pdf, Size:1020Kb

Chemical Predictive Modelling to Improve Compound Quality REVIEWS Chemical predictive modelling to improve compound quality John G. Cumming1, Andrew M. Davis2, Sorel Muresan3,4, Markus Haeberlein5,6 and Hongming Chen3 Abstract | The ‘quality’ of small-molecule drug candidates, encompassing aspects including their potency, selectivity and ADMET (absorption, distribution, metabolism, excretion and toxicity) characteristics, is a key factor influencing the chances of success in clinical trials. Importantly, such characteristics are under the control of chemists during the identification and optimization of lead compounds. Here, we discuss the application of computational methods, particularly quantitative structure–activity relationships (QSARs), in guiding the selection of higher-quality drug candidates, as well as cultural factors that may have affected their use and impact. Reducing attrition rates remains a major challenge in physics of the system, although they can indicate what drug development. Recent estimates indicate that the the controlling physical properties might be. QSAR chances of a drug candidate successfully reaching methods use statistical regression and classification- the start of Phase II trials, where the role of the target based approaches to identify quantitative patterns that biology in the disease can be tested, are only ~37%1. are present within the existing data. The rules in rule- Moreover, the probability of success in Phase II trials is based methods may be either manually or automatically 1Chemistry Innovation only ~34%. Although a substantial proportion of failures generated. Physics-based approaches are often used in Centre, Discovery Sciences, AstraZeneca R&D, in Phase II trials may be due to flaws in the underlying conjunction with QSAR methods, but the large scale of Alderley Park, Macclesfield biological hypothesis, the physicochemical properties of data sets available can limit the degree to which a physics- SK10 4TG, UK. small-molecule drug candidates also have an important based approach may be rigorously applied owing to limi- 2Respiratory, Inflammation impact, through their influence on ADMET (absorption, tations in computational resources. Empirical methods, and Autoimmunity Innovative distribution, metabolism, excretion and toxicity) char- however, are particularly suited for the analysis of the large Medicines Unit, AstraZeneca R&D, Pepparedsleden 1, acteristics and on the effectiveness of the compounds at volumes of data that are now available from the routine 2 431 83 Mölndal, Sweden. selectively engaging their targets in humans . use of high- and medium-throughput in vitro biological 3Chemistry Innovation Given such issues, there has long been interest in and ADMET assays in drug discovery. We term the suite Centre, Discovery Sciences, the use of computational approaches to help guide the of available empirically based methods ‘chemical pre- AstraZeneca R&D, (TABLE 1) Pepparedsleden 1, selection and optimization of compounds for synthesis dictive modelling’ . 431 83 Mölndal, Sweden. and testing in order to reduce the risks of failure related In this Review, we describe the development of some 4Present address: AkzoNobel to their physicochemical properties3. Many papers have of the most important chemical predictive modelling tools Surface Chemistry, been published that discuss the properties of a ‘quality that are currently used in the industry and discuss some Hamnvägen 2, 444 85 compound’ — a compound that is more likely to robustly of their limitations as well as the cultural aspects that may Stenungsund, Sweden. 5CNS & Pain Innovative test the biological hypothesis in the clinic. prevent these approaches from realizing their potential. Medicines, AstraZeneca R&D, Such computational approaches can be broadly 151 85 Södertälje, Sweden. divided into physics-based and empirically based meth- What defines compound quality? 6Present address: ods. Physics-based methods encompass, for example, A universally accepted definition of compound quality Proteostasis Therapeutics, 200 Technology Square, molecular dynamics and the prediction of binding has not been established. However, multiple landmark Cambridge, Massachusetts affinity by methods such as free energy perturbation publications over the past 15 years or so have indicated 02139, USA. and quantum chemical calculations. Empirical methods the importance of various physicochemical properties, Correspondence to are based on observed patterns in existing data, which are particularly lipophilicity. J.G.C. and A.M.D. used to guide the design of future compounds; examples The pioneering ‘rule of five’ guidelines published by e-mails: John.Cumming@ astrazeneca.com; of such methods include quantitative structure–activity Lipinski et al. in 1997 proposed simple physicochemical 4 [email protected] relationships (QSARs), rule-based systems and expert property-based guidelines for drug permeability (BOX 1). doi:10.1038/nrd4128 systems. They do not rely on any understanding of the By analysing a set of drugs that had entered clinical trials, 948 | DECEMBER 2013 | VOLUME 12 www.nature.com/reviews/drugdisc © 2013 Macmillan Publishers Limited. All rights reserved REVIEWS Table 1 | Selected commonly used tools for chemical predictive modelling Tool or toolkit URL Cheminformatics toolkits Daylight toolkit http://www.daylight.com OpenEye Scientific Software toolkit http://www.eyesopen.com ChemAxon http://www.chemaxon.com The Chemistry Development Kit102 http://sourceforge.net/projects/cdk/files/cdk RDkit http://www.rdkit.org Dragon descriptors103 http://www.talete.mi.it Batch Modules for ACD/Percepta http://www.acdlabs.com/products/percepta/batch.php Statistical tools and toolkits The R Project for Statistical Computing http://www.r-project.org Bioclipse59 http://bioclipse.net JMP statistical discovery software http://www.jmp.com Pipelining tools Pipeline Pilot (Accelrys) http://accelrys.com/products/pipeline-pilot Knime104 http://www.knime.org Data visualization tools Spotfire http://spotfire.tibco.com Vortex (Dotmatics) http://www.dotmatics.com/products/vortex it was found that the following rules pertained to a large study of the toxicological outcomes of 245 compounds proportion of the compounds: molecular mass ≤500 Da; in development at Pfizer, which found that compounds calculated LogP (cLogP) ≤5; number of hydrogen-bond with cLogP >3 and total polar surface area <75 Å2 were donors ≤5; and number of hydrogen-bond acceptors ≤10. six times more likely to show an adverse event in a rat or It was suggested that compounds that violate any two of dog in vivo safety study than a compound with cLogP <3 the ‘rule of five’ conditions are unlikely to be oral drugs4. and total polar surface area >75 Å2 (REF. 12). Flexibility, The recognition that lead optimization often resulted molecular complexity and shape are additional properties in increased lipophilicity and molecular size prompted that have received attention from some in the field14,15. the definition of the ‘lead-like’ concept5,6. This sug- Where potency information is available, the most gested that screening libraries should be preferentially recent evolution of these guidelines is the proposal of populated with smaller and less lipophilic compounds various ligand efficiency concepts that that build on the than those described by Lipinski’s ‘drug-like’ definitions. concepts of lead-likeness and of discriminating between Leads that are smaller and less lipophilic than drug-like optimal and non-optimal binders, as first suggested by compounds would provide ‘headroom’ for lead optimi- Andrews16. Ligand efficiency17, ligand lipophilic efficiency zation. These publications had a huge impact on how (LLE)18 and LogP divided by ligand efficiency (LELP)19 are medicinal chemists defined compound quality and led size-, lipophilicity- and size-plus-lipophilicity-corrected to an increase in the use of in silico approaches for drug measures of potency that help to identify compounds design; for example, medicinal chemists would compu- that are maximizing the use of their chemical struc- tationally filter compound collections and compounds ture in desirable binding and are therefore likely to be proposed for synthesis to only include those with cal- better leads. culated physiochemical properties that were sufficiently For example, scientists at Astex have reported that lead- or drug-like. companies focusing on leads with ligand efficiency and Over the past 15 years, many developments on these lipophilic efficiency find more robust SARs and produce guidelines have been published7,8 to supplement and candidate drugs with a more acceptable compound prop- fine-tune recommended molecular property ranges for erty profile20. Similar findings were reported by Leeson fragments9, target classes, disease areas10 and ADMET and St Gallay21; in their target-by‑target comparison, characteristics11,12. Some of the guidelines have been companies that applied rigorous ligand efficiency and challenged and new ones proposed. Based on studies of LLE optimization produced candidate drugs that were the temporal invariance of physicochemical properties, smaller and less lipophilic. Finally, Keserü and colleagues cLogP it has been questioned whether the focus on molecular analysed data on the ADMET properties of compounds The calculated logarithm of the 1‑octanol–water partition mass is justified, as lipophilicity is the more fundamen- published by Pfizer and showed that LELP can discrim­ 13 coefficient of the non-ionized tal controlling property . The importance of lipophi-
Recommended publications
  • FDA-Approved Drugs with Potent in Vitro Antiviral Activity Against Severe Acute Respiratory Syndrome Coronavirus 2
    pharmaceuticals Article FDA-Approved Drugs with Potent In Vitro Antiviral Activity against Severe Acute Respiratory Syndrome Coronavirus 2 1, , 1, 2 1 Ahmed Mostafa * y , Ahmed Kandeil y , Yaseen A. M. M. Elshaier , Omnia Kutkat , Yassmin Moatasim 1, Adel A. Rashad 3 , Mahmoud Shehata 1 , Mokhtar R. Gomaa 1, Noura Mahrous 1, Sara H. Mahmoud 1, Mohamed GabAllah 1, Hisham Abbas 4 , Ahmed El Taweel 1, Ahmed E. Kayed 1, Mina Nabil Kamel 1, Mohamed El Sayes 1, Dina B. Mahmoud 5 , Rabeh El-Shesheny 1 , Ghazi Kayali 6,7,* and Mohamed A. Ali 1,* 1 Center of Scientific Excellence for Influenza Viruses, National Research Centre, Giza 12622, Egypt; [email protected] (A.K.); [email protected] (O.K.); [email protected] (Y.M.); [email protected] (M.S.); [email protected] (M.R.G.); [email protected] (N.M.); [email protected] (S.H.M.); [email protected] (M.G.); [email protected] (A.E.T.); [email protected] (A.E.K.); [email protected] (M.N.K.); [email protected] (M.E.S.); [email protected] (R.E.-S.) 2 Organic & Medicinal Chemistry Department, Faculty of Pharmacy, University of Sadat City, Menoufia 32897, Egypt; [email protected] 3 Department of Biochemistry & Molecular Biology, Drexel University College of Medicine, Philadelphia, PA 19102, USA; [email protected] 4 Department of Microbiology and Immunology, Zagazig University, Zagazig 44519, Egypt; [email protected] 5 Pharmaceutics Department, National Organization for Drug Control and Research, Giza 12654, Egypt; [email protected] 6 Department of Epidemiology, Human Genetics, and Environmental Sciences, University of Texas, Houston, TX 77030, USA 7 Human Link, Baabda 1109, Lebanon * Correspondence: [email protected] (A.M.); [email protected] (G.K.); [email protected] (M.A.A.) Contributed equally to this work.
    [Show full text]
  • Istls Information Services to Life Science Internet Bioinformatics Resources Josef Maier [E-Mail: [email protected]] Last Checked August, 17Th, 2011
    IStLS Information Services to Life Science Internet Bioinformatics Resources Josef Maier [e-mail: [email protected]] Last checked August, 17th, 2011 IStLS Bioinformatics Resources http://www.istls.de/bioinfolinks.php Courses and lectures Bioinformatics - Online Courses and Tutorials http://www.bioinformatik.de/cgi-bin/browse/Catalog/Research_and_Education/Online_Courses_and_Tutorials/ EMBRACE Network of Excellence http://www.embracegrid.info/page.php EMBNet Quick Guides http://www.embnet.org/node/64 EMBNet Courses http://www.embnet.org/ Sequence Analysis with distributed Resources http://bibiserv.techfak.uni-bielefeld.de/sadr/ Tutorial Protein Structures (EXPASY) SwissModel http://swissmodel.expasy.org/course/course-index.htm CMBI Courses for protein structure http://swift.cmbi.ru.nl/teach/courses/index.html 2Can Support Portal - Bioinformatics educational resource http://www.ebi.ac.uk/2can Bioconductor Workshops http://www.bioconductor.org/workshops/ CBS Bioinformatics Courses http://www.cbs.dtu.dk/courses.php The European School In Bioinformatics (Biosapiens) http://www.biosapiens.info/page.php?page=esb Institutes Centers Networks Bioinformatics Institutes Germany WSI Wilhelm-Schickard-Institut für Informatik - Universitaet Tuebingen http://www.uni-tuebingen.de/en/faculties/faculty-of-science/departments/computer-science/department.html WSI Huson - Algorithms in Bioinformatics http://www-ab.informatik.uni-tuebingen.de/welcome.html WSI Prof. Zell - Computer Architecture http://www.ra.cs.uni-tuebingen.de/ WSI Kohlbacher - Div. for Simulation
    [Show full text]
  • Open Data, Open Source, and Open Standards in Chemistry: the Blue Obelisk Five Years On" Journal of Cheminformatics Vol
    Oral Roberts University Digital Showcase College of Science and Engineering Faculty College of Science and Engineering Research and Scholarship 10-14-2011 Open Data, Open Source, and Open Standards in Chemistry: The lueB Obelisk five years on Andrew Lang Noel M. O'Boyle Rajarshi Guha National Institutes of Health Egon Willighagen Maastricht University Samuel Adams See next page for additional authors Follow this and additional works at: http://digitalshowcase.oru.edu/cose_pub Part of the Chemistry Commons Recommended Citation Andrew Lang, Noel M O'Boyle, Rajarshi Guha, Egon Willighagen, et al.. "Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on" Journal of Cheminformatics Vol. 3 Iss. 37 (2011) Available at: http://works.bepress.com/andrew-sid-lang/ 19/ This Article is brought to you for free and open access by the College of Science and Engineering at Digital Showcase. It has been accepted for inclusion in College of Science and Engineering Faculty Research and Scholarship by an authorized administrator of Digital Showcase. For more information, please contact [email protected]. Authors Andrew Lang, Noel M. O'Boyle, Rajarshi Guha, Egon Willighagen, Samuel Adams, Jonathan Alvarsson, Jean- Claude Bradley, Igor Filippov, Robert M. Hanson, Marcus D. Hanwell, Geoffrey R. Hutchison, Craig A. James, Nina Jeliazkova, Karol M. Langner, David C. Lonie, Daniel M. Lowe, Jerome Pansanel, Dmitry Pavlov, Ola Spjuth, Christoph Steinbeck, Adam L. Tenderholt, Kevin J. Theisen, and Peter Murray-Rust This article is available at Digital Showcase: http://digitalshowcase.oru.edu/cose_pub/34 Oral Roberts University From the SelectedWorks of Andrew Lang October 14, 2011 Open Data, Open Source, and Open Standards in Chemistry: The Blue Obelisk five years on Andrew Lang Noel M O'Boyle Rajarshi Guha, National Institutes of Health Egon Willighagen, Maastricht University Samuel Adams, et al.
    [Show full text]
  • Molecular Structure Input on the Web Peter Ertl
    Ertl Journal of Cheminformatics 2010, 2:1 http://www.jcheminf.com/content/2/1/1 REVIEW Open Access Molecular structure input on the web Peter Ertl Abstract A molecule editor, that is program for input and editing of molecules, is an indispensable part of every cheminfor- matics or molecular processing system. This review focuses on a special type of molecule editors, namely those that are used for molecule structure input on the web. Scientific computing is now moving more and more in the direction of web services and cloud computing, with servers scattered all around the Internet. Thus a web browser has become the universal scientific user interface, and a tool to edit molecules directly within the web browser is essential. The review covers a history of web-based structure input, starting with simple text entry boxes and early molecule editors based on clickable maps, before moving to the current situation dominated by Java applets. One typical example - the popular JME Molecule Editor - will be described in more detail. Modern Ajax server-side molecule editors are also presented. And finally, the possible future direction of web-based molecule editing, based on tech- nologies like JavaScript and Flash, is discussed. Introduction this trend and input of molecular structures directly A program for the input and editing of molecules is an within a web browser is therefore of utmost importance. indispensable part of every cheminformatics or molecu- In this overview a history of entering molecules into lar processing system. Such a program is known as a web applications will be covered, starting from simple molecule editor, molecular editor or structure sketcher.
    [Show full text]
  • Spoken Tutorial Project, IIT Bombay Brochure for Chemistry Department
    Spoken Tutorial Project, IIT Bombay Brochure for Chemistry Department Name of FOSS Applications Employability GChemPaint GChemPaint is an editor for 2Dchem- GChemPaint is currently being developed ical structures with a multiple docu- as part of The Chemistry Development ment interface. Kit, and a Standard Widget Tool kit- based GChemPaint application is being developed, as part of Bioclipse. Jmol Jmol applet is used to explore the Jmol is a free, open source molecule viewer structure of molecules. Jmol applet is for students, educators, and researchers used to depict X-ray structures in chemistry and biochemistry. It is cross- platform, running on Windows, Mac OS X, and Linux/Unix systems. For PG Students LaTeX Document markup language and Value addition to academic Skills set. preparation system for Tex typesetting Essential for International paper presentation and scientific journals. For PG student for their project work Scilab Scientific Computation package for Value addition in technical problem numerical computations solving via use of computational methods for engineering problems, Applicable in Chemical, ECE, Electrical, Electronics, Civil, Mechanical, Mathematics etc. For PG student who are taking Physical Chemistry Avogadro Avogadro is a free and open source, Research and Development in Chemistry, advanced molecule editor and Pharmacist and University lecturers. visualizer designed for cross-platform use in computational chemistry, molecular modeling, material science, bioinformatics, etc. Spoken Tutorial Project, IIT Bombay Brochure for Commerce and Commerce IT Name of FOSS Applications / Employability LibreOffice – Writer, Calc, Writing letters, documents, creating spreadsheets, tables, Impress making presentations, desktop publishing LibreOffice – Base, Draw, Managing databases, Drawing, doing simple Mathematical Math operations For Commerce IT Students Drupal Drupal is a free and open source content management system (CMS).
    [Show full text]
  • Molecular Obesity, Potency and Other Addictions in Drug Discovery
    Molecular Obesity, Potency and other Addictions in Drug Discovery Mike Hann Bio-Molecular Structure MDR Chemical Sciences GSK Medicines Research Centre Stevenage, UK [email protected] The challenge of drug discovery – Patients are still in need of more effective drugs for many diseases. – Payers are increasingly only prepared to pay for innovative rather than derivative drugs. – Investors believe they can get a better return on investment elsewhere – Legislators are demanding that only the very safest possible drugs are licensed. – Researchers are equally frustrated – despite all the accumulated knowledge from the new technologies, the challenge of successfully navigating through everything to find novel drugs seems to get harder rather than easier. – the fruit is possibly not as low hanging as before – The consequence of all this is that the average cost is now $1.8bn*! – We owe it to patients and society to do better than this. – *How to improve R&D productivity: the pharmaceutical industry's grand challenge. S. M. Paul, D. S. Mytelka, C. T. Dunwiddie, C. C. Persinger, B. H. Munos, S. R. Lindborg and A. L. Schacht, Nat. Rev. Drug Discovery, 2010, 9, 203. Learning from our mistakes – the resurgence of reason based on metadata from big Pharma Emergence of rules of thumb as guidance – Permeability/solubility: Pfizer analysis of existing drugs and oral absorption profile – (Lipinski, Adv. Drug. Del. Revs. 1997, 23, 3) – Mol Wt <500, LogP <5, OH + NH count <5, O + N count <10: 90% of oral drugs do not fail more than one of these rules. – Lipinski Rule of 5 – Receptor Promiscuity: AZ analysis of 2133 compounds in >200 Cerep Bioprint® assays – (Leeson & Springthorpe, Nat.
    [Show full text]
  • Improving Drug Discovery Efficiency Via in Silico Calculation of Properties
    Improving Drug Discovery Efficiency via In Silico Calculation of Properties D. Ortwine 1 16th North American Regional ISSX meeting , 10/18/09 Outline • Background: Why Calculate Properties? • Calculable properties • Modeling Methods and Molecule Descriptors • Reporting Results From Calculations • Available Commercial Software • Strategies for Implementation • A Real Project Example • The Future • Conclusions • References 2 16th North American Regional ISSX meeting , 10/18/09 Lead Optimization in Drug Discovery The Needle in the Haystack 3 16th North American Regional ISSX meeting , 10/18/09 Why Calculate Properties? They can be related to the developability of drugs! Most marketed oral drugs have defined “property profiles” Paul D. Leeson and Brian Springthorpe, “The Influence of Drug-Like Concepts on Decision-Making in Medicinal Chemistry”, Nature Reviews Drug Discovery, vol. 6, pp. 881-890, 2007. Mark C. Wenlock, et.al, “A Comparison of Physiochemical Property Profiles of Development and Marketed Oral Drugs”, J. Med. Chem, 2003, 46, 1250-1256. 4 16th North American Regional ISSX meeting , 10/18/09 Why Calculate Properties? They can also be related to the ADMET Profile M. Paul Gleeson. Generation of a Set of Simple, Interpretable ADMET Rules of Thumb. J. Med. Chem. (2008), 51(4), 817-834. 5 16th North American Regional ISSX meeting , 10/18/09 Why Calculate Properties? • Prioritize synthesis -> Generate virtual individual molecules or combinatorial libraries, calculate properties, map back to R groups • Build an understanding of SAR •
    [Show full text]
  • Openchrom: a Cross-Platform Open Source Software for the Mass Spectrometric Analysis of Chromatographic Data Philip Wenig*, Juergen Odermatt
    Wenig and Odermatt BMC Bioinformatics 2010, 11:405 http://www.biomedcentral.com/1471-2105/11/405 SOFTWARE Open Access OpenChrom: a cross-platform open source software for the mass spectrometric analysis of chromatographic data Philip Wenig*, Juergen Odermatt Abstract Background: Today, data evaluation has become a bottleneck in chromatographic science. Analytical instruments equipped with automated samplers yield large amounts of measurement data, which needs to be verified and analyzed. Since nearly every GC/MS instrument vendor offers its own data format and software tools, the consequences are problems with data exchange and a lack of comparability between the analytical results. To challenge this situation a number of either commercial or non-profit software applications have been developed. These applications provide functionalities to import and analyze several data formats but have shortcomings in terms of the transparency of the implemented analytical algorithms and/or are restricted to a specific computer platform. Results: This work describes a native approach to handle chromatographic data files. The approach can be extended in its functionality such as facilities to detect baselines, to detect, integrate and identify peaks and to compare mass spectra, as well as the ability to internationalize the application. Additionally, filters can be applied on the chromatographic data to enhance its quality, for example to remove background and noise. Extended operations like do, undo and redo are supported. Conclusions: OpenChrom is a software application to edit and analyze mass spectrometric chromatographic data. It is extensible in many different ways, depending on the demands of the users or the analytical procedures and algorithms. It offers a customizable graphical user interface.
    [Show full text]
  • Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb
    pipenightdreams osgcal-doc mumudvb mpg123-alsa tbb-examples libgammu4-dbg gcc-4.1-doc snort-rules-default davical cutmp3 libevolution5.0-cil aspell-am python-gobject-doc openoffice.org-l10n-mn libc6-xen xserver-xorg trophy-data t38modem pioneers-console libnb-platform10-java libgtkglext1-ruby libboost-wave1.39-dev drgenius bfbtester libchromexvmcpro1 isdnutils-xtools ubuntuone-client openoffice.org2-math openoffice.org-l10n-lt lsb-cxx-ia32 kdeartwork-emoticons-kde4 wmpuzzle trafshow python-plplot lx-gdb link-monitor-applet libscm-dev liblog-agent-logger-perl libccrtp-doc libclass-throwable-perl kde-i18n-csb jack-jconv hamradio-menus coinor-libvol-doc msx-emulator bitbake nabi language-pack-gnome-zh libpaperg popularity-contest xracer-tools xfont-nexus opendrim-lmp-baseserver libvorbisfile-ruby liblinebreak-doc libgfcui-2.0-0c2a-dbg libblacs-mpi-dev dict-freedict-spa-eng blender-ogrexml aspell-da x11-apps openoffice.org-l10n-lv openoffice.org-l10n-nl pnmtopng libodbcinstq1 libhsqldb-java-doc libmono-addins-gui0.2-cil sg3-utils linux-backports-modules-alsa-2.6.31-19-generic yorick-yeti-gsl python-pymssql plasma-widget-cpuload mcpp gpsim-lcd cl-csv libhtml-clean-perl asterisk-dbg apt-dater-dbg libgnome-mag1-dev language-pack-gnome-yo python-crypto svn-autoreleasedeb sugar-terminal-activity mii-diag maria-doc libplexus-component-api-java-doc libhugs-hgl-bundled libchipcard-libgwenhywfar47-plugins libghc6-random-dev freefem3d ezmlm cakephp-scripts aspell-ar ara-byte not+sparc openoffice.org-l10n-nn linux-backports-modules-karmic-generic-pae
    [Show full text]
  • The Chemistry Development Kit (CDK). 3
    Willighagen et al. RESEARCH The Chemistry Development Kit (CDK). 3. Atom typing, Rendering, Molecular Formula, and Substructure Searching Egon L Willighagen1*, John W May2, Jonathan Alvarsson3, Arvid Berg3, Nina Jeliazkova4, Tom´aˇsPluskal7, Miguel Rojas-Cherto??, Ola Spjuth3, Gilleain Torrance??, Rajarshi Guha5 and Christoph Steinbeck6 *Correspondence: [email protected] Abstract 1 Dept of Bioinformatics - BiGCaT, NUTRIM, Maastricht Background: Cheminformatics is a well-established field with many applications University, NL-6200 MD, in chemistry, biology, drug discovery, and others. The Chemistry Development Kit Maastricht, The Netherlands Full list of author information is (CDK) has become a widely used Open Source cheminformatics toolkit, available at the end of the article providing various models to represent chemical structures, of which the chemical graph is essential. However, in the first five years of the project increased so much in size that interdependencies between components grew unmanageable large, resulting in unpredictable instabilities. Results: We here report improvements to the CDK since the 1.2 release series made to accommodate both the increased complexity of the library, as well as significant improvements of and additions to the functionality of the library. Second, we outline how the CDK evolved with respect to quality control and the approach we have adopted to ensure stability, including a peer review mechanism. Additionally, a selection of the new APIs that have been introduced will be discussed: atom type perception, substructure searching, molecular fingerprints, rendering of molecules, and handling of molecular formulas. Conclusions: With this paper we have shown the continued effort to provide a free, Open Source cheminformatics library, and show that such collaborative projects can exist over a long period.
    [Show full text]
  • Open Source Molecular Modeling
    Accepted Manuscript Title: Open Source Molecular Modeling Author: Somayeh Pirhadi Jocelyn Sunseri David Ryan Koes PII: S1093-3263(16)30118-8 DOI: http://dx.doi.org/doi:10.1016/j.jmgm.2016.07.008 Reference: JMG 6730 To appear in: Journal of Molecular Graphics and Modelling Received date: 4-5-2016 Accepted date: 25-7-2016 Please cite this article as: Somayeh Pirhadi, Jocelyn Sunseri, David Ryan Koes, Open Source Molecular Modeling, <![CDATA[Journal of Molecular Graphics and Modelling]]> (2016), http://dx.doi.org/10.1016/j.jmgm.2016.07.008 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. Open Source Molecular Modeling Somayeh Pirhadia, Jocelyn Sunseria, David Ryan Koesa,∗ aDepartment of Computational and Systems Biology, University of Pittsburgh Abstract The success of molecular modeling and computational chemistry efforts are, by definition, de- pendent on quality software applications. Open source software development provides many advantages to users of modeling applications, not the least of which is that the software is free and completely extendable. In this review we categorize, enumerate, and describe available open source software packages for molecular modeling and computational chemistry. 1. Introduction What is Open Source? Free and open source software (FOSS) is software that is both considered \free software," as defined by the Free Software Foundation (http://fsf.org) and \open source," as defined by the Open Source Initiative (http://opensource.org).
    [Show full text]
  • Bioclipse-Opentox: Interactive Predictive Toxicology
    Bioclipse-OpenTox: Interactive Predictive Toxicology Egon Willighagen, @egonwillighagen Dept. of Bioinformatics - BiGCaT - Maastricht University orcid.org/0000-0001-7542-0286 DepartmentACS New of Bioinformatics Orleans, -9 BiGCaT April 2013, #ACSNola #DrugDisco 1 March 11 2013 Department of Bioinformatics - BiGCaT 2 SEURAT-1 “...replacing animal testing” Department of Bioinformatics - BiGCaT 3 OpenTox / ToxBank Department of Bioinformatics - BiGCaT 4 ToxBank Kohonen, P. et al. MolInf 2013 32(1):47-63. Department of Bioinformatics - BiGCaT 5 Alternative methods … Department of Bioinformatics - BiGCaT 6 Alternative methods … … computational. Department of Bioinformatics - BiGCaT 7 Alternative methods … … computational. “But they don't work, right?” Department of Bioinformatics - BiGCaT 8 How to integrate complementary info? • Experimental • Computational – Cell line – “COMP” stuff – Rat models – “CINF” stuff – Environmetal data – Systems biology – ... Needs: integrate, visualize, analyze. Department of Bioinformatics - BiGCaT 9 Integration platform: Bioclipse Spjuth, O et al. BMC bioinformatics 2007 8(1):59. Department of Bioinformatics - BiGCaT 10 Hanson, RM. Appl Cryst 2010 43(5):1250-1260. Department of Bioinformatics - BiGCaT 11 Why Bioclipse? • Open Source eco-system – Jmol, CDK, OPSIN, ... O’Boyle, NM et al. JChemInf 2011 3(1):1-15. Department of Bioinformatics - BiGCaT 12 Many extensions... Core bioclipse: latex, ui, bioclipse, xml, js, balloon, cdk, rdf, inchi, cml, moltable, jcp, jcpprops Additional libraries: bridgedb, metfrag, metware, joelib, oscar, opsin, r, pellet, specmol, spectrum, bibtex, owl, ds, qsar, ambit, structuredb Online services: cir, opentox, google, gist, myexperiment, sadi, pubchem, pubmed, nmrshiftdb, twitter … and more. Department of Bioinformatics - BiGCaT 13 Decision Support (Ola Spjuth) Spjuth, O. et al. JCIM 2011 51(8):1840-1847. Department of Bioinformatics - BiGCaT 14 OpenTox Hardy, B.
    [Show full text]