Navigating Chemical Space by Interfacing Generative Artificial Intelligence and Molecular Docking

Total Page:16

File Type:pdf, Size:1020Kb

Navigating Chemical Space by Interfacing Generative Artificial Intelligence and Molecular Docking bioRxiv preprint doi: https://doi.org/10.1101/2020.06.09.143289; this version posted June 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Navigating Chemical Space By Interfacing Generative Artificial Intelligence and Molecular Docking Ziqiao Xua, Orrette Wauchopeb, and Aaron T. Franka,c athe University of Michigan, Department of Chemistry, Ann Arbor, 48109, United States of America; bCity University of New York, Baruch College, Department of Natural Sciences, New York, 10010, United States of America; cthe University of Michigan, Biophysics, Ann Arbor, 48109, United States of America This manuscript was compiled on June 10, 2020 Here we report the testing and application of a simple, structure-aware framework to design target-specific screening libraries for drug development. Our approach combines advances in generative artificial intelligence (AI) with conventional molecular docking to rapidly explore chemical space conditioned on the unique physiochemical properties of the active site of a biomolecular target. As a proof-of- concept, we used our framework to construct a focused library for cyclin-dependent kinase type-2 (CDK2). We then used it to rapidly generate a library specific to the active site of the main protease (Mpro) of the SARS-CoV-2 virus, which causes COVID-19. By comparing approved and experimental drugs to compounds in our library, we also identified six drugs, namely, Naratriptan, Etryptamine, Panobinostat, Procainamide, Sertraline, and Lidamidine, as possible SARS-CoV-2 Mpro targeting compounds and, as such, potential drug repurposing candidates. To complement the open-science COVID-19 drug discovery initiatives, we make our SARS-CoV-2 Mpro library fully accessible to the research community (https://github.com/atfrank/SARS-CoV-2). Virtual Screening Libraries Biophysical Modeling Computational Drug Discovery | | Developing new drugs is costly and time- Fortunately, advances in generative artificial intel- consuming; by some estimates, bringing a new drug ligence now make it possible to design both chemi- to market requires 13 years at the cost of US$1 cally novel and synthetically feasible compounds in ≥ ≥ billion.(1) As such, there is a keen interest in accel- silico.(18–23) Noteworthy among these frameworks erating the drug development pipeline and reduce are those based on autoencoders. Autoencoders its cost. In principle, computer-aided drug devel- are unsupervised machine learning models that, by opment can both accelerate and lower the cost of virtue of their architecture, are able to learn a com- identifying viable drug candidates.(2) For instance, pact, latent space representation of chemical space during hit identification, where the goal is to identify when they are trained with molecular data. The task small molecules that may bind to the target, one of designing target-specific libraries can, therefore, could restrict testing to only the compounds that be cast as one of exploring the latent space of such are predicted to bind to the target with high affinity. models, conditioned on the unique properties of the However, the computational cost of screening large active site on the target of interest. Currently lack- chemical libraries can still be burdensome. The com- ing, however, are efficient, structure-aware strategies putation cost of virtual screening could be reduced for sampling the latent space of generative molec- by working with smaller and more focused, target- ular models. If developed, such techniques could specific virtual libraries, enriched in compounds that find utility in constructing active-site and target- are likely to bind to a specific site on the target of specific virtual libraries that are likely to contain hit interest.(3) Such libraries can be constructed by car- compounds that, if synthesized, might bind to the rying out de novo design, during which compounds targeted site. More immediately, they could be used are designed on-the-fly, guided by the unique physio- to construct a ligand-based pharmacophore model chemical properties of the active site of the target.(4– that can be used to screen millions of compounds in 17) However, because of the difficulty in formalizing a matter of seconds.(24) diverse reaction rules and implementing robust algo- Here, we tested and implemented a novel, rithms to intelligently apply them, up to this point, structure-based framework for constructing target- such de novo design frameworks often produce syn- specific libraries by sampling the latent space of gen- thetically inaccessible compounds. erative machine learning models conditioned on the 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.06.09.143289; this version posted June 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. SAMPLE DOCK O NH O NH O a d O O Chemical NH NH N NH NH Space N N NH N N N NH N b Tanimoto = 0.7491 Tanimoto = 0.7208 Tanimoto = 0.7101 PDB ID: 2R3I PDB ID: 2VTI Scaffold DOCK PDB ID: 2EXM DrugBank ID: DB08534 DrugBank ID: DB08133 (benzene) Design 1 (Target) DrugBank ID: DB08768 Score N N SAMPLE DOCK Filter Best O Design 2 N (VAE) (Target) Design NH NH N N O NH NH O DOCK Design n Tanimoto = 0.6826 (Target) Tanimoto = 0.6997 PDB ID: 1VYZ PDB ID: 1VYW DrugBank ID: DB02647 DrugBank ID: DB06944 4 N c 1.0 O N 0.8 O 3 Increasing Increasing NH NH Drug-likeness 0.6 Synthetic feasibility 2 N N N Density Density N N 0.4 N N N 1 O NH 0.2 0 Tanimoto = 0.6418 0.0 Tanimoto = 0.6835 Tanimoto = 0.6533 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 2 3 4 5 PDB ID: 2R3N PDB ID: 2BHH PDB ID: 2R3P QED SAS DrugBank ID: DB08539 DrugBank ID: DB03583 DrugBank ID: DB07065 Fig. 1. Illustration of (a) the “sample-and-dock” concept that can be used to implement (b) a structure-aware design pipeline. (c) Computed properties of compounds design to target the active site of CDK2. (d) Representative compounds in our sample-and-dock library that closely resemble known inhibitors of CDK2. Highlighted are sub-substructures that are identical to those found in the most similar compound in our sample-and-dock virtual library. unique physiochemical properties of the “active site” We chose CDK2 because it is an important thera- of a target. We reasoned that we could implement peutic target that has been extensively studied bio- such a pipeline by combining generative molecular chemically and structurally, and for which a large machine learning models with molecular computer library of known inhibitors is readily accessible.(25) docking algorithms. For instance, one could start by To achieve this, we first interfaced the junction-tree embedding a reference compound, e.g., benzene, into variational autoencoder (JTVAE)(26)withthedock- the latent space of a generative autoencoder model ing program, rDock.(27) JTVAE was chosen because and sample latent space points in its “neighborhood.” it has demonstrated a superior ability to generate syn- The sampled points can then be decoded into their thetically feasible molecules, and rDock was chosen corresponding 3-dimensional (3D) representations because it is fast, and the software is open-sourced. and then docked onto the target (Figure 1a). Of We then applied our JTVAE-rDock sample-and-dock these sampled compounds, the one that is predicted pipeline (Figure 1a and b) to CDK2. For this CDK2 to interact most favorably with the active site can be test case, we initiated sample-the-dock pipeline by identified and then used as the new reference. The first mapping the CDK2 active site using the crys- process can be repeated (Figure 1b), and in so do- tal structure of CDK2 complexed with Roniciclib ing, the latent-space of a generative molecular model (PDB: 5IEV).(28) The JTVAE-rDock sample-and- (and, so chemical space) explored in a semi-directed dock pipeline was then initialized using benzene as manner, conditioned on the properties of the active a starting scaffold. We chose to use benzene as the site of the target. The compounds sampled during starting scaffold because it is small and a chemi- this iterative (“sample-and-dock”) process could then cal motif that is ubiquitous in known drugs. Each be used to construct a target-specific virtual library. sample-and-dock cycle consisted of sampling and First, we implemented a sample-and-dock pipeline docking 20, unique compounds. For each compound, and then used it to construct a target-specific library 100 poses were generated during docking. The com- for the cyclin-dependent kinase 2 (CDK2) protein. pound with the highest predicted binding affinity 2 | Xu et al. bioRxiv preprint doi: https://doi.org/10.1101/2020.06.09.143289; this version posted June 11, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. a P-III P-III P-III P-II P-II P-II P-IV P-IV P-IV P-V P-V P-V P-I P-I P-I SD-2020-1 SD-2020-2 SD-2020-3 Docking Score: -30.66 kcal/mol Docking Score: -27.10 kcal/mol Docking Score: -28.36 kcal/mol Drug-likesness: 0.51 Drug-likesness: 0.59 Drug-likesness: 0.75 b 2.81 Å His41 2.70 Å 164 3.15 Å His 2.89 Å His164 2.70 Å 2.73 Å 3.08 Å 2.55 Å 2.69 Å 3.14 Å 2.82 Å Cys145 2.68 Å Cys145 3.11 Å 3.18 Å Ser144 Leu141 Leu141 Leu141 Ser144 Ser144 c NH 2 N O NH HN N S H O NH N H O Tanimoto: 0.65 Tanimoto: 0.64 NH Tanimoto: 0.61 HO Naratriptan Etryptamine Panobinostat DrugBank ID: DB00952 DrugBank ID: DB01546 DrugBank ID: DB06603 Cl O O O NH N 2 O N Cl H N N N N H H H N NH 2 H Tanimoto: 0.52 NH 2 Lidamidine Tanimoto: 0.56 Tanimoto: 0.56 Tanimoto: 0.54 Procainamide Parsalmide Sertraline DrugBank ID: DB01035 DrugBank ID: DB01104 Fig.
Recommended publications
  • Visualization of Chemical Space Using Principal Component Analysis
    World Applied Sciences Journal 29 (Data Mining and Soft Computing Techniques): 53-59, 2014 ISSN 1818-4952 © IDOSI Publications, 2014 DOI: 10.5829/idosi.wasj.2014.29.dmsct.10 Visualization of Chemical Space Using Principal Component Analysis B. Firdaus Begam and J. Satheesh Kumar Department of Computer Applications, School of Computer Science and Engineering, Bharathiar University, Coimbatore, Tamilnadu, India Abstract: Principal component analysis is one of the most widely used multivariate methods to visualize chemical space in new dimension by the chemist for analysing data. In Multivariate data analysis, the relationship between two variables with more number of characteristics can be considered. PCA provides a compact view of variation in chemical data matrix which helps in creating better Quantitative Structure Activity Relationship (QSAR) model. It highlights the dominating pattern in the matrix through principal component and graphical representation. This paper focuses on mathematical aspects of principal components and role of PCA on Maybridge dataset to identify dominating hidden patterns of drug likeness based on Lipinski RO5. Key words: Principal Component Analysis Load p lot Score plot Biplot INTRODUCTION a new model of selected objects and variables with minimal loss of information. It can also be used for Principal Component Analysis (PCA) is a multivariate prediction model as PCA acts as exploratory tool for data statistical approach to analyze data in lower dimensional analysis by calculating the variance among the variables space. PCA used vector space transformation technique which are uncorrelated [8]. to view datasets from higher-dimensional space to lower Chemical space (drug space) is defined as dimensional space. PCA is an application of linear algebra number of descriptors calculated for each molecule [2] which was first coined by Karl Pearson during 1901 [1].
    [Show full text]
  • Exploring Chemical Space
    Frontier of Chemistry Exploring Chemical Space Shuli Mao 04/12/2008 Shuli Mao @ Wipf Group Page 1 of 40 4/15/2008 What is chemical space? Chemical space is the space spanned by all possible (i.e. energetically stable) stoichiometrical combinations of electrons and atomic nuclei and topologies (isomers) in molecules and compounds in general. -----wilkipedia Chemical space is defined as the total descriptor space that encompasses all the small carbon-based molecules that could in principle be created. (Descriptors: molecular mass, liphophilicity and topological features, etc.) -----Dobson, C. M. Chemical space and biology Nature 2004, 432, 824. Shuli Mao @ Wipf Group Page 2 of 40 4/15/2008 Why we need to explore chemical space? The estimated number of small carbon-based compounds is on the order of 1060. However, only a minute fraction of this chemical space has been explored. Given the vastness of chemical space, we need to explore chemical space efficiently to find biologically relevant chemical space (space in which biologically active compounds reside). Dobson, C. M. Chemical space and biology Nature 2004, 432, 824. • To understand biological systems especially in unknown area • To develop potential drugs From 1994 to 2001, only 22 drugs that modulate new targets were approved. So far, less than 500 human proteins have been targeted by current drugs, which is a small portion of human proteins. Lipinski, C; Hopkins, A. Navigating chemical space for biology and medicine Nature 2004, 432, 855. Shuli Mao @ Wipf Group Page 3 of 40 4/15/2008 How to explore chemical space? Three approaches: (A) Target-Oriented Synthesis (B) Focused Synthesis (Medicinal Chemistry/ Combinatorial Chemistry) (C) Diversity-Oriented Synthesis Burke, M.
    [Show full text]
  • A Thermodynamic Atlas of Carbon Redox Chemical Space
    A thermodynamic atlas of carbon redox chemical space Adrian Jinicha,b,1, Benjamin Sanchez-Lengelinga, Haniu Rena, Joshua E. Goldfordc, Elad Noord, Jacob N. Sanderse, Daniel Segrèc,f,g,h, and Alán Aspuru-Guziki,j,k,l,1 aDepartment of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138; bDivision of Infectious Diseases, Weill Department of Medicine, Weill-Cornell Medical College, New York, NY 10021; cBioinformatics Program and Biological Design Center, Boston University, Boston, MA 02215; dInstitute of Molecular Systems Biology, ETH Zürich, 8093 Zürich, Switzerland; eDepartment of Chemistry and Biochemistry, University of California, Los Angeles, CA 90095; fDepartment of Biology, Boston University, Boston, MA 02215; gDepartment of Biomedical Engineering, Boston University, Boston, MA 02215; hDepartment of Physics, Boston University, Boston, MA 02215; iChemical Physics Theory Group, Department of Chemistry, University of Toronto, Toronto, ON, M5S 3H6 Canada; jDepartment of Computer Science, University of Toronto, Toronto, ON, M5T 3A1 Canada; kCIFAR AI Chair, Vector Institute, Toronto, ON, M5S 1M1 Canada; and lLebovic Fellow, Canadian Institute for Advanced Research, Toronto, ON M5S 1M1 Canada Edited by Pablo G. Debenedetti, Princeton University, Princeton, NJ, and approved November 17, 2020 (received for review April 9, 2020) Redox biochemistry plays a key role in the transduction of central metabolic redox substrates (3). In addition, since its chemical energy in living systems. However, the compounds standard potential is ∼100 mV lower than that of the typical observed in metabolic redox reactions are a minuscule fraction aldehyde/ketone functional group, it effectively decreases the of chemical space. It is not clear whether compounds that ended steady-state concentration of potentially damaging aldehydes/ up being selected as metabolites display specific properties that ketones in the cell (3).
    [Show full text]
  • Cheminformatic Analysis of Natural Products and Their Chemical Space
    NATURAL PRODUCTS IN DRUG DISCOVERY355 doi:10.2533/chimia.2007.355 CHIMIA 2007, 61,No. 6 Chimia 61 (2007) 355–360 ©Schweizerische Chemische Gesellschaft ISSN 0009–4293 Cheminformatic Analysis of Natural Products and their Chemical Space Stefan Wetzela ,Ansgar Schuffenhauerb ,Silvio Roggob ,Peter Ertlb ,and Herbert Waldmann*a Abstract: Cheminformatic methods allow the detailed characterization of particular and characteristic properties of natural products (NPs) and comparison with related characteristics of drugs and other compounds. An overview of the most important properties of natural products and analogues and their difference with respect to drugs and synthetic compounds is presented. Moreover,different approaches to charting the chemical space populated by natural products arereviewed and their underlying principles aredelineated. Some insights about NP chemical space aredescribed together with possible applications of methods charting chemical space. Strengths and weak- nesses of the different approaches will be discussed with respect to possible applications in compound collection design. Keywords: Chemical space ·ChemGPS ·Cheminformatics ·Natural products ·SCONP 1. Introduction al constitute biologically validated starting may be applied in the synthesis of com- points for library design and manyoftheir pounds with NP-likeproperties. There are Natural products (NPs) have been selected core structures have been recognized to be several publications describing the results of during evolution to bind to various proteins privileged structures.[4,5] Known examples statistical analyses of NP properties. In 1999 during their life-cycle, e.g. in biosynthesis for privileged structures in NPs that have Henkel et al. published afirst comparison and degradation and while exerting their successfully been applied in library design of molecular properties of natural products, mode of action.
    [Show full text]
  • Data-Driven Astrochemistry: One Step Further Within the Origin of Life Puzzle
    life Review Data-Driven Astrochemistry: One Step Further within the Origin of Life Puzzle Alexander Ruf 1,2,3,* ID , Louis L. S. d’Hendecourt 3 and Philippe Schmitt-Kopplin 1,2,* 1 Analytical BioGeoChemistry, Helmholtz Zentrum München, Ingolstaedter Landstr. 1, 85764 Neuherberg, Germany 2 Analytical Food Chemistry, Technische Universität München, Maximus-von-Imhof-Forum 2, 85354 Freising, Germany 3 Université Aix-Marseille, Laboratoire de Physique des Interactions Ioniques et Moléculaires (PIIM), UMR CNRS 7345, 13397 Marseille, France; [email protected] * Correspondence: [email protected] (A.R.); [email protected] (P.S.-K.); Tel.: +49-89-3187-3246 (P.S.-K.) Received: 1 April 2018; Accepted: 22 May 2018; Published: 1 June 2018 Abstract: Astrochemistry, meteoritics and chemical analytics represent a manifold scientific field, including various disciplines. In this review, clarifications on astrochemistry, comet chemistry, laboratory astrophysics and meteoritic research with respect to organic and metalorganic chemistry will be given. The seemingly large number of observed astrochemical molecules necessarily requires explanations on molecular complexity and chemical evolution, which will be discussed. Special emphasis should be placed on data-driven analytical methods including ultrahigh-resolving instruments and their interplay with quantum chemical computations. These methods enable remarkable insights into the complex chemical spaces that exist in meteorites and maximize the level of information on the huge astrochemical molecular diversity. In addition, they allow one to study even yet undescribed chemistry as the one involving organomagnesium compounds in meteorites. Both targeted and non-targeted analytical strategies will be explained and may touch upon epistemological problems. In addition, implications of (metal)organic matter toward prebiotic chemistry leading to the emergence of life will be discussed.
    [Show full text]
  • Mapping the Molecular Tree of Life Using Assembly Spaces
    Mapping the Molecular Tree of Life using Assembly Spaces Yu Liu, Cole Mathis, Stuart Marshall, Leroy Cronin* School of Chemistry, the University of Glasgow, University Avenue, Glasgow G12 8QQ, UK. Email: [email protected] The mapping of chemical space by the enumeration of graphs generates an infinite number of molecules, yet the experimental exploration of known chemical space shows that it appears to become sparser as the molecular weight of the compounds increases. What is needed is a way to explore chemical space that exploits the information encoded in known molecules to give access to unknown chemical space by building on the common conserved structures found in related families of molecules. Molecular assembly theory provides an approach to explore and compare the intrinsic complexity of molecules by the minimum number of steps needed to build up the target graphs, and here we show this can be applied to networks of molecules to explore the assembly properties of common motifs, rather than just focusing on molecules in isolation. This means molecular assembly theory can be used to define a tree of assembly spaces, allowing us to explore the accessible molecules connected to the tree, rather than the entire space of possible molecules. This approach provides a way to map the relationship between the molecules and their common fragments and thus measures the distribution of structural information collectively embedded in the molecules. We apply this approach to prebiotic chemistry, specifically the construction of RNA, and a family of opiates and plasticizers, as well as to gene sequences. This analysis allows us to quantify the amount of external information needed to assemble the tree and identify and predict new components in this family of molecules, based on the contingent information in the assembly spaces.
    [Show full text]
  • Alexander Ruf Dissertation
    TECHNISCHE UNIVERSITÄT MÜNCHEN Previously unknown organomagnesium compounds in astrochemical context Alexander Ruf Dissertation TECHNISCHE UNIVERSITÄT MÜNCHEN Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt Lehrstuhl für Analytische Lebensmittelchemie Previously unknown organomagnesium compounds in astrochemical context Alexander Ruf Vollständiger Abdruck der von der Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Prof. Dr. Erwin Grill Prüfer der Dissertation: 1. apl. Prof. Dr. Philippe Schmitt-Kopplin 2. Prof. Dr. Michael Rychlik 3. Prof. Eric Quirico, PhD (Université Grenoble Alpes) Die Dissertation wurde am 06.12.2017 bei der Technischen Universität München ein- gereicht und durch die Fakultät Wissenschaftszentrum Weihenstephan für Ernährung, Landnutzung und Umwelt am 18.01.2018 angenommen. Do we feel less open-minded, the more open-minded we are? A tribute to sensitivity and resolution... Acknowledgments This work has been prepared at the Helmholtz Zentrum München in the research unit Analytical BioGeoChemistry of apl. Prof. Dr. Philippe Schmitt-Kopplin, in collaboration with the Chair of Analytical Food Chemistry at the Technical Uni- versity of Munich. In the course of these years, I have relied on the courtesy and support of many to which I am grateful. The success of this PhD thesis would not have been possible without help and support of many wonderful people. First of all, I would like to thank the whole research group Analytical BioGeo- Chemistry for a very friendly, informal, and emancipated working atmosphere that formed day-by-day an enjoyable period of residence - it has felt like freedom! Small issues like having stimulating lunch discussions or going out into a bar, friendly peo- ple could be found herein to setting up a balance to scientific work.
    [Show full text]
  • The Next Level in Chemical Space Navigation: Going Far Beyond Enumerable Compound Libraries
    REVIEWS Drug Discovery Today Volume 24, Number 5 May 2019 Reviews The next level in chemical space INFORMATICS navigation: going far beyond enumerable compound libraries 1 2 Torsten Hoffmann and Marcus Gastreich 1 Taros Chemicals GmbH & Co. KG, Emil-Figge-Str. 76a, 44227 Dortmund, Germany 2 BioSolveIT GmbH, An der Ziegelei 79, 53757 Sankt Augustin, Germany Recent innovations have brought pharmacophore-driven methods for navigating virtual chemical spaces, the size of which can reach into the billions of molecules, to the fingertips of every chemist. There has been a paradigm shift in the underlying computational chemistry that drives chemical space search applications, incorporating intelligent reaction knowledge into their core so that they can readily deliver commercially available molecules as nearest neighbor hits from within giant virtual spaces. These vast resources enable medicinal chemists to execute rapid scaffold-hopping experiments, rapid hit expansion, and structure–activity relationship (SAR) exploitation in largely intellectual property (IP)-free territory and at unparalleled low cost. Introduction Lead optimization projects in drug discovery often reach a dead For a long time, computational chemists have attempted to end and result in failure. These projects not only try to exploit propose new ideas for lead molecules by tapping into novel IP hitherto undiscovered modes of action, but may also examine the spaces using techniques such as hit expansion, SAR exploitation, possibility of larger molecules as acting agents, such as peptides, and scaffold hopping, yet the past 25 years have shown us that macrocycles, biologics, and their conjugates. However, many re- current computational algorithms and methods simply might search programs simply suffer from a lack of fast access to novel not be capable of identifying highly innovative nearest neighbor chemical classes that may display highly potent and selective phar- molecules with much success.
    [Show full text]
  • Feature Optimization in High Dimensional Chemical Space: Statistical and Data Mining Solutions Jinuraj K
    K. R. et al. BMC Res Notes (2018) 11:463 https://doi.org/10.1186/s13104-018-3535-y BMC Research Notes RESEARCH NOTE Open Access Feature optimization in high dimensional chemical space: statistical and data mining solutions Jinuraj K. R.1, Rakhila M.1, Dhanalakshmi M.1, Sajeev R.2, Akshata Gad3, Jayan K.2, Muhammed Iqbal P.4, Andrew Titus Manuel3 and Abdul Jaleel U. C.5* Abstract Objectives: The primary goal of this experiment is to prioritize molecular descriptors that control the activity of active molecules that could reduce the dimensionality produced during the virtual screening process. It also aims to: (1) develop a methodology for sampling large datasets and the statistical verifcation of the sampling process, (2) apply screening flter to detect molecules with polypharmacological or promiscuous activity. Results: Sampling from large a dataset and its verifcation were done by applying Z-test. Molecular descriptors were prioritized using principal component analysis (PCA) by eliminating the least infuencing ones. The original dimen- sions were reduced to one-twelfth by the application of PCA. There was a signifcant improvement in statistical parameter values of virtual screening model which in turn resulted in better screening results. Further improvement of screened results was done by applying Eli Lilly MedChem rules flter that removed molecules with polypharma- cological or promiscuous activity. It was also shown that similarities in the activity of compounds were due to the molecular descriptors which were not apparent in prima facie structural studies. Keywords: Virtual screening, Z-test, PubChem bioassay, Molecular descriptors, Principal component analysis, Self- organizing maps, Eli Lilly MedChem rules, Molecular similarity Introduction Molecular descriptors were prioritized by applying Structure and ligand-based virtual screening methods are principal component analysis (PCA) from the available widely employed in drug discovery process [1–4].
    [Show full text]
  • A Close-Up Look at the Chemical Space of Commercially Available Building Blocks for Medicinal Chemistry
    Preprint___________________________ A close-up look at the chemical space of commercially available building blocks for medicinal chemistry Yuliana Zabolotna1, Dmitriy M.Volochnyuk3,6, Sergey V.Ryabukhin4,6, Dragos Horvath1, Kostiantyn Gavrylenko5,6, Gilles Marcou1, Yurii S.Moroz5,7, Oleksandr Oksiuta3,7, Alexandre Varnek2,1* ____________________________________________________________________________________________ Abstract: The ability to efficiently synthesize desired compounds can be a limiting factor for chemical space exploration in drug discovery. This ability is conditioned not only by the existence of well-studied synthetic protocols but also by the availability of corresponding reagents, so-called building blocks (BB). In this work, we present a detailed analysis of the chemical space of 400K purchasable BB. The chemical space was defined by corresponding synthons – fragments contributed to the final molecules upon reaction. They allow an analysis of BB physicochemical properties and diversity, unbiased by the leaving and protective groups in actual reagents. The main classes of BB were analyzed in terms of their availability, rule-of-two-defined quality, and diversity. Available BBs were eventually compared to a reference set of biologically relevant synthons derived from ChEMBL fragmentation, in order to illustrate how well they cover the actual medicinal chemistry needs. This was performed on a newly constructed universal generative topographic map of synthon chemical space, allowing to visualize both libraries and analyze their overlapping and library-specific regions. Keywords: building blocks, synthons, library design, chemical space analysis, GTM ____________________________________________________________________________________________ INTRODUCTION [1] University of Strasbourg, Laboratoire de Chemoinformatique , The success of drug discovery strongly depends on the 4, rue B. Pascal, Strasbourg 67081 (France) *e-mail: [email protected] quality of the screening compounds.
    [Show full text]
  • Quest for the Rings. in Silico Exploration of Ring Universe to Identify Novel Bioactive Heteroaromatic Scaffolds
    4568 J. Med. Chem. 2006, 49, 4568-4573 Quest for the Rings. In Silico Exploration of Ring Universe To Identify Novel Bioactive Heteroaromatic Scaffolds Peter Ertl,* Stephen Jelfs, Jo¨rg Mu¨hlbacher, Ansgar Schuffenhauer, and Paul Selzer NoVartis Institutes for BioMedical Research, CH-4002 Basel, Switzerland ReceiVed February 24, 2006 Bioactive molecules only contain a relatively limited number of unique ring types. To identify those ring properties and structural characteristics that are necessary for biological activity, a large virtual library of nearly 600 000 heteroaromatic scaffolds was created and characterized by calculated properties, including structural features, bioavailability descriptors, and quantum chemical parameters. A self-organizing neural network was used to cluster these scaffolds and to identify properties that best characterize bioactive ring systems. The analysis shows that bioactivity is very sparsely distributed within the scaffold property and structural space, forming only several relatively small, well-defined “bioactivity islands”. Various possible applications of a large database of rings with calculated properties and bioactivity scores in the drug design and discovery process are discussed, including virtual screening, support for the design of combinatorial libraries, bioisosteric design, and scaffold hopping. Introduction just the 32 most frequently occurring scaffolds. Several authors tried to classify rings according to their characteristics. Gibson Ring systems in molecules form a cornerstone of organic et al.5 characterized a set of 100 aromatic rings by calculated chemistry and therefore also medicinal chemistry and the drug properties that were selected because of their potential involve- discovery effort as a whole. Rings give molecules their basic - shape, determine whether molecules are rigid or flexible, and ment in the molecular recognition of drug receptor binding keep substituents in their proper positions.
    [Show full text]
  • Exploring Evolutionary and Chemical Space Using Chemoinformatic Tools and Traditional Methods in Pharmacognosy
    Digital Comprehensive Summaries of Uppsala Dissertations Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 282 from the Faculty of Pharmacy 282 Exploring evolutionary Exploring evolutionary and chemical space using and chemical space using chemoinformatic tools chemoinformatic tools and traditional methods in and traditional methods in pharmacognosy pharmacognosy ASTRID HENZ RYEN ASTRID HENZ RYEN ACTA ACTA UNIVERSITATIS UNIVERSITATIS UPSALIENSIS ISSN 1651-6192 UPSALIENSIS ISSN 1651-6192 UPPSALA ISBN 978-91-513-0843-2 UPPSALA ISBN 978-91-513-0843-2 2020 urn:nbn:se:uu:diva-399068 2020 urn:nbn:se:uu:diva-399068 Dissertation presented at Uppsala University to be publicly examined in C4:305, BMC, Dissertation presented at Uppsala University to be publicly examined in C4:305, BMC, Husargatan 3, Uppsala, Friday, 14 February 2020 at 09:15 for the degree of Doctor of Husargatan 3, Uppsala, Friday, 14 February 2020 at 09:15 for the degree of Doctor of Philosophy (Faculty of Pharmacy). The examination will be conducted in English. Faculty Philosophy (Faculty of Pharmacy). The examination will be conducted in English. Faculty examiner: Associate Professor Fernando B. Da Costa (School of Pharmaceutical Sciences of examiner: Associate Professor Fernando B. Da Costa (School of Pharmaceutical Sciences of Ribeiraõ Preto, University of Saõ Paulo). Ribeiraõ Preto, University of Saõ Paulo). Abstract Abstract Henz Ryen, A. 2020. Exploring evolutionary and chemical space using chemoinformatic Henz Ryen, A. 2020. Exploring evolutionary and chemical space using chemoinformatic tools and traditional methods in pharmacognosy. Digital Comprehensive Summaries of tools and traditional methods in pharmacognosy. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Pharmacy 282.
    [Show full text]