Fuchs Thesis

Total Page:16

File Type:pdf, Size:1020Kb

Fuchs Thesis DISS. ETH NO. 25527 Development and Application of Bespoke Machine Learning Lipophilicity Models for Peptides A thesis submitted to attain the degree of DOCTOR OF SCIENCES of ETH ZURICH (Dr. sc. ETH Zürich) presented by JENS-ALEXANDER FUCHS Pharmacist (State Examination) University of Bonn born on September 12, 1988 Citizen of Germany accepted on the recommendation of Prof. Dr. Gisbert Schneider - examiner Prof. Dr. Stefanie Krämer - co-examiner 2018 ii c 2018 Jens A. Fuchs: Development and Application of Bespoke Machine Learning Lipophilicity Models for Peptides iii This work is dedicated to my wife Lisa, my parents Gabi and Wolfgang, and sister Jasmin. Your sympathy fosters my personality, well-being, and my thoughts about life and science. “Our species needs, and deserves, a citizenry with minds wide awake and a basic understanding of how the world works.” Carl Sagan iv Publications Parts of this thesis were published in: • J. A. Fuchs, F. Grisoni, M. Kossenjans, J. A. Hiss, G. Schneider, "Lipophilicity prediction of peptides and peptide derivatives by consensus machine learning", Medicinal Chemistry Communications 2018, 9, 1538-1546. The discussed concepts of peptide quantification and generative modelling by artificial neural networks are published in: • M. D. Allenspach*, J. A. Fuchs*, N. Doriot, J. A. Hiss, G. Schneider, C. Steuer, "Quantification of hydrolyzed peptides and proteins by amino acid fluorescence" Journal of Peptide Science 2018, e3113. • A. Gupta, A. T. Müller, B. J. H. Huisman, J. A. Fuchs, P. Schneider, G. Schneider, "Generative recurrent networks for de novo drug design", Molecular Informatics 2018, 37, 1-2. v Contents PUBLICATIONS iv LIST OF FIGURES AND TABLES viii LIST OF ABBREVIATIONS xi SUMMARY xiii ZUSAMMENFASSUNG xv 1 INTRODUCTION 1 1.1 Lipophilicity: A Fundamental Concept for Pharmacokinetic and Phar- macodynamic Assessment in Drug Discovery ................ 1 Partition and Distribution Coefficients . 4 Experimental Approaches to Determine Partition and Distribution Coef- ficients . 6 In Silico Calculation of Partition- and Distribution-Coefficients . 9 1.2 Peptides in Drug Discovery ........................... 13 Advantages and Drawbacks of Peptides . 14 Overcoming the Drawbacks of Peptides by Combining Biotechnology and Medicinal Chemistry . 15 Lipophilicity of Peptides and Peptide-Mimetics . 18 1.3 Machine Learning for the Prediction of Pharmaceutically Relevant Prop- erties ......................................... 20 Molecular Representation . 21 Unsupervised Algorithms . 26 Supervised Algorithms . 29 Model Evaluation . 35 Applicability Domain . 37 1.4 Evolutionary Algorithms ............................. 39 1.5 Protein-Protein Interactions in Drug Discovery . 41 The Chemokine System . 42 vi CCR7 and CCL19/CCL21 . 45 2 AIMS OF THIS THESIS 47 3 MATERIALS AND METHODS 49 3.1 Laboratory Methods ............................... 49 Peptide Synthesis . 49 Peptide Analytics and Purification . 51 Shake-Flask Method . 51 Microscale Thermophoresis . 53 3.2 Computational Methods ............................. 54 Software . 54 Molecular Representation and Descriptor Calculation . 55 Machine Learning . 55 Datasets . 56 De Novo Peptide Design . 57 4 RESULTS AND DISCUSSION 59 4.1 Baseline Models .................................. 59 Introduction . 59 Feature Selection and Dimensionality Reduction . 60 Results for Modelling with Lasso Features vs. PCA Scores . 63 Predictions from Baseline Models for Peptides up to a Length of Six AA . 64 Discussion . 66 4.2 Expanded Models ................................. 68 Introduction and Hypothesis . 68 Results for Modelling LIPOPEP vs. AZ . 68 Final Consensus Model based on the Pooled Data . 71 Domain of Applicability . 73 Discussion . 76 4.3 Benchmarking the Final Consensus Model . 80 Introduction and Hypothesis . 80 Methods . 80 Results . 81 Discussion . 85 4.4 Focussed De Novo Generated Peptide Libraries for Studying Chemokine- Receptor / Ligand Interactions .......................... 87 Introduction . 87 Fragmentation of CCR7_C24A . 89 vii De Novo Peptide Generation by Simulated Molecular Evolution and Ranking 90 Binding Affinities of Selected Offsprings . 92 Discussion . 93 5 CONCLUSIONS AND OUTLOOK 97 6 ACKNOWLEDGEMENTS 101 BIBLIOGRAPHY 103 ASUPPLEMENTARY INFORMATION 125 A.1 Supplementary Information to Chapter 1 . 125 A.2 Supplementary Information to Chapter 4.1 . 126 A.3 Supplementary Information to Chapter 4.2 . 149 A.4 Supplementary Information to Chapter 4.3 . 151 A.5 Supplementary Information to Chapter 4.4 . 152 viii List of Figures 1.1 The Drug Discovery Pathway . 3 1.2 LogD vs. pH Profile of Buprenorphine . 5 1.3 Molecular Structures of Cyclosporin A, Desmopressin, Daptomycin and Carbetocin . 17 1.4 Dataset Preparation and Machine Learning Workflow . 21 1.5 1D - 3D Molecular Representations . 23 1.6 Principal Component Analysis . 27 1.7 k-mean Clustering . 29 1.8 Support Vector Machines . 33 1.9 Ensemble Prediction by Cascaded Jury Networks . 35 1.10 Cross-Validation and y-Randomisation . 36 1.11 Covered Chemical Space and Applicability Domain . 38 1.12 NMR-Structures of CCL19 and CCL21 . 42 1.13 Sequence Alignment N-Termini of Homeostatic CC- Chemokine Receptors 43 1.14 Schematic Depiction of the CCR7/CCL19 Site 1 Interaction . 46 4.1 Feature Selection by Lasso. 60 4.2 Loadingplot of the Lasso-selected Features . 61 4.3 PCA Scree Plot. 62 4.4 Heatmaps of SVR hyper-parametrisation . 63 4.5 Baseline Model Predictions for the In-House Peptides . 65 4.6 Y-Randomisation . 67 4.7 Performances LIPOPEP vs. AZ . 69 4.8 Differences between LIPOPEP and AZ. 70 4.9 Consensus Results . 72 4.10 Williams Plots . 75 4.11 Retraining ACDlabs and Chemaxon. Flagging Molecules by ADMET- Predictor. 82 4.12 Benchmarking: Model Performances in Relation to Ionisability, Molecu- lar Weight and Liophilicity. 84 4.13 Overview of the CCR7 Project . 88 4.14 MST-Curves of CCR7_C24A, CCR7_10.1 and CCR7_6.4 . 89 4.15 Fragmentation of CCR7_C24A . 90 ix 4.16 Properties of Virtual Peptide Libraries. 92 List of Tables 1.1 Selected Experimental Methods for Direct and Indirect Lipophilicity De- termination. 8 1.2 Selected "Classic" Methods for logP Prediction. 10 1.3 Selected QSPR Methods for Lipophilicity Prediction. 12 1.4 LogD7.2 of some Peptide Drugs. 18 1.5 Prominent Kernel Functions . 32 3.1 Summary of the Synthesized Peptides. 50 3.2 SFM: Chromatographic Settings for Peptide-quantification. 52 3.3 Summary Datasets. 57 4.1 Performances of Baseline Models on the LIPOPEP Set. 64 4.2 Performances of Extended Models and the Consensus Model on the Pooled Data. 73 4.3 Structures and logD7.4 of the Test Compounds Predicted with an Abso- lute Error > 2 log Units. 77 4.4 Results of the Benchmark Analysis . 82 4.5 Summary of the Synthesised and Tested Offsprings . 94 xi List of Abbreviations 1D one-dimensional 2D two-dimensional 3D three-dimensional AA amino acid AAM arithmetic average model ACN acetonitrile ACP anticancer peptide AD applicability domain AE absolute error AMP antimicrobial peptide ANN(-E) artificial neural network (-ensemble) AZ AstraZeneca CCL19, CCL21 CC-chemokine ligand 19/21 CCR7 chemokine receptor 7 CHI chromatographic hydrophobicity index CPC centrifugal partition chromatography CPP cell penetrating peptide CR chemokine receptor CV cross validation DCM dichloromethane DMF dimethylformamide EA evolutionary algorithms ECFP extended connectivity fingerprints EV external validation FA formic acid FDA Food and Drug Administration Fmoc 9-fluorenylmethoxycarbonyl GAG glycosaminoglycans GP Gaussian process GPCR G-protein-coupled receptor HCTU 2-(6-chloro-1H-benzotriazol-1-yl)-1,1,3,3-tetramethylaminium- hexafluorophosphate HPLC high-performance liquid chromatography HTS high-throughput screening IUPAC International Union of Pure and Applied Chemistry Kd dissociation constant Lasso least absolute shrinkage and selection operator LLE llipophilic ligand efficiency logDpH logarithmic distribution coefficient at specific pH logP logarithmic partition coefficient MD molecular dynamics MHC-1 major histocompatibility complex 1 MOE molecular operating environment ML machine learning xii MLR multivariate linear regression MS mass spectrometry MST microscale thermophoresis MW molecular weight NCE new chemical entity NMM n-methyl-morpholine NMR nuclear magnetic resonance NN nearest neighbour OCHEM online chemical modelling environment PB phosphate buffer PBS phosphate-buffered saline PC principal component PCA principal component analysis PD pharmacodynamics PK pharmacokinetics pKa logarithmic acid dissociation constant PPI protein-protein interaction peptide-protein interaction PSGL-1 p-selectin glycoprotein ligand-1 QSAR quantitative structure-activity relationship QSPR quantitative structure-property relationship RF random forest RMSE root mean squared error Ro5 rule of five SFM shake-flask method SMILES simplified molecular input line entry specification S/N signal-to-noise ratio SPPS solid phase peptide synthesis std standard deviation SVM support vector machine SVR support vector regression TFA trifluoroacetic acid TRH thyrotropin-releasing hormone TIS triisopropylsilane UHPLC ultra-high-performance liquid chromatography USP United States Pharmacopeia vdW van der Waals xiii Summary Lipophilicity displays a key physicochemical property in drug design and discov- ery. In early stage scenarios, lipophilicity is employed to rationalise the selection of molecules from a large pool of compounds directing the development into preferred regions of the chemical space. The direct link between lipophilicity and the
Recommended publications
  • The ELIXIR Core Data Resources: ​Fundamental Infrastructure for The
    Supplementary Data: The ELIXIR Core Data Resources: fundamental infrastructure ​ for the life sciences The “Supporting Material” referred to within this Supplementary Data can be found in the Supporting.Material.CDR.infrastructure file, DOI: 10.5281/zenodo.2625247 (https://zenodo.org/record/2625247). ​ ​ Figure 1. Scale of the Core Data Resources Table S1. Data from which Figure 1 is derived: Year 2013 2014 2015 2016 2017 Data entries 765881651 997794559 1726529931 1853429002 2715599247 Monthly user/IP addresses 1700660 2109586 2413724 2502617 2867265 FTEs 270 292.65 295.65 289.7 311.2 Figure 1 includes data from the following Core Data Resources: ArrayExpress, BRENDA, CATH, ChEBI, ChEMBL, EGA, ENA, Ensembl, Ensembl Genomes, EuropePMC, HPA, IntAct /MINT , InterPro, PDBe, PRIDE, SILVA, STRING, UniProt ● Note that Ensembl’s compute infrastructure physically relocated in 2016, so “Users/IP address” data are not available for that year. In this case, the 2015 numbers were rolled forward to 2016. ● Note that STRING makes only minor releases in 2014 and 2016, in that the interactions are re-computed, but the number of “Data entries” remains unchanged. The major releases that change the number of “Data entries” happened in 2013 and 2015. So, for “Data entries” , the number for 2013 was rolled forward to 2014, and the number for 2015 was rolled forward to 2016. The ELIXIR Core Data Resources: fundamental infrastructure for the life sciences ​ 1 Figure 2: Usage of Core Data Resources in research The following steps were taken: 1. API calls were run on open access full text articles in Europe PMC to identify articles that ​ ​ mention Core Data Resource by name or include specific data record accession numbers.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • Industry Programme EMBL-EBI and Industry
    The European Bioinformatics Institute . Cambridge Industry Programme EMBL-EBI and Industry Our Industry Programme is unique. It is a forum for interaction and knowledge exchange for those working at the forefront of applied bioinformatics, in over 20 major companies with global R&D activities. The programme focuses on precompetitive collaboration, open-source software and informatics standards, which have become essential to improving efficiency and reducing costs for the world’s bioindustries. The European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, annotation, interrogation and dissemination of large datasets of relevance to the bioindustries. We help companies realise the potential of ‘big data’ by combining our unique expertise with their own R&D knowledge, significantly enhancing their ability to exploit high-dimensional data to create value for their business. We see data as a critical tool that can accelerate research and development. Our mission is to provide opportunities for scientists across sectors to make the best possible use of public and proprietary data. This can help companies reduce costs, enhance product selection and validation and streamline their decision-making processes. Companies with large R&D capacity must ensure high data quality and integrate licensed information with both public and proprietary data. At EMBL-EBI, we help companies build publicly available data into their local infrastructure so they can add proprietary and licensed information in a secure way. Going forward, we see our interactions with our industry partners growing stronger, as the quality of data continues to rise. Through our programme and efforts such as the Innovative Medicines Initiative and the Pistoia Alliance, we support pre- competitive research collaborations, promote the uptake and utility of open-source software, and steer the development of data standards.
    [Show full text]
  • EMBL-EBI) Wellcome Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom
    European Bioinformatics Institute (EMBL-EBI) Wellcome Genome Campus Hinxton, Cambridge CB10 1SD United Kingdom y www.ebi.ac.uk/industry EMBL-EBI is part of the European Molecular Biology Laboratory (EMBL) C +44 (0)1223 494 154 P +44 (0)1223 494 468 EMBL member states: Austria, Belgium, Croatia, Czech Republic, Denmark, E [email protected] Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Luxembourg, Malta, Netherlands, Norway, Portugal, Slovakia, Spain, Sweden, T @emblebi Switzerland, United Kingdom. Associate member states: Argentina, Australia F /EMBLEBI Y EMBLmedia You can download this brochure at www.ebi.ac.uk/about/our-impact The European Bioinformatics Institute . Cambridge Industry Programme EMBL-EBI and Industry Our Industry Programme is unique in the world. It is a forum for interaction and knowledge exchange for those working at the forefront of applied bioinformatics, in over 20 major companies with global R&D activities. The programme focuses on precompetitive collaboration, open-source software and informatics standards, which have become essential to improving efficiency and reducing costs for the world’s bioindustries. The European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, annotation, interrogation and dissemination of large datasets of relevance to the bioindustries. We help companies realise the potential of ‘big data’ by combining our unique expertise with their own R&D knowledge, significantly enhancing their ability to exploit high-dimensional data to create value for their business. We see data as a critical tool that can accelerate research and development. Our mission is to provide opportunities for scientists across sectors to make the best possible use of public and proprietary data.
    [Show full text]
  • A Widerange of Available Compounds of Matrix Metalloproteinase Inhibitors
    RESEARCH ARTICLE MMpI: A WideRange of Available Compounds of Matrix Metalloproteinase Inhibitors Charuvaka Muvva1,2, Sanjukta Patra3, Subramanian Venkatesan1* 1 Chemical Laboratory, Council of Scientific and Industrial Research-Central Leather Research Institute, Chennai, India, 2 Academy of Scientific and Innovative Research (AcSIR), New Delhi, India, 3 Department of Biotechnology, Indian Institute of Technology Guwahati, Guwahati, Assam, India * [email protected]; [email protected] a11111 Abstract Matrix metalloproteinases (MMPs) are a family of zinc-dependent proteinases involved in the regulation of the extracellular signaling and structural matrix environment of cells and tissues. MMPs are considered as promising targets for the treatment of many diseases. OPEN ACCESS Therefore, creation of database on the inhibitors of MMP would definitely accelerate the Citation: Muvva C, Patra S, Venkatesan S (2016) research activities in this area due to its implication in above-mentioned diseases and asso- MMpI: A WideRange of Available Compounds of Matrix Metalloproteinase Inhibitors. PLoS ONE 11(8): ciated limitations in the first and second generation inhibitors. In this communication, we e0159321. doi:10.1371/journal.pone.0159321 report the development of a new MMpI database which provides resourceful information for Editor: Qing-Xiang Amy Sang, Florida State all researchers working in this field. It is a web-accessible, unique resource that contains University, UNITED STATES detailed information on the inhibitors of MMP including small molecules, peptides and MMP Received: September 16, 2015 Drug Leads. The database contains entries of ~3000 inhibitors including ~72 MMP Drug Leads and ~73 peptide based inhibitors. This database provides the detailed molecular and Accepted: June 30, 2016 structural details which are necessary for the drug discovery and development.
    [Show full text]
  • The Role of Uniprot's Protein Sequence Databases in Biomedical Research
    Andrew Nightingale1, Tunca Dogan1, Diego Poggioli1, Maria Martin1 and the UniProt Consortium1,2,3 1 EMBL-European Bioinformatics Institute, Cambridge, UK 2 SIB Swiss Institute of Bioinformatics, Geneva, Switzerland 3 Protein Information Resource, Georgetown University, Washington DC & University od Delaware, USA The Role of UniProt's Protein Sequence Databases in Biomedical Research Introduction Mapping diseases to InterPro UniProt provides human disease information with extensive Domains, Variants and ChEMBL cross-references to disease relevant databases such as: Medical Subject Headings (MeSH)1 and Online Mendelian Inheritance in Man Compounds (OMIM)2, Figure 1. In order to further enhance the functional annotations relevance to biomedical research; UniProt has recently developed a pipeline for importing protein altering variants from globally ● 4,246 diseases mapped to 2,337 InterPro domains. recognised genetic variant repositories with the aim to extend the ● 316 InterPro domains from 510 protein entries matched to 3,601 ChEMBL manually curated set of natural protein altering variants provided by ligands. UniProt. By combining these resources UniProt has become a more relevant resource for biomedical research and drug target identification. ● Somatic variants have been found within the binding pockets of proteins Here we describe how users of UniProt can develop methodologies to associated to specific cancer types. utilise the described cross-references, protein structure and functional annotations to explore how structural, functional and chemical ligand annotations can be utilised to identify relationships between a protein and disease causing variants. Mitogen-activated Protein Kinase 4 Example Figure 1: Disease and natural variant annotation for UniProtKB/SwissProt entry for Human BRCA1 Figure 3: MAPK4 binding pocket with analogue inhibitor bound and p.Ser233Ala variant.
    [Show full text]
  • The Chembl Bioactivity Database: an Update A
    Published online 7 November 2013 Nucleic Acids Research, 2014, Vol. 42, Database issue D1083–D1090 doi:10.1093/nar/gkt1031 The ChEMBL bioactivity database: an update A. Patrı´cia Bento, Anna Gaulton, Anne Hersey, Louisa J. Bellis, Jon Chambers, Mark Davies, Felix A. Kru¨ ger, Yvonne Light, Lora Mak, Shaun McGlinchey, Michal Nowotka, George Papadatos, Rita Santos and John P. Overington* European Molecular Biology Laboratory European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 1SD, UK Received September 30, 2013; Accepted October 7, 2013 ABSTRACT from resources such as the U.S. Food and Drug Administration (FDA) Orange Book (9) and DailyMed ChEMBL is an open large-scale bioactivity database (http://dailymed.nlm.nih.gov/dailymed). Details of the (https://www.ebi.ac.uk/chembl), previously described data extraction process, curation and data model have in the 2012 Nucleic Acids Research Database Issue. been published previously (10); therefore, the current Since then, a variety of new data sources and im- article focuses on recent enhancements to ChEMBL. provements in functionality have contributed to the growth and utility of the resource. In particular, DATA CONTENT more comprehensive tracking of compounds from research stages through clinical development to Release 17 of the ChEMBL database contains informa- market is provided through the inclusion of data tion extracted from >51 000 publications, together with from United States Adopted Name applications; a bioactivity data sets from 18 other sources (depositors new richer data model for representing drug targets and databases). In total, there are now >1.3 million distinct compound structures and 12 million bioactivity has been developed; and a number of methods have data points.
    [Show full text]
  • Industry Programme
    The European Bioinformatics Institute Industry Programme European Bioinformatics Institute (EMBL-EBI) Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD United Kingdom y www.ebi.ac.uk/industry C +44 (0)1223 494 154 P +44 (0)1223 494 468 E [email protected] T @emblebi F /EMBLEBI Y /EMBLMedia © 2014 European Molecular Biology Laboratory For more information about EMBL-EBI please contact: [email protected] EMBL-EBI Industry Programme EMBL-EBI Industry Programme EMBL-EBI and industry As Europe’s premier bioinformatics hub, EMBL-EBI is a global leader in the storage, annotation, interrogation and dissemination of large datasets of relevance to the bioindustries. We help companies realise the potential of ‘big data’ by combining our unique expertise with their own R&D knowledge, significantly enhancing their ability to exploit high-dimensional data to create value for their business. We see data as a critical tool that can accelerate research and development. We are constantly working to provide opportunities for scientists to make the best possible use of public and proprietary data. In this way, companies can reduce costs, enhance product selection and validation while streamlining their decision-making processes. Companies with large R&D capacity must ensure high data quality and integrate licensed information with both public and proprietary data. At EMBL-EBI, we help companies build all publicly available data into their local infrastructure so they can add proprietary and licensed information in a secure way. Going forward, we see our interactions with our industry partners growing stronger, as the flood of data continues to rise. Through efforts such as the Innovative Medicines Initiative and the Pistoia Alliance, we are keen to support the transition to pre- Dr Dominic Clark competitive research collaborations, increased use of open-source software and standards development.
    [Show full text]
  • Reverse Translation of Adverse Event Reports Paves the Way for De-Risking
    RESEARCH ARTICLE Reverse translation of adverse event reports paves the way for de-risking preclinical off-targets Mateusz Maciejewski1*†, Eugen Lounkine1, Steven Whitebread1, Pierre Farmer2, William DuMouchel3, Brian K Shoichet4*, Laszlo Urban1* 1Novartis Institutes for Biomedical Research, Cambridge, United States; 2Novartis Institutes for Biomedical Research, Basel, Switzerland; 3Oracle Health Sciences, Oracle Health Sciences, Burlington, United States; 4University of California, San Francisco, United States Abstract The Food and Drug Administration Adverse Event Reporting System (FAERS) remains the primary source for post-marketing pharmacovigilance. The system is largely un-curated, unstandardized, and lacks a method for linking drugs to the chemical structures of their active ingredients, increasing noise and artefactual trends. To address these problems, we mapped drugs to their ingredients and used natural language processing to classify and correlate drug events. Our analysis exposed key idiosyncrasies in FAERS, for example reports of thalidomide causing a deadly ADR when used against myeloma, a likely result of the disease itself; multiplications of the same report, unjustifiably increasing its importance; correlation of reported ADRs with public events, regulatory announcements, and with publications. Comparing the pharmacological, pharmacokinetic, and clinical ADR profiles of methylphenidate, aripiprazole, and risperidone, and of kinase drugs targeting the VEGF receptor, demonstrates how underlying molecular mechanisms *For
    [Show full text]
  • Actionable Druggable Genome-Wide Mendelian Randomization Identifies Repurposing Opportunities for COVID-19
    ARTICLES https://doi.org/10.1038/s41591-021-01310-z Actionable druggable genome-wide Mendelian randomization identifies repurposing opportunities for COVID-19 Liam Gaziano1,2, Claudia Giambartolomei 3,4, Alexandre C. Pereira5,6, Anna Gaulton 7, Daniel C. Posner 1, Sonja A. Swanson8, Yuk-Lam Ho1, Sudha K. Iyengar9,10, Nicole M. Kosik 1, Marijana Vujkovic 11,12, David R. Gagnon 1,13, A. Patrícia Bento 7, Inigo Barrio-Hernandez14, Lars Rönnblom 15, Niklas Hagberg 15, Christian Lundtoft 15, Claudia Langenberg 16,17, Maik Pietzner 17, Dennis Valentine18,19, Stefano Gustincich 3, Gian Gaetano Tartaglia 3, Elias Allara 2, Praveen Surendran2,20,21,22, Stephen Burgess 2,23, Jing Hua Zhao2, James E. Peters 21,24, Bram P. Prins 2,21, Emanuele Di Angelantonio2,20,21,25,26, Poornima Devineni1, Yunling Shi1, Kristine E. Lynch27,28, Scott L. DuVall27,28, Helene Garcon1, Lauren O. Thomann1, Jin J. Zhou29,30, Bryan R. Gorman1, Jennifer E. Huffman 31, Christopher J. O’Donnell 32,33, Philip S. Tsao34,35, Jean C. Beckham36,37, Saiju Pyarajan1, Sumitra Muralidhar38, Grant D. Huang38, Rachel Ramoni38, Pedro Beltrao 14, John Danesh2,20,21,25,26, Adriana M. Hung39,40, Kyong-Mi Chang 12,41, Yan V. Sun 42,43, Jacob Joseph1,44, Andrew R. Leach7, Todd L. Edwards45,46, Kelly Cho1,47, J. Michael Gaziano1,47, Adam S. Butterworth 2,20,21,25,26 ✉ , Juan P. Casas1,47 ✉ and VA Million Veteran Program COVID-19 Science Initiative* Drug repurposing provides a rapid approach to meet the urgent need for therapeutics to address COVID-19. To identify thera- peutic targets relevant to COVID-19, we conducted Mendelian randomization analyses, deriving genetic instruments based on transcriptomic and proteomic data for 1,263 actionable proteins that are targeted by approved drugs or in clinical phase of drug development.
    [Show full text]
  • Size Uniformity of Animal Cells Is Actively Maintained by a P38 MAPK
    RESEARCH ARTICLE Size uniformity of animal cells is actively maintained by a p38 MAPK-dependent regulation of G1-length Shixuan Liu1,2†, Miriam Bracha Ginzberg1†, Nish Patel1, Marc Hild3, Bosco Leung1, Zhengda Li4, Yen-Chi Chen5, Nancy Chang1, Yuan Wang3, Ceryl Tan1,2, Shulamit Diena1,2, William Trimble1, Larry Wasserman6, Jeremy L Jenkins3, Marc W Kirschner7*, Ran Kafri1,2* 1Cell Biology Program, The Hospital for Sick Children, Toronto, Canada; 2Department of Molecular Genetics, University of Toronto, Toronto, Canada; 3Novartis Institutes for BioMedical Research, Cambridge, United States; 4Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, United States; 5Department of Statistics, University of Washington, Seattle, United States; 6Department of Statistics, Carnegie Mellon University, Pittsburgh, United States; 7Department of Systems Biology, Harvard Medical School, Boston, United States Abstract Animal cells within a tissue typically display a striking regularity in their size. To date, the molecular mechanisms that control this uniformity are still unknown. We have previously shown that size uniformity in animal cells is promoted, in part, by size-dependent regulation of G1 length. To identify the molecular mechanisms underlying this process, we performed a large-scale small molecule screen and found that the p38 MAPK pathway is involved in coordinating cell size and cell cycle progression. Small cells display higher p38 activity and spend more time in G1 than larger *For correspondence: cells. Inhibition of p38 MAPK leads to loss of the compensatory G1 length extension in small cells, [email protected] (MWK); resulting in faster proliferation, smaller cell size and increased size heterogeneity. We propose a [email protected] (RK) model wherein the p38 pathway responds to changes in cell size and regulates G1 exit accordingly, to increase cell size uniformity.
    [Show full text]
  • EMBL-EBI Powerpoint Presentation
    Chemical Resources – ChEMBL Anne Hersey – ChEMBL Group EBI is an Outstation of the European Molecular Biology Laboratory. Overview of Talk • ChEMBL Content • ChEMBL Use Cases • UniChem Opportunity for hands-on use of ChEMBL tomorrow afternoon 2 What is ChEMBL • Open access database for drug discovery • Freely available (searchable and downloadable) • Content: • Bioactivity data manually extracted from the primary medicinal chemistry literature from journals such as J. Med. Chem. • Deposited data e.g. neglected disease screening, GSK kinase set • Subset of data from PubChem • Bioactivity data is associated with a biological target and a chemical structure • Compounds are stored in a structure searchable format • Protein targets are linked to protein sequences in UniProt • Updated regularly with new data • Secure searching (https://www.ebi.ac.uk/chembldb ) 3 Accessing ChEMBL Data Pipeline Pilot and knime protocols that use these webservices are available on forum pages 4 ChEMBL Database Content ChEMBL_14 Compounds: 1,213,239 Activities: 10,129,256 Publications: 46,133 Targets: 9,003 Targets 5918 proteins Compounds* organisms 517261 to 1213239 1475 1433 cell lines •Increase of >200,000 compds from literature since ChEMBL01 •~1% overlap between ChEMBL literature and PubChem compds 5 * Includes PubChem Compounds Organisation of ChEMBL Data Activities Compounds • Type (e.g IC50) • Values Targets • Units • Type • Sequence • Organism • Names Assays • Properties • Experimental detail 6 ChEMBL Compounds • Chemical structures in journal articles
    [Show full text]