Computational Analyses of Small Molecules Activity from Phenotypic Screens

Computational analyses of small molecules activity from phenotypic screens Azedine Zoufir Hughes Hall This dissertation is submitted for the degree of Doctor of Philosophy July 2018 Declaration This thesis is submitted as the result of my own work and includes nothing which is the outcome of work done in collaboration except where specifically indicated in the text. It is not substantially the same as any that I have submitted, or, is being concurrently submitted for a degree or diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the preface and specified in the text. I further state that no substantial part of my dissertation has already been submitted, or, is being concurrently submitted for any such degree, diploma or other qualification at the University of Cambridge or any other University or similar institution except as declared in the Preface and specified in the text. This dissertation does not exceed the word limit of 60,000 words. Azedine Zoufir July 2018 Summary Title: Computational analyses of small molecules activity from phenotypic screens Author: Azedine Zoufir Drug discovery is no longer relying on the one gene-one disease paradigm nor on target-based screening alone to discover new drugs. Phenotypic-based screening is regaining momentum to discover new compounds since those assays provide an environment closer to the physiological state of the disease and allow to better anticipate off-target effects and other factors that can limit the efficacy of the drugs. However, uncovering the mechanism of action of the compounds active in those assays relies on in vitro techniques that are expensive and time- consuming. In silico approaches are therefore beneficial to prioritise mechanism of action hypotheses to be tested in such systems. In this thesis, the use of machine learning algorithms for in silico ligand-target prediction for target deconvolution in phenotypic screening datasets was investigated. A computational workflow is presented in Chapter 2, that allows to improve the coverage of mechanism of action hypotheses obtained by combining two conceptually different target prediction algorithms. These models rely on the principle that two structurally similar compounds are likely to have the same target. In Chapter 3 of this thesis, it was shown that structural similarity and the similarity in phenotypic activity are correlated, and the fraction of phenotypically similar compounds that can be expected for an increase in structural similarity was subsequently quantified. Morgan fingerprints were also found to be less sensitive to the dataset employed in these analyses than two other commonly used molecular descriptors. In Chapter 4, the mechanism of action hypotheses obtained through target prediction was compared to those obtained by extracting experimental bioactivity data of compounds active in phenotypic assays. It was then showed that the mechanism of action hypotheses generated from these two types of approach agreed where a large number of compounds were active in the phenotypic assay. When there were fewer compounds active in the phenotypic assay, target prediction complemented the use of experimental bioactivity data and allowed to uncover alternative mechanisms of action for compounds active in these assays. Finally, the in silico target prediction workflow described in Chapter 2 was applied in Chapter 5 to deconvolute the activity of compounds in a kidney cyst growth reduction assay, aimed at discovering novel therapeutic opportunities for polycystic kidney disease. A metric was developed to rank predicted targets according to the activity of the compounds driving their prediction. Gene expression data and occurrences in the literature were combined with the target predictions to further narrow down the most probable mechanisms of action of cyst growth reducing compounds in the screen. Two target predictions were proposed as a potential mechanism for the reduction of kidney cyst growth, one of which agreed with docking studies. Acknowledgements I would like to thank my supervisor Dr Andreas Bender for allowing me to be a part of his lab. I thank him for his continuous guidance and patience through all the revisions of my work. I also thank my collaborators from the University of Leiden, Dr Tijmen Booij and Dr Leo Price for providing the kidney cyst screening data, and Dr Dorien Peters and Tareq Malas for providing their gene expression dataset. I am grateful to Dr Xitong Li and Dr Ellen Berg for providing the BioMAP dataset. I thank the European Research Council for funding my research. Next, I thank Dr Fredrik Svensson, Dr Krishna Bulusu, Dr Avid Afzal and Dr Deszo Modos, for providing excellent scientific advice and very constructive feedback on my work. I am grateful to the whole of the Bender group for being supportive colleagues and always friendly. Rich and Lewis are thanked for their technical help in using our computing server. Also, my time in this group would not have been the same without Sefer, Siti, Fatima, Nitin, Avid, Leen, Nadia, Fredrik, Deszo and Krishna, who all have been very supportive and helpful with me, and I am very grateful to all of them. I really enjoyed working among such a diverse and talented group of people. I also thank Susan Begg without whom the lab would not be running so smoothly. Last but not least, I thank my family and particularly my parents for their encouragements throughout my studies. My deepest gratitude goes to my friends Ain, Charles, Ben and Cristian for being there in those times when friends are needed, and for keeping me away of my thesis when I would become too preoccupied about it. Table of Contents TABLE OF CONTENTS ........................................................................................................ I LIST OF FIGURES .............................................................................................................. VI LIST OF TABLES ............................................................................................................. VIII ABBREVIATIONS ................................................................................................................. X CHAPTER 1 INTRODUCTION ........................................................................................... 1 1.1 FROM TARGET-BASED TO PHENOTYPIC-BASED DRUG DISCOVERY ........ 2 1.1.1 TARGET-BASED SCREENING AND LIMITATIONS ...................................................................... 2 1.1.2 PHENOTYPIC-BASED SCREENING COMPENSATES FOR THE LIMITATIONS OF TARGET- BASED SCREENING ........................................................................................................................................ 3 1.1.3 ASSAYS USED IN PHENOTYPIC-BASED SCREENING ................................................................. 4 1.1.4 IN VITRO DECONVOLUTION IN PHENOTYPIC SCREENS AND LIMITATIONS .......................... 5 1.2 MOLECULAR AND BIOLOGICAL SIMILARITY .................................................. 7 1.2.1 REPRESENTATION OF CHEMICALS .............................................................................................. 7 1.2.2 MOLECULAR SIMILARITY PRINCIPLE IN VIRTUAL SCREENING AND NEIGHBOURHOOD PROPERTY ..................................................................................................................................................... 10 1.3 IN SILICO DECONVOLUTION METHODS OF COMPOUND ACTIVITY IN PHENOTYPIC SCREENS ................................................................................................... 14 1.3.1 DATA-DRIVEN DECONVOLUTION .............................................................................................. 14 1.3.2 DECONVOLUTION METHODS BASED ON IN SILICO LIGAND-TARGET PREDICTIONS ........ 16 1.3.2.1 Bioactivity datasets and limitations relevant to target prediction ................... 16 i 1.3.2.2 Current target prediction methods ................................................................... 19 1.3.2.3 Applications to deconvolution of compounds active in phenotypic screens .... 26 1.4 CONCLUSIONS AND AIMS OF THE THESIS ........................................................ 28 CHAPTER 2 COMPUTATIONAL METHODS ................................................................ 30 2.1 WORKFLOW OVERVIEW ........................................................................................ 30 2.2 MOLECULAR FINGERPRINTS ................................................................................ 32 2.2.1 ECFP4 FINGERPRINTS ................................................................................................................. 32 2.2.2 MACCS KEYS AND PUBCHEM FINGERPRINTS ...................................................................... 35 2.3 SIMILARITY SCORING ............................................................................................. 35 2.3.1 STRUCTURAL SIMILARITY SCORING ......................................................................................... 35 2.3.2 BIOLOGICAL SIMILARITY SCORING .......................................................................................... 36 2.4 LIGAND-TARGET PREDICTION MODELS .......................................................... 37 2.4.1 CHEMBL TARGET PREDICTION MODEL .................................................................................. 38 2.4.1.1 Laplacian-corrected multinomial Naïve Bayes machine learning model ........ 38 2.4.1.2 Multinomial Naïve Bayes target prediction

Computational Analyses of Small Molecules Activity from Phenotypic Screens

Large-Scale Analysis of Genome and Transcriptome Alterations in Multiple Tumors Unveils Novel Cancer-Relevant Splicing Networks

PARSANA-DISSERTATION-2020.Pdf

Remodeling Adipose Tissue Through in Silico Modulation of Fat Storage For

RBM11 Sirna (H): Sc-91387

Recombinant Rabbit Anti-Human RAB8A Monoclonal Antibody, Clone NKG-S33-80-4 (CABT- Z616R) This Product Is for Research Use Only and Is Not Intended for Diagnostic Use

Mutation-Specific and Common Phosphotyrosine Signatures of KRAS G12D and G13D Alleles Anticipated Graduation August 1St, 2018

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Coordinate Regulation of Long Non-Coding Rnas and Protein-Coding Genes in Germ- Free Mice Joseph Dempsey, Angela Zhang and Julia Yue Cui*

1 Supporting Information for a Microrna Network Regulates

Anti-Inflammatory Role of Curcumin in LPS Treated A549 Cells at Global Proteome Level and on Mycobacterial Infection

Gene Symbol Category ACAN ECM ADAM10 ECM Remodeling-Related ADAM11 ECM Remodeling-Related ADAM12 ECM Remodeling-Related ADAM15 E

Noelia Díaz Blanco