Cranfield University Glaxosmithkline JESSICA BROTHWOOD

Cranfield University GlaxoSmithKline JESSICA BROTHWOOD DRUGGABLE AND BIOPHARMABLE GENOME ANNOTATION PIPELINE DEVELOPMENT Cranfield Health Applied Bioinformatics MSc Thesis Academic year: 2011-12 Supervisors: Dr Michael Cauchi (Cranfield) Dr Hannah Tipney (GSK) September 2012 Cranfield University GlaxoSmithKline Research & Development Ltd Cranfield Health Applied Bioinformatics MSc Thesis Academic Year 2011-12 Jessica Brothwood Druggable and biopharmable genome annotation pipeline development Supervisors: Dr Michael Cauchi (Cranfield University) Dr Hannah Tipney (GlaxoSmithKline) September 2012 This thesis is submitted in partial fulfilment of the requirements for the degree of Master of Science. © Cranfield University 2012. All rights reserved. No part of this publication may be reproduced without the written permission of the copyright holder. Abstract The identification of proteins which could be potential targets for new pharmaceutical products is invaluable for the continued improvement people’s quality of life and expansion of available treatment options. In order to aid the discovery of new drug targets, predictions of every human gene likely to be exploitable by compounds and biotechnology were generated using open source tools and publicly available data. An automated pipeline was produced in order to minimise the effort required to reproduce, update and expand this work. In total, using various different prediction techniques, over 15,000 genes were predicted to code potential targets. An optimistic estimate of the druggable genome at 5,097 genes was produced. These genes contain one or more of the same Pfam protein domains as a drug target (a protein displaying significant activity with a phase four drug from ChEMBL database). The preliminary techniques explored here estimate the biopharmable genome to encompass between 3,169 and 8,117 genes. However, as they failed to identify many of the known approved targets, it is likely this is not an accurate representation. An easy to run, updated and expandable prediction pipeline, which annotates genes with a predicted druggable target class as well as a ChEMBL target class, if available, has been produced in Perl and implemented within GlaxoSmithKline. i Acknowledgements This dissertation would not have been possible without advice, guidance and support of Dr Hannah Tipney, to whom I am extremely grateful. I would also like to thank Peter Woollard for his recommendations and help with Perl and Dr Michael Cauchi for his advice and support and for his assistance in producing the ROC curves. I am obliged to Kieran Todd for his invaluable work on the biopharmable genome. I would also like to thank Dr David Michalovich, for information on drug target classes, and Stefan Senger, for his explanations of compound activities, as well as everyone else in the Computational Biology department at GlaxoSmithKline for their warm welcome and help. The research leading to these results has received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement n° 115191, resources of which are composed of financial contribution from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution (www.imi.europa.eu). ii Table of Contents Abstract .............................................................................................................................. i Acknowledgements .......................................................................................................... ii List of Figures .................................................................................................................. vi List of Tables ................................................................................................................... ix Abbreviations ................................................................................................................... x 1 Introduction ............................................................................................................... 1 1.1 Drug Discovery and Development .................................................................... 1 1.1.1 Target Identification and Validation........................................................... 1 1.1.2 Lead Compound Selection and Optimisation ............................................. 3 1.1.3 Preclinical Testing ...................................................................................... 3 1.1.4 Clinical Trials ............................................................................................. 3 1.2 Challenges Facing the Pharmaceutical Industry ................................................ 5 1.3 Drug Types ........................................................................................................ 6 1.3.1 Estimates of Drug-likeness ......................................................................... 7 1.3.2 Small Molecular Drug Targets ................................................................... 9 1.3.3 Biological Therapeutics ............................................................................ 10 1.4 The Druggable and Biopharmable Genome .................................................... 11 1.4.1 Genomic Sequencing ................................................................................ 11 1.4.2 Determining Target Druggability ............................................................. 13 1.4.3 Known Drug Targets ................................................................................ 15 1.4.4 The Druggable Genome............................................................................ 16 1.4.5 The Secretome and Biopharmable Genome ............................................. 18 1.5 Strategy to New Drug Discovery ..................................................................... 20 2 Aims and Objectives ............................................................................................... 22 3 Materials and Methods ............................................................................................ 23 3.1 Programming Languages ................................................................................. 23 3.2 Mutual Resources ............................................................................................ 23 3.3 Chemically Tractable Resources and Filters ................................................... 23 3.4 Chemically Tractable Genome Outputs ........................................................... 26 3.5 Biopharmable Resources and Filters ............................................................... 28 3.6 Biopharmable Genome Outputs ....................................................................... 29 iii 3.7 Prediction Method Evaluation ......................................................................... 33 4 Results ..................................................................................................................... 34 4.1 Druggable Genome .......................................................................................... 35 4.1.1 Prediction of ChEMBL database small molecule targets ......................... 36 4.1.2 Prediction of DrugBank database small molecule targets ........................ 38 4.1.3 Druggable Predictions .............................................................................. 39 4.1.4 Sensitivity and Specificity of methods predicting ChEMBL small molecule targets....................................................................................................... 42 4.1.5 Sensitivity and Specificity of methods predicting DrugBank small molecule targets....................................................................................................... 43 4.1.6 Receiver operating characteristic curves for methods predicting small molecule targets....................................................................................................... 44 4.1.7 Druggable Predictive Power ..................................................................... 48 4.1.8 Hopkins and Groom comparison .............................................................. 48 4.1.8 Predicted Druggable Target Classes ......................................................... 50 4.2 Biopharmable Genome .................................................................................... 52 4.2.1 Prediction of ChEMBL database biotechnology targets .......................... 53 4.2.2 Prediction of DrugBank database biotechnology targets ......................... 54 4.2.3 Biopharmable Predictions ........................................................................ 55 4.2.4 Sensitivity and Specificity of methods predicting ChEMBL biotechnology targets .................................................................................................................. 56 4.2.5 Sensitivity and Specificity of methods predicting DrugBank biotechnology targets .............................................................................................. 57 4.2.6 Receiver operating characteristic curves for predicting targets of biotechnology .......................................................................................................... 58 4.2.7 Biopharmable Predictive Power ............................................................... 60 4.2.8 Predicted Biopharmable Target Classes ................................................... 61 4.3 Results Summary ............................................................................................

Load more