Computational Approaches for Predicting Drug Targets

Computational Approaches for Predicting Drug Targets Tolulope Tosin Adeyelu A thesis submitted for the degree of Doctor of Philosophy January 2020 Institute of Structural and Molecular Biology University College London 1 I, Tolulope Tosin Adeyelu, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. −−−−−−−−−−−−−−−−−−−−−−−− Tolulope Tosin Adeyelu January, 2020 Abstract This thesis reports the development of several computational approaches to predict human disease proteins and to assess their value as drug targets, using in-house domain functional families (CATH FunFams). CATH-FunFams comprise evolution- ary related protein domains with high structural and functional similarity. External resources were used to identify proteins associated with disease and their genetic variations. These were then mapped to the CATH-FunFams together with information on drugs bound to any relatives within the FunFam. A number of novel approaches were then used to predict the proteins likely to be driving disease and to assess whether drugs could be repurposed within the FunFams for targeting these putative driver proteins. The first work chapter of this thesis reports the mapping of drugs to CATH- FunFams to identify druggable FunFams based on statistical overrepresentation of drug targets within the FunFam. 81 druggable CATH-FunFams were identified and the dispersion of their relatives on a human protein interaction network was analysed to assess their propensity to be associated with side effects. In the second work chapter, putative drug targets for bladder cancer were identified using a novel computational protocol that expands a set of known bladder cancer genes with genes highly expressed in bladder cancer and highly associated with known bladder cancer genes in a human protein interaction network. 35 new bladder cancer targets were identified in druggable FunFams, for some of which FDA approved drugs could be repurposed from other protein domains in the FunFam. In the final work chapter, protein kinases and kinase inhibitors were analysed. These are an important class of human drug targets. A novel classification proto- Abstract 3 col was applied to give a comprehensive classification of the kinases which was benchmarked and compared with other widely used kinase classifications. Drug information from ChEMBL was mapped to the Kinase-FunFams and analyses of protein network characteristics of the kinase relatives in each FunFam used to identify those families likely to be associated with side effects. Acknowledgements So it’s finally here! The journey of the past 4 years has several intriguing stories. I would love to give thanks to God Almighty for the grace to complete this great feat. I can’t but thank my supervisor whose role on the achievement of my Ph.D. cannot be overemphasised. She is both a mentor and a motivator. Thank you so much Prof. Christine Orengo for taking out time to accept, correct and mentor me all the way. You are such a rare gem. Thank you for being so accommodating, I couldn’t have asked for another. A big thank you to my thesis committee members (Prof. Snezana Djordje- vic and the thesis Chair, Prof. Andrew Martins) for their commitment towards the progress on my Ph.D. journey. As a saying goes, ”if I have come this far, it is because I stood on the shoulders of those who have go ahead of me”. This therefore makes me to acknowledge wonderful scientist who, out of their very busy sched- ule have found time to share knowledge and helped on this journey. A special thanks to Dr. Moya-Garcia Aurelio, who coached and collaborated with me on drug polypharmacology research and network biology. A sincere appreciation goes to all Post-docs in Orengo group for driving my passion in the area of protein domains classifications and function, and responding always to my questions everytime. A big thank you to Ian Silitoe, Jon Lees, Paul Ashford, Nathalie, Sayoni Das, Nicola Bordin. I will also like to appreciate everyone in the Orengo group for great mo- ments shared together in the group. Thank you Harry Scholes, Millie Pang, Su datt Lam, Vaishali and Joseph Bonello. I will also like to appreciate the members of the RCCG Faith Chapel who have been an umbrella and a family since I arrived the City of London in 2014. Your Acknowledgements 5 prayers and encouragement have kept me going. A sincere appreciation to Daddy and Mummy Feyibunmi. To my London family; the Ilegbusi, you guys know you rock my world and you are one of the reason for the achievement of this great feat. Thank you Mummy and Daddy Ilegbusi, John and Ike. To all my lovely friend turned family, I can’t but mention Folarin, Olawole, Tolu. Thank you so much guys for every moment shared together. I want to also appreciate the Adekunle Ajasin University Management team and the department of Biochemistry for being part of my success story. A heartfelt gratitude to my parent, Mr and Mrs Samuel Adeyelu, for their en- couraging words and prayers throughout the journey. To my siblings Adedamola and Olaide, thank you guys for being the best. To my adorable gift, my wife of inestimable value, Obiageli Jane Adeyelu, you have being a great strength to me on this journey and I can’t but thank you for those times I left you alone just to complete this, it is now an added feather and always remember, we did this together. Thank you for believing in me. Lastly, I can’t but appreciate the Federal Government of Nigeria who provided funding for my Ph.D. program through the Presidential Scholarship for Innovation and Development (PRESSID) program. To the amazing world of science in general, discovery and innovations are the heart of creativity. In that wise, there is more to discover and we keep pressing forward. Impact Statement The thesis describes the development and application of computational protocols for predicting drug targets and identifying druggable domain families. Proteins are one of the most targeted molecules, and because they mostly function by interaction with other proteins, a comprehensive network of protein interactions was analysed to reveal network properties that could be used to identify drug targets and to char- acterise the side effects associated with drug targets. In the first work chapter of this thesis, druggable domain families were identified based on overrepresentation of drug targets. This revealed a subset of domain families whose relatives can be targeted by the pharmaceutical industry. This work was published in Scientific Reports. One major impact of this study is reporting how drugs currently approved and marketed for targeting a particular domain can be repurposed to other relatives within the same druggable domain family. Drug repurposing is of considerable interest to some pharmaceutical companies to re- channel approved drugs to other orphan targets. Another key area with likely impact that has been addressed in this thesis is the issue of side effects associated with drugs. Side effects arising from drug usage is one of the major causes of death and management are quite costly. To show the application of this study to diseases lacking drugs, the second work chapter of this thesis reports the repurposing of FDA approved drugs in bladder cancer. This study therefore provides predicted targets that can be validated experimentally. Contents 1 Introduction To Thesis 18 1.1 Introduction . 18 1.2 Overview of protein interaction networks . 19 1.2.1 Experimental and computational approaches to construct- ing protein interaction networks . 20 1.2.2 Network representation of protein interactions . 27 1.2.3 Graph theory and general characteristics of networks . 28 1.3 Identification of Network Modules . 33 1.3.1 Local neighbourhood density search . 33 1.3.2 Cost-based local search . 34 1.3.3 Flow simulation . 35 1.3.4 Link Clustering . 35 1.4 Protein Networks application to Human Diseases . 35 1.4.1 Tissue specificity of Diseases . 37 1.4.2 Analysis of Disease Modules . 39 1.5 Resources used in this thesis for protein and network annotation . 41 1.5.1 Resources with information on protein interaction networks 41 1.5.2 Resources with information on drug target identification . 42 1.5.3 Resources with information on protein domain classifications 43 1.5.4 Sequence profiling tools . 46 1.5.5 Structure comparison approaches . 47 Contents 8 1.5.6 Resources providing functional annotation and pathway information . 49 1.6 Overview of Thesis . 50 2 Domain based approaches to drug polypharmacology 52 2.1 Introduction . 52 2.1.1 The druggable Genome . 54 2.1.2 Assessing druggability . 56 2.1.3 Drug side effects . 56 2.1.4 Systems polypharmacology . 57 2.1.5 Objectives of chapter . 58 2.2 Materials and Methods . 59 2.2.1 Drug-proteins dataset . 59 2.2.2 Identifying CATH-FunFams with overrepresentation of drug targets (Druggable CATH FunFams) . 60 2.2.3 CD-Hit and SSAP . 62 2.2.4 Ligand binding site conservation in the druggable CATH- FunFam . 63 2.2.5 Protein interaction data . 63 2.2.6 Transforming the protein network . 64 2.2.7 Network centrality measures . 64 2.3 Results and discussion . 66 2.3.1 Drug-Enrichment Analysis . 66 2.3.2 Proportion of known druggable classes in the druggable CATH-FunFams . 66 2.3.3 Structural similarities of the relatives in the druggable CATH-FunFams (CD-HIT and SSAP) . 68 2.3.4 Structural superposition and conservation of drug binding sites in CATH-FunFams . 69 2.3.5 Aggregation of drug targets in the human protein functional network .

Load more