Drug Repositioning and Indication Discovery Using Description Logics
Total Page:16
File Type:pdf, Size:1020Kb
Drug repositioning and indication discovery using description logics Samuel Claude Jean Croset Darwin College A thesis submitted on March, 2014 for the Degree of Doctor of Philosophy Abstract Drug repositioning is the discovery of new indications for approved or failed drugs. This practice is commonly done within the drug discovery process in order to ad- just or expand the application line of an active molecule. Nowadays, an increasing number of computational methodologies aim at predicting repositioning oppor- tunities in an automated fashion. Some approaches rely on the direct physical interaction between molecules and protein targets (docking) and some methods consider more abstract descriptors, such as a gene expression signature, in order to characterise the potential pharmacological action of a drug (Chapter 1). On a fundamental level, repositioning opportunities exist because drugs per- turb multiple biological entities, (on and off-targets) themselves involved in mul- tiple biological processes. Therefore, a drug can play multiple roles or exhibit various mode of actions responsible for its pharmacology. The work done for my thesis aims at characterising these various modes and mechanisms of action for approved drugs, using a mathematical framework called description logics. In this regard, I first specify how living organisms can be compared to com- plex black box machines and how this analogy can help to capture biomedical knowledge using description logics (Chapter 2). Secondly, the theory is imple- mented in the Functional Therapeutic Chemical Classification System (FTC - https://www.ebi.ac.uk/chembl/ftc/), a resource defining over 20,000 new cat- egories representing the modes and mechanisms of action of approved drugs. The FTC also indexes over 1,000 approved drugs, which have been classified into the mode of action categories using automated reasoning. The FTC is evaluated against a gold standard, the Anatomical Therapeutic Chemical Classification System (ATC), in order to characterise its quality and content (Chapter 3). Finally, from the information available in the FTC, a series of drug reposi- tioning hypotheses were generated and made publicly available via a web appli- cation (https://www.ebi.ac.uk/chembl/research/ftc-hypotheses). A sub- set of the hypotheses related to the cardiovascular hypertension as well as for Alzheimer's disease are further discussed in more details, as an example of an application (Chapter 4). The work performed illustrates how new valuable biomedical knowledge can be automatically generated by integrating and leveraging the content of publicly available resources using description logics and automated reasoning. The newly created classification (FTC) is a first attempt to formally and systematically characterise the function or role of approved drugs using the concept of mode of action. The open hypotheses derived from the resource are available to the community to analyse and design further experiments. Declaration This thesis: • is my own work and contains nothing which is the outcome of work done in collaboration with others, except where specified in the text; • is not substantially the same as any that I have submitted for a degree or diploma or other qualification at any other university; and • does not exceed the prescribed limit of 60,000 words (approximately 39,853 words). Samuel Claude Jean Croset March, 2014 Acknowledgements Writing a thesis is a challenging enterprise, and I would like to thank all the people who provided support along the journey. First to both my supervisors, Dietrich Rebholz-Schuhmann (DRS) and John Overington (JPO) for giving me the opportunity and freedom of doing my own science. Then to all the folks who provided me valuable advice and enlightening discussion, in particular Sarah Carl, Robert Hoehndorf, Anika Oellrich, Felix Krueger, Rita Santos, Christoph Grabmuller, Benjamin Stauch, John May, Gerard van Westen, George Papadatos, Grace Mugambate, Syed Asad Rahman and Pedro Ballester. I have interacted mostly with two research groups during my PhD time at the EMBL-EBI, the text-mining and the ChEMBL group; I would like to show my gratitude to all the members of both of these teams. I would also like to include my family in the acknowledgements, in particular my mother Evelyne, my father Serge and my little brother Elliott, for providing mental support and cheese. All my friends should be present in this list, both from here such as the predocs or out from of the country such as the Haute-Savoie folks. A special thanks to Sarah Carl (<3) and Jean-Baptiste Pettit, as well as to Jacek for the spiritual support. Finally I would like to sincerely thank the reader, for taking the time and energy to go through this document. I deeply apologize in advance for the numerous people I did not mention here, and I will offer a free pub lunch to thank anyone reading this manuscript - just send me an email ([email protected]) with the subject "THESIS ACKNOWLEDGEMENT" to claim your meal. Contents 1 Review of computational drug repositioning approaches 15 1.1 Relevance of drug repositioning . 16 1.1.1 Opportunities for finding new indications . 17 1.1.2 Drug repositioning faces legal and scientific challenges . 20 1.2 Drug repositioning and indication discovery: success stories . 21 1.2.1 Sildenafil: repositioning from clinical side-effects . 22 1.2.2 Thalidomide: repositioning a hazardous drug . 23 1.2.3 Raloxifene: expanding the application line . 24 1.3 Computational approaches towards drug repositioning . 25 1.3.1 Chemical structure-based approaches . 26 1.3.2 Gene expression and functional genomics-based approaches 29 1.3.3 Protein structure and molecular docking-based approaches 32 1.3.4 Phenotype and side-effect-based approaches . 34 1.3.5 Genetic variation-based approaches . 37 1.3.6 Disease network-based approaches . 38 1.3.7 Machine learning and concepts combination approaches . 41 1.3.8 Summary . 43 1.4 Thesis: biological process and molecular function for drug reposi- tioning . 46 1.4.1 Rationale . 47 1.4.2 Towards the specification, implementation and analysis . 49 1.4.2.1 Chapter 2 - Description logics and biomedical knowl- edge (Specification) . 50 1.4.2.2 Chapter 3 - The Functional Therapeutic Chemical Classification System (Implementation) . 50 1.4.2.3 Chapter 4 - Systematic drug repositioning analysis 51 1.4.2.4 Chapter 5 - Outlook and future work . 51 2 Description logics and biomedical knowledge (Specification) 53 2.1 Introduction . 54 2.2 Biomedical knowledge . 55 2.2.1 Contemporary formalism in biomedical sciences . 55 2.2.1.1 Formalism varies among natural sciences . 55 2.2.1.2 Organisms as complex machines . 58 2.2.2 Requirements for biomedical knowledge formalisation . 61 2.2.2.1 Mathematical framework . 61 2.2.2.2 Definitions . 62 2.2.2.3 Hierarchies and abstraction . 62 2.2.2.4 Distributed and scalable . 63 2.2.2.5 Molecular dynamism . 64 2.3 Description logics for biomedical knowledge representation . 65 2.3.1 Problems addressed by description logics . 65 2.3.2 Expressivity and complexity . 68 2.3.3 DLs' components and relation to life sciences . 69 2.3.3.1 Description logics entities . 70 2.3.3.2 Axioms . 72 2.3.3.3 Constructors . 74 2.3.4 Reasoning services . 78 2.3.5 The Web Ontology Language 2 (OWL2) . 80 2.4 Implementation with life-science information . 81 2.4.1 Integration with biomedical ontologies . 81 2.4.1.1 Open Biomedical Ontologies (OBO) . 81 2.4.1.2 Approximations and assumptions . 84 2.4.2 Integration with databases . 86 2.4.3 Brain library - implementing programmatic solutions . 88 2.5 Summary . 90 3 The Functional Therapeutic Chemical Classification System (Im- plementation) 93 3.1 Introduction . 94 3.2 Method and definitions . 97 3.2.1 Source code . 97 3.2.2 Categories creation . 98 3.2.2.1 Mode of Action categories . 98 3.2.2.2 Mechanism of Action categories . 99 3.2.3 Equivalent definitions . 99 3.2.3.1 Regulatory pattern . 99 3.2.3.2 Functional pattern . 102 3.2.4 Data integration . 103 3.2.4.1 Drugbank . 103 3.2.4.2 Gene Ontology Annotations (GOA) . 106 3.2.5 Knowledge base classification . 106 3.2.6 Evaluation methodology . 107 3.2.6.1 Evaluation Points . 108 3.2.6.2 True Positives . 109 3.2.6.3 False Negatives . 109 3.2.6.4 False Positives . 109 3.2.6.5 Precision . 109 3.2.6.6 Recall . 109 3.2.7 Semantic similarity . 110 3.2.8 Mode of action similarity against indication . 110 3.2.9 Knowledge base specification . 111 3.2.9.1 Core FTC classes . 112 3.2.9.2 Core FTC properties . 113 3.3 The classification . 116 3.4 Evaluation . 119 3.5 Exploration . 122 3.5.1 Polypharmacology spectrum . 122 3.5.2 Drugs with similar functions have similar indications . 124 3.6 Discussion . 129 3.6.1 Biological assumptions . 130 3.6.2 Interpreting the evaluation . 131 3.6.3 FTC identifiers . 132 3.7 Summary . 132 4 Systematic drug repositioning analysis 135 4.1 Introduction . 136 4.2 Structure, function and indication of drugs . 137 4.2.1 Structural descriptor selection . 137 4.2.2 Dissimilar structures have dissimilar functions . 140 4.2.3 The more specific an indication is, the more similar the function and structure are . 145 4.2.4 Isolation of drug repositioning hypotheses . 153 4.3 Open drug repositioning hypotheses . 155 4.3.1 Relationship between therapeutic areas . 156 4.3.2 Drug repositioning for hypertension and the cardiovascular system . 161 4.3.2.1 Antihypertensives . 163 4.3.2.2 Calcium channel blockers . 166 4.3.2.3 Adrenergic receptor antagonists . 169 4.3.2.4 Alpha-2 agonists .