Ontology-Based Methods for Disease Similarity Estimation and Drug
Total Page:16
File Type:pdf, Size:1020Kb
ONTOLOGY-BASED METHODS FOR DISEASE SIMILARITY ESTIMATION AND DRUG REPOSITIONING A DISSERTATION IN Computer Science and Mathematics Presented to the Faculty of the University of Missouri Kansas City in partial fulfillment of the requirements for the degree DOCTOR OF PHILOSOPHY by SACHIN MATHUR B.Tech., Computer Science, Jawaharlal Nehru Technological University, 2001 M.S., Computer Science, University of Missouri-Kansas City, 2004 Kansas City, Missouri 2012 ONTOLOGY-BASED METHODS FOR DISEASE SIMILARITY ESTIMATION AND DRUG REPOSITIONING SACHIN MATHUR, Candidate for the Doctor of Philosophy Degree University of Missouri-Kansas City, 2012 ABSTRACT Human genome sequencing and new biological data generation techniques have provided an opportunity to uncover mechanisms in human disease. Using gene-disease data, recent research has increasingly shown that many seemingly dissimilar diseases have similar/common molecular mechanisms. Understanding similarity between diseases aids in early disease diagnosis and development of new drugs. The growing collection of gene- function and gene-disease data has instituted a need for formal knowledge representation in order to extract information. Ontologies have been successfully applied to represent such knowledge, and data mining techniques have been applied on them to extract information. Informatics methods can be used with ontologies to find similarity between diseases which can yield insight into how they are caused. This can lead to therapies which can actually cure diseases rather than merely treating symptoms. Estimating disease similarity solely on the basis of shared genes can be misleading as variable combinations of genes may be associated with similar diseases, especially for complex diseases. This deficiency can be potentially overcome by looking for common or similar biological processes rather than only explicit gene matches between diseases. The use of semantic similarity between biological processes to estimate disease similarity could enhance the identification and iii characterization of disease similarity besides indentifying novel biological processes involved in the diseases. Also, if diseases have similar molecular mechanisms, then drugs that are currently being used could potentially be used against diseases beyond their original indication. This can greatly benefit patients with diseases that do not have adequate therapies especially people with rare diseases. This can also drastically reduce healthcare costs as development of new drugs is far more expensive than re-using existing ones. In this research we present functions to measure similarity between terms in an ontology, and between entities annotated with terms drawn from the ontology, based on both co-occurrence and information content. The new similarity measure is shown to outperform existing methods using biological pathways. The similarity measure is then used to estimate similarity among diseases using the biological processes involved in them and is evaluated using a manually curated and external datasets with known disease similarities. Further, we use ontologies to encode diseases, drugs and biological processes and demonstrate a method that uses a network-based algorithm to combine biological data about diseases with drug information to find new uses for existing drugs. The effectiveness of the method is demonstrated by comparing the predicted new disease-drug pairs with existing drug-related clinical trials. iv APPROVAL PAGE The faculty listed below, appointed by the Dean of the School of Graduate Studies, have examined a dissertation titled “Ontology-Based Methods for Disease Similarity Estimation and Drug Repositioning” presented by Sachin Mathur, candidate for the Doctor of Philosophy degree, and certify that in their opinion it is worthy of acceptance. Supervisory Committee Deendayal Dinakarpandian, M.D., Ph.D., M.S., Committee Chair School of Computing and Engineering Yugyung Lee, Ph.D. School of Computing and Engineering Appie Van de Liefvoort, Ph.D. School of Computing and Engineering Deepankar Medhi, Ph.D. School of Computing and Engineering Jie Chen, Ph.D. Department of Statistics Peter Smith, PhD. Department of Physiology, Kansas University Medical Center v TABLE OF CONTENTS ABSTRACT ........................................................................................................................... iii LIST OF ILLUSTRATIONS .................................................................................................. xi LIST OF TABLES ................................................................................................................ xiii ACKNOWLEDGEMENTS ................................................................................................. xiv CHAPTER 1. INTRODUCTION AND MOTIVATION ............................................................................ 1 1.1 Background ................................................................................................................... 1 1.2 Motivation ..................................................................................................................... 3 1.3 Organization of Dissertation .......................................................................................... 5 1.4 Contributions of the Dissertation .................................................................................. 5 2. ONTOLOGIES IN BIOMEDICAL DOMAIN ................................................................... 7 2.1 Definition of an Ontology ............................................................................................. 7 2.2 Need for Ontologies ...................................................................................................... 7 2.3 Ontologies in Biomedicine ............................................................................................ 9 2.3.1 Gene Ontology (GO) .............................................................................................. 9 2.3.2 Unified Medical Language System (UMLS) ....................................................... 10 2.4 Application of Ontologies to Biomedicine ...................................................................11 2.5 Open problems in Ontologies ...................................................................................... 14 vi 3. METHODS TO COMPUTE ONTOLOGICAL SIMILARITY ........................................ 16 3.1 Importance of Similarity Between Ontological Terms ................................................ 16 3.2 Importance of Measuring Similarity between Functions of Genes ............................. 17 3.3 Classification of Similarity Measures ......................................................................... 18 3.3.1 Node-Based Measures .......................................................................................... 18 3.3.2 Edge Based Measures ........................................................................................... 21 3.3.3 Hybrid Approach .................................................................................................. 21 3.4 Similarity Measures for Entities Annotated with Ontological Terms ......................... 22 3.4.1 Vector-Based ......................................................................................................... 22 3.4.2 Set-Based .............................................................................................................. 22 3.5 Existing Similarity Measures ...................................................................................... 23 3.5.1. Point-wise Mutual Information (PMI) ................................................................ 23 3.5.2 Resnik ................................................................................................................... 23 3.5.3 Jiang ...................................................................................................................... 24 3.5.4 Lin ........................................................................................................................ 25 3.5.5 Rel ........................................................................................................................ 25 3.5.6 Leacock & Chodorow (LC) .................................................................................. 26 3.5.7 GFSST .................................................................................................................. 26 3.5.8 Wang ..................................................................................................................... 26 3.5.9 Fuzzy Similarity Measure (FSM) ......................................................................... 27 3.5.10 Context Vector Based Method ............................................................................ 28 3.6 Limitations of Existing Similarity Measures ............................................................... 28 vii 4. PROPOSED APPROACH FOR ONTOLOGICAL TERM SIMILARITY ...................... 32 4.1 Proposed Approach ...................................................................................................... 32 4.2 Similarity between terms in an ontology ..................................................................... 33 4.3 Similarity between Entities ......................................................................................... 35 4.4 Comparison with other Similarity Measures ............................................................... 36 4.5 Ontology Perturbation