Development of Computational Approaches for Medical Image Retrieval, Disease Gene Prediction, and Drug Discovery
Total Page:16
File Type:pdf, Size:1020Kb
DEVELOPMENT OF COMPUTATIONAL APPROACHES FOR MEDICAL IMAGE RETRIEVAL, DISEASE GENE PREDICTION, AND DRUG DISCOVERY by YANG CHEN Submitted in partial fulfillment of the requirements For the degree of Doctor of Philosophy Dissertation Advisor: Dr. Rong Xu, Dr. Guo-qiang Zhang Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY August 2015 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Yang Chen candidate for the degree of Doctor of Philosophy∗ Committee Chair Guo-qiang Zhang Committee Member Rong Xu Committee Member Jing Li Committee Member M. Cenk Cavusoglu Committee Member Xiang Zhang Date of Defense June 23, 2015 *We also certify that written approval has been obtained for any proprietary material contained therein. iii Contents 1 Introduction 1 1.1 Domain knowledge guided strategy for developing computational approaches for biomedical applications ................. 1 1.2 Retrieving medically-relevant web images ................ 3 1.3 Detecting novel genetic basis for human diseases ............ 4 1.4 Predicting novel drug treatments based on disease genetics ...... 7 1.5 Contribution and organization of the dissertation ........... 8 2 Ontology guided approach to retrieving medically-relevant web images: application on retrieving disease manifestation images 10 2.1 Motivation .................................. 10 2.2 Data and methods .............................. 12 2.2.1 Discovering target body parts ................... 12 2.2.2 Detecting target body parts .................... 14 2.2.3 Combining detections for disease image classification ..... 16 2.3 Results .................................... 17 2.3.1 Single-organ disease classification ................ 17 2.3.2 Multiple-organ disease classification ............... 20 2.4 Discussion .................................. 23 2.5 Conclusions ................................. 25 iv 3 Analyzing cross-species genetic networks to predict disease-associated genes: application on Plasmodium falciparum malaria 27 3.1 Motivation .................................. 27 3.2 Data and methods .............................. 29 3.2.1 Construct cross-species gene network .............. 30 3.2.2 Predict candidate genes for malaria ............... 30 3.2.3 Evaluate the validity in predicting malaria genes ....... 32 3.2.4 Evaluate the ranks of druggable genes .............. 32 3.2.5 Extract and analyze malaria-specific pathways based on gene ranking ................................ 33 3.3 Results .................................... 33 3.3.1 Network-based approach allows the prioritization of known malaria genes from both human and parasite genomes .... 33 3.3.2 Network-based approach prioritizes novel malaria genes other than the seeds ............................ 34 3.3.3 Prioritized genes are enriched by druggable genes ....... 36 3.3.4 Pathway analysis shows functions of prioritized genes .... 37 3.4 Discussion .................................. 38 3.5 Conclusions ................................. 40 4 Combining multiple human phenotype networks to predict disease-associated genes: application on Crohn’s disease 41 4.1 Motivation .................................. 41 4.2 Data and methods .............................. 43 4.2.1 Construct DMN using disease-manifestation associations in UMLS ................................ 44 4.2.2 Compare phenotypic relationships in DMN with genetic dis- ease associations .......................... 46 v 4.2.3 Compare DMN with the widely-used disease phenotype net- work mimMiner .......................... 48 4.2.4 Integrate networks ......................... 48 4.2.5 Predict disease genes from the integrated network ....... 49 4.2.6 Evaluate gene prediction in cross validation analyses ..... 51 4.2.7 Evaluate gene prediction for different disease classes ..... 53 4.2.8 Investigate translational potential in drug discovery of the predicted genes for Crohn’s disease ............... 53 4.3 Results .................................... 54 4.3.1 DMN network properties ..................... 54 4.3.2 DMN partially correlates with the genetic disease networks . 57 4.3.3 DMN contains knowledge different from mimMiner ..... 59 4.3.4 Integrating DMN with mimMiner significantly improves the performance of disease gene predictions ............ 60 4.3.5 Our method achieves high but varying performance for dif- ferent disease classes ........................ 62 4.3.6 Our gene prediction method has the potential to guide the drug discovery for Crohn’s disease ................ 64 4.4 Discussion .................................. 67 4.5 Conclusions ................................. 68 5 Studying disease comorbidity network to detect genetic evidences for disease links: application on colorectal cancer and obesity 69 5.1 Motivation .................................. 69 5.2 Data and methods .............................. 71 5.2.1 Construct disease comorbidity network ............. 71 5.2.2 Prioritize the diseases that have strong associations with both obesity and CRC .......................... 75 vi 5.2.3 Identify gene overlaps through gene expression meta-analysis 76 5.3 Results .................................... 77 5.3.1 Local disease comorbidity network models the connection between obesity and CRC ..................... 77 5.3.2 Osteoporosis shows high comorbidity associations with both CRC and obesity .......................... 77 5.3.3 Innovative genes shared among osteoporosis, obesity and CRC are detected using gene expression meta-analysis .... 79 5.4 Discussion .................................. 80 5.5 Conclusions ................................. 81 6 Combing human disease genetics and mouse model phenotypes towards drug repositioning: application on Parkinson’s disease 83 6.1 Motivation .................................. 83 6.2 Data and methods .............................. 85 6.2.1 Identify mouse model phenotypes for PD using disease ge- netics in OMIM ........................... 85 6.2.2 Prioritize candidate PD drugs based on the similarities of mouse phenotype profiles between disease and drugs .... 87 6.2.3 De novo evaluation in prioritizing FDA-approved PD drugs 88 6.2.4 Evaluation in ranking novel PD drugs and comparison with an existing drug repositioning approach ............. 90 6.2.5 Test the top-ranked drugs using gene expression data analysis 91 6.3 Results .................................... 91 6.3.1 Our disease genetics-based phenotype prioritization algorithm identified PD-specific mouse model phenotypes ........ 91 6.3.2 Our approach prioritized FDA-approved PD drugs ...... 92 vii 6.3.3 Our approach outperformed an existing approach in priori- tizing novel PD drugs ....................... 94 6.3.4 Gene expression analysis suggests quetiapine as a potential PD drug ............................... 96 6.4 Discussion .................................. 96 6.5 Conclusions ................................. 97 7 Conclusions and future work 98 7.1 Conclusions ................................. 98 7.2 Future work ................................. 100 7.2.1 Disease image retrieval ...................... 100 7.2.2 Disease gene prediction ...................... 100 7.2.3 Drug repositioning ......................... 101 Appendices viii List of Tables 2.1 Performance Comparison on Ten Eye Disease Image Test Sets. .... 19 2.2 Performance Comparison on Ten Ear Disease Image Test Sets. .... 21 2.3 Performance Comparison on Ten Mouth/Lip Disease Image Test Sets. 22 2.4 Performance Comparison on Ten Mouth/Lip Disease Image Test Sets. 24 3.1 Result of the leave-one-out cross validation for human genes. We left out one malaria gene from the seed list each time, and deter- mined the rank of this excluded gene using our method. We showed the rank and percentage among all human genes. ........... 35 3.2 Top 10 parasite genes in the leave-one-out cross validation. ...... 35 3.3 Rank of other malaria-associated genes from literature. ........ 36 3.4 Pathways prioritized over 50% in rank. ................. 39 4.1 Global properties of DMN and the other disease networks, includ- ing HDNs (genetic disease networks) and mimMiner (widely-used phenotype network) based on OMIM text mining. The last three columns represent average shortest path, average cluster coefficient, and connected component, respectively. ................. 55 ix 4.2 Compare the edge overlaps N between DMN and the genetic dis- 0 ease networks. Network B represents the randomized graph that 0 preserves the properties of Network B. Column N(A;B ) represents the average number of edge overlap comparing network A and the randomized networks. ........................... 58 4.3 Compare the community structures between DMN and the genetic disease networks. SA!B and SB!A represent the two-way the simi- larity in community partitions between network A and B. ...... 59 4.4 Compare DMN with mimMiner in nodes, edges and community structures. .................................. 60 4.5 Ratios of successful disease-gene association predictions in the leave- one-out cross validation experiment. All diseases were included in the experiment. ............................... 61 4.6 Success ratio of disease-gene association predictions for all diseases and monogenetic diseases in the nine disease classes. ......... 63 4.7 Drug candidates for Crohn’s disease that are supported by literature. 66 5.1 Top five disease nodes in the local network that contains all paths from obesity to colorectal cancer. the diseases were ranked by de- gree and betweenness,