University of Cincinnati
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CINCINNATI Date:___________________ I, _________________________________________________________, hereby submit this work as part of the requirements for the degree of: in: It is entitled: This work and its defense approved by: Chair: _______________________________ _______________________________ _______________________________ _______________________________ _______________________________ Computational Selection and Prioritization of Disease Candidate Genes A dissertation submitted to the Graduate School of the University of Cincinnati in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY in the Department of Biomedical Engineering of the College of Engineering 2008 by Jing Chen B.E., National University of Singapore, 2002 Committee Chairs: Bruce J Aronow, Ph.D. and Anil G Jegga, D.V.M., M.S. Committee Member: Marepalli Rao, Ph.D. Abstract Identifying causal genes underlying susceptibility to human disease is a problem of primary importance in post-genomic era and current biomedical research. Recently, there has been a paradigm shift of such gene-discovery efforts from rare, monogenic conditions to common “oligogenic” or “multifactorial” conditions such as asthma, diabetes, cancers and neurological disorders. These conditions are referred as multifactorial because, susceptibility to these diseases is attributed to the combinatorial effects of genetic variation at a number of different genes and their interaction with relevant environmental exposures. The expectation is that identification and characterization of the causal genes implicated in the inherited component of disease susceptibility will lead to substantial advances in our understanding of disease. These advances in turn can lead to improvements in diagnostic accuracy, prognostic precision, the range and targeting of available therapeutic options and ultimately realize the promise of personalized or “tailor-made” medicine. The objective of my thesis therefore is to design, develop, and validate computational approaches for identification and prioritization of these causal genes. The first approach tests the hypothesis that the majority of genes that impact or cause disease share membership in any of several functional relationships. We use a p-value-based meta-analysis method to prioritize the candidate genes based on functional annotation. For the very first time, we use and demonstrate, the utility of mouse phenotype annotations in human disease gene prioritization. Since this approach is limited to only genes with functional annotation, and because many human genes are yet to be functionally classified, i we have developed another approach that is independent of gene functional annotations. We implemented a set of new algorithms to prioritize genes based on protein-protein interaction networks. Large scale cross-validation were performed for comparison and evaluation of the methods, and to determine the associated parameters. Our results demonstrate that the functional annotation-based method performs better than other approaches. Although the performance of the network-based method was not as good as functional annotation-based method, it is much simpler to implement, apply, and execute. The best performance was however achieved, as demonstrated through asthma test case, by combining the results from the two methods. ii iii Acknowledgements First, I am in debt of gratitude to my advisors and mentors, Dr. Bruce Aronow and Dr. Anil Jegga, for providing me with supervision, motivation and encouragement throughout my graduate studies. Their enthusiasm, high expectations, and trust pushed me toward a new level of professionalism. They have forever set standards for dedication and excellence to which I will always aspire. I owe much of my accomplishments to them. Without their care, supervision and friendship, I would not be able to complete this work. Thanks and gratitude also to Dr. Marepalli Rao, for being a part of the graduation committee, for sharing his expertise in statistics, and for making complicated statistical concepts and problems easy to understand. I am grateful for his advice and help, and importantly making me realize the importance of statistics in the research of bioinformatics. Thanks to Dr. Jarek Meller, Dr. Mario Medvedovic, Dr. Michael Wagner and Dr. Yan Xu in division of Biomedical Informatics at Cincinnati Children’s Hospital for giving advice and raising questions during my presentations and discussions. The journal club experience is valuable and unforgettable for me. Being part of the graduate student population at Bioinformatics at University of Cincinnati, I have been very lucky to have had many supportive fellow students. Sincere thanks to Sivakumar Gowrisankar, Ranga Chandra Gudivada, Johannes Freudenberg, and Mukta iv Phatak for their support in various aspects of the project and for always being available for help. I am also very grateful to Dr. Xiaohua Sheng for his suggestion on statistical analysis and Eric Bardes for his advice on programming. Finally, I wish to thank my parents for their love, support, and continued encouragement throughout the years. I wish to thank my son Lang who troubles me with so much joy. I would also like to thank my wife Huan Xu, for giving me her unconditional love and support throughout everything. This dissertation could not have been completed without her. I would like to dedicate this work to her. v Publications arising from this thesis Papers Chen J, Xu H, Aronow BJ, Jegga AG 2007. Improved human disease candidate gene prioritization using mouse phenotype. BMC Bioinformatics 8(1): 392. Chen J, Aronow BJ, Jegga AG 2008. Disease candidate gene identification and prioritization using protein-protein interaction networks. (Submitted) Chen J, Gowrisankar S, Xu H, Aronow BJ, Jegga AG 2008. In silico Prioritization of Novel Asthma Candidate Genes. (Submitted ) Gudivada RC, Qu X, Chen J, Jegga AG, Neumann EK, Aronow BJ 2007. Identifying disease-causal genes using semantic web-based integration of genomic-phenomic data (accepted by Journal of Biomedical Informatics) Book chapters Chen J, Jegga AG 2007. Systems Biology Based Integrative Approaches to Identify and Prioritize Novel Disease Candidate Genes. Bios Publications, In Press. vi Table of contents Abstract ..................................................................................................................................................................i Acknowledgements ..............................................................................................................................................iv Publications arising from this thesis ..................................................................................................................vi Table of contents .................................................................................................................................................vii List of figures ........................................................................................................................................................x List of tables ........................................................................................................................................................xii Chapter 1. Overview.......................................................................................................................................1 1.1 Motivation..............................................................................................................................................1 1.2 Contributions of this Thesis ...................................................................................................................2 Chapter 2. Systems Biology Based Integrative Approaches to Identify and Prioritize Novel Disease Candidate Genes...................................................................................................................................................4 2.1 Background............................................................................................................................................4 2.1.1 Connecting phenotype with genotype: Disease gene discovery ....................................................4 2.1.2 Traditional Candidate Gene Approach...........................................................................................7 2.1.3 Candidate disease gene prediction using protein-protein interactions...........................................8 2.1.4 Candidate gene prioritization based on functional annotations ...................................................10 2.2 Current work: improved functional-based and novel network-based prioritization methods..............14 2.3 Limitations of candidate gene prioritization approaches.....................................................................16 2.4 Summary ..............................................................................................................................................18 Chapter 3. Discovery and Prioritization of Candidate Genes that Cause or Impact Disease Using an Integrative Genome-Transcriptome-Phenome-Bibliome Approach...............................................................19 3.1 Background..........................................................................................................................................19 3.2 Materials and methods.........................................................................................................................21