Downloaded from Ensembl
Total Page:16
File Type:pdf, Size:1020Kb
UCSF UC San Francisco Electronic Theses and Dissertations Title Detecting genetic similarity between complex human traits by exploring their common molecular mechanism Permalink https://escholarship.org/uc/item/1k40s443 Author Gu, Jialiang Publication Date 2019 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California by Submitted in partial satisfaction of the requirements for degree of in in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, SAN FRANCISCO AND UNIVERSITY OF CALIFORNIA, BERKELEY Approved: ______________________________________________________________________________ Chair ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ ______________________________________________________________________________ Committee Members ii Acknowledgement This project would not have been possible without Prof. Dr. Hao Li, Dr. Jiashun Zheng and Dr. Chris Fuller at the University of California, San Francisco (UCSF) and Caribou Bioscience. The Li lab grew into a multi-facet research group consist of both experimentalists and computational biologists covering three research areas including cellular/molecular mechanism of ageing, genetic determinants of complex human traits and structure, function, evolution of gene regulatory network. Labs like these are the pillar of global success and reputation of UCSF and UC Berkeley, and excellent incubator of research projects similar to the one documented in this manuscript. Secondly I would like to thank Dr. Ryan Hernandez, Dr. Irina Conboy for serving as my dissertation committee. I truly appreciate their advice and questions which helped me refine research scope and adjust research direction. Thirdly I would like to thank Dr. Patricia Babbitt and Dr. Adam Abate to be my academic advisors, providing advice ranging from materializable action such as rotation lab selection to philosophical discussion such as the history, evolution and future of life science and its comparison with physical science. Above all, the success of this project can be attributed to my family who supported my decision in 2014 to resign from the software industry and pursue my true interest. I am extremely grateful for their support bearing the risk with me as their only child. Finally, this dissertation is part of a story that One’s Revenge is Others’ Redemption. iii Detecting genetic similarity between complex human traits by exploring their common molecular mechanism Jialiang Gu Abstract The rapid accumulation of Genome Wide Association Studies (GWAS) and association studies of intermediate molecular traits provides new opportunities for comparative analysis of the genetic basis of complex human phenotypes. Using a newly developed statistical framework called Sherlock-II that integrates GWAS with eQTL (expression Quantitative Trait Loci) and metabolite-QTL data, we systematically analyzed 445 GWAS datasets, and identified 2114 significant gene-phenotype associations and 469 metabolites-phenotype associations (passing a Q-value cutoff of 1/3). This integrative analysis allows us to translate SNP-phenotype associations into functionally informative gene-phenotype association profiles. Genetic similarity analyses based on these profiles clustered phenotypes into sub-trees that reveal both expected and unexpected relationships. We employed a statistical approach to delineate sets of functionally related genes that contribute to the similarity between their association profiles. This approach suggested common molecular mechanisms that connect the phenotypes in a subtree. For example, we found that fasting insulin, fasting glucose, breast cancer, prostate cancer, and lung cancer clustered into a subtree, and identified cyclic AMP/GMP signaling that connects breast cancer and insulin, NAPDH oxidase/ROS generation that connects the three cancers, and apoptosis that connects all five phenotypes. Our approach can be used to assess genetic similarity and suggest mechanistic connections between phenotypes. It has the potential to improve the diagnosis and treatment of a disease by mapping mechanistic insights from one phenotype onto others based on common molecular underpinnings. iv Table of Contents 1. Introduction .................................................................................................................... 1 1.1. Importance of disease classification and the need for similarity study ............................... 1 1.2. Current state of disease classification and the need for a new similarity metric ................ 1 1.3. Example of new similarity metric in Precision Medicine Initiative .................................... 2 1.4. Our approach ..................................................................................................................... 3 2. Aims and Strategy ........................................................................................................... 8 3. Result ............................................................................................................................. 9 3.1. Sherlock-II: a new computational algorithm to infer IMT-phenotype association ............ 9 3.1.1. Metabolite phenotype associations ................................................................................... 10 3.1.2. Gene phenotype associations ............................................................................................ 13 3.1.3. Summary of gene/metabolite phenotype associations ...................................................... 13 3.2. Global analysis of the genetic similarity between phenotypes .......................................... 14 3.2.1. Hierarchical clustering of phenotypes based on gene phenotype association profile ....... 14 3.2.2. Gene Ontology terms reveal common mechanism connecting phenotypes ...................... 22 3.2.3. Two way bi-clustering detects genes associated with multiple phenotypes ..................... 25 4. Discussion ...................................................................................................................... 30 4.1. Advantage of similarity study on gene/metabolite level than on SNP level ...................... 30 4.2. Alternative mechanisms supported by the same genetic evidence .................................... 31 5. Method .......................................................................................................................... 32 5.1. Sherlock-II algorithm ....................................................................................................... 32 5.2. Global analysis to identify molecular similarity between phenotypes .............................. 38 v 5.2.1. Hierarchical clustering ..................................................................................................... 39 5.2.2. Partial-Pearson-Zscore analysis ........................................................................................ 39 5.2.3. Two-way bi-clustering algorithm ..................................................................................... 40 6. Visualization .................................................................................................................. 42 6.1. Interactive web-interface .................................................................................................. 42 6.1.1. Browse gene-phenotype associations ............................................................................... 43 6.1.2. Browse SNPs supporting gene-phenotype associations ................................................... 47 6.1.3. Database models ............................................................................................................... 54 7. Partially completed Attempts and future directions ....................................................... 56 7.1. Allele direction .................................................................................................................. 56 7.2. eQTL Mechanism ............................................................................................................. 58 7.3. Dimension reduction ......................................................................................................... 61 7.4. More phenotype connection supported by gene ontology terms ....................................... 65 References ............................................................................................................................ 70 Appendix ............................................................................................................................... 80 vi List of Figures Figure 1. 1 A schematic of the integrative analysis framework. .................................................... 7 Figure 1. 2 Four examples of metabolite-phenotype associations ................................................ 12 Figure 3. 1 example alignment ...................................................................................................... 16 Figure 3. 2 global hierarchical clustering of phenotypes based on gene association profiles. ..... 17 Figure 3. 3 overlapping genes between phenotype pairs .............................................................. 21 Figure 3. 4 Subclustering of phenotype-gene association matrix ................................................. 26 Figure 3. 5 all sub-clustering