Looking Beyond the Standard Genome-Wide Association Study
Total Page:16
File Type:pdf, Size:1020Kb
Looking beyond the standard genome-wide association study: Biologically-motivated methodological approaches to discover novel genetic variants associated with complex human traits and disease by Cindy Im A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Epidemiology School of Public Health University of Alberta © Cindy Im, 2018 ABSTRACT The standard approach for testing associations between common single nucleotide genetic variants (referred to as single nucleotide polymorphisms or SNPs) and disease entails testing disease associations for each SNP in the genome individually. This “hypothesis-free” approach has identified thousands of statistically significant associations between single SNPs and a wide range of diseases. However, complex forms of genetic variation – which include epistatic interactions, gene-environment interactions, inheritance patterns, rare variants, and structural variants – represent a tremendous potential source of transcriptional complexity in the human genome and may contribute substantially to disease risk. These complex forms of genetic variation are not explored in conventional single-SNP genome-wide association studies, largely due to computational, methodological, and statistical constraints. In this dissertation, we look beyond the contributions of single SNPs to the genetic architecture of disease and consider novel approaches to investigate how untested classes of genomic variation present in the human genome may advance our understanding of the genetic basis of disease. More specifically, the studies presented in this thesis describe a novel methodological framework to detect patterns of epistasis (multiple SNP interactions) and haplotypes (SNP alleles arranged on the same chromosome) associated with complex disease traits that may also potentially model the regulation of trait-related gene transcription events, thereby elucidating central biomolecular mechanisms that influence disease trait pathogenesis. This methodological framework may be summarized as follows: first, a “filter” is employed to restrict the set of investigated SNPs to those with putative biological functions; subsequently, a novel, non-exhaustive statistical approach is implemented to discover candidate epistatic ii interaction and haplotype associations with disease traits among filtered SNPs. As a final step, replication and biological inference analyses are conducted to assess the credibility of complex genetic variant discoveries. Under this framework, we increase the prior probability of identifying epistatic interactions or haplotypes that are transcriptionally relevant, and facilitate searches of the large space of interactions/haplotypes without limiting the number of tested associations using computational burden-based criteria to improve power. Our results demonstrate the relevance of studies of epistasis in explaining the variability of bone mineral density (an integral determinant of bone health) in adult survivors of pediatric cancer exposed to bone-diminishing treatments, and the effects of haplotypes on risk for primary biliary cholangitis (an incurable autoimmune disease of the liver) in Japanese. We suggest that the discovered genetic targets from these analyses be considered for future basic research into biological mechanisms influencing bone mineral density and primary biliary cholangitis, under the expectation that such research will support the eventual objective of developing potential health applications for the prevention, diagnosis, or treatment of these health conditions. iii PREFACE This thesis is an original work by Cindy Im, and is part of a larger research project led by Prof. Yutaka Yasui that received research ethics approval from the University of Alberta Health Research Ethics Board (HREB) under project name “Statistical analyses – Genome Wide Association Study”, No. Pro00042122, on August 30, 2013. As my PhD advisor, Prof. Yasui was responsible for supervising the ethical conduct of research and the overall direction of methodological approaches implemented as a part of this thesis. Chapter 2 of this thesis has been published as C. Im, K.K. Ness, S.C. Kaste, W. Chemaitilly, W. Moon, Y. Sapkota, R.J. Brooke, M.M. Hudson, L.L. Robison, Y. Yasui, and C.L. Wilson, “Genome-wide search for higher order epistasis as modifiers of treatment effects on bone mineral density in childhood cancer survivors,” European Journal of Human Genetics, vol. 26, pp. 275-286. The data used for the analyses presented in Chapter 2 come from the “St. Jude Lifetime Cohort Study” (SJLIFE), which was conceived, designed, and implemented under the supervision of L.L. Robison, M.M. Hudson, K.K. Ness, S.C. Kaste, W. Chemaitilly, and C.L. Wilson (funded by the National Cancer Institute, #U01 CA195547). C.L. Wilson and W. Moon provided technical assistance by providing access to SJLIFE data. Under the supervision of Prof. Yasui, I was responsible for developing the study hypothesis, designing the analytic method, performing genetic data quality checks, phasing/imputation, and annotation, conducting the analysis, interpreting and summarizing the results, and composing the manuscript. C.L. Wilson provided clinical expertise throughout all project stages. All co-authors contributed critical revisions to the final manuscript. iv Chapter 3 of this thesis was submitted as C. Im, W. Moon, R.J. Brooke, Y. Sapkota, and Y. Yasui, “Genome-wide search for higher order epistasis as modifiers of treatment effects on bone mineral density in childhood cancer survivors” to Advances in Neural Information Processing Systems 29. Under Prof. Yasui’s guidance, I contributed to the design of the simulation study, interpreted and summarized the results, and composed the manuscript. W. Moon provided technical assistance by conducting simulation study iterations. All co-authors contributed critical revisions to the final manuscript. Chapter 4 of this thesis has been published as C. Im, Y. Sapkota, W. Moon, M. Kawashima, M. Nakamura, K. Tokunaga, and Y. Yasui, “Genome-wide haplotype association analysis of primary biliary cholangitis risk in Japanese”, Scientific Reports, vol. 8, issue 1. The data used for the analyses presented in Chapter 4 come from the Japan PBC-GWAS (PBC: Primary Biliary Cirrhosis; GWAS: Genome-Wide Association Study) Consortium. Conception of the original study and initial data collection was led M. Nakamura (primarily funded by the Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science program, #20590800, #23591006, #26293181). M. Kawashima provided technical assistance by providing access to the data, while Y. Sapkota and W. Moon supported the preliminary processing and phasing of the genotype data. Under the supervision of Prof. Yasui, I was responsible for developing the study hypothesis, performing genetic data quality checks and annotation, designing the analytic methods, conducting the analysis, interpreting and summarizing the results, and composing the manuscript. Y. Sapkota, M. Nakamura, and K. Tokunaga provided critical input in the interpretation and presentation of study results. All co- authors contributed revisions to the final manuscript. v DEDICATIONS In loving memory of my father, Myo Soon Im. vi ACKNOWLEDGEMENTS There are many people that I would like to thank for their contributions to this thesis. First, I would like to express my deepest appreciation for having had the opportunity to work with Prof. Yutaka Yasui, my Ph.D. program supervisor. The thesis would not have been possible without his brilliant insights, profound research perspective, and generous and thoughtful mentorship. I also sincerely thank the members of my PhD Supervisory Committee, Prof. Sambasivarao Damaraju and Prof. Irina Dinu, for their invaluable feedback. Their collaborative efforts were integral for this work; I am thankful for their contributions to my professional development. I would also like to thank the wonderful faculty members and researchers at the University of Alberta (Profs. Keumhee Carriere Chough and Michael Kouritzin), St. Jude Children’s Research Hospital (Drs. Leslie Robison, Kiri Ness, Carmen Wilson, Yadav Sapkota, Russell J. Brooke, Wonjong Moon), University of Alabama at Birmingham (Dr. Noha Sharafeldin), and the University of Tokyo (Prof. Katsushi Tokunaga) who have not only influenced this work, but inspired me as a researcher and lifelong learner. This work was supported by several institutions and funding agencies. I am indebted to St. Jude Children’s Research Hospital for providing me access to unparalleled research resources. I thank the Alberta Machine Intelligence Institute for providing me funding support throughout the duration of my PhD program. I would like to specifically acknowledge that the research described in Chapter 2 was funded by the St. Jude Lifetime Cohort Study (U01 CA195547), American Lebanese Syrian Associated Charities, Rally Foundation for Childhood Cancer Research, and National Institutes of Health Grant R01CA216354, while the study described in Chapter 4 was supported by funding provided by Grants-in-Aid for Scientific vii Research from the Japan Society for the Promotion of Science, Grant-in-Aid for Clinical Research from the National Hospital Organization, Health Labor Science Research Grants from Research on Measures for Intractable Diseases, Intractable Hepato-Biliary Diseases Study Group in Japan, and the Japan Agency for Medical Research and Development. Finally, I cannot express the