And a Deep Phenotype Association Study (Deepas) with Data from the Age‐Related Eye Disease Study 2 (Areds2)
Total Page:16
File Type:pdf, Size:1020Kb
Identifying Genetic Pleiotropy through a Literature-wide Association Study (LitWAS) and a Phenotype Association Study (PheWAS) in the Age-related Eye Disease Study 2 (AREDS2) Item Type text; Electronic Thesis Authors Simmons, Michael Publisher The University of Arizona. Rights Copyright © is held by the author. Digital access to this material is made possible by the College of Medicine - Phoenix, University of Arizona. Further transmission, reproduction or presentation (such as public display or performance) of protected items is prohibited except with permission of the author. Download date 26/09/2021 00:42:07 Link to Item http://hdl.handle.net/10150/623630 IDENTIFYING GENETIC PLEIOTROPY THROUGH A LITERATURE‐WIDE ASSOCIATION STUDY (LITWAS) AND A DEEP PHENOTYPE ASSOCIATION STUDY (DEEPAS) WITH DATA FROM THE AGE‐RELATED EYE DISEASE STUDY 2 (AREDS2) A thesis submitted to the University of Arizona College of Medicine – Phoenix in partial fulfillment of the requirements for the Degree of Doctor of Medicine Michael Simmons Class of 2017 Mentor: Zhiyong Lu, PhD Acknowledgements We acknowledge the contributions of Ayush Singhal PhD, Freekje VanAsten PhD, Tiarnan Keenan, FRCOphth, PhD, and Emily Chew MD in their support and indespensible contributions to this work. Particularly, Dr. Singhal designed the text mining algorithm and conducted the LitWAS, and Dr. VanAsten performed the principle work for execution of the DeePAS. This research was supported by the NIH Medical Research Scholars Program, a public‐private partnership supported jointly by the NIH and generous contributions to the Foundation for the NIH from the Doris Duke Charitable Foundation, the Howard Hughes Medical Institute, the American Association for Dental Research, the Colgate‐Palmolive Company, and other private donors. No funds from the Doris Duke Charitable Foundation were used to support research that used animals. Abstract Background/Significance: Genetic association studies simplify genotype‐phenotype relationship investigation by considering only the presence of a given polymorphism and the presence or absence of a given downstream phenotype. Although such associations do not indicate causation, collections of phenotypes sharing association with a single genetic polymorphism may provide valuable mechanistic insights. In this thesis we explore such genetic pleiotropy with Deep Phenotype Association Studies (DeePAS) using data from the Age‐Related Eye Study 2 (AREDS2). We also employ a novel text mining approach to extract pleiotropic associations from the published literature as a hypothesis generation mechanism. Research Question: Is it possible to identify pleiotropic genetic associations across multiple published abstracts and validate these in data from AREDS2? Methods: Data from the AREDS2 trial includes 123 phenotypes including AMD features, other ocular conditions, cognitive function and cardiovascular, neurological, gastrointestinal and endocrine disease. A previously validated relationship extraction algorithm was used to isolate descriptions of genetic associations with these phenotypes in MEDLINE abstracts. Results were filtered to exclude negated findings and normalize variant mentions. Genotype data was available for 1826 AREDS2 participants. A DeePAS was performed by evaluating the association between selected SNPs and all available phenotypes. Associations that remained significant after Bonferroni‐correction were replicated in AREDS. Results: LitWAS analysis identified 9372 SNPs with literature support for at least two distinct phenotypes, with an average of 3.1 phenotypes/SNP. PheWAS analyses revealed that two variants of the ARMS2‐HTRA1 locus at 10q26, rs10490924 and rs3750846, were significantly associated with sub‐retinal hemorrhage in AMD (rs3750846 OR 1.79 (1.41‐2.27), p=1.17*10‐7). This associated remained significant even in populations of participants with neovascular AMD. Furthermore, odds ratios for the development of sub‐retinal hemorrhage in the presence of the rs3750846 SNP were similar between incident and prevalent AREDS2 sub‐populations (OR: 1.94 vs 1.75). This association was also replicated in data from the AREDS trial. No literature‐defined pleiotropic associations tested remained significant after multiple‐testing correction. Conclusions: The rs3750846 variant of the ARMS2‐HTRA1 locus is associated with sub‐retinal hemorrhage. Automatic literature mining, when paired with clinical data, is a promising method for exploring genotype‐phenotype relationships. Table of Contents INTRODUCTION: ...................................................................................................................... 1 METHODS: ............................................................................................................................... 5 OVERVIEW ...................................................................................................................................... 5 AREDS2 STUDY DESIGN ................................................................................................................... 7 AREDS STUDY DESIGN ..................................................................................................................... 7 GENOTYPING ................................................................................................................................... 7 PHENOTYPE DEFINITIONS .................................................................................................................... 8 LITERATURE‐WIDE ASSOCIATION STUDY ................................................................................................ 8 Gene‐variant‐disease relationship extraction ......................................................................... 9 Normalization of variant descriptions ..................................................................................... 9 Negation identification .......................................................................................................... 11 Overview: ........................................................................................................................... 11 Negation Corpus ................................................................................................................ 12 Negation phrase and term dictionaries: ............................................................................ 12 Prioritization of SNPs for inclusion in the PheWAS ................................................................ 13 PHENOTYPE ASSOCIATION STUDY ...................................................................................................... 14 RESULTS: ............................................................................................................................... 15 LITWAS ....................................................................................................................................... 15 Normalization of variant descriptions ................................................................................... 15 Negation identification .......................................................................................................... 17 LitWAS .................................................................................................................................... 19 PHEWAS ...................................................................................................................................... 23 Patient characteristics ........................................................................................................... 23 Abbreviations: L, lutein; Z, zeaxanthin; DHA, docosahexanoic acid; EPA, eicosapentaenoic acid ........................................................................................................................................ 24 PheWAS ................................................................................................................................. 25 FUTURE DIRECTIONS ............................................................................................................. 34 CONCLUSIONS ....................................................................................................................... 35 REFERENCES .......................................................................................................................... 36 List of Figures and Tables TABLES TABLE 1. COMMON ABBREVIATIONS ........................................................................................................ 4 TABLE 2. VARIANT NORMALIZATION EVALUATION .................................................................................... 16 TABLE 3. NEGATION IDENTIFICATION EVALUATION ................................................................................... 18 TABLE 4. LITERATURE‐WIDE ASSOCIATION STUDY RESULTS ......................................................................... 20 TABLE 5. LITERATURE‐MINED ASSOCIATIONS WITH THE ARMS2 VARIANT, RS1040924 .................................. 22 TABLE 6. AREDS2 BASELINE CHARACTERISTICS ....................................................................................... 24 TABLE 7. HTRA1‐ARMS2 (RS3750846) AND SUBRETINAL HEMORRHAGE IN AREDS2 ................................. 27 TABLE 8. ARMS2 (RS3750846) AND HEMORRHAGE CHARACTERISTIC OF AMD IN AREDS AND AREDS2 ....... 29 FIGURES FIGURE 1. STUDY OVERVIEW .................................................................................................................. 6 FIGURE