FROM VARIANTS TO PATHWAYS: INTERROGATING THE GENETIC ARCHITECTURE OF AGE-RELATED MACULAR DEGENERATION

by

ANDREA ROSE WAKSMUNSKI

Submitted in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Department of Genetics and Genome Sciences

CASE WESTERN RESERVE UNIVERSITY

May 2020

CASE WESTERN RESERVE UNIVERSITY

SCHOOL OF GRADUATE STUDIES

We hereby approve of the dissertation of

Andrea Rose Waksmunski

candidate for the degree of Doctor of Philosophy*.

Committee Chair

Dana C. Crawford, Ph.D.

Thesis Advisor

Jonathan L. Haines, Ph.D.

Committee Member

Thomas LaFramboise, Ph.D.

Committee Member

Mark Cameron, Ph.D.

Committee Member

Alexander Miron, Ph.D.

Date of Defense

March 30, 2020

*We also certify that written approval has been obtained for any proprietary material

contained therein.

2

DEDICATION

To my Mom and Dad. Your love and faith shaped me into the person I am today. Thank you for your endless encouragement and teaching me to value learning, hard work, and problem-solving.

To my siblings, Alexis and Derek. Being triplets has given us a special bond. I am so honored to be your sister and eternally grateful for your unwavering love and support.

To my grandparents, Andrew and Rose Ondecko. You became angels during the first year of my Ph.D. program and have been watching over me ever since. May you continue to be my guardian angels in everything I do.

3

Table of Contents

LIST OF TABLES ...... 7 LIST OF FIGURES ...... 8 ACKNOWLEDGEMENTS ...... 10 LIST OF ABBREVIATIONS ...... 12 Abstract ...... 16 CHAPTER 1 ...... 18 Age-Related Macular Degeneration as a Public Health Concern...... 18 Age-Related Macular Degeneration: Symptoms and Treatment Options ...... 19 Age-Related Macular Degeneration: Known Pathology ...... 22 Age-Related Macular Degeneration: Risk Factors ...... 24 Genetic Epidemiology of Age-Related Macular Degeneration: Methods and Populations ...... 25 The Amish as an Isolated, Founder Population ...... 29 The Amish in Genetics Research ...... 31 Summary and Unanswered Questions ...... 34 CHAPTER 2 ...... 36 Abstract ...... 37 Introduction ...... 38 Methods ...... 40 Study Demographics ...... 40 Blood Collection ...... 40 Nucleic Acid Extraction ...... 41 P503A Genotyping ...... 41 Amish Pedigrees ...... 42 Age at Exam ...... 42 CFH RNA Quantification and Analysis ...... 42 Modeling ...... 43 Western Blots and Quantitative Analysis ...... 44 ELISA ...... 45 Genetic Risk Score Analysis ...... 46

4

Results ...... 47 Identification of Additional CFH P503A Carriers in the Amish ...... 47 Age at Diagnosis by P503A Carrier Status ...... 49 CFH RNA Quantification and Analysis ...... 50 Protein Modeling ...... 52 CFH Protein Expression ...... 54 Genetic Risk Scores ...... 57 Discussion ...... 59 CHAPTER 3 ...... 65 Abstract ...... 66 Introduction ...... 67 Materials and Methods ...... 68 Study Participants...... 68 Genotyping and Quality Control ...... 71 Association Analysis ...... 73 Linkage Analysis ...... 74 In Silico Functional Analysis of from 1-HLOD Support Intervals of Linkage Regions ...... 76 Results ...... 76 Association Analysis ...... 76 Linkage Analysis ...... 80 In Silico Functional Analysis of Genes from 1-HLOD Support Intervals of Linkage Regions ...... 85 Discussion ...... 87 CHAPTER 4 ...... 95 Abstract ...... 96 Introduction ...... 97 Methods ...... 98 Study Subjects and GWAS Summary Statistics ...... 98 PARIS: Knowledge-Driven Pathway Analysis of GWAS Data ...... 99 Identification of Statistical Pathway Driver Genes ...... 100 Protein-Protein Interaction Network for Statistical Pathway Driver Genes ...... 100 Motif Analysis for Statistical Pathway Driver Genes ...... 101

5

Results ...... 101 In Silico Pathway Analysis ...... 101 Statistical driver genes among advanced AMD-associated pathways ...... 102 Discussion ...... 111 CHAPTER 5 ...... 117 Abstract ...... 118 Background ...... 119 Results ...... 122 Study Data for ADV, GA, and CNV Analyses...... 122 Narrow-sense heritability explained by variants in SDGs for ADV, GA, and CNV 123 Epistasis analyses ...... 127 Discussion ...... 130 Conclusions ...... 134 Methods ...... 135 Statistical driver genes for advanced AMD ...... 135 Variant Selection and Genotype Extraction ...... 136 Estimation of AMD heritability with GCTA GREML ...... 137 Pairwise LD analysis of SDGs and 34 AMD Loci...... 139 Epistatic interaction analyses ...... 139 CHAPTER 6 ...... 140 Overview ...... 140 Further Investigation of CFH P503A and Implications for AMD Pathology ..... 140 Beyond Rare Variants and Loci for Age-Related Macular Degeneration ...... 142 Future Directions from Pathway Analyses and Statistical Driver Genes ...... 144 Considerations for Statistical Driver Genes and Missing Heritability...... 145 Conclusion ...... 147 APPENDIX ...... 148 Chapter 2 Appendix ...... 148 Chapter 3 Appendix ...... 152 Chapter 4 Appendix ...... 180 BIBLIOGRAPHY ...... 195

6

LIST OF TABLES

CHAPTER 3 Table 3.1 AMD associated variants identified with ROADTRIPS testing of Amish families...... 79 Table 3.2 Allele frequencies for the AMD associated variants identified with ROADTRIPS testing of Amish families in outbred populations (IAMDGC and gnomAD release 2.1)...... 80 Table 3.3 Significant linkage loci identified from model-based multipoint linkage analyses of Amish families with disease allele frequency of 0.10...... 81 CHAPTER 4 Table 4.1 Significantly associated pathways across multiple pathway databases for advanced AMD...... 102 Table 4.2 Eight statistical pathway driver genes from significant KEGG, Reactome, and GO pathways...... 104 Table 4.3 Sequence motifs with (TF) binding sites near statistical driver genes...... 106 CHAPTER 5 Table 5.1 Demographics of participants in the study data...... 122 Table 5.2 Characteristics of the marker data extracted from the IAMDGC exome chip...... 123 Table 5.3 Heritability estimates for advanced AMD (ADV), GA, and CNV based on our variant sets...... 124 Table 5.4 Chip heritability estimates calculated by GREML for advanced AMD (ADV), GA, and CNV using quantitative covariates...... 126 Table 5.5 Epistatic interactions from pairwise logistic regression-based epistasis testing between variants in or within 50 kb of the 2 novel SDGs (PPARA and PLCG2) and the 52 AMD-associated index variants from the 2016 IAMDGC GWAS...... 128 Table 5.6 Epistatic interactions between variants in or within 50 kb of PLCG2 and PPARA from pairwise logistic regression-based epistasis testing...... 129 Table 5.7 Statistical driver genes for advanced AMD identified with PARIS...... 136

7

LIST OF FIGURES

CHAPTER 1 Figure 1.1 Progressive vision loss experienced by patients with advanced AMD...... 20 Figure 1.2 The role of CFH in the complement pathway...... 24 CHAPTER 2 Figure 2.1 All-connecting path pedigree for the 58 Amish individuals with the risk allele for CFH P503A...... 48 Figure 2.2 Age at exam for carriers and non-carriers of the risk allele for CFH P503A...... 49 Figure 2.3 Relative expression of CFH transcripts in carriers and non- carriers...... 51 Figure 2.4 Visualization of protein models containing the amino acid substitution for CFH P503A in SCR8 domain of CFH...... 53 Figure 2.5 Representative blots measuring relative CFH protein expression in plasma...... 54 Figure 2.6 Quantification of relative CFH plasma expression measured by Western blots...... 55 Figure 2.7 Quantification of relative CFH plasma expression measured by ELISA...... 56 Figure 2.8 Weighted genetic risk scores for 35 of the 52 IAMDGC variants...... 58 Figure 2.9 Highlighted counties of the state of Pennsylvania...... 59 CHAPTER 3 Figure 3.1 All-connecting path pedigree of the 175 Amish individuals in this study...... 70 Figure 3.2 Manhattan plot of p-values obtained from association testing using ROADTRIPS...... 78 Figure 3.3 HLOD scores obtained from multipoint linkage analysis in MERLIN under dominant and recessive models on 8...... 82 Figure 3.4 HLOD scores obtained from multipoint linkage analysis in MERLIN under dominant and recessive models on ...... 83

8

Figure 3.5 Conditional linkage analysis on taking into account the Y402H and P503A carrier statuses...... 84 Figure 3.6 Ontology (GO) Enrichment Networks for Genes from 1- HLOD Support Intervals from (a) and (b) chromosome 18...... 86 CHAPTER 4 Figure 4.1 Comparison of significant genes from AMD-associated KEGG, Reactome, and GO pathways identified by PARIS...... 103 Figure 4.2 PPI network generated for the encoded by the eight statistical driver genes...... 105 Figure 4.3 Identification of PLCG2 as a candidate gene for advanced AMD.... 109 Figure 4.4 PPI network generated for PLCG2...... 110 Figure 4.5 LocusZoom Plot of P-Values for the 65 PLCG2 variants in the IAMDGC advanced AMD case-control analysis...... 111

9

ACKNOWLEDGEMENTS

“I've heard it said that people come into our lives for a reason bringing something we must learn, and we are led to those who help us most to grow if we let them and we help them in return.”

— Glinda the Good in “For Good” from WICKED: The Untold Story of the Witches of Oz

The work presented in this dissertation was molded by the inspiration, interactions, and support of many individuals. My research career began as a part of my undergraduate research experiences; therefore, I would like to thank my mentors at

Juniata College, Vincent Buonaccorsi, Regina Lamendella, and Daniel Dries, and Terry

Gaasterland at the University of California—San Diego for helping me realize my passion for research and setting a strong foundation for my graduate school career.

I also extend gratitude to the members of my thesis committee for their constructive insights and suggestions on this work: Jonathan Haines (advisor), Dana

Crawford (chair), Thomas LaFramboise, Mark Cameron, and Alexander Miron. Having members of my committee from diverse backgrounds and departments encouraged me to think more broadly and critically of my work. I am also thankful for the previous and current members of the Departments of Genetics and Genome Sciences and Population &

Quantitative Health Sciences for their support and positive influences on my graduate training. I especially would like to thank Anthony Wynshaw-Boris for suggesting I rotate in the Haines Lab. I also appreciate the constructive feedback and contributions of our research collaborators at the John P. Hussman Institute for Genomics at the

University of Miami and the Department of Ophthalmology at the University of

10

Pennsylvania as well as the members of the International Age-Related Macular

Degeneration Genomics Consortium.

Endless thanks to the members of the Haines Lab for their years of encouragement, feedback, and support. I would especially like to thank my thesis advisor,

Jonathan Haines, for being an incredible mentor and providing a balance of structured guidance and independence that greatly enriched my training. His encouraging nature also enabled me to pursue numerous opportunities in and out of the lab that shaped my personal and professional growth. Under his leadership, the Haines Lab epitomized team science and taught me the importance of collaboration. I want to thank all the previous and current members of the Haines Lab for their vital contributions to the research described herein and for positively shaping my graduate school experience, especially

Jessica Cooke Bailey, Kristy Miskimen, Penny Miron, Michelle Grunin, Yeunjoo Song,

Rob Igo, Sara Kennedy, Tyler Kinzy, Renee Laux, Denise Fuzzell, and Sarada Fuzzell. I am blessed to have had the opportunity to learn from and work with them.

Finally, and most importantly, I would like to thank my family for being my strongest supporters during my Ph.D. program and throughout my life. My parents always encouraged my siblings and me to pursue our dreams and instilled in us values and skills that have enriched our lives. I am also grateful to my siblings, Alexis and

Derek, for their steadfast love and support. As triplets, we have always been there for one another, and I appreciate their constant encouragement. One of the joys of pursuing my

Ph.D. in Cleveland was the chance to become closer to my Aunt Elinor and Uncle

Francis. They provided a loving support network for me in Cleveland that helped me throughout my graduate training. I am so grateful for their positive presence in my life.

11

LIST OF ABBREVIATIONS

ADV advanced age-related macular degeneration

AGDB Anabaptist Genealogy Database

AIM ancestry-informative marker

AMD age-related macular degeneration

AREDS2 Age-Related Eye Disease Study 2

C2 complement C2

C3 complement C3

CARMS Clinical Age-Related Maculopathy Staging

CFH complement factor H

CGRRF1 cell growth regulator with ring finger domain 1

Chr Chromosome

CI confidence interval cM Centimorgan

CNV choroidal neovascularization d.f. degrees of freedom

DLGAP1 DLG associated protein 1

EGFR epidermal growth factor

ELISA enzyme-linked immunosorbent assay

EM expectation maximization

EMT epithelial to mesenchymal transition

EUR European

FDA Food and Drug Administration

12

GA geographic atrophy

GCTA Genome-wide Complex Trait Analysis gnomAD Genome Aggregation Database

GO

GREML genomic-relatedness-based restricted maximum-likelihood

GRM genetic relationship matrix

GWAS genome-wide association study

GWIS genome-wide interaction study h2 narrow-sense heritability

H2 broad-sense heritability

HCK HCK proto-oncogene, Src family tyrosine kinase

HGNC HUGO Committee

HLOD heterogeneity LOD

IAMDGC International Age-Related Macular Degeneration Genomics Consortium

ITK IL2 inducible T cell kinase

Kb kilobasepairs

KEGG Kyoto Encyclopedia of Genes and Genomes

LCN9 lipocalin 9

LCP2 lymphocyte cytosolic protein 2

LD linkage disequilibrium

LIPC lipase C, hepatic type

LOAD late-onset Alzheimer’s disease

LYN LYN proto-oncogene, Src family tyrosine kinase

13

MAF minor allele frequency

MB megabasepairs

MBP myelin basic protein

MEGA Multi-Ethnic Genotyping Array

MEME Multiple EM for Motif Elucidation

MERLIN Multi-Point Engine for Rapid Likelihood Inference

MICA MHC class I polypeptide-related sequence A

MYP2 myopia-2

NHLBI National Heart, Lung, and Blood Institute

NOTCH4 notch receptor 4

OCT optical coherence tomography

OR odds ratio oxLDL oxidized low-density lipoprotein

PARIS Pathway Analysis by Randomization Incorporating Structure

PBS phosphate buffered saline

PCA principal component analysis

PIK3R1 phosphoinositide-3-kinase regulatory subunit 1

PLAID PLCG2-associated antibody deficiency and immune dysregulation

PLCG2 phospholipase C-gamma-2

PPARA peroxisome proliferator activated receptor alpha

PPI protein-protein interaction

QC quality control

14

QQ quantile-quantile

RA rheumatoid arthritis

RAD51B RAD51 paralog B

RCA regulation of complement activation

REML restricted maximum-likelihood

RIPA radioimmunoprecipitation assay

ROADTRIPS RObust Association-Detection Test for Related Individuals with Population Substructure

RPE retinal pigment epithelium

RTEL1 regulator of telomere elongation helicase 1

RTEL1-TNFRSF6B RTEL1-TNFRSF6B readthrough

SCR short consensus repeat

SDG statistical driver gene

SD-OCT spectral domain optical coherence tomography

SE standard error

STRING Search Tool for Recurring Instances of Neighbouring Genes

SYK spleen associated tyrosine kinase

TEC Tec protein tyrosine kinase

TF transcription factor

TOPMed Trans-Omics for Precision Medicine

UTR untranslated region

VANTAGE Vanderbilt Technologies for Advanced Genomics

VEGF vascular endothelial growth factor

WHO World Health Organization

15

From Variants to Pathways: Interrogating the Genetic Architecture of Age-Related

Macular Degeneration

Abstract

by

ANDREA ROSE WAKSMUNSKI

Vision loss is a highly feared medical condition because of its life-altering effects.

Age-related vision loss is a mounting public health concern due to the growth of the elderly population. Age-related macular degeneration (AMD) is the leading cause of visual impairment in adults over 60. Family and twin studies provided significant evidence for the influence of genetic factors on AMD risk. The largest genome-wide association study (GWAS) for AMD identified 52 genomic variations associated with advanced AMD (ADV), but these variants only account for about two-thirds of AMD heritability. Therefore, we hypothesize that additional genomic loci contribute to AMD.

Furthermore, GWAS alone do not directly implicate biological consequences for the associated variants.

In this work, we leveraged data from the Amish population and the International

AMD Genomics Consortium (IAMDGC) to interrogate the genetic architecture of AMD.

Studying the Amish population enabled us to characterize a rare AMD risk variant in complement factor H (CFH P503A) and to uncover novel genomic loci for AMD in the

Amish. We also built upon the known AMD loci by performing pathway analyses of the

IAMDGC GWAS data. Using multiple pathway databases in our analyses, we identified

16 biological pathways in which nominally associated AMD variants aggregated. We also computationally characterized genes that were consistently contributing to our significant pathway signals, including two novel AMD loci (PPARA and PLCG2). Variants from these statistical driver genes do not strongly contribute to ADV heritability. However, our epistasis analyses identified modest interactions between the 52 IAMDGC variants and variants in PPARA and PLCG2, which led us to hypothesize that pathway analyses of

GWAS data may be useful for identifying genetic variants that contribute to AMD in a non-additive manner.

This work demonstrates the utility of analyzing the genetics of a complex trait in an isolated population like the Amish and in a large dataset generated by an international consortium. We also highlight the importance of understanding biological contexts of genetic variation for AMD. We hope that our work will inspire future multidisciplinary studies to better understand the genetic architecture of AMD and to develop novel therapeutics for AMD patients.

17

CHAPTER 1

Introduction

Age-Related Macular Degeneration as a Public Health Concern

Humans experience the world through their senses, and of these senses, many individuals value sight as the most important because it affects every aspect of their lives and greatly shapes how they perceive their environment. Vision results from the transmission of light and other signals from the eye to the brain and enables us to see, read, and perform other daily tasks (1). Visual impairment occurs when this process is diminished or completely disrupted and can have severe costs on an individual’s quality of life (2). Individuals with visual impairment are more likely to experience physical disabilities and injuries in their lifetime (3, 4). Adult-onset conditions that result in visual impairment contribute to over $35 billion in financial costs, including medical costs and loss of productivity (5). Medical costs can include outpatient services such as doctor and hospital visits as well as costs associated with prescription drugs, vitamin supplements, and other medications (5).

According to the 2019 World Report on Vision published by the World Health

Organization (WHO), at least 2.2 billion individuals are blind or have some type of visual impairment (1). These cases of visual impairment are mostly attributable to lifestyle changes, the inability to access eye care, and the increasing size of the aging population

(1). In modern society, we are privileged with medical, technological, and societal advancements that have enabled us to have longer life expectancies. This is attributable to both the reduction of infant mortality as well as a reduction in mortality rates over age 80

18

(6). However, increasing age leads to the increasing prevalence of age-related conditions as morbidity shifts toward chronic, non-communicable conditions (6). According to the

WHO, “those who live long enough will experience at least one eye condition during their life” (1). Also, it has been estimated that adults over 65 comprise the fastest growing demographic in the United States (7). Taken together, this information suggests that the prevalence and challenges posed by age-related eye conditions will become more common and pronounced in the coming years (8).

Age-related macular degeneration (AMD) is an adult-onset eye condition that leads to the progressive loss of central vision. This disrupts the patients’ ability to read, drive, recognize faces, and other essential daily tasks, which can severely impact their quality of life (2). It is the third-leading cause of blindness in the world after cataracts and glaucoma because it contributes to 8.7 percent of the global cases of blindness (9-11).

AMD is the leading cause of blindness in individuals over 60 (9, 10). In 2010, it was determined that about 2 percent of adults in the United States over age 50 had AMD (12).

In 2004, the total medical costs for individuals with AMD was almost $600 million (5).

In addition, it is expected that nearly 300 million individuals in the world will have AMD in 2040 (11). Therefore, with the increasing size of the aging population, it will become a significant socioeconomic and public health burden.

Age-Related Macular Degeneration: Symptoms and Treatment Options

Patients with AMD experience the loss of their central vision (Figure 1.1) because of the deterioration of the cell layers in the macula, which is part of the retina in the posterior part of the eye (13). Although some individuals only have AMD symptoms in one eye, in most cases (80 percent), individuals have AMD in both eyes (14, 15). AMD is

19 clinically classified into two main subtypes: early and advanced AMD. Early AMD is characterized by the presence of drusen and pigmentary changes in the macula (13).

Drusen are a hallmark of the aging retina and are considered a risk factor for AMD (10,

16, 17). These insoluble aggregates of extracellular waste are composed of lipids, proteins, and other cellular debris that accumulate in the region between the retinal pigment epithelium (RPE) and Bruch’s membrane in the macula (10).

A B

Figure 1.1. Progressive vision loss experienced by patients with advanced AMD.

AMD is characterized by the loss of central vision over time as the diseases progresses to advanced stages. (A) Vision in an unaffected individual. (B) Vision of a patient with advanced AMD. Images were generated using the Impairment Simulator

(http://www.inclusivedesigntoolkit.com/simsoftware/simsoftware.html).

Advanced AMD is further sub-classified based on the presence of geographic atrophy (“dry AMD”, GA) or choroidal neovascularization (“wet AMD”, CNV) in the region (10, 13). GA refers to the loss of photoreceptors and retinal pigment epithelial

(RPE) cells in the macula following the accumulation of drusen and the expansion of the

20 region between the RPE and Bruch’s membrane, which disrupt the flow of nutrients between the RPE and the photoreceptors (18). This subtype affects 80 percent of AMD patients (19). CNV accounts for about 10 percent of all AMD cases and results from the irregular formation of leaky blood vessels that bleed into the cell layers outside of the choroid (9, 10, 15). The nomenclature for this subtype has recently been updated to

“macular neovascularization” to reflect the observation that vascularization can occur in the outer retina in this advanced stage of disease (15). Although early stages of AMD and

GA are often asymptomatic, CNV is clinically characterized by the rapid loss of vision experienced by the patient (9, 10, 19).

Individuals at risk for developing AMD are encouraged to monitor slight or abrupt changes in their vision using the Amsler grid (20). The first method for assessing the health of the retina and diagnosing AMD was color fundus photography (13).

Additional more invasive techniques such as fluorescein angiography and indocyanine green angiography were developed to assess pigmentation and neovascularization in the macula (21). Fundus autofluorescence was developed as a noninvasive diagnostic tool to especially monitor the presence of geographic atrophy (19). More recently, the use of optical coherence tomography (OCT) has become common for imaging the retina (19). In contrast to previous diagnostic tools for AMD, OCT is a non-invasive, high-resolution technique that provides a cross-sectional view of the cell layers in the retina (21).

While there are numerous methods to monitor the presence and progression of

AMD, there are minimal prophylactic or treatment options available for AMD patients

(19). The Age-Related Eye Disease Study 2 (AREDS2) suggested that the consumption of antioxidant vitamins and mineral supplements may help slow the progression to

21 advanced AMD subtypes (10, 19); however, they are not helpful if a patient has already been diagnosed with advanced AMD (10). The AREDS2 formula includes antioxidants, zinc, lutein, zeaxanthin, and omega-3 fatty acids (22). For CNV, patients can be given anti-vascular endothelial growth factor antibodies (anti-VEGF) treatments to ablate the formation of the leaky blood vessels in the macula (23). Since the first anti-VEGF treatment was approved by the United States Food and Drug Administration (FDA) in

2004 and subsequent landmark clinical trials in 2006 showed that another similar treatment could be effective, anti-VEGF treatments have reduced the prevalence of blindness due to AMD by 50 percent (21). The first anti-VEGF treatment to be approved for AMD was pegaptanib (Eyetech Pharmaceuticals/Pfizer) followed by ranibizumab

(Genentech/Novartis), bevacizumab (Genentech), aflibercept (Regeneron/Bayer), and brolucizumab (Novartis) (23, 24). Monthly intraocular injections of these treatments stabilize and, in some cases, improve vision in patients with CNV (19).

Age-Related Macular Degeneration: Known Pathology

AMD culminates following deleterious changes to the macula. The initial stages of AMD are largely shaped by the buildup of drusen deposits in the macula (23). While about 40 percent of drusen composition includes neutral lipids like esterified cholesterol, drusen also include collagen, glycoproteins, complement proteins, apolipoproteins, and metal ions like zinc and iron (25, 26). Drusen disrupt RPE function and impede the transport of cellular waste from the photoreceptors through the RPE and across Bruch’s membrane (23). These physiological changes can lead to RPE and photoreceptor cell death and geographic atrophy (26). Therefore, pathways and genes involved in lipid

22 metabolism and transport, extracellular matrix organization, and apoptosis are believed to contribute to AMD pathology.

Additional staples of AMD pathophysiology include inflammation and angiogenesis. Equilibrium between pro-inflammatory and anti-inflammatory factors yields healthy homeostasis in the retina; however, aging diminishes the retina’s capacity to sustain this balance, which can lead to systemic and local para-inflammation (6).

Distress of the RPE also promotes an inflammatory response in the retina that may initiate the production of angiogenic factors such as VEGFA and TGFBR1 (23, 27).

Angiogenesis is a central biological process in the disease etiology of the CNV and is the target of anti-VEGF therapies (23). It has also been suggested that angiogenesis occurs in response to atrophy as a means to repair the damaged macular tissue analogous to wound healing (28).

Members of the complement pathway, especially complement factor H (CFH), have been repeatedly implicated in AMD etiology (29). CFH is a key inhibitor of the alternative pathway of complement activation in the innate immune system (30). In the complement pathway, the C3b protein is deposited on surfaces of cells marked for complement attack (31). As depicted in Figure 1.2, CFH blocks the activity of this membrane-bound protein on RPE cells by recruiting another complement factor (CFI) that cleaves C3b into its inactive form or by dissociating CFB from C3b (31).

Perturbations in this protein are believed to obstruct the regulation of the complement pathway and promote immune responses in the retina that contribute to the pathology of

AMD (30).

23

Figure 1.2. The role of CFH in the complement pathway. CFH inhibits C3b activity by engaging CFI, which cleaves C3b into its inactive form (iC3b), or by disrupting the interaction between CFB and C3b. This mechanism protects RPE cells from complement attack. Adapted from (31).

Age-Related Macular Degeneration: Risk Factors

AMD is a multifactorial condition with disease susceptibility and development driven by both environmental and genetic factors (10). The environmental factors, which contribute to about 20 to 40 percent of AMD risk, include age, cigarette smoking, hypertension, and consuming high-fat diet (21, 26, 32). Because it is an age-related condition, the primary non-modifiable risk factor for AMD is age (33). The prevalence of

AMD drastically increases in individuals over 75, especially those of European descent

(11). Smoking is a significant modifiable risk factor for AMD because it increases an individual’s risk for developing AMD by about two-fold and smoking cessation is associated with reduced AMD risk (34-36). The consumption of a diet high in saturated,

24 monounsaturated, polyunsaturated, and vegetable fats and cholesterol confers risk for

AMD; while consumption of foods high in omega-3 fatty acids reduces risk of AMD progression (22, 37-43). Epidemiological studies have also shown that hypertension increases risk for developing AMD (44-48).

The role of genetics in AMD risk was recognized following the results of family aggregation studies (49-52) and twin studies (36, 53-55). Individuals with a first-degree relative with AMD were about twice as likely to develop AMD compared individuals with no affected first-degree relatives (50-52). There was also a higher concordance of

AMD between monozygotic twins compared to dizygotic twins (36, 53-55). Twin studies estimated that 46-71 percent of advanced AMD variance is attributable to genetic factors

(36). In contrast, early AMD heritability is estimated to be around 35-55 percent based on twin studies (54). These studies suggested that AMD clustered in families and that shared environmental and/or genetic factors contribute to its development. To date, 52 independent genomic variants from 34 distinct loci have been identified for AMD and explain more than half of AMD heritability (56).

Genetic Epidemiology of Age-Related Macular Degeneration: Methods and

Populations

Following the conclusion that a trait is heritable, the aim of genetics research becomes isolating its genetic determinants. Trait heritability describes the proportion of phenotypic variance explained by the variation in genetic factors in a population (57).

Although total genetic variance can be attributable to dominance effects, additive effects, and epistatic effects, narrow-sense heritability (h2) is only shaped by genetic variation with additive effects on the phenotypic variation (57). Genetic variants that contribute to

25 the heritability of a trait can greatly vary in frequency as a consequence of their effect on the trait or due to allelic heterogeneity (58). For instance, high-effect variants on a disease trait may be rare due to selection as evidenced by the contribution of rare, highly penetrant variants to Mendelian disorders (59, 60). This is not necessarily the case with late-onset conditions like AMD because they manifest several years after reproductive age (58). However, rare variants may contribute to common disease etiology if mutation rates are high enough to keep them in the population and they have deleterious effects on the trait (58).

Genetic factors for AMD were first investigated with linkage analyses in AMD- affected families. These analyses were powerful for isolating broad regions of the genome that co-segregated with the trait locus in families (61). Although these studies identified several genomic loci linked to AMD status such as 1q31 and

10q26 due to their strong effect on the phenotype, they were mostly unsuccessful in isolating individual genetic variants as risk factors for AMD (62-69). This was attributable to limitations of the technology and coverage of the genome at the time as well as the observation that linkage analyses were under-powered to identify genetic variants of low to modest effect (60). By contrast, genome-wide association studies

(GWAS) capitalized on the common disease-common variant hypothesis, which suggests that the genetic etiologies of common diseases are shaped by multiple common variants

(70, 71). Consequently, GWAS were well-powered to identify common variants with modest effects for a trait but were often underpowered to identify low-frequency and rare disease variants, which could have a range of effects on disease risk (60).

26

GWAS compare allele frequencies of hundreds of thousands to millions of genetic variants between cases and controls (72). Therefore, there is a high threshold for statistical significance to account for multiple testing in GWAS. The Bonferroni- corrected significance threshold for GWAS is less than 5 × 10-8 (73, 74). To identify statistical associations that reach this conservative threshold, GWAS require large sample sizes to detect variants of modest effect (75). The first GWAS in human genetics were performed for AMD and published in 2005 (76-78). In these studies, investigators identified a significant association between a variant in a gene called complement factor

H (CFH Y402H, rs1061170) and AMD (76-78). In addition to being significantly overrepresented in cases compared to controls, this risk variant (Y402H, rs1061170) predominantly occurred within high risk haplotypes for AMD and was significantly over- transmitted in family-based data (76-78). The second major risk variant (A69S, rs10490924) for AMD was identified in LOC387715, which was later referred to as the

ARMS2/HTRA1 locus (79). Subsequent association studies identified additional risk loci for AMD in genes involved in the complement pathway (C2, C3, C9, CFB, and CFI), angiogenesis (TGFBR1 and VEGFA), cell survival (RAD51B and TNFRSF10A), lipid metabolism (APOE, CETP, and LIPC), and extracellular matrix organization (ADAMTS9,

B3GALTL, COL8A1, COL10A1, and TIMP3) (26, 80, 81). To identify novel risk loci for

AMD, investigators from around the world pooled genetic data from 26 studies to create the International AMD Genomics Consortium (IAMDGC) dataset of about 50,000 individuals and over 11 million variants (56). Leveraging this dataset for AMD gene discovery, the largest GWAS for AMD (16,144 cases and 17,382 controls) was published in 2016 and identified 52 common and rare genomic variants for AMD including one

27 variant near MMP9 that was specifically associated with CNV (56). The known genetic variants explain more than half of AMD heritability; therefore, a portion of the genetic underpinnings of AMD is not attributable to known genetic factors (56).

Following the identification of additive contributions to AMD heritability with

GWAS, foci shifted to pathway-based approaches and non-additive interactions among genomic loci and environmental factors. Gene-environment interaction analyses provided evidence for associations between genetic variants and environmental factors for AMD risk such as smoking (82-84). Risk attributable to genes defined in eight AMD-related pathways in the Gene Ontology (GO) database was assessed using Genome-wide

Complex Trait Analysis (GCTA) on genetic data from 1,145 AMD cases and 668 controls (85). Genes from complement and other immune-related pathways strongly contributed to AMD risk, but the other pathways like angiogenesis and apoptosis did not

(85). Pathway analyses of nominally significant AMD genes from gene-based tests of individuals with extreme AMD phenotypes found enrichment for lipid metabolic pathways (86). As a part of their 2016 GWAS, the IAMDGC performed functional enrichment analysis with three different pathway databases and discovered several extracellular matrix, immune, and lipid metabolic pathways were enriched in genes from the 34 susceptibility loci (56). A large genome-wide interaction study (GWIS) of the

IAMDGC data identified two additional AMD loci near the CLUL1 and RLBP1 genes by accounting for gene x age interactions (87).

In contrast to case-control analyses of unrelated individuals in a population, the use of families in GWAS reduces the need for large sample sizes to identify disease associated variants (60). If a causal variant persists with high frequency in affected

28 families relative to the general population, it increases the statistical power of the analysis to detect it (88). Additionally, rare variants that may be difficult to find in population- based case-control studies could be enriched in densely affected families (60). These benefits are amplified in population isolates, like the Amish, that are composed of potentially large, affected families (89, 90).

The Amish as an Isolated, Founder Population

The Amish are a genetically isolated and socially segregated population located in settlements across North and South America. Individuals in this community separate themselves from modern society and abstain from the use of modern technology including automobiles, electricity, and telephones (91, 92). This religious sect diverged from a broader Swiss Anabaptist movement in Europe in the 1690s (93, 94). Anabaptists, such as the Amish and Mennonites, practice adult baptisms because they believe that the decision to join the faith should be made intentionally by adults, rather than as infants as done in Catholicism (93). In 1693, a Mennonite Bishop named Jakob Ammann initiated the severance of the Amish from the Mennonites because he insisted on indoctrinating shunning (called Meidung) of faith dissenters (91).

Since their founding during the Protestant Reformation, the Anabaptists were persecuted as heretics in Europe (91). In Pennsylvania, Anabaptists like the Amish were promised asylum by William Penn; consequently, about 200 Amish individuals immigrated to Lancaster County, Pennsylvania during the 1700s (91). Another migration of Amish immigrants called Alsatian immigrants from Europe occurred in the 1800s and resulted in Amish settlements in Somerset County, Pennsylvania as well as in Ohio and

Indiana (93). Amish individuals from Somerset County, Pennsylvania also moved to

29

Ohio and Indiana in the 1800s to gain more farmland (95). Today, the three largest

Amish settlements in the United States are in Lancaster County, Pennsylvania; Holmes

County, Ohio; and LaGrange and Elkhart Counties, Indiana (95). Because of their intra- faith marriages but geographic separation, marriages between sub-populations of the

Amish in North America were not common and resulted in well-defined sub-isolates, such as the Midwest and Lancaster County Amish populations, with distinctive family surnames and disease incidence (93). Now, due to further movement and resettlement,

Amish settlements can be found in 32 states, Canada, and 2 countries in South America

(92, 96). Although some people leave the faith group, this population continues to expand today. As a consequence of the large families and high baptism or retention rates, the population doubling time for the Amish is about 21 years (97). By 2050, it is predicted that there will be over 900 settlements with over 900,000 Amish people in total (97).

The Amish constitute an isolated population because they originated from a small number of “founders” who separated from the parent population in a “founding event” and maintained their closed population for subsequent generations (89). The exodus of

Amish individuals from Europe to North America several hundred years ago created a genetic bottleneck and established an isolated population that has been sustained across generations (89). A few hundred Amish individuals served as founders for the entire

Amish population in North and South America; therefore, any two present-day Amish share many chromosomal regions identical-by-descent from a restricted number of ancestors (91, 98). Because of the founder effect and the acceptance of second cousin marriages in the population, Amish communities exhibit levels of consanguinity indicative of third or fourth cousins regardless of their true familial relationships (93).

30

The cultural isolation experienced by the Amish reinforce their practice of endogamy and perpetuate limited gene flow into and out of the population (89). Specifically, while some individuals leave the Amish community, new memberships are not actively encouraged, and marriages to non-members are prohibited (93). Additionally, the de novo mutation rate is low in the Amish based on population-level data in the National Heart, Lung, and

Blood Institute (NHLBI) Trans-Omics for Precision Medicine (TOPMed) program (99).

Therefore, novel genetic diversity is not introduced to the group, and the resulting population is genetically homogeneous (100).

The Amish in Genetics Research

Endogamy, bottlenecks, and selection in an isolated population, like the Amish, contribute to a decrease in genetic variability (100). This subsequently results in an increased prevalence of rare alleles in the population and an increased incidence of recessive genetic disorders (89). This also includes the enrichment of alleles that are extremely rare in the European “parent population” (89). For instance, the prevalence of particular autosomal recessive disorders is higher in the Amish compared to the general population of European descent (89). Some of these conditions are so rare that they only exist in specific sub-isolates of the Amish. For instance, Troyer syndrome and Ellis-van

Creveld syndrome are distinctive to the Amish and not found in the general population

(91). Additionally, Amish lethal microcephaly has only been diagnosed in the Amish from Lancaster County, Pennsylvania (101). Protective alleles can also aggregate in the

Amish population. For instance, a null variant (R19X) in the apoC-III gene (APOC3) that is associated with favorable lipid levels is present in the Lancaster County Amish population but rare in the general population (102, 103). Therefore, the potential

31 enrichment of rare alleles and their extensive genealogical records dating back to the founding immigrants from Europe in the 1700 and 1800s make them a valued population for genetics research (93).

The Amish became a central focus in genetics research following the pioneering work of Dr. Victor McKusick in the 1960s (93). While initial medical genetics studies in the Amish were focused on rare, autosomal recessive disorders (104), it became recognized that studying genetics of complex traits, such as longevity (105-108), aging

(109-112), Parkinson’s disease (113-117), Alzheimer’s disease (118-122), dementia

(123-126), and refractive error (127-133), in the Amish could be highly informative. In addition to their reduced genetic variability, the Amish adhere to a conservative lifestyle that is uniform across most Amish settlements (91, 93, 134). As a part of their conservative lifestyle, tobacco use is highly discouraged; therefore, cigarette smoking, which is the main modifiable risk factor for AMD and other complex diseases, is significantly less common in the Amish than in non-Amish communities (135).

Therefore, the Amish population is more genetically and environmentally homogenous relative to the general population of European descent in the United States.

AMD has been repeatedly examined in the Amish to gain information about the clinical manifestation of AMD as well as identify genetic factors associated with AMD in the Amish. Genetic risk scores of the then-known 19 AMD risk variants demonstrated that the Amish have a lower genetic burden for known AMD variants compared to the non-Amish population of European descent (136). This observation combined with their lower smoking rates might lead one to suspect that they have lower prevalence of AMD compared to the general population. However, the prevalence of AMD in the Amish is

32 comparable to that observed in the general population of European descent (unpublished).

Whole exome sequencing and subsequent association analysis led to the discovery of a rare AMD risk variant in 19 individuals from Ohio and Indiana Amish communities

(136). This risk variant (CFH P503A) was significantly associated with AMD and not found in a non-Amish cohort of 791 elderly controls and 1,456 AMD cases (136).

Additionally, Amish individuals with extreme AMD phenotypes and genetic risk scores for the 19 then-known AMD variants were analyzed to identify potentially causal, rare variants for AMD but failed to uncover any novel significant loci (86).

Evaluating genetic factors contributing to measurable endophenotypes of AMD pathology such as choroidal thickness and drusen volume was hypothesized to be a useful way to uncover novel biomarkers for AMD and to better understand the genetic etiology of AMD (137). The Amish Eye Study was started to uncover novel genetic variants and biomarkers for AMD in the Amish and estimate the heritability of these potential AMD biomarkers (95). Participants for this study were ascertained from Amish communities in

Pennsylvania, Ohio, and Indiana if they were at least 50 years old and reported having at least one relative with AMD (95). Once enrolled, they underwent an eye examination, interview, and blood draw for DNA extraction (95). Based on spectral domain OCT (SD-

OCT), the following AMD endophenotypes were characterized in the retinas of Amish study participants with and without AMD: choroidal thickness, subretinal drusenoid deposits, drusen volume, and geographic atrophy area (95). Using data from the Amish

Eye Study, investigators found that deposits of reticular pseudodrusen appeared to change the choroidal vasculature, thickness, and intensity (138). Choroidal thickness was shown to be heritable in the Amish and modestly correlate with AMD status (137). Analyses

33 have also been performed in the Amish to assess the association between known AMD variants and drusen measurements taken from OCT (139). Additionally, retinal sensitivity was characterized using data from the Amish Eye Study to elucidate the effects of AMD on rod and cone function (140). The extensive number of studies in the Amish population demonstrate their invaluable contributions to decades of genetics research, including research on the genetic etiology of AMD.

Summary and Unanswered Questions

Vison loss is an impactful health burden and feared medical condition because of its pervasive effects on everyday life. AMD is a leading cause of vision loss in the world and will increase in prevalence with the surge of the aging population. Through family and twin studies, AMD was established to have a strong genetic component with 46-71 percent of the phenotypic variation being explained by genetic variation. In the last 15 years, numerous genetic loci have been identified for AMD risk, but the estimated heritability of AMD is not fully explained by known genetic variants. Additionally, the biological implications of the known AMD risk variants are mostly unexplored.

Therefore, the genetic etiology of AMD and their biological effects are significant problems that need to be addressed.

In this work, we aimed to identify new genetic risk loci for AMD and to understand how these genetic changes might lead to AMD using methods from the fields of genetics, epidemiology, biostatistics, and bioinformatics. These approaches were applied to a large AMD case-control dataset from the International AMD Genomics

Consortium (IAMDGC) and to genetic data generated from Amish populations in Ohio,

Indiana, and Pennsylvania. In Chapter 2, we interrogated the functional consequences of

34 a known AMD risk variant in the Amish. In Chapter 3, we utilized well-established genetic epidemiological methods to identify novel genetic variants and loci associated with AMD in the Ohio and Indiana Amish populations. In Chapter 4, we broadened our perspective beyond individual GWAS results to interrogate the pathways that may aggregate genetic variants for AMD and the statistical driver genes (SDGs) that drive the associations of these pathways. In Chapter 5, we calculated heritability estimates for the genetic variants within the SDGs identified in Chapter 4 and searched for evidence of epistatic interactions among them and known variants for AMD. In Chapter 6, we synthesize the information garnered from our Amish and IAMDGC-based studies described herein and consider how our results, which span from individual variants to complex biological pathways, contribute to the genetic architecture of AMD.

35

CHAPTER 2

Consequences of a rare complement factor H variant

for age-related macular degeneration in the Amish

Andrea R. Waksmunski1,2,3, Kristy Miskimen3, Yeunjoo E. Song3, Michelle Grunin2,3,

Renee Laux3, Denise Fuzzell3, Sarada Fuzzell3, Larry D. Adams4, Laura Caywood4,

Michael Prough4, Alexander Miron5, Dwight Stambolian6, William K. Scott4, Margaret

A. Pericak-Vance4, and Jonathan L. Haines1,2,3

1Department of Genetics and Genome Sciences, Case Western Reserve University,

Cleveland, Ohio, U.S.A.; 2Cleveland Institute for Computational Biology, Case Western

Reserve University, Cleveland, Ohio, U.S.A.; 3Department of Population and

Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio, U.S.A.

4John P. Hussman Institute for Human Genomics, University of Miami Miller School of

Medicine, Miami, Florida, U.S.A; 5PlexSeq Diagnostics, Cleveland, Ohio, USA.;

6Department of Ophthalmology, University of Pennsylvania, Philadelphia, Pennsylvania,

U.S.A.

A modified version of this chapter will be submitted for publication.

36

Abstract

Purpose: Genetic variants in the complement factor H gene (CFH) have been consistently implicated in age-related macular degeneration (AMD) risk. However, their functional effects are not fully characterized. A rare, AMD-associated variant in CFH

(P503A, rs570523689) was identified in 19 Amish individuals, but its functional consequences have not yet been investigated.

Methods: We performed genotyping for the P503A risk variant in 1,326 Amish individuals to identify additional carriers. We also examined differences for age at AMD diagnosis (as a proxy for age of onset) and genetic risk scores based on 35 of the 52 known AMD-associated variants between carriers and non-carriers. We quantified RNA and protein expression for CFH in blood samples from Amish carriers and non-carriers.

Potential changes to the protein structure were interrogated computationally with Phyre2 and Chimera software programs.

Results: We identified an additional 39 carriers who were relatives of the original 19 carriers. All 58 Amish individuals with the risk allele for CFH P503A are from Ohio or

Indiana and are related through 12 common ancestors. While on average the carriers were younger when they were diagnosed with AMD, there were not significant age differences between AMD-affected carriers and non-carriers. Genetic risk scores were also not significantly different among carriers and non-carriers. CFH transcript and protein levels in blood samples from Amish carriers and non-carriers were also not significantly different.

Conclusions: We have identified 58 carriers of the risk allele for CFH P503A in the

Ohio and Indiana Amish. We did not observe significant differences in AMD age of

37 onset, genetic risk scores, or expression levels of CFH transcripts or CFH protein in blood samples from carriers and non-carriers. Our in silico protein modeling showed slight changes in CFH protein conformation that were predicted to alter interactions between the 503 residue and other neighboring residues. This suggests that CFH P503A may affect CFH binding or function rather than expression.

Introduction

Age-related macular degeneration (AMD) is among the leading causes of irreversible vision loss in the world (19). Individuals with AMD experience a loss of central vision as the disease progresses with drusen accumulation, photoreceptor death, and neovascularization in the macula (13). The results of family and twin studies demonstrated that there is a genetic component to AMD risk (36, 41, 50-55). Genome- wide association studies (GWAS) performed by the International AMD Genomics

Consortium (IAMDGC) identified 52 common and rare genomic variants associated with

AMD in the general population of European descent (56). Additional risk variants, including rare variants, have been associated with AMD (136, 141, 142). While these studies uncovered genetic factors that may play a role in AMD risk and development, they do not necessarily interrogate the biological consequences of these genetic changes.

Furthermore, the results from these studies have highlighted the need for functional studies to characterize the biological effects of these variants.

Variants in the complement factor H (CFH) gene were among the first AMD- associated variants identified through GWAS (77, 78, 143) and have been repeatedly associated with AMD risk in the general population (56). Therefore, the CFH locus is considered a usual suspect for AMD-associated variants. Perturbations in the CFH

38 protein are suspected to alter both systemic and local complement regulation and promote immune responses in the retina that contribute to AMD (10, 144, 145). Despite this, the biological implications of these variants are not fully understood, and the development of effective complement-based therapies have remained mostly unsuccessful (29, 146).

We previously identified a rare, missense variant in CFH that was significantly associated with AMD in the Amish. Studying this population isolate offers a distinct opportunity to identify and study novel AMD variants because the Amish are more environmentally and genetically homogeneous than the general population of European descent (93). In our previous study, we observed a nuclear family in Ohio comprised of several AMD-affected family members that lacked the risk allele for the common CFH

Y402H risk variant (136). Therefore, we performed whole exome sequencing of these individuals and subsequently identified CFH P503A (rs570523689) as a rare, risk variant for AMD (p = 9.27 × 10-13) (136). The risk allele for this variant was identified in 19

Amish individuals (12 affected, 5 unaffected, and 2 unknown AMD status) and was computationally predicted to be damaging to the CFH protein structure and function

(136). However, the functional effects of this variant were not examined.

The purpose of this study was to elucidate functional consequences of CFH

P503A in the Amish, which might inspire testable hypotheses for AMD etiology and development of therapeutics that could improve the lives of AMD patients. We hypothesize that rare variants, in particular, may reveal insights into disease etiology and may be an effective target for therapeutic intervention because they are often expected to be damaging to the protein structure and perturb cellular processes.

39

Methods

Study Demographics

The participants for this study were recruited from the Amish populations in

Ohio, Indiana, and Pennsylvania. Individuals were at least 50 years old and reported having at least one close relative with a diagnosis of AMD. Informed consent was acquired from all study subjects in accordance with the tenets of the Declaration of

Helsinki. Clinical data and biological materials (samples) from study participants were collected under an IRB-approved protocol. AMD affection status was based on self- reported AMD diagnosis and clinically confirmed diagnoses from eye exams performed at each respective clinical center (Ohio, Indiana, and Pennsylvania) where ocular coherence tomography (OCT) images were obtained for both eyes. Each eye was graded based on a modified Clinical Age-Related Maculopathy Staging (CARMS) system (16,

137). Individuals with a grade of 3 or higher in at least one eye were considered AMD- affected. Individuals with grades of 2 or lower in both eyes were considered unaffected.

Blood Collection

Peripheral blood was collected from study participants via intravenous methods under an IRB-approved protocol. We collected whole blood for DNA extraction and white blood cells for protein lysates using Vacutainer EDTA Tubes (BD). Plasma was collected by spinning whole blood from EDTA tubes at 2500 x g for 10 minutes. Whole blood for RNA extraction was collected in PAXgene Blood RNA Tubes (Qiagen).

40

Nucleic Acid Extraction

Genomic DNA was extracted from 1 mL whole blood aliquots using

QIAsymphony DSP DNA Kit (Qiagen) on a QIAsymphony SP automated system

(Qiagen). DNA concentration and 260/280 ratio were determined using a NanoDrop

Spectrophotometer (Thermo Fisher). DNA integrity was confirmed using an e-Gel

Precast Agarose Electrophoresis System (Thermo Fisher) using a 1% agarose gel. RNA was extracted from PAXgene Blood RNA Tubes using the QIAsymphony PAXgene

Blood RNA Kit (Qiagen) on a QIAsymphony automated system. RNA concentration and

260/280 ratio were determined using a NanoDrop Spectrophotometer.

P503A Genotyping

We performed custom TaqMan genotyping assays (Thermo Fisher) following the manufacturer’s instructions using 12.5 ng of genomic DNA per reaction and TaqMan

Genotyping Master Mix (Thermo Fisher). Our assays utilized custom probes for the CFH

P503A variant that assessed the presence of the C allele (non-risk, P503) compared to the

G allele (risk, A503) at position 1507 in the CFH transcript. The forward primer sequence of the probe is AATTACATGTGGGAAAGATGGATGGT. The reverse primer sequence of the probe is CTTTTGTGTATCATCTGGATAATCAATACAAACAT. Each

96-well plate included at least one known carrier of the risk allele for P503A as controls.

Blanks were also included on each plate. The assays were run on a QuantStudio 7 PCR instrument and performed genotype calling with the QuantStudio analysis software.

Assays were run on a total of 526 Ohio Amish, 303 Indiana Amish, and 497

Pennsylvania Amish.

41

Amish Pedigrees

Using data from the Anabaptist Genealogy Database (AGDB) (147), we constructed both all-connecting and all-shortest path pedigrees for the 58 Amish carriers of the risk allele. An all-connecting path pedigree depicts all possible familial relationships among persons of interest in the pedigree. An all-shortest path shows the closest relationships among these individuals. The all-connecting path pedigree was visualized using Pedigraph (148).

Age at Exam

To determine if CFH P503A carrier status has an effect on age at onset for AMD, we examined the ages of the carriers and non-carriers when they received their first AMD diagnosis. This measurement served as a proxy for age of onset of AMD. We compared the ages using a Kaplan-Meir survival curve analysis in R between the carriers (n = 31) and non-carriers (n = 795). We also used a Wilcoxon rank-sum test to evaluate significant differences among carriers and non-carriers with and without AMD.

CFH RNA Quantification and Analysis

We examined mRNA expression of CFH in whole blood samples from 7 AMD- affected carriers, 14 unaffected carriers, 5 AMD-affected non-carriers, and 14 unaffected non-carriers. This included assays targeting the three protein-coding transcripts of CFH

(CFH-201, 202, and 206) and the large retained intron of CFH (CFH-203). First-strand cDNA synthesis was carried out using the SuperScript VILO cDNA Synthesis Kit

(Invitrogen) on 500 ng of total RNA. For the real-time PCR, commercially available

TaqMan Assays for CFH (Hs00962360_m1, Hs00962373_m1, and

42

Hs00962376_m1) were used to quantify CFH-202 and the other protein-coding transcripts of CFH (CFH-201 and CFH-206). A Custom Plus TaqMan RNA Assay was designed to target CFH-203 (ARFVKWR). Commercially available assays for ACTB

(Hs99999903_m1) and TBP (Hs00427621_m1) were also used for real-time PCR. Assays were prepared using the TaqMan Fast Advanced Master Mix (Thermo Fisher) and run on a QuantStudio 7 PCR instrument. Expression levels of each mRNA were determined using the ΔΔCt method and normalized to ACTB or TBP housekeeping genes.

Expression of CFH in the Amish blood samples was quantified relative to the expression measured for ARPE-19 cells in the same assays. Pancreatic and liver cells were included in our assays as controls. We evaluated differences among relative CFH transcript expression levels from carriers and non-carriers with and without AMD using Wilcoxon rank-sum tests between group pairs and using Kruskal-Wallis tests among all groups in each assay. Four outliers were removed from our analyses; therefore, our statistical tests were based on relative expression data from 6 AMD-affected carriers, 12 unaffected carriers, 5 AMD-affected non-carriers, and 13 unaffected non-carriers.

Protein Modeling

To understand the effects of CFH P503A on CFH protein structure, we modeled

P503 and A503 versions of the amino acid sequence with the Phyre2 software program

(149). Specifically, we evaluated the amino acid sequence for short consensus repeat

(SCR) domain 8 (SCR8) of CFH with and without the amino acid substitution at position

58 in the sequence of 62 amino acids. We used the Chimera software program (150) to examine if the amino acid substitution changed the predicted number of contacts between residue 503 and the neighboring residues. Contacts included all types of direct

43 interactions within the protein structure, including polar and nonpolar interactions as well as favorable and unfavorable interactions.

Western Blots and Quantitative Analysis

Plasma samples from study participants were diluted 1:10 in phosphate buffered saline (PBS). We added 1 µl of diluted plasma to radioimmunoprecipitation assay (RIPA) buffer (ThermoFisher). Laemmli Sample Buffer (BioRad) containing β-mercaptoethanol was added to a final concentration of 1X. Samples were denatured at 100°C for 10 minutes and run on Novex WedgeWell 4-20% Tris-Glycine gels (Invitrogen). We ran 5

µg of human liver whole tissue lysate (Novus Biologicals) on each gel as a control for

CFH expression. Separated proteins were transferred to PVDF membranes (Thermo

Fisher) and probed for CFH (Abcam ab8842 sheep polyclonal). A rabbit anti-sheep secondary antibody conjugated to HRP was used for detection (Abcam ab6747). Signal was detected using ECL Solution (Advansta), and digital images were captured on

Odyssey Imaging System (LI-COR). Complete transfer and equal protein loading were confirmed using Novex Reversible Membrane Protein Stain Kit (Thermo Fisher).

Quantitative analyses were performed using ImageJ (151) and the following protocol (152). Briefly, blot photos were transformed into 8-bit grayscale photos, and all measurements were taken in “Grey Mean Value” values. An area to examine was set based on the largest possible band size, and the identical examination/selection area was used for every band per blot and every background per blot. Background measurement was taken from the blank area on the blot above each band. Measurements were inverted relative to their pixel density (i.e. 255 minus the measurement taken). Matched background was subtracted from the band selection, and the ratio of the band selection to

44 the liver control was calculated for every band for each blot separately. Analysis was performed using Microsoft Excel, and expression values were compared between groups using two-sided t-tests assuming unequal variances.

ELISA

We developed our enzyme-linked immunosorbent assay (ELISA) protocol based on previously described methods (153). We tested the accuracy of our assay on normal blood donor plasma from the Hematopoietic Biorepository and Cellular Therapy Shared

Resource at Case Western Reserve University in triplicate along with purified CFH and purified liver lysate controls. Our final ELISA protocol involved liver lysate as a standard in dilutions ranging from 1:1 to 1:4096 in serial dilutions for accuracy. Analyses were completed in Microsoft Excel according to standard methods (154). Each incubation was performed on a rocker for maximum efficiency.

We diluted plasma from Amish and non-Amish study participants at 1:4096 in 50

µl per well in triplicate. Samples were double-blinded from all researchers involved in the analysis and were randomly allocated to each plate regardless of P503A genotype or whether they were AMD-affected or unaffected. All antibodies utilized in this protocol came from Abcam. We first diluted mouse monoclonal anti-CFH (85 ng/ml, ab118820) at

1:500 in 50 µl per well. Plates were incubated overnight for maximum capture at 4 degrees. We washed the plates 4 times with PBS and 0.05% TWEEN. We added 100 µl of blocking solution (1% BSA in PBS) and incubated the plates for 2 hours at room temperature. Doubling dilutions of the standard (liver lysate) and plasma samples were incubated at 50 µl per well in triplicate overnight at 4 degrees. We added 50 µl of the sheep polyclonal anti-CFH detection antibody (1.5 µl/ml, ab8842) to each well and

45 incubated for 3 hours at room temperature. From this point on, the protocol was carried out in darkness. We performed 4 washes and added the HRP conjugate antibody at 50 µl

(145 ng/ml, ab6747) to the plates. A 30-minute room temperature incubation was performed, the plates were washed 4 times. We added 100 µl of chromogen (TMB) and incubated the plates at room temperature for 1 hour. We added 100 µl of 0.5 M sulfuric acid for the stop solution to end the reaction. Plates were read immediately at 450 nm on the FluoSTAR machine. Analyses were discrete; however, tests were performed to confirm that 450nm was the ideal wavelength to read on the original protocol development plates. Analyses to determine relative CFH protein expression in our assays were performed in Microsoft Excel.

Genetic Risk Score Analysis

To determine if CFH P503A carriers have a higher burden for the currently known genetic variants, we calculated genetic risk scores for 523 Amish study participants with and without the risk allele for P503A. Genotypes for the 52 IAMDGC risk variants for AMD (56) were acquired using the PlexSeq amplicon sequencing platform (PlexSeq Diagnostics, Cleveland, Ohio, USA). We performed extensive quality control and removed any variants that had a call rate less than 80 percent in our sample.

This resulted in analyses of 35 of the 52 variants (Chapter 2 Appendix: Supplemental

Table 1). In our analyses, we excluded individuals with missing genotypes for these 35 variants, which resulted in a sample size of 523 (14 carriers and 509 non-carriers). In our weighted risk scores, we used odds ratios calculated by the IAMDGC (56). We compared risk scores based on AMD and P503A carrier statuses using Wilcoxon rank-sum tests.

46

Results

Identification of Additional CFH P503A Carriers in the Amish

We performed genotyping assays (Chapter 2 Appendix: Supplemental Figure 1) on 526 Ohio Amish, 303 Indiana Amish, and 497 Pennsylvania Amish and identified 39 additional carriers of the risk allele for CFH P503A. Therefore, in total, we have identified 58 carriers, including 57 heterozygotes and 1 homozygote. Of these 58 carriers,

20 have AMD (8 self-reported and 12 clinically confirmed), 33 do not have AMD (29 clinically confirmed and 4 self-reported), and 5 have an unknown AMD status. Using data from the AGDB, we found that these 58 individuals are related through a 1,000- person pedigree that traces back to 12 common ancestors (Figure 2.1). All 58 individuals with the risk allele for CFH P503A were from Ohio and Indiana. None of the nearly 500

Lancaster County Amish that were genotyped in our study had a copy of this risk allele.

47

.

CFH

the risk allele for risk for allele the

and theand tool software Pedigraph

are highlightedare in orange.

common ancestors (6 married The couples). married ancestors (6 common

Carriers

path pedigree for the 58 Amish individualswith the 58 for path pedigree

connecting

-

The ancestry of the 58 carriers can be traced be the to 12 of ancestry 58 carriers can The

Figure 2.1. All Figure P503A. information AGDB the was from drawn using genealogy pedigree males. females, squares represent represent and Circles

48

Age at Diagnosis by P503A Carrier Status

To gauge if the individuals with the risk allele for P503A exhibited an earlier age of AMD onset compared to individuals without the risk allele, we compared the age at first AMD diagnosis in carriers and non-carriers of the risk allele. The carriers did not exhibit a significantly earlier age at first AMD diagnosis compared to non-carriers in our

Kaplan-Meir survival curve analysis (p = 0.59); however, several of the carriers appear to have an earlier age of AMD onset compared to most of the non-carriers (Figure 2.2). The average ages at AMD diagnosis were 69.9 and 71.2 years for affected carriers (n = 7) and non-carriers (n = 154), respectively. These observations were consistent with the results from our pairwise Wilcoxon rank-sum tests between age at AMD diagnosis for carriers and non-carriers (Chapter 2 Appendix: Supplemental Figure 2).

Figure 2.2. Age at exam for carriers and non-carriers of the risk allele for CFH

P503A. The AMD statuses of 826 Amish individuals were evaluated, including 31 of the carriers (7 with AMD and 24 without AMD) and 795 non-carriers (154 with AMD and

49

641 without AMD) from the Ohio, Indiana, and Pennsylvania Amish populations. Blue represents the carriers (Genotyped=CG), and red represents the non-carriers

(Genotyped=CC). The y-axis depicts the proportion of carriers and non-carriers that were considered unaffected at their eye exam. The x-axis depicts the ages at which individuals received their first diagnosis of AMD based on their eye exam.

CFH RNA Quantification and Analysis

To understand the effects of CFH P503A on CFH gene products, we examined mRNA expression of CFH in 40 whole blood samples from carriers and closely related non-carriers. Real-time quantitative PCR was executed with four different TaqMan assays targeting the three protein-coding transcripts of CFH (CFH-202, 206, and 201) and the large retained intron of CFH (CFH-203). Four samples were removed as outliers from our analysis: one affected carrier, two unaffected carriers, and one unaffected non- carrier. The P503A variant falls within the following transcripts: CFH-202 and CFH-203.

The combined relative expression of the CFH-206 and 202 transcripts was significantly higher in the non-carriers without AMD compared to the affected non-carriers (Figure

2.3A). The combined relative expression of the three protein-coding transcripts of CFH

(CFH-201, 206, and 202) was significantly higher in the carriers and non-carriers without

AMD compared to the affected non-carriers (Figure 2.3B). Relative expression of CFH-

202 was significantly higher in the carriers and non-carriers without AMD compared to the affected non-carriers (Figure 2.3C). There were no significant differences among groups for the relative expression of CFH-203 (Figure 2.3D).

50

-

203.

-

We

carriers

using

-

CFH

201, CFH

-

CFH

(D)

Carrier/AMD; Carrier/AMD;

d non

carriers. carriers.

-

tatistical tatistical

CFH

transcript transcript

-

S

202;

(B)

-

CFH CFH

ession of

A comparison across A comparison

AMD. AMD.

CFH

202;

were evaluated were

-

-

(C)

AMD; 5 AMD; Non for

sum tests.

-

202;

-

-

Carrier/Non

-

206 and CFH206 and

-

Wallis test.

-

CFH

Figure 2.3. Relative 2.3. Relative expr Figure and in carriers transcripts non the of following transcripts: expression measured (A) CFH206, and outliers. Samples removed as were Four samples group: each 12 n = Carriers/AMD; for sizes 6 for Carrier/Non for Non 13 for relative among differences an from levels carriers expression AMD with and without rank Wilcoxon out with a carried assay in each was groups all Kruskal

51

Protein Modeling

Because the risk allele of CFH P503A results in an amino acid substitution of a proline for an alanine, we computationally investigated if this substitution yielded any structural changes to the CFH protein. In the risk-associated version of the structure

(A503), there were fewer contacts between the amino acid at position 503 and the nearby residues when we modeled the domain (SCR8) in which the variant occurs (Figure 2.4).

Specifically, there were 21 contacts predicted in Chimera for the P503 structure, and there were only 12 contacts predicted between A503 and neighboring residues. Although contacts with three neighboring residues (I455, Y475, K450, T504) remained consistent between models, there were no contacts between two residues (W499 and Q502) and

A503 and reduced contacts between one residue (I492) and A503 compared to P503

(Figure 2.4).

52

Figure 2.4. Visualization of protein models containing the amino acid substitution for CFH P503A in SCR8 domain of CFH. Models were visualized with Chimera software (150). (A) SCR8 with P503 and interacting residues. Points of contact are depicted in red. (B) SCR8 with A503 and interacting residues. Points of contact are depicted in red. (C) Superposition of P503 (blue residue) and A503 (orange residue) protein models and neighboring contacts. Residues colored in green maintained the same contacts in the P503 and A503 structures. Red residues lost contact with A503 that interacted with P503. The yellow residue had reduced contacts with A503 compared to

P503.

53

CFH Protein Expression

To observe if the risk allele for CFH P503A affects CFH protein expression, we performed Western blot analyses and ELISA assays with plasma from carriers and non- carriers who were affected or unaffected by AMD. In our preliminary data, we did not identify strong changes to CFH expression based on P503A carrier status or AMD status

(Figure 2.5-2.7). Plasma from the homozygous carrier did not have a marked change in

CFH expression compared to the other individuals we assayed (lane 2 in Figure 2.5B).

Figure 2.5. Representative blots measuring relative CFH protein expression in plasma. (A) Comparison of heterozygous carriers and non-carriers with and without

AMD. (B) Comparison of carriers (homozygous* and heterozygous) and non-carriers with differing AMD diagnoses. Plasma from normal blood donors and liver CFH lysate were used as controls in both experiments.

54

-

.

tests

-

blots

individuals

sided t

-

measured by Western measured by Western

were calculated using calculated were two

values values

-

P

.

CFH plasmaCFH expression

individuals

relative relative

Comparison of CFH protein expression in plasma of expression from CFHComparison protein

Data were quantified from four Western blot Western experiments. four quantified Datafrom were

B)

(

equal variance. equal

un

Comparison of CFH protein expression in plasma from carriers of non risk of in plasma CFHComparison versus the carriers allele expression from protein

A) 2.6. QuantificationFigure of ( the of risk allele. carriers versus by AMD unaffected affected assuming

55

carriers carriers

-

tests

-

.

affected by affected

sided t

-

individuals

measured by ELISA

om carriers of the of non risk alleleom carriers versus

values were calculated usingvalues calculated were two

-

P

expression in plasma from expression from in plasma

CFH plasmaCFH expression

relative relative

relative relative

Data shown were quantified from five ELISA experiments. five ELISA experiments. quantified from Data shown were

expression in plasma fr in plasma expression

Comparison of

relative

variance.

B)

(

equal equal

Quantification of

un

7.

Comparison of Comparison

A) Figure 2. Figure ( risk allele. of the by AMD. individuals unaffected versus AMD assuming

56

Genetic Risk Scores

Previous genetic risk score calculations in the Amish demonstrated that Amish

AMD cases and controls had a lower genetic burden for the 19 then-known AMD- associated variants compared to AMD cases and controls from the general population

(136). Since that time, the IAMDGC has identified 52 independent genomic variants associated with AMD (56). Therefore, to determine if P503A carriers have a higher burden for known genetic variants for AMD compared to non-carriers, we calculated genetic risk scores for 35 of the 52 IAMDGC variants and weighted them by the odds ratios calculated by the IAMDGC (Chapter 2 Appendix: Supplemental Table 1). The weighted risk scores among individuals with AMD (mean = 23.01, 95% CI: 21.54-24.49) were significantly higher (p = 0.0037) than the scores for unaffected individuals (mean =

20.19, 95% CI: 19.78-20.60). The weighted risk scores among carriers (mean = 18.50,

95% CI: 16.32-20.68) were not significantly different (p = 0.055) than the scores for non- carriers (20.84, 95% CI: 20.38-21.30). Among AMD cases, we found that P503A carriers did not have significantly (p = 0.054) lower genetic risk scores compared to non-carriers

(Figure 2.8). However, the weighted genetic risk scores among non-carriers were significantly higher (p = 0.001) among individuals with AMD compared to unaffected individuals (Figure 2.8). Unaffected carriers also had significantly lower (p = 0.045) scores compared to AMD-affected non-carriers (Figure 2.8).

57

.

,

n 9) =

(

(n 409) =

Comparisons among the the Comparisons among

carriers with AMD with carriers

carriers carriers without AMD

-

P503A

CFH CFH

, non and

(n = 100) (n =

sum tests:

-

carriers AMD with

-

, non

(n 5) =

Figure 2.8. Weighted genetic risk scores variants. 35 52 IAMDGC risk genetic for scores of the 2.8. Weighted Figure with Wilcoxon rank evaluated following groups were AMD without carriers

58

Discussion

In this study, we aimed to characterize the effects of CFH P503A in the Amish population. All 58 individuals with the risk allele were from Ohio and Indiana Amish communities. We failed to find any Amish individuals from our Lancaster County cohort with the risk allele for CFH P503A. Therefore, we hypothesize that this risk allele segregated within the Midwest Amish subpopulation because of the second Amish migration in the United States in the 1800s. The Lancaster County Amish and the

Midwest Amish communities are distinct as evidenced by differences in family surnames and settlement timelines; therefore, it is unsurprising that they could differ genetically as well (93). Using data from the Swiss Anabaptist Genealogical Association, we determined that the 12 common ancestors of the 58 carriers were from Europe as well as

Berks and Somerset Counties in Pennsylvania (Figure 2.9).

Figure 2.9. Highlighted counties of the state of Pennsylvania. The risk allele for the

CFH P503A variant was not identified in a Lancaster County Amish cohort of nearly 500 individuals. Data from the Swiss Anabaptist Genealogical Association suggests that the

59 ancestry of the 12 common ancestors of the 58 carriers is rooted in Berks and Somerset

Counties in Pennsylvania rather than Lancaster County, Pennsylvania.

Although this variant was initially identified in the Amish communities of Ohio and Indiana in the United States, other studies have identified heterozygous individuals for CFH P503A, including the DiscovEHR (155), TOPMed (156), and gnomAD (157) databases. In each of these databases, there are no more than 10 individuals with the risk allele. It is unclear if the individuals in the DiscovEHR study are heterozygous or homozygous, but the allele frequency of CFH P503A was less than 0.001 (155). The

DiscovEHR browser was created from a collaboration between the Geisinger Health

System and Regeneron Genetics Center (155). The study participants were originally recruited through the MyCode Community Health Initiative, which enrolled patients from

Geisinger clinics in central and northeastern Pennsylvania (158). Pennsylvania has among the highest populations of Amish individuals in North America (95); therefore, it is possible that some of these carriers in the DiscovEHR study are Amish or at least of

Amish descent.

TOPMed (https://bravo.sph.umich.edu/freeze5/hg38/variant/1-196713905-C-G) and gnomAD (https://gnomad.broadinstitute.org/variant/1-196683035-C-G) only have heterozygous CFH P503A carriers. In TOPMed, there are three heterozygotes for the risk allele. The six heterozygotes in gnomAD are between 60 and 65 years old, which is the age at which some people begin to develop AMD (9, 10). The individuals with the risk allele for CFH P503A in gnomAD are from the following ancestries: African, North- western European, and Other non-Finnish European. In the AMD literature, two studies

60 identified AMD-affected individuals with one copy of the risk allele. In one study, a heterozygote for CFH P503A was identified among AMD cases from an aggregated dataset of individuals from Radboud University Medical Center in the Netherlands and the European Genetic Database (EUGENDA) (159). Similarly, one heterozygote with

AMD was identified in a dataset that included study participants from Boston, France, and Baltimore (160). As described, it is unclear which geographic region this individual is from in this study. Despite this, the identification of carriers in these other datasets suggests that the risk allele for CFH P503A is not Amish-specific.

The P503A variant is about 15bp from the end of exon 10 in the CFH gene. Using the Human Splicing Finder v.3 (http://www.umd.be/HSF3/), we found that the risk allele appears to break an exonic splicing enhancer (ESE) site, which may alter splicing of

CFH. We interrogated if the risk allele for CFH P503A alters expression of any of the

CFH transcripts. The variant falls within the full-length protein-coding transcript of CFH

(CFH-202) and the retained intron transcript (CFH-203) but does not lead to significant changes in expression of these transcripts in the blood. While relative CFH expression varied significantly between some groups in our assays, carrier status alone could not explain these differences. In contrast, other rare CFH variants (chr1:196648924G>A and chr1:196642295T>C) located in spice sites lead to reduced serum CFH protein levels in carriers of the risk allele (160, 161). Other studies have shown that transcript levels of

CFH in the RPE-choroid are not significantly different among AMD cases and controls

(143, 162) or among Y402H carriers and non-carriers (162). Additionally, individuals with the P503A risk allele were not significantly younger at their first AMD diagnoses compared to homozygotes for the non-risk allele. This is in contrast with carriers of

61 another rare CFH variant (R1210C), which experienced AMD symptoms at significantly younger ages than non-carriers did (141).

We interrogated CFH protein expression in plasma samples from Amish individuals with and without the risk allele for CFH P503A. We hypothesized that CFH P503A causes misfolding of the protein which would be degraded through the unfolded protein response and result in lower CFH expression in carriers compared to non-carriers.

However, our preliminary data suggest that the risk allele does not noticeably affect CFH protein expression in these individuals. The lack of an effect on CFH protein abundance has been similarly observed for other missense risk alleles in CFH for AMD, including the common, high-effect variant (Y402H, rs1061170) that has been repeatedly associated with AMD. Rather than affecting protein expression of CFH, the Y402H variant affects the binding of CFH to its ligands including heparin and C-reactive protein (163). Y402H also lowers the binding affinity of plasma CFH to oxidized low-density lipoprotein

(oxLDL), which may influence systemic and local oxidative stress (164). On the other hand, the protective AMD variant in CFH (I62) contributes to increased binding to C3b in the complement pathway (165).

The CFH P503A variant is a missense mutation that results in the conversion of a proline residue to an alanine residue in a domain of the CFH protein that contains binding sites for C3b, C-reactive protein, and cell surface proteins of pathogenic bacteria which recruit CFH to prevent complement attack (166). Therefore, this variant may alter the binding affinities of CFH for some of its ligands in the complement cascade. Our protein modeling suggests that modest conformational changes occur in SCR8 of CFH due to

62

P503A. Therefore, further investigation of the binding capabilities of CFH P503 and

A503 need to be evaluated.

While we observed that P503A status did not affect CFH expression levels, this does not eliminate the possibility that it is impeding CFH function, which may be more aptly observed by examining its effects on other members of the complement pathway.

Eyes of deceased study participants who were homozygous for CFH Y402H have elevated levels of C5a, IL-18, and TNF-alpha that may lead to heightened activity of the complement and NF-κB pathways (167). Carriers of the risk allele for Y402H also have higher C-reactive protein expression in the RPE-choroid (162). Therefore, investigations of other members and ligands of the complement pathway should be performed for CFH

P503A to determine if there are broader effects of the risk allele independent of CFH expression.

Although we interrogated the effects of CFH P503A in blood samples from

Amish carriers and non-carriers, we were unable to assess these potential consequences in eye tissue. Due to their cultural beliefs, we are unable to acquire eye tissue from Amish study participants. As with other studies that examine the effects of CFH variants in the blood, our approach to study CFH P503A in the blood begins to interrogate the role of this variant in systemic complement activity. However, its effects may be different in the eye due to tissue-specific effects and ocular immune privilege (168). CFH is predominantly synthesized in the liver and secreted into the bloodstream to circulate throughout the body (169). CFH is also constitutively expressed by RPE cells in the human eye and protects these cells from complement attack (30). Perturbations in CFH are suspected to alter both systemic and local complement regulation and promote

63 immune responses in the retina that contribute to AMD (10, 144, 145). Therefore, the potential consequences of CFH P503A on local complement activity need to be explored.

Additionally, while we have identified the highest number of carriers of the risk allele for

CFH P503A in a single cohort, we are limited by the small number of carriers (n = 57 heterozygotes and n = 1 homozygote) in our functional experiments and statistical analyses.

In this study, we characterized the effects of a rare, missense variant (CFH

P503A) for AMD in Amish individuals. We did not observe significant differences in mRNA and protein expression of CFH in Amish blood samples from carriers and non- carriers of the risk allele. However, the substitution of an alanine amino acid for a proline amino acid at position 503 appears to change the number of contacts among neighboring residues in SCR8 of CFH. Therefore, this variant may affect binding affinities for ligands of CFH and other members of the complement pathway. Elucidating the impacts of risk variants, like CFH P503A, on both local and systemic complement activity could lead to new knowledge of the pathophysiology of AMD and promote the creation of novel therapeutics.

64

CHAPTER 3

Rare variants and loci for age-related macular

degeneration in the Ohio and Indiana Amish

Andrea R. Waksmunski1,2,3, Robert P. Igo, Jr.3, Yeunjoo E. Song3, Jessica N. Cooke

Bailey2,3, Renee Laux3, Denise Fuzzell3, Sarada Fuzzell3, Larry D. Adams4, Laura

Caywood4, Michael Prough4, Dwight Stambolian5, William K. Scott4, Margaret A.

Pericak-Vance4, and Jonathan L. Haines1,2,3

1Department of Genetics and Genome Sciences, Case Western Reserve University,

Cleveland, Ohio, U.S.A.; 2Cleveland Institute for Computational Biology, Case Western

Reserve University, Cleveland, Ohio, U.S.A.; 3Department of Population and

Quantitative Health Sciences, Case Western Reserve University, Cleveland, Ohio,

U.S.A.; 4John P. Hussman Institute for Human Genomics, University of Miami Miller

School of Medicine, Miami, Florida, U.S.A; 5Department of Ophthalmology, University

of Pennsylvania, Philadelphia, Pennsylvania, U.S.A.

This chapter was adapted from the following published manuscript:

Waksmunski AR, Igo RP, Song YE, Cooke Bailey JN, Laux R, Fuzzell D, Fuzzell S,

Adams LD, Caywood L, Prough M, Stambolian D, Scott WK, Pericak-Vance MA,

Haines JL. 2019. Rare variants and loci for age-related macular degeneration in the Ohio and Indiana Amish. Human Genetics 138: 1171–1182. doi:10.1007/s00439-019-02050-4

Copyright © 2019, Springer Nature

65

Abstract

Age-related macular degeneration (AMD) is a leading cause of blindness in the world. While dozens of independent genomic variants are associated with AMD, about one-third of AMD heritability is still unexplained. To identify novel variants and loci for

AMD, we analyzed Illumina HumanExome chip data from 87 Amish individuals with early or late AMD, 79 unaffected Amish individuals, and 15 related Amish individuals with unknown AMD affection status. We retained 37,428 polymorphic autosomal variants across 175 samples for association and linkage analyses. After correcting for multiple testing (n = 37,428), we identified four variants significantly associated with

AMD: rs200437673 (LCN9, p = 1.50 x 10-11), rs151214675 (RTEL1, p = 3.18 x 10-8), rs140250387 (DLGAP1, p = 4.49 x 10-7), and rs115333865 (CGRRF1, p = 1.05 x 10-6).

These variants have not been previously associated with AMD and are not in linkage disequilibrium with the 52 known AMD associated variants reported by the International

AMD Genomics Consortium based on physical distance. Genome-wide significant linkage peaks were observed on chromosomes 8q21.11-q21.13 (maximum recessive

HLOD = 4.03) and 18q21.2-21.32 (maximum dominant HLOD = 3.87; maximum recessive HLOD = 4.27). These loci do not overlap with loci previously linked to AMD.

Through gene ontology enrichment analysis with ClueGO in Cytoscape, we determined that several genes in the 1-HLOD support interval of the chromosome 8 locus are involved in fatty acid binding and triglyceride catabolic processes, and the 1-HLOD support interval of the linkage region on chromosome 18 is enriched in genes that participate in serine-type endopeptidase inhibitor activity and the positive regulation of

66 epithelial to mesenchymal transition. These results nominate novel variants and loci for

AMD that require further investigation.

Introduction

Age-related macular degeneration (AMD) is the third leading cause of vision loss in the world (10). It is characterized by the deterioration of the central field of vision due to the accumulation of lipid deposits (drusen), inflammation, and neurodegeneration in the macula (26). AMD is a multifactorial disease with numerous influential genetic and environmental risk factors. Environmental factors constitute about 20 to 40 percent of total AMD variance and include age, smoking, hypertension, and diet (26, 32). More than half of AMD heritability is explained by 52 independent common and rare genetic variants across 34 genomic loci (56). Therefore, about half of AMD heritability is unaccounted for by known genetic polymorphisms and may be partially resolved with the identification of rare variants (60).

As the minor allele frequency (MAF) for a variant decreases, the sample size required to detect an association increases linearly with 1/MAF (60). Consequently, traditional genome-wide association studies (GWAS) have been largely unable to detect associations with rarer variation (typically MAF < 0.01) unless the sample size in is the tens of thousands. In contrast, GWAS on a sample of large families can increase the power to detect such variants through enrichment of rare variants segregating in families with many affected members (88). Isolated founder populations, like the Amish, can amplify the power of family-based studies through sharing of founder variants. The

Amish are descendants of Swiss Anabaptists who settled in North America 200-300 years ago and maintained their culturally segregated community for generations by marrying

67 within their faith group and adhering to a uniform, conservative lifestyle (91). Therefore, they have lower genetic and environmental heterogeneity compared to the general population of European descent. Also, Amish communities comprise large families with an average of seven to nine children in each nuclear family (93). Extensive genealogical data are available for this community in the Anabaptist Genealogy Database (AGDB) making this a valuable population to identify rare genetic variation by combining the advantages of isolated populations and large pedigrees (147).

To identify novel variants underlying the pathophysiology of AMD, we performed association and linkage analyses on Illumina HumanExome chip data for 175 related Amish individuals from Ohio and Indiana. We also examined in silico functional annotations of the genes from our linkage regions to determine whether these genes participate in biological processes that may be relevant for AMD pathology.

Materials and Methods

Study Participants

The Amish individuals were identified from published Amish community directories, referral from other Amish community members, or due to close relationships to other research subjects. All study participants were at least 65 years old. The AMD affection status was defined by a self-reported diagnosis on a health questionnaire, which asked whether the individual had ever received an AMD diagnosis from a physician. We previously demonstrated that self-reported AMD statuses have positive and negative predictive values of around 90 percent in this population (136). Some Ohio Amish participants also received a clinical exam at the Wooster Eye Clinic in Wooster, Ohio where ocular coherence tomography images were obtained for both eyes. Participants’

68 eyes were graded on a scale from 0 to 5 using the modified Clinical Age-Related

Maculopathy Staging (CARMS) system (16, 137). Individuals with grades 0, 1, and 2 in both eyes were designated as unaffected in this study, and individuals with grades 3 to 5 in at least one eye were designated as affected. Grades of 3 were indicative of early/intermediate AMD, and grades of 4 or 5 were classified as advanced AMD. For this study, 181 Amish samples included 87 individuals with confirmed or self-reported AMD,

79 unaffected individuals, and 15 related Amish individuals with unknown AMD phenotype. Using the Anabaptist Genealogy Database (147), we constructed a 2,118 person all-connecting path pedigree for the 175 Amish individuals that passed quality control and were closely related (Figure 3.1). This included 86 affected with AMD, 77 unaffected individuals, and 12 relatives with an unknown AMD affection status.

Informed consent was obtained from all study participants, and the study was approved by the institutional review boards at Vanderbilt University and Case Western Reserve

University. Study procedures were performed in accordance with the tenets of the

Declaration of Helsinki.

69

The pedigree was drawn was pedigree The

es. Squares: males. The Squares: males. The es.

connecting path pedigree of the 175 Amish individuals in this in study. 175 Amish individuals connecting of the path pedigree

-

3.1. All Figure Circles: Genealogy femal (AGDB). Anabaptist the Database from using information blue are individuals withinbox the shaded genotyped

70

Genotyping and Quality Control

A total of 192 samples, which included 181 unique Amish samples, 5 Amish duplicates, and 6 Coriell HapMap trio controls of European descent, were genotyped using the Illumina HumanExome BeadChip v1.1 at the Vanderbilt Technologies for

Advanced Genomics (VANTAGE) DNA Resources Core. Genotype data were called and clustered with Illumina’s GenomeStudio (170). The accuracy of genotype calling using

GenomeStudio is improved by increasing the sample size (171). Therefore, an additional

2,280 non-Amish samples that were genotyped by VANTAGE with the same protocols and at the same time as the Amish samples were imported into GenomeStudio and clustered with the Amish samples. The data from these non-Amish samples were not used in the subsequent association and linkage analyses. All samples demonstrated call rates greater than 95 percent and were used for extensive quality control in GenomeStudio as summarized in Chapter 3 Appendix: Supplemental Figure 1 and Supplemental Table 1

(172). Briefly, we excluded variants for low call rate, GenTrain score for cluster quality, cluster separation, and intensity values (Chapter 3 Appendix: Supplemental Table 1).

Hemizygous variants were evaluated as well as variants with Mendelian and replication errors (Chapter 3 Appendix: Supplemental Table 1). Rare variant calling was performed manually with GenomeStudio’s GenCall algorithm and with zCall, a separate rare variant calling algorithm (173). zCall is utilized as a post-processing step for calling variants with no genotypes assigned after initial variant calling with GenomeStudio’s GenCall algorithm. It uses the intensity profile of the homozygous cluster of the common allele to determine the location of the other homozygous and heterozygous genotype clusters

71

(173). All variants with at least four new heterozygous calls in zCall were manually reviewed in GenomeStudio.

A total of 146,571 variants and 2,466 samples, including 181 Amish individuals, passed quality control in GenomeStudio and were evaluated for sex mismatch, race mismatch, and heterozygosity outliers using PLINK v1.07 (174). Sex mismatched samples were evaluated using heterozygosity rates for the X chromosome and call rates for the Y chromosome (Chapter 3 Appendix: Supplemental Figure 2). One unresolvable sex mismatch in our Amish samples was identified and removed from the study. An

XXY Amish male and an XO Amish female were identified and retained. Ancestry mismatch was assessed using the EIGENSTRAT software package and about 3,000 ancestry-informative markers (AIMs) from the exome chip (175). Principal components analysis (PCA) demonstrated that the combined sample set is multi-ethnic and that the

Amish samples clustered together with other Europeans as expected (Chapter 3

Appendix: Supplemental Figure 3). None of our Amish samples were in the extremes of the heterozygosity distribution calculated by PLINK v1.07 (174), so none of them were excluded as outliers (Chapter 3 Appendix: Supplemental Figure 4). The heterozygosity rate can indicate problematic, potentially contaminated DNA samples (172). Of the

242,901 variants genotyped, 37,428 were polymorphic in the Amish sample, passed extensive quality control, and served as input for our association and linkage analyses.

Variants were considered polymorphic if they had a non-zero minor allele frequency calculated by the RObust Association-Detection Test for Related Individuals with

Population Substructure (ROADTRIPS) software v2.0. We omitted five Amish individuals from our association and linkage analyses due to excessively distant

72 relatedness in the pedigree (Chapter 3 Appendix: Supplemental Figure 5). Of the remaining 175 Amish individuals, 86 are affected, 77 are unaffected, and 12 have an unknown AMD status.

Association Analysis

The samples in this study are highly interrelated (Figure 3.1). To account for these relationships and population structure, we performed association testing using the

ROADTRIPS software v2.0 (176). This program determines associations between genotypes and binary traits while adjusting for the relatedness and population structure of the samples. It calculates and uses an empirical covariance matrix based on the genotypes from the genotype data to account for population structure. ROADTRIPS determines three different test statistics for association: the RM, Rχ, and RW tests (176). For each of these tests, p-values are calculated based on a chi-squared distribution with 1 d.f. asymptotic null distribution (176). We used the RM test statistic because it allows inclusion of both unaffected individuals and individuals with unknown phenotype in the analysis and is more powerful than the other two when samples are related and pedigree structure is known (176). We used kinship coefficients for the 175 Amish individuals calculated by KinInbcoef to account for the pedigree structure in our data (177). The software program determines the weight of individuals with unknown phenotype in the analysis by a user-defined population prevalence for the trait and the phenotypes of their genotyped relatives (176). We set the population prevalence of AMD to 0.13 based on published estimates of its prevalence in individuals of European descent of an age typical of this pedigree (33, 178).

73

Linkage Analysis

We performed multipoint linkage analyses on a panel of 5,668 autosomal variants chosen to be informative for stretches of identity by descent on the exome chip (179).

The 175 Amish individuals in this study are connected in a multigenerational pedigree of

2,118 Amish individuals (Figure 3.1). Because analyzing very large, complex pedigrees is computationally intractable, we used PedCut to partition the all-connecting path pedigree into densely affected sub-pedigrees with a maximum bit size of 24 (180).

Parametric heterogeneity LOD (HLOD) scores were determined with the Multi-Point

Engine for Rapid Likelihood Inference (MERLIN) software tool (181) under affecteds- only dominant and recessive models. The penetrance values for the dominant model were

0, 0.0001, or 0.0001 for 0, 1, or 2 copies of the disease allele, respectively. Under the recessive model, penetrance values were defined as 0, 0, or 0.0001 for 0, 1, or 2 copies of the disease allele, respectively. We set disease allele frequencies to 0.01 and 0.10 for each of these models. We performed multipoint linkage analyses across all autosomes with

HLOD scores evaluated at each marker and at every 1 centimorgan (cM; Haldane) with a bit size threshold of 24. For chromosomes with genome-wide significant multipoint

HLOD scores, we repartitioned the all-connecting path pedigree into sub-pedigrees with maximum bit sizes of 23 and 25 to test the robustness of our findings to changes in sub- pedigree construction. The model parameters for these analyses were consistent with those used in the initial multipoint analyses.

For chromosome 1, we performed a multipoint linkage analysis with liability classes for carriers of the CFH Y402H (rs1061170) (76-78, 143) and P503A

(rs570523689) variants (136). CFH Y402H failed quality control in our dataset.

74

Therefore, we identified a surrogate variant (rs1329424, r2 = 1.0) for CFH Y402H using the LDmatrix module of LDlink to compare linkage disequilibrium statistics among CFH variants that had passed quality control in our study and the Y402H variant in the CEU population from Phase 3 (Version 5) of the 1000 Genome Project (182). We identified

CFH P503A carriers by performing customized TaqMan genotyping assays of the variant in our Amish cohort (unpublished data). Briefly, P503A genotyping was carried out using a Custom TaqMan SNP Genotyping Assay (Thermo Fisher) to interrogate the presence of a C (P503) or G (A503) at position 1507 in the CFH transcript. Assays were carried out per the manufacturer’s instructions using TaqMan Genotyping Master Mix (Thermo

Fisher) and 10 ng of genomic DNA per reaction. Reactions were run on a QuantStudio 7

PCR machine, and genotypes were called by the QuantStudio analysis software. In our conditional linkage analysis, penetrance values were 0.13, 0.312, and 0.624 for 0, 1, and

2 copies of the disease allele, respectively, for carriers of only the surrogate for Y402H.

We chose these values based on the population prevalence of AMD in the outbred

European-descent population (13 percent) (33, 178) and the expected prevalence based on the published odds ratio (OR) for Y402H, which is 2.4 per copy of the risk allele in individuals of European descent (77, 183). For carriers of only the P503A variant or both variants, penetrance values were 0.13, 0.6, and 0.6 for 0, 1, and 2 copies of the disease allele, respectively. The penetrance value for the risk allele of P503A was derived from the number of affected P503A carriers we identified in a previous study (136). For non- carriers of both variants, penetrance values were consistent with those used for the multipoint linkage analyses of the other autosomes.

75

In Silico Functional Analysis of Genes from 1-HLOD Support Intervals of Linkage

Regions

To determine whether genes present in the 1-HLOD support intervals (Chapter 3

Appendix: Supplemental Tables 2-4) of our linkage regions on chromosomes 8 and 18 are functionally related to one another, we extracted gene boundaries from Ensembl

( build 37) and performed gene ontology (GO) enrichment analyses with the ClueGO v2.5.3 plug-in (184) of Cytoscape v3.7.1 (https://cytoscape.org/).

Specifically, terms from the following classifications in GO were considered: Immune

System Process, Biological Process, Molecular Function, and Cellular Component. We included all evidence codes (experimental, non-experimental, author statements from publication, and curator statements) in the analysis. We calculated p-values using right- sided hypergeometric tests and the Benjamini-Hochberg correction for multiple testing.

We chose medium network specificity to identify representative pathways for our genes of interest. This level of specificity examines GO levels 3-8, requires at least three genes per GO term, and ensures that mapped genes represent at least four percent of the total associated genes. We also used a Kappa score of 0.4 for our GO term network connectivity threshold. GO terms were iteratively compared, merged into functional groups, and visualized in Cytoscape.

Results

Association Analysis

Of the 37,428 polymorphic autosomal variants that passed quality control, four variants met the Bonferroni correction threshold (p < 1.34 x 10-6): rs200437673

(chromosome 9, LCN9, p = 1.50 x 10-11), rs151214675 (chromosome 20, RTEL1, p =

76

3.18 x 10-8), rs140250387 (chromosome 18, DLGAP1, p = 4.49 x 10-7), and rs115333865

(chromosome 14, CGRRF1, p = 1.05 x 10-6) (Figure 3.2, Table 3.1). P-values were not corrected for inflation because the genomic control factor was 1.05 (Chapter 3 Appendix:

Supplemental Figure 6). These variants demonstrate novel associations with AMD and do not map to the AMD susceptibility loci identified by the International AMD Genomics

Consortium (IAMDGC) based on physical distance (56). While these variants were identified in the Amish, we found that they are rare to low-frequency (MAF< 0.01) in outbred populations of European descent including the IAMDGC dataset and the gnomAD database (Table 3.2).

77

<

p

1.

3.

Table

values obtained from association testingvalues obtained using from

-

p

lot of lot of

The red line Bonferroni for the threshold red multiple correction testing ( denotes The

). Four variants passed this threshold and are summarized in this and passed threshold are ). Four variants

6

-

p 3.2. Manhattan Figure ROADTRIPS. 1.34 x 10

78

8 7 6

11

- - -

-

P

cell

3.18x10 4.49x10 1.05x10

1.50x10

CGRRF1, CGRRF1,

1 0 0 1

Unknowns

0 1 1 2

values were obtained from obtained were the values

-

P

Allele CountAllele

Unaffecteds

Detection Test for Related Detection Related for Test

-

regulator of telomere elongation helicase helicase elongation of telomere regulator

5 6 3

10

, DLG associated protein 1; protein associated , DLG

Affecteds

RTEL1,

e

DLGAP1

Coding

-

Missense Missense Missens

Transcript

lipocalin 9; lipocalin

Non

Synonymous

Consequence

candidate);

LCN9,

TNFRSF6B

-

LCN9

RTEL1

Gene(s)

DLGAP1 CGRRF1

RTEL1

A/G G/A A/G G/A

acular degeneration; ROADTRIPS, RObust Association ROADTRIPS, degeneration; acular RObust

Alleles

TNFRSF6B readthrough (NMD TNFRSF6B readthrough

-

related m

-

RTEL1

Position

3,534,424

ciated variants identified with testing variants identified ROADTRIPS families. ciated of Amish

62,293,235 55,004,449

138,555,237

9

20 18 14

Chr.

TNFRSF6B,

-

rsID

test. Variant positions are given for human genome. build 37 (hg19) of the are positions test. Variant

RTEL1

rs200437673 rs151214675 rs140250387 rs115333865

Table 3.1. AMD asso Table 3.1. AMD RM age AMD, Abbreviations: Chr., chromosome; Substructure; Individuals with Population 1; 1 domain regulator with ring finger growth

79

Table 3.2. Allele frequencies for the AMD associated variants identified with

ROADTRIPS testing of Amish families in outbred populations (IAMDGC and gnomAD release 2.1).

Allele Frequency Variant IAMDGC gnomAD Advanced Non-Finnish rsID Chromosome Controls AMD Cases Europeans rs200437673 9 0.004118 0.0043691 0.004341 rs151214675 20 0.00024779 0.00016824 0.0001936 rs140250387 18 0.001146 0.0012338 0.001611 rs115333865 14 0.019112 0.019993 0.01879 Abbreviations: AMD, age-related macular degeneration; ROADTRIPS, RObust

Association-Detection Test for Related Individuals with Population Substructure;

IAMDGC, International Age-Related Macular Degeneration Genomics Consortium; gnomAD, Genome Aggregation Database

Linkage Analysis

Multipoint linkage analyses under autosomal dominant and recessive models were performed to identify genomic loci linked to AMD in 16 Amish sub-pedigrees (Chapter 3

Appendix: Supplemental Figures 7-8 and Supplemental Tables 5-6). Under the recessive model with a disease allele frequency of 0.10, we identified genome-wide significant signals (HLOD > 3.6) on chromosome 8 (Table 3.3; Figure 3.3). We also identified significant signals on chromosome 18 under the dominant and recessive models (Table

3.3; Figure 3.4). These signals were robust to varying sub-pedigree structure (Chapter 3

Appendix: Supplemental Figures 9-10). Although we initially observed genome-wide significant signals (HLOD > 3.6) on chromosome 2 under the dominant and recessive models (Chapter 3 Appendix: Supplemental Table 7 and Supplemental Figure 11) and on

80 chromosome 15 under the recessive model (Chapter 3 Appendix: Supplemental Table 7 and Supplemental Figure 12), these signals greatly reduced when we re-performed our analyses with different sub-pedigree structures (Chapter 3 Appendix: Supplemental

Figures 13-14).

Table 3.3. Significant linkage loci identified from model-based multipoint linkage analyses of Amish families with disease allele frequency of 0.10. The peak HLOD score is the maximum HLOD score obtained for the designated chromosome.

Peak Region with 1-HLOD Support Model Chromosome HLOD HLOD > 3.6 Interval Score (cM) (cM) Recessive 8 4.03 97.28-99.61 96.745-105.65 Dominant 18 3.87 75.82-85.41 70.88-87.23 74.025-78.87 Recessive 18 4.27 74.39-78.51 81.31-82.18 83.46-84.80 HLOD, heterogeneity LOD; cM, centimorgans

81

Figure 3.3. HLOD scores obtained from multipoint linkage analysis in MERLIN under dominant and recessive models on chromosome 8. The black line denotes genome-wide significance (HLOD Score > 3.6). A genome-wide significant HLOD score of 4.03 was observed under the recessive model. Tick marks along the upper x-axis correspond to the marker positions.

82

Figure 3.4. HLOD scores obtained from multipoint linkage analysis in MERLIN under dominant and recessive models on chromosome 18. The black line denotes genome-wide significance (HLOD Score > 3.6). The maximum dominant HLOD score was 3.87. The maximum recessive HLOD score was 4.27. Tick marks along the upper x- axis correspond to the marker positions.

In our multipoint linkage analysis under the dominant model with a disease allele frequency of 0.10, we observed a region on chromosome 1 that was nearly genome-wide significant (maximum HLOD = 3.50 at 234.60 cM; Chapter 3 Appendix: Supplemental

Table 5). The peak of this region is about 16 MB from the complement factor H (CFH) gene, which has been repeatedly associated with AMD risk in the general population

(56). We performed a conditional linkage analysis on the markers on chromosome 1 with

83 liability classes for carriers of two AMD risk variants in CFH (Y402H and P503A) to determine whether these markers were driving the peak linkage signal on chromosome 1.

Of the 86 affected individuals in the linkage analysis, eight carry at least one copy of the

P503A variant, and 63 have at least one copy of the surrogate for Y402H. Three of the

P503A carriers are also heterozygous for the surrogate of Y402H. Our conditional linkage analysis on chromosome 1 using distinct classes for carriers of the surrogate for

CFH Y402H and/or the CFH P503A variant demonstrated that the CFH locus was contributing to the peak signal we observed in the unconditioned analysis (Figure 3.5).

Figure 3.5. Conditional linkage analysis on chromosome 1 taking into account the

Y402H and P503A carrier statuses. The black line denotes the HLOD scores from the multipoint linkage analysis performed without liability classes for the CFH variants. The orange line represents the distribution of HLOD scores from the conditional linkage

84 analysis. The peak HLOD score in the unconditioned analysis was 3.503 at 234.6cM

(212,619,339 bp, build 37). The peak HLOD score in the conditioned analysis was

2.1836 at 214.47 cM (197,070,697 bp, build 37). The CFH gene boundaries are

196,621,008-196,716,634 bp (Ensembl, build 37).

In Silico Functional Analysis of Genes from 1-HLOD Support Intervals of Linkage

Regions

We performed GO enrichment analyses using right-sided hypergeometric tests on the genes from the 1-HLOD support intervals our linkage regions to uncover if they are functionally related in particular GO terms. P-values were corrected for multiple testing using the Benjamini-Hochberg method. The region we identified with significant evidence of linkage on chromosome 8q21.11-q21.13 includes 19 genes (Chapter 3

Appendix: Supplemental Table 2). The 1-HLOD support interval of this peak contains 80 unique genes (Chapter 3 Appendix: Supplemental Table 2), including those that are involved in fatty acid binding and triglyceride catabolic processes (Figure 3.6a; Chapter 3

Appendix: Supplemental Table 8). The region we found with significant evidence of linkage on chromosome 18q21.2-21.32 under the dominant model includes 47 genes

(Chapter 3 Appendix: Supplemental Table 3). The 1-HLOD support interval of this peak contains 102 unique genes (Chapter 3 Appendix: Supplemental Table 3), including those that are implicated in serine-type endopeptidase inhibitor activity and positive regulation of epithelial to mesenchymal transition (Figure 3.6b; Chapter 3 Appendix: Supplemental

Table 9). The significant linkage region we observed on chromosome 18q21.2-21.31 under the recessive model includes 19 genes (Chapter 3 Appendix: Supplemental Table

85

4). The 1-HLOD support interval of this peak contains 46 unique genes (Chapter 3

Appendix: Supplemental Table 4), which were not overrepresented in any particular GO terms in our enrichment analysis. When we analyzed all the genes from our 1-HLOD support intervals on chromosome 8 and 18 together in ClueGO, the same GO terms from our chromosome-specific analyses were identified.

Figure 3.6. Gene Ontology (GO) Enrichment Networks for Genes from 1-HLOD

Support Intervals from (a) chromosome 8 and (b) chromosome 18. The size of the nodes in each network illustrate the number of mapped genes to the depicted GO term, and the color of the node shows the significance of the term in each enrichment analysis.

The leading term of each GO group is depicted in bold and represents the most significant ontology from the group. Full descriptions of these terms are available in

Chapter 3 Appendix: Supplemental Tables 8 and 9.

86

Discussion

We performed association and linkage analyses on Amish families from Ohio and

Indiana to uncover novel variants and loci for AMD. Using ROADTRIPS and kinship coefficients derived from the relationships detailed in the all-connecting path pedigree, we identified four novel variants associated with AMD in the Amish. Because we analyzed exome chip data, we recognize that these variants may not be the functional variants of these loci and that we might be observing their association signals because they are in linkage disequilibrium with the true functional variants for AMD. These four variants are independent and physically distant from the 52 AMD associated variants identified by the IAMDGC (56). Association signals were not detected in the 5 directly genotyped variants in the LCN9 gene on chromosome 9 or the 27 directly genotyped variants in the RTEL1/RTEL1-TNFRSF6B locus on chromosome 20 from the

IAMDGC GWAS for advanced AMD. However, a few nominal signals (p < 0.05) were found in the IAMDGC loci for DLGAP1 (n = 3 out of 144) on chromosome 18 and CGRRF1 (n = 1 out of 6) on chromosome 14 (56). These three DLGAP1 variants are not in linkage disequilibrium (LD) with one another (r2 < 0.2) but are within 500

KB of the variant we identified in the Amish. The CGRRF1 variant that achieved a p- value less than 0.05 in the IAMDGC GWAS and the variant we identified with

ROADTRIPS in the Amish are 500 bp apart but are not in LD (r2 = 0.0001 in CEU population). The variant we found on chromosome 9 (rs200437673) is about 36.6 MB,

65.1 MB, 61.9 MB, and 30.9 MB away from the lead variants identified by the IAMDGC on this chromosome (rs1626340, rs71507014, rs10781182, and rs2740488, respectively) (56). The variant we identified on chromosome 20 (rs151214675) is about

87

5.6 and 17.7 MB away from the lead variants from the IAMDGC GWAS on chromosome

20 (rs201459901 and rs142450006, respectively) (56). The IAMDGC did not identify any AMD associated variants on chromosome 18 (56). The lead variant in the AMD locus identified by the IAMDGC on chromosome 14 (rs61985136) (56) is almost 14 MB away from the variant we identified in the Amish (rs115333865). The four variants we identified in the Amish are rare to low-frequency in the IAMDGC data and the gnomAD database. While allele frequencies for these variants are fairly similar in advanced AMD cases and controls from the IAMDGC, these variants were significantly associated with

AMD in this Amish cohort. It was previously demonstrated that the Amish population has a lower genetic burden of known AMD variants (136); therefore, this may suggest that the Amish have different etiology for AMD than the general population of European descent.

Of the AMD associated variants that we identified in this study, only the RTEL1 variant (rs151214675) is cataloged in the ClinVar database, but it is considered a variant of unknown significance (https://www.ncbi.nlm.nih.gov/clinvar/RCV000504185/). This variant also maps to the naturally occurring RTEL1-TNFRSF6B read-through transcript of this locus, which is a candidate for nonsense-mediated mRNA decay and unlikely to yield a protein product. RTEL1 encodes a regulator of a telomere elongation helicase, which maintains telomere length and genomic stability (185-187). Telomere length has been hypothesized as a marker of aging because telomeres shorten with age and the presence of short telomeres directs the cell to enter senescence (188). Oxidative stress can also contribute to the reduction of telomere length in cells and has been characterized as a contributing factor to AMD pathophysiology (188, 189). Lipocalins constitute a family of

88 extracellular proteins that are responsible for transporting small lipids such as fatty acids, retinoids, and steroids (190). There are no LCN9 variants documented in the GWAS

Catalog (https://www.ebi.ac.uk/gwas/). However, another lipocalin family member

(lipocalin-2, LCN2) may modulate inflammation in retinal degeneration by promoting cell survival responses and regulating the production of inflammatory proteins (191).

Additionally, tear lipocalins constitute a large group of lipid-binding proteins in tears and may serve as potential biomarkers for diabetic retinopathy and Alzheimer’s disease (192,

193). DLGAP1 encodes a guanylate kinase-associated protein involved in protein-protein interactions with scaffold proteins in the post-synaptic density of excitatory synapses

(194, 195). DLGAP1 also is one of nine genes located in a chromosomal region (myopia-

2, MYP2) that demonstrated significant genetic linkage with autosomal dominant high myopia (196, 197). The proteins encoded by DLGAP1 and lipocalin genes (LCN1 and

LCN2) were also found to be expressed in the human choroid-retinal pigment epithelial complex (198). The variant we identified in DLGAP1 is about 3 MB away from an AMD variant (rs9973159) identified in a genome-wide association study accounting for age- stratified effects in the IAMDGC data (87). CGRRF1 encodes a cell growth regulator that has been associated with eye morphology in (199). While the protein product of this gene is not well-characterized, it has proposed as a modulator of Evi, which is a transmembrane protein involved in Wnt protein secretion (200). Canonical Wnt signaling has been implicated in retinal inflammation and may have a role in AMD pathology

(201). Additional studies will be required to elucidate the roles these genes might have in

AMD etiology.

89

In our linkage screens in the Amish, we identified a novel susceptibility locus for

AMD on chromosome 8q21.11-q21.13 under our recessive model. This region does not overlap with AMD loci previously identified with genome-wide linkage screens, which occur on chromosomes 8p21 and 8q11.2 (63). In their recent GWAS, the IAMDGC identified a susceptibility locus for AMD on chromosome 8p21.3 with the most significant signals coming from rs13278062 and rs79037040 in the

TNFRSF10A/LOC389641 and TNFRSF10A genes (56, 80). In our linkage analyses of chromosome 8, we observed a maximum HLOD score of 0.67 in this gene region under the recessive model with disease allele frequency of 0.10. The strongest single variant p- value observed by the IAMDGC in our linkage region was 6.54 x 10-4, which does not reach classical GWAS significance (p < 5 x 10-8). Common variants associated with optic nerve degeneration in glaucoma have also been identified in 8q22, which is near our linkage region (202).

Based on our GO enrichment analysis with ClueGO, the genes from the 1-HLOD support interval of our significant linkage region on chromosome 8q21.11-q21.13 have functional annotations related to triglyceride catabolic processes and fatty acid binding.

Components of lipid metabolism have been previously implicated in the genetic etiology and pathophysiology of AMD. Lipids comprise about 40 percent of the composition of drusen, which are a hallmark of AMD (203). In their most recent GWAS, the IAMDGC determined that several lipid pathways from the Reactome and GO pathway databases were enriched for genes from the 34 AMD susceptibility loci they identified (56). Known

AMD-associated genes (APOE and LIPC) (56) are also described as members of the triglyceride, neutral lipid, acylglycerol, and glycerolipid catabolic processes

90

(GO:0019433, GO:0046461, GO:0046464, and GO:0046503, respectively), which were enriched in genes from our linkage region. Additionally, although consuming high amounts of saturated and monounsaturated fats has been associated with AMD risk (42), dietary intake of omega-3-fatty acids has been attributed to reducing the risk of AMD

(40-43). There is not a consistent association between triglyceride levels and AMD risk.

Some studies observed lower triglyceride levels in patients with early AMD (46), the choroidal neovascularization subtype of advanced AMD (204), or any type of AMD

(205-207). However, other studies found higher triglyceride levels were associated with

AMD status (208, 209). Recently, levels of triglycerides were associated with a decreased risk for AMD and smaller drusen area (210). Further functional studies would need to be performed to definitively implicate the role of fatty acid binding and triglyceride levels in

AMD pathology.

From our multipoint linkage analyses with dominant and recessive models of inheritance, we identified novel AMD loci on chromosome 18q21.2-21.32 and 18q21.2-

21.31, respectively. These regions do not overlap with the AMD associated variant

(rs140250387) we identified in our association test. Although both analyses interrogate the genetic etiology of the trait, linkage analyses identify chromosomal segments co- segregating with traits in families (61); whereas, association analyses compare allele frequencies between individuals with and without the trait of interest (211). Therefore, their results can be independent of one another. The IAMDGC did not identify any AMD susceptibility loci on chromosome 18 in either of their recent GWAS, and another study using IAMDGC data identified an AMD gene x age interaction on chromosome 18p, which is independent of our linkage peak (56, 80, 87). The most notable single marker p-

91 value observed in the IAMDGC GWAS was 1.27 x 10-4, which occurred for a variant that falls within our linkage regions under both dominant and recessive models. Previous

GWAS have found suggestive AMD variants on chromosomes 18q22.1 (84) and 18q23

(212), which are located outside of the 1-HLOD support interval of our linkage region.

The C allele of the 18q22.1 variant (rs17073641) has a strong protective effect in never smokers but increases the risk of AMD in smokers (84). The 18q23 variant (rs1789110), which occurs near the myelin basic protein (MBP) gene, demonstrated suggestive evidence of association with geographic atrophy (212).

The 1-HLOD support interval of our significant linkage region on chromosome

18q21.2-21.32 contains genes that have functional annotations in GO processes such as positive regulation of epithelial to mesenchymal transition (EMT) and serine-type endopeptidase inhibitor activity. Another known AMD risk gene, TGFBR1, (56, 80) has been previously described as a part of the positive regulation of EMT (GO:0010718). The

EMT of retinal pigment epithelial (RPE) cells is considered one of several biological processes that are responsible for the formation of subretinal fibrosis in the macula following choroidal neovascularization in advanced AMD (213). EMT has also been proposed as a mechanism employed by RPE cells to survive in the stressful macular microenvironment during dry AMD progression (214). Therefore, it has been suggested that therapeutically targeting RPE cells in EMT may help treat patients with advanced

AMD and subretinal fibrosis (214-216). Several of the serine proteinase inhibitor (serpin) family members from the 1-HLOD support interval have also been described in the context of AMD-related pathologies. The proteins encoded by SERPINB5 (maspin) and

SERPINB13 (headpin) are known for their anti-angiogenic properties, which may have

92 implications for the choroidal neovascularization subtype of advanced AMD (217-219).

A genetic variant near SERPINB2 was identified as a risk factor for AMD in smokers and a protective factor in nonsmokers in gene-environment interaction analyses (84). Another gene from our 1-HLOD support interval, TCF4, has repeatedly been associated with

Fuchs endothelial dystrophy, which is characterized by the deterioration of the corneal epithelium (220-224). C3 and TIMP3 are known AMD-associated genes identified by the

IAMDGC that are also described in the following GO terms from our analyses: endopeptidase regulator activity (GO:0061135), peptidase inhibitor activity

(GO:0030414), and endopeptidase inhibitor activity (GO:0004866). Additionally, the only subtype-specific variant identified for the CNV subtype of advanced AMD was located near the MMP9 gene on chromosome 20, which is a member of the endopeptidase family that participates in extracellular matrix degradation (56).

Although the major peak in our initial linkage analysis on chromosome 1 was diminished in our conditional linkage analysis, a suggestive linkage peak (HLOD ~ 2) remained around 213-217 cM. This genomic region corresponds to a locus of approximately 5 MB that contains multiple genes including CFH and CFHR1-5. These genes are located within the regulation of complement activation (RCA) gene cluster on chromosome 1 and encode members of the factor H/CFHR family of proteins (225).

Polymorphisms and deletions in CFH and CFHRs have been previously associated with

AMD (56, 76-78, 143, 226, 227), including rare variants in CFH (136, 141) and a protective deletion of CFHR1 and CFHR3 (228-230). More detailed mapping of this region in our cohort will be necessary to determine which polymorphisms in these gene(s) are contributing to our linkage signal.

93

In this study, we performed association and linkage screens in Amish families from Ohio and Indiana. We identified four novel rare variants associated with increased

AMD risk in the Amish and novel susceptibility loci on chromosomes 8q21.11-q21.13

(maximum recessive HLOD = 4.03) and 18q21 (maximum dominant HLOD = 3.87; maximum recessive HLOD = 4.27). These findings suggest novel genetic factors for

AMD in the Amish and demonstrate the benefits of analyzing the genetic architecture of a complex trait, like AMD, in closely related individuals from an isolated population.

While family-based studies do not require sample sizes as large as traditional GWAS, the sample size for this study (n = 175) limited its power to detect novel AMD variants and loci. This could be remedied with the ascertainment of additional members of many

Amish communities. Additionally, more genetic data could be obtained from whole exome sequencing or dense genotyping for a panel such as the Multi-Ethnic Genotyping

Array (MEGA) of about 2 million exonic markers for GWAS and ancestry-specific variants. Additional studies are needed to replicate their associations and validate their roles in the development or progression of AMD in outbred populations. Analyses could also be performed for the subtypes or endophenotypes of AMD to identify subtype- specific trends or associations with other binary and quantitative traits that may be pertinent for AMD etiology. Such studies may help resolve the missing heritability of

AMD and provide novel insights to AMD pathophysiology.

94

CHAPTER 4

Pathway analysis integrating genome-wide and

functional data identifies PLCG2 as a candidate gene for

age-related macular degeneration

Andrea R. Waksmunski1,2,3, Michelle Grunin2,3, Tyler G. Kinzy3, Robert P. Igo, Jr.3,

Jonathan L. Haines1,2,3¶, and Jessica N. Cooke Bailey2,3¶; for the International Age-Related

Macular Degeneration Genomics Consortium (IAMDGC)^

1Department of Genetics and Genome Sciences, Case Western Reserve University,

Cleveland, Ohio, U.S.A.; 2Cleveland Institute for Computational Biology, Case Western

Reserve University, Cleveland, Ohio, U.S.A.; 3Department of Population and Quantitative

Health Sciences, Case Western Reserve University, Cleveland, Ohio, U.S.A.

¶ JLH and JNCB are Joint Senior Authors.

^The full list of IAMDGC members is provided in (231).

This chapter was adapted from the following published manuscript:

Waksmunski AR, Grunin M, Kinzy TG, Igo RP, Haines JL, Cooke Bailey JN, for the

International Age-Related Macular Degeneration Genomics Consortium. 2019. Pathway

Analysis Integrating Genome-Wide and Functional Data Identifies PLCG2 as a

Candidate Gene for Age-Related Macular Degeneration. Investigative Ophthalmology &

Visual Science 60(12):4041-4051. doi: https://doi.org/10.1167/iovs.19-27827.

Copyright 2019 The Authors

95

Abstract

Purpose: Age-related macular degeneration (AMD) is the worldwide leading cause of blindness among the elderly. Although genome-wide association studies (GWAS) have identified AMD risk variants, their roles in disease etiology are not well-characterized, and they only explain a portion of AMD heritability.

Methods: We performed pathway analyses using summary statistics from the

International AMD Genomics Consortium’s 2016 GWAS and multiple pathway databases to identify biological pathways wherein genetic association signals for AMD may be aggregating. We determined which genes contributed most to significant pathway signals across the databases. We characterized these genes by constructing protein- protein interaction networks and performing motif analysis.

Results: We determined that eight genes (C2, C3, LIPC, MICA, NOTCH4, PLCG2,

PPARA, and RAD51B) “drive” the statistical signals observed across pathways curated in the KEGG, Reactome, and GO databases. We further refined our definition of statistical driver gene to identify PLCG2 as a candidate gene for AMD due to its significant gene- level signals (p < 0.0001) across KEGG, Reactome, GO, and NetPath pathways.

Conclusions: We performed pathway analyses on the largest available collection of advanced AMD cases and controls in the world. Eight genes strongly contributed to significant pathways from the three larger databases, and one gene (PLCG2) was central to significant pathways from all four databases. This is the first study to identify PLCG2 as a candidate gene for AMD based solely on genetic burden. Our findings reinforce the utility of integrating in silico genetic and biological pathway data to investigate the genetic architecture of AMD.

96

Introduction

Vision loss is one of the most feared medical conditions because of its profound effect on day-to-day quality of life (232, 233). Age-related macular degeneration (AMD) is the most common cause of blindness in individuals over age 60 and is responsible for almost 10 percent of all cases of blindness in the world (10). AMD is a late-onset disease that results from the accumulation of drusen, inflammation, and photoreceptor loss in the macular region of the eye (10). This progressive disease is categorized as either early/intermediate or advanced AMD; the latter is further sub-classified as geographic atrophy (“dry AMD”, GA) or choroidal neovascularization (“wet AMD”, CNV) (10).

Early AMD is often asymptomatic and dry AMD is initially asymptomatic, but as the disease progresses, patients’ central vision begins to blur and diminish (10). Wet AMD is characterized by the growth of abnormal blood vessels in the macula, which ultimately results in severe vision loss (10).

Although both genetic and environmental factors shape AMD susceptibility, between 46 and 71 percent of the phenotypic variance of the disease is attributable to genetic factors (36). To understand the genetic architecture of AMD, the International

Age-Related Macular Degeneration Genomics Consortium (IAMDGC) performed a large-scale genome-wide association study (GWAS) for advanced AMD cases and controls. They identified 52 independent genetic variants across 34 susceptibility loci for advanced AMD that are estimated to explain nearly two-thirds of AMD heritability (56).

Therefore, about one-third of AMD heritability is still unexplained by the known loci.

Although other studies have identified additional risk loci with modest effect for

97 advanced AMD (87, 234), more comprehensive approaches beyond GWAS must be used to find the remaining heritable variation for AMD.

Rather than investigating associations between single genetic variants and a phenotype, pathway analysis of GWAS data interrogates alterations in biological pathways for a trait of interest. Generally, this is done by aggregating summary statistics for these variants into genes, which are then grouped into pathways based on data in curated pathway databases (235). We hypothesize that applying this more comprehensive approach may help elucidate the genetic etiology of advanced AMD that has been indiscernible from GWAS. In this study, we performed in silico pathway analysis using the Pathway Analysis by Randomization Incorporating Structure (PARIS) software to identify biological pathways and processes enriched in genetic variation potentially associated with AMD in individuals of European descent. Because nomenclature, foci, and definitions vary across pathway databases (236), we utilized multiple databases to complement and validate our findings. Additionally, we sought to determine the central causal genes that “drive” the statistical signals observed for significant pathways identified by PARIS.

Methods

Study Subjects and GWAS Summary Statistics

The participants for this study were previously ascertained by cohorts in the

IAMDGC as described (56). This included 16,144 individuals with advanced AMD and

17,832 unaffected individuals. Of the advanced AMD cases, 3,235 individuals have GA only and 10,749 have CNV only. The remaining cases have both GA and CNV. All of the cases and controls used for our analyses were of European ancestry. All participants

98 provided informed consent, and the study protocol was approved by institutional review boards as previously described (56). Data were previously collected in accordance with the tenets of the Declaration of Helsinki. The summary statistics we analyzed in this study were obtained in the 2016 GWAS performed by the IAMDGC (56). Specifically, these data include p-values for 445,115 directly genotyped common and rare variants from the advanced AMD case-control results. The genotypes for these variants were generated from an Illumina HumanCoreExome array that was designed with additional genome-wide and custom content for AMD (56).

PARIS: Knowledge-Driven Pathway Analysis of GWAS Data

To identify biological pathways enriched in genetic variants possibly contributing to advanced AMD risk, we performed in silico pathway analysis using the PARIS V2.4 software (237, 238). PARIS uses variant summary statistics from GWAS, clusters them into features defined by the linkage disequilibrium (LD) structure of the genome based on a reference catalog of common genetic variants, and assigns significance to pathways based on permutation of the genome (237, 238). In our analyses, we performed 100,000 permutations. PARIS also assigns empirical p-values to the genes comprising a pathway based on permutation testing of features within each of the genes (237, 238).

We performed PARIS using multiple pathway databases including Kyoto

Encyclopedia of Genes and Genomes (KEGG) (239), Reactome (240), Gene Ontology

(GO) (241), and NetPath (242). KEGG, Reactome, and GO databases are extensive, curated biological pathway data repositories. NetPath is a specialized database that covers signaling pathways. Pathways with a p-value less than 0.0001 were prioritized for further investigation. This permutation p-value was calculated using the following equation: p =

99

(1+b)/(1+M), where M = the number of permutations and b is the number of randomly sampled permutation scores that are greater than the observed score. To determine if the pathway associations we observed were driven by known AMD loci, we re-performed our pathway analyses excluding variants from the 34 susceptibility loci identified by the

IAMDGC (defined by the 52 genomic variants) and their proxies (r2 ≥ 0.5) within 500 kilobases (56).

Identification of Statistical Pathway Driver Genes

Due to disparate nomenclature and composition of pathways in the databases, we identified genes that overlapped across significant pathways within a database and across databases (regardless of pathway). This served to internally validate and complement our results. To interrogate the significant signals obtained from the pathways identified by

PARIS, we queried which significant (p < 0.0001) genes overlapped among the significant (p < 0.0001) pathways within a pathway database. These genes were compared across the analyses done with each of the pathway databases (KEGG,

Reactome, GO, and NetPath) to find statistical driver genes that had significant signals across three or more databases for the advanced AMD results.

Protein-Protein Interaction Network for Statistical Pathway Driver Genes

We searched the Search Tool for Recurring Instances of Neighbouring Genes

(STRING) database (243) version 10.5 for protein-protein interactions involving the proteins encoded by the genes identified as statistical driver genes. The STRING database is composed of known and predicted protein-protein interactions based on data from curated interactions databases, high-throughput lab experiments, co-expression, and text mining in the literature. We used the high confidence (0.700) minimum required

100 interaction score to construct the protein-protein networks of interactions based on experimental data, database entries, and co-expression.

Motif Analysis for Statistical Pathway Driver Genes

We extracted reference genome sequences for the statistical driver genes using the

UCSC Genome Table Browser (244). We included 600 nucleotides upstream from the first exon and the 5’ untranslated region (UTR) in the sequences for each gene. To identify potential sequence motifs for each of these gene sets, we utilized the MEME software suite (245). Sequences were considered motifs if their lengths were between 6 and 50 nucleotides. MEME was not required to find a motif in every sequence but motifs were required to have an E-value of 0.0001. Each motif from the gene sets was then investigated in Tomtom, which looks for transcription factors (TF) that are associated with the motif. TF binding motifs were evaluated based on the known human TF database from JASPAR (246) using HOCOMOCO (247). To validate the motifs found and to test the null hypothesis of random motifs found unrelated to the statistical driver genes, ten permutations were run on a random gene set generator for eight genes and performed the same analyses via MEME and Tomtom. We removed motifs and TFs that appeared in both the random and actual gene sets from further analysis.

Results

In Silico Pathway Analysis

We identified several biological pathways and processes from KEGG, Reactome,

GO, and NetPath databases (Table 4.1; Chapter 4 Appendix: Supplemental Tables 1-4) to be significantly associated with advanced AMD using PARIS. A pathway was considered significant if it had a pathway-level p-value less than 0.0001. The vast majority of

101 pathways in the four databases were not significant (Table 4.1). When we re-performed our pathway analyses excluding the 34 known AMD loci (56), ~40 percent of the previously significant KEGG (n = 10) and GO (n = 53) pathways and over 60 percent of the Reactome (n = 32) pathways remained significant (Chapter 4 Appendix:

Supplemental Tables 1-3). The single NetPath pathway that was significant in our initial analysis (Wnt, Chapter 4 Appendix: Supplemental Table 4) was no longer significant in this sensitivity analysis (p = 0.00215).

Table 4.1. Significantly associated pathways across multiple pathway databases for advanced AMD. Pathways were considered significant if they obtained an empirical p- value less than 0.0001.

Count of Total Proportion of Significant Database Significant Entries in Pathways in Database Pathways Database NetPath 1 26 0.038 KEGG 25 293 0.085 Reactome 50 1,748 0.029 GO 145 12,765 0.011 p < 0.0001

Statistical driver genes among advanced AMD-associated pathways

Because pathway structure and terminology vary across databases, we determined which genes were significantly contributing to the overall pathway signals detected by

PARIS. We compared the significant genes in significant pathways from KEGG,

Reactome, and GO (Figure 4.1; Table 4.2) and identified eight such genes. Upon removing variants from our analyses that fell within the 34 known AMD susceptibility loci as defined in Supplementary Table 5 in the IAMDGC GWAS (56), we found that

102 two genes (PPARA and PLCG2) remained statistical driver genes across associated pathways from KEGG, Reactome, and GO.

Figure 4.1. Comparison of significant genes from AMD-associated KEGG,

Reactome, and GO pathways identified by PARIS. Eight genes demonstrated significant signals across all three comparisons and are summarized in Table 4.2.

103

Table 4.2. Eight statistical pathway driver genes from significant KEGG, Reactome, and GO pathways. The cross-database comparison of significant genes from significantly associated pathways.

Statistical Pathway Driver Genes Implicated in the 2016 IAMDGC GWAS Loci Gene Chromosome Full Gene Name (HGNC) C2 6 complement C2 MICA 6 MHC class I polypeptide-related sequence A NOTCH4 6 notch receptor 4 RAD51B 14 RAD51 paralog B LIPC 15 lipase C, hepatic type C3 19 complement C3 Novel Genes Identified with Pathway Analysis with PARIS Gene Chromosome Full Gene Name (HGNC) PLCG2 16 phospholipase C gamma 2 PPARA 22 peroxisome proliferator activated receptor alpha

To identify evidence of protein-protein interactions (PPI) for the proteins encoded by the eight statistical driver genes in our analyses (C2, C3, LIPC, MICA, NOTCH4,

PPARA, PLCG2, and RAD51B), we queried the STRING database. Each of these proteins have multiple binding partners identified through functional studies or in silico predictions (Figure 4.2). When considering no more than 50 interaction partners for each of the eight proteins, we found three distinct clusters of protein-protein interactions

(Figure 4.2). One cluster connects MICA, PLCG2, LIPC, C2, C3, and other immune- related proteins (Figure 4.2A); another connects NOTCH4, PPARA, and other signaling proteins (Figure 4.2B); and the third contains RAD51B and other DNA repair proteins

(Figure 4.2C).

104

Figure 4.2. PPI network generated for the proteins encoded by the eight statistical driver genes. No more than 50 interactions from the STRING database were displayed for each input protein. This threshold of interactions enabled the connection of all eight queried proteins to a network. Three distinct networks were defined by the proteins encoded by the statistical driver genes: (A) Network connecting MICA, PLCG2, LIPC,

C2, C3, and other immune-related proteins. (B) Network connecting NOTCH4, PPARA, and other signaling proteins. (C) Network connecting RAD51B and other DNA repair proteins. Types of interaction sources include co-expression (black), experimental data

(magenta), and curation in databases (cyan).

105

Using the Multiple EM (Expectation Maximization) for Motif Elucidation

(MEME) software suite, we identified sequence motifs with known transcription factor binding sites near the eight statistical driver gene sequences from the UCSC Genome

Browser Table Browser (244). Five motifs were present for most of the statistical driver genes and contain binding sites for transcription factors (Table 4.3). Only one sequence motif

([GCA][AC][CT]AG[AT]G[CA][TGA]A[AG][AT][CA]T[CA][CG][GA]T[CG][TG][C

A]A[AG]AAA[ATG][AG]AAA[AT][CA][AC]A[AC]A[AC][AT][AT]A) was near all eight statistical driver genes and contained binding sites for 12 transcription factors.

Table 4.3. Sequence motifs with transcription factor (TF) binding sites near statistical driver genes. For each motif, we identified TFs associated with the motif sequence using Tomtom. The p-value represents the strength of the match between the sequence motif identified adjacent to the statistical driver genes and the curated sequences of the TF binding motifs in the HOCOMOCO database.

Transcription Statistical Motif Consensus Sequence p-value Factor Driver Genes KLF5 0.0095 KLF12 0.011 G[CG][TG]TG[AT]ACC[CAT THA11 0.012 C2 ][AG]G[GT][AG]GG[CT][GT ZN563 0.013 LIPC ][GT][AT][GA][CG]TT[GC]C IRF2 0.013 MICA [AT]G[TA]GAGCC[GT]AGA NFIA 0.017 NOTCH4 [TA]C[GA][CG][GT][CT]C[A ZN449 0.019 PPARA T][CG] ELF2 0.020 RAD51B ZBTB6 0.021 RARG 0.024 [CT][TA]G[GT]C[TC]AA[CA PIT1 0.0052 C2 ][AG][CT][AG][GC][TA][GC SOX5 0.0093 LIPC ]AAACCC[CA][GC][TA][CA MICA ][TA]C[TC]A[CT][TC][AC]A AIRE 0.010 NOTCH4

106

A[AG]ATA[CT][AT][AG][A PPARA CEBPE 0.011 C]AAA[AT]TA[GT][TCG] RAD51B MAFB 0.0089 [GA][CG]CTG[CT][AT][GA][ MAFF 0.010 C2 TA]CC[CA]AGCT[AGC][CT] LIPC HTF4 0.011 [TA][CGA][GT][GT][GT][AT MICA MAFK 0.012 ][GC]G[CTC][TG][GA]AG[G NOTCH4 FOXA2 0.012 T]CAG[GA][AT]G[AC][AC][ PPARA TGC] TFE2 0.014 RAD51B BACH2 0.021 TFAP4 0.0047 C2 ZN322 0.0062 LIPC [GA]C[CT]T[CT][GC][GA]C ZNF41 0.011 MICA C[TC]CCCAAA[GC][TC]GC CRX 0.013 NOTCH4 TGGGAT[TC]AC[AG]GGCG ZIC3 0.015 PPARA T[GC]A[GA]CC NKX21 0.020 PLCG2 GLI3 0.024 RAD51B HEN1 0.0025 ZSC31 0.0029 PKNX1 0.0034 C2 [GCA][AC][CT]AG[AT]G[C NKX21 0.0037 C3 A][TGA]A[AG][AT][CA]T[C PBX3 0.0066 LIPC A][CG][GA]T[CG][TG][CA] TYY1 0.010 MICA A[AG]AAA[ATG][AG]AAA[ NR2C1 0.011 NOTCH4 AT][CA][AC]A[AC]A[AC][A VDR 0.014 PPARA T][AT]A CREB1 0.016 PLCG2 RFX2 0.021 RAD51B ATF1 0.021 CEBPE 0.022

We further restricted our definition of statistical pathway driver gene to include genes that also strongly contributed to AMD associated pathways from NetPath. This enabled us to further support PLCG2 as a candidate gene for advanced AMD (Figure

4.3). This gene encodes a phosphodiesterase that is involved in phosphatidylinositol signaling and several other immune, metabolic, and signaling pathways curated in

KEGG, Reactome, GO, and NetPath (Figure 4.3). We interrogated potential interaction partners for the PLCG2 protein by constructing a PPI network for PLCG2 using the

107

STRING database (Figure 4.4). We also determined if PLCG2 harbored any suggestive associations with AMD in the IAMDGC data. None of the p-values for the 65 individual

PLCG2 variants we analyzed with PARIS reaches genome-wide significance (p < 5 x 10-

8), but several of them (n = 14) were nominally associated (p < 0.05) with advanced

AMD (Figure 4.5). The single-variant association results from PLCG2 are not highly correlated based on LD structure using the 1000 Genomes Project (Figure 4.5), which indicates that the concentration of nominally significant results in this gene is not merely due to LD.

108

Figure 4.3. Identification of PLCG2 as a candidate gene for advanced AMD. A comparison of the significant genes from significant KEGG, GO, NetPath, and Reactome pathways in our PARIS pathway analysis converged on one gene (PLCG2) which encodes a protein that is common to several pathways.

109

Figure 4.4. PPI network generated for PLCG2. No more than 10 interactions were displayed. Types of interaction sources include co-expression (black), experimental data

(magenta), and curation in databases (cyan). Protein names include PLCG2

(Phospholipase C Gamma 2), EGFR (Epidermal Growth Factor Receptor), HCK (HCK

Proto-Oncogene, Src Family Tyrosine Kinase), ITK (IL2 Inducible T Cell Kinase),

LCP2 (Lymphocyte Cytosolic Protein 2), LYN (LYN Proto-Oncogene, Src Family

Tyrosine Kinase), PIK3R1 (Phosphoinositide-3-Kinase Regulatory Subunit 1), SYK

(Spleen Associated Tyrosine Kinase), and TEC (Tec Protein Tyrosine Kinase).

110

Figure 4.5. LocusZoom Plot of P-Values for the 65 PLCG2 variants in the IAMDGC advanced AMD case-control analysis. These variants were either within the gene boundaries (human genome build 37) of PLCG2 or within 50 kilobases of these boundaries. P-values were generated by the IAMDGC in their advanced AMD case- control genome-wide association study published in 2016 (56). LD estimates (r2) are based on the European (EUR) population from the 1000 Genomes Project (November

2014 release).

Discussion

Using knowledge-driven pathway analysis on GWAS data, we uncovered pathways that were enriched in variation potentially associated with AMD in individuals of European descent. Our study is the first to perform such analyses on the largest available advanced AMD case-control association dataset. We found several signaling,

111 immune, metabolic, and disease-related pathways from the KEGG, Reactome, GO, and

NetPath databases that are associated with advanced AMD. Our sensitivity analysis demonstrated that several of the pathways from KEGG, Reactome, and GO (Chapter 4

Appendix: Supplemental Tables 1-3) remained associated with advanced AMD following the exclusion of the 34 AMD susceptibility loci described earlier (56). This suggests that modest effects aggregating in these pathways may contribute to the missing heritability of

AMD. Although the Wnt pathway from NetPath was no longer significant in our sensitivity analysis, the Wnt signaling pathway from GO remained associated with AMD.

This results from the difference in the pathway definitions. These pathways are nearly identical in size (n = 45 and 41 genes for NetPath and GO, respectively); however, only two genes overlap between them (PLCG2 and FZD4). Furthermore, the Wnt signaling pathway in KEGG (n = 140 genes) and the signaling by Wnt pathway in Reactome (n =

294 genes) only achieved pathway-level p-values of 0.032 and 0.037 in our analyses, respectively. These pathway definition differences further justify our use of multiple curated databases in our analyses to uncover AMD associated pathways and genes driving their statistical significance.

Due to varying nomenclature for pathways across databases and as a way of internal validation, we focused on eight statistical driver genes (C2, C3, LIPC, MICA,

NOTCH4, PPARA, PLCG2, and RAD51B) that were consistently significant across GO,

Reactome, and KEGG pathways. PPARA and PLCG2 were not previously identified as a part of the 34 IAMDGC loci associated with AMD risk. The strongest single-marker p- values observed in PLCG2 and PPARA were 2.05 x 10-4 and 3.10 x 10-5, respectively, and do not meet the classical GWAS significance levels. In our sensitivity analysis,

112

PPARA and PLCG2 remained statistical driver genes in pathways from KEGG,

Reactome, and GO, suggesting that pathway analysis can identify novel AMD genes.

Additionally, the aggregation of nominally significant independent variants in PLCG2 suggests that the gene-wide significance of PLCG2 is greater than that of the individual variants and emphasizes the power of pathway analysis for identifying gene- wide signals rather than single-variant associations.

DNA motif analysis identified five sequence motifs adjacent to the eight statistical driver genes in their promoter regions. These motifs represent sites of known transcription factor binding and suggest that the expression of these genes may be controlled by similar mechanisms. One motif

([GCA][AC][CT]AG[AT]G[CA][TGA]A[AG][AT][CA]T[CA][CG][GA]T[CG][TG][C

A]A[AG]AAA[ATG][AG]AAA[AT][CA][AC]A[AC]A[AC][AT][AT]A) was adjacent to the start positions of all eight statistical driver genes and contains known binding sites of several transcription factors (Table 4.3). Functional studies are required to confirm these in silico findings and elucidate the transcriptional mechanisms of these statistical driver genes in the context of AMD.

One gene, PLCG2, was central to multiple pathways in all four databases and remained significant after our sensitivity analysis. PLCG2 encodes a signaling enzyme

(phospholipase C gamma 2, PLCG2) that utilizes calcium to catalyze the hydrolysis of

PIP2 into second messengers IP3 and DAG (248). These molecules initiate intracellular calcium flux and activate protein kinase C, respectively (248). The enzymatic activity of

PLCG2 results from tyrosine phosphorylation performed by growth factor receptors, immune receptors, and G protein-coupled receptors as well as the activity of lipid-derived

113 second messengers in the cell (248). This enzyme is highly expressed in cells of hematopoietic origin and is responsible for regulating immune responses and platelet adhesion and spreading (249-253).

The PLCG2 protein interacts with several members (HCK, LYN, PIK3R1, and

SYK) of the microglia pathogen phagocytosis pathway in humans (254). Its interaction partners also play roles in oxidative stress, angiogenesis, and platelet activation. BLNK and BTK are central to facilitating B cell apoptosis following oxidative stress (255, 256).

Exposure to oxidative stress activates EGFR, which promotes retinal epithelial cell health and survival through EGFR/Akt, PI3K, and ERK/MAPK signaling pathways (257, 258).

EGFR downstream signaling also contributes to retinal pigment epithelial cell proliferation and migration in wound healing (259, 260). PIK3R1 is a regulatory subunit of PI3K in the PI3K/Akt/mTOR pathway, which is a possible target for treating ocular neovascularization (261). PI3K and Tec protein kinases regulate platelet activation (262), and signaling cascades from LCP2 (also called SLP-76) and SYK are responsible for separating blood and lymphatic vasculatures in the human body (263). These interactions and processes, coupled with PLCG2’s role in the VEGF pathway (264, 265), could be pertinent for understanding the role of PLCG2 and its interaction partners in the choroidal neovascularization subtype of advanced AMD. In the CNV-only case-control GWAS performed by the IAMDGC, no PLCG2 variants were genome-wide significant; however,

13 variants were nominally associated with CNV (p < 0.05) (56). Of the 65 PLCG2 variants analyzed by PARIS, 31 exhibited lower p-values in the CNV-specific IAMDGC

GWAS than in the combined advanced AMD IAMDGC GWAS.

114

Heterozygous gain-of-function mutations in PLCG2 result in constitutive phospholipase activity and PLCG2-associated Antibody Deficiency and Immune

Dysregulation (PLAID), which is characterized by immunodeficiency and autoimmunity

(266). This gene was recently identified as a candidate gene for rheumatoid arthritis (RA) due to its overexpression in RA patients compared to controls (267). Genetic risk scores for RA are associated with increased AMD risk (268), and individuals with RA are at a higher risk of developing AMD (269). Gene expression profiles for PLCG2 as well as

PPARA have been observed in AMD and control retinas (270). In unaffected retinal tissue, PPARA is highly expressed; whereas, PLCG2 is lowly expressed (271, 272).

Neither of these genes were significantly expressed in the age-adjusted analysis of CNV retinas (270). PLCG2 is highly expressed in microglia (273) and has been previously implicated in the genetic etiology of late-onset Alzheimer’s disease (LOAD) (274, 275).

Specifically, genome-wide association studies identified a protective effect for a rare variant in the coding region of PLCG2 on LOAD (274, 275). This variant (rs72824905) is considered hypermorphic because the mutant enzyme experiences a small increase in enzymatic activity compared to wild type enzyme, which would imply that mildly activating PLCG2 could be a therapeutic intervention for LOAD (273). This variant was not significantly associated with advanced AMD in the IAMDGC GWAS (p = 0.57) (56).

Functional studies would need to be performed to determine if PLCG2’s enzymatic activity could be modulated by a similar mechanism in patients with AMD.

Although PLCG2 has not been previously associated with AMD in a case-control

GWAS, variants in this gene were associated with AMD when accounting for birth control pill usage in women with CNV (276). These associations were undetectable when

115 gene-environment interactions between PLCG2 variants and exogenous estrogen exposure were not considered (276). Other interaction studies have identified PLCG2 variants as genetic modifiers of previously identified associations among menopausal hormone therapy, mammographic density, and breast cancer risk, which could suggest sex-specific effects of genetic variants in this gene for disease risk (277, 278).

While our study provides in silico evidence for the roles of these statistical driver genes and pathways in AMD, it does not biologically confirm them. Functional studies are required to determine causality for these genes and pathways in patients with AMD.

Knowledge-driven pathway analyses are subject to the quality and coverage of the knowledge in a given database. We attempted to circumvent this limitation by utilizing multiple databases in our analyses and integrating our results. The GWAS data used in this study were generated from individuals of European descent. Consequently, these findings may not be applicable to non-European populations. The IAMDGC GWAS dataset is considered the largest available dataset for advanced AMD cases and controls in the world. We are unaware of any comparable datasets available for replication.

116

CHAPTER 5

Statistical driver genes as a means to uncover missing

heritability for age-related macular degeneration

Andrea R. Waksmunski1,2,3*, Michelle Grunin2,3*, Tyler G. Kinzy3, Robert P. Igo, Jr.3,

Jonathan L. Haines1,2,3¶, and Jessica N. Cooke Bailey2,3¶

1Department of Genetics and Genome Sciences, Case Western Reserve University,

Cleveland, Ohio, U.S.A.; 2Cleveland Institute for Computational Biology, Case Western

Reserve University, Cleveland, Ohio, U.S.A.; 3Department of Population and Quantitative

Health Sciences, Case Western Reserve University, Cleveland, Ohio, U.S.A.

*ARW and MG are Joint First Authors.

¶ JLH and JNCB are Joint Senior Authors.

A modified version of this chapter was submitted for publication.

117

Abstract

Background: Age-related macular degeneration (AMD) is a progressive retinal disease contributing to blindness worldwide. Multiple estimates for AMD heritability (h2) exist; however, a substantial proportion of h2 is not attributable to known genomic loci. The

International AMD Genomics Consortium (IAMDGC) gathered the largest dataset of advanced AMD (ADV) cases and controls available and identified 34 loci containing 52 independent risk variants defining known AMD h2. To better define AMD heterogeneity, we used Pathway Analysis by Randomization Incorporating Structure (PARIS) on the

IAMDGC data and identified 8 statistical driver genes (SDGs), including 2 novel SDGs not discovered by the IAMDGC. We chose to further investigate these pathway-based risk genes and determine their contribution to ADV h2, as well as the differential ADV subtype h2.

Results: We performed genomic-relatedness-based restricted maximum-likelihood

(GREML) analyses on ADV, geographic atrophy (GA) and choroidal neovascularization

(CNV) subtypes to investigate the h2 of genotyped variants on the full DNA array chip,

34 risk loci (n = 2,758 common variants), and 52 variants from the IAMDGC 2016

GWAS, as well as combinations of those with the h2 of the 8 SDGs, specifically the novel

2 SDGs, PPARA and PLCG2. Full chip h2 was 44.05% for ADV, 46.37% for GA, and

62.03% for CNV. The lead 52 variants’ h2 (ADV: 14.52%, GA: 8.02%, CNV: 13.62%) and 34 loci h2 (ADV: 13.73%, GA: 8.81%, CNV: 12.89%) indicate that known variants contribute ~14% to ADV h2. SDG variants account for a small percentage of ADV, GA, and CNV heritability, but estimates based on the combination of SDGs and the 34 known loci are similar to those calculated for known loci alone. We identified modest epistatic

118 interactions among variants in the 2 SDGs and the 52 IAMDGC variants, including modest interactions between variants in PPARA and PLCG2.

Conclusions: Pathway analyses, which leverage biological relationships among genes in a pathway, may be useful in identifying additional loci that contribute to the heritability of complex disorders in a non-additive manner. Heritability analyses of these loci, especially amongst disease subtypes, may provide clues to the importance of specific genes to the genetic architecture of AMD.

Background

Genome-wide association studies (GWAS) have been instrumental in identifying genomic variants associated with complex traits for over 10 years (279). GWAS detect such associations by comparing allele frequencies in individuals with and without a trait of interest in a specific population (211). These methods have been successfully applied to find large numbers of disease-associated variants that contribute to the trait’s heritability (60). Heritability is defined as the fraction of phenotypic variance explained by genetic variation in the context of a specific range of environmental variation (60).

Broad-sense heritability (H2) is the proportion of phenotypic variation that includes dominance and epistasis; whereas, narrow-sense heritability (h2) is the proportion of phenotypic variation of additive genetic effects. Common variants may capture up to about two-thirds of narrow-sense heritability for age-related macular degeneration

(AMD), but despite this, much of AMD heritability is still unexplained by known variants (57). The topic of missing heritability has been discussed, especially with regards to complex diseases, and may be attributable to non-additive effects of genomic variants that are not discernible from traditional GWAS (60).

119

GWAS have been remarkably successful for identifying genomic loci contributing substantially to age-related macular degeneration (AMD) risk. This progressive, adult-onset condition is among the leading causes of blindness in the world in individuals over 60 and is expected to become a significant health burden as the aging population increases in size (11). AMD leads to the decline of central vision in patients as a result of lipid deposits (drusen), photoreceptor loss, and inflammation in the macula

(10, 26). It is clinically characterized into multiple subtypes: early, intermediate, or advanced AMD stages based on disease severity. Advanced AMD (ADV) is further sub- categorized into geographic atrophy (GA; “dry” AMD) or choroidal neovascularization

(CNV; “wet”).

The International AMD Genomics Consortium (IAMDGC) performed the largest case-control GWAS to date for ADV and identified 52 independent common and rare variants from 34 susceptibility loci, the highest risk loci being the CFH and

ARMS2/HTRA1 genes (56). These 52 genomic variants contribute to about half of the genomic heritability for ADV, which leaves nearly half of ADV heritability unexplained

(56).

In contrast to traditional case-control GWAS, in silico pathway analyses of

GWAS summary statistics identify biological pathways, which are defined by interactions of genes for a common biological function, harboring excesses of genomic variants that may be associated with a trait (235, 280). They accomplish this by grouping variants into features that are then merged into pathways based on curated information in publicly available pathway databases (235, 280). Because pathway analyses focus broadly on the collection of nominal genetic variants in biological pathways, they are not

120 limited to assessing additive effects of individual variants on the trait and may be leveraged to identify genetic variance with non-additive effects. Ultimately, these analyses provide insights into trait-associated biological processes and suggest which genes are most pertinent for these pathway-level associations (237, 238). However, they do not estimate the contribution of genetic variants in these genes and pathways to the trait’s heritability (85).

To uncover genomic loci undetectable by GWAS, we performed in silico, knowledge-driven pathway analyses of the summary statistics from the IAMDGC 2016

GWAS (231) using Pathway Analysis by Randomization Incorporating Structure

(PARIS) (237, 238). In our comprehensive approach, we utilized multiple pathway databases to determine which genes were consistently contributing to significant pathway signals for ADV (231). We identified eight statistical driver genes (SDGs) that were significantly contributing to the significant AMD-associated pathways from PARIS: C2,

C3, LIPC, MICA, NOTCH4, PLCG2, PPARA, and RAD51B. Of these eight SDGs, two genes (PLCG2 and PPARA) fell outside of the 34 AMD susceptibility loci identified by the IAMDGC GWAS (56); we showed that these loci may be associated with ADV risk

(231).

While the 2016 IAMDGC GWAS uncovered several AMD loci that explain a large portion of AMD heritability (56), their study did not investigate potential non- additive effects of AMD risk genes. Pathway analyses of GWAS data consider known biological relationships among genes in a pathway; therefore, we were able to identify two novel AMD genes (PLCG2 and PPARA) that were not found in the IAMDGC

GWAS. To further examine the potential role of the 2 novel SDGs, we calculated the

121 proportion of ADV variance explained by (i) common variants in PPARA and PLCG2,

(ii) the 8 SDGs identified by pathway analysis, and (iii) known and novel AMD loci identified by the IAMDGC. We also applied this approach to the subtypes of ADV (GA and CNV) to elucidate whether these variants contribute more to the heritability of AMD in a particular subtype of ADV. We further interrogated the possible epistatic interactions among lead variants in the known AMD genes as well as our novel SDGs to elucidate if their contributions to AMD heritability could be attributable to non-additive effects.

Results

Study Data for ADV, GA, and CNV Analyses

We aimed to determine the proportion of ADV, GA-specific, and CNV-specific heritability explained by variants in and within 50 kilobasepairs (kb) of the SDGs identified by PARIS (Tables 5.1 and 5.2). We extracted 2,173 variants from all 8 previously identified SDGs and 234 variants from the 2 novel SDGs (PPARA and

PLCG2) based on their gene boundaries (Methods). However, we found that several of the variants in the SDGs either had very low minor allele frequencies (MAF) or had a low genotype call rate in the samples we analyzed and, therefore, they were removed prior to genetic relationship matrix (GRM) creation (Table 5.2).

Table 5.1. Demographics of participants in the study data.

Data ADV GA CNV Total Samples 33,976 21,067 28,581 Cases 16,144 3,235 10,749 Controls 17,832 17,832 17,832 Values represent counts of samples in the IAMDGC data.

122

Table 5.2. Characteristics of the marker data extracted from the IAMDGC exome chip.

Data ADV GA CNV IAMDGC Chip 553,261 553,261 553,261 8 SDGs ± 50 kb 1,122 1,122 1,122 2 Novel SDGs ± 50 kb 79 79 79 Lead IAMDGC Variants 52 52 52 34 AMD Loci 2,758 2,758 2,758 34 AMD Loci and 2 Novel SDGs ± 50 kb 2,835 2,992 2,992 34 AMD Loci and 8 SDGs ± 50 kb 3,351 4,411 3,351 Values represent counts of pruned variants based on MAF and call rate. The 8 SDGs are

C2, C3, LIPC, MICA, NOTCH4, PLCG2, PPARA, and RAD51B. The 2 novel SDGs are

PLCG2 and PPARA. The different numbers of variants for the expanded loci analyses were attributable to the different MAFs observed for each of the advanced AMD subtypes: ADV, GA, and CNV.

Narrow-sense heritability explained by variants in SDGs for ADV, GA, and CNV

To determine whether the SDGs contribute to the missing heritability of ADV or its subtypes (GA and CNV), we performed GREML analyses of variants from the SDGs.

We found that the percent of ADV risk explained by the common, high call rate variants in the 8 SDGs was 3.76% (S.E. = 0.39) (Table 5.3). This was higher than the estimate observed for GA and comparable to CNV h2 estimate (2.53% and 3.71%, respectively)

(Table 5.3). The 2 SDGs contribute to 0.097%, 0.12%, and 0.18% ADV, GA, and CNV risk, respectively (Table 5.3).

123

(%)

S.E.

1.47 0.41 0.08 2.36 0.82 0.74 0.74

CNV

62

(%)

3.71 0.18 13.6 12.9 12.2 12.6

Estimate

1.5 0.7

(%)

S.E.

2.13 0.36 0.08 0.75 0.63

GA

8.6

(%)

46.4 2.53 0.12 8.02 8.81 8.78

Estimate

(%)

S.E.

1.29 0.39 0.05 2.48 0.83 0.81 0.73

GA, and based on variant CNV our sets.

ADV

0.1

(%)

44.1 3.76 14.5 13.7 13.5 13.1

Estimate

culated for the full IAMDGC chip and lead IAMDGC variants, we did not apply variant IAMDGC did not chip variants, we apply culated lead full IAMDGC for and the

34 AMD Loci Loci 34 AMD

8 SDGs ± 50 kb

IAMDGC Chip IAMDGC

2 Novel SDGs ± 50 kb 2 Novel

Lead IAMDGC VariantsLead IAMDGC

Data

34 AMD Loci and Loci 8 SDGs34 AMD ± 50 kb

34 AMD Loci and SDGs Loci ± 50 kb 2 Novel 34 AMD

Table 5.3. Heritability estimates for advanced AMD (ADV), for advanced AMD Table estimates 5.3. Heritability genotype and call frequency including the variants on minor 34 loci, allele analyses pruned For the SDGs and/or based we the For the cal in the samples. estimates rate kb: kilobasepairs filtering.

124

To compare the SDG heritability estimates for ADV, GA, and CNV to those observed for known AMD loci, we performed GREML analyses of the common variants with high call rates within the 34 loci identified by the IAMDGC (56) for each disease subtype. We found that common variants from the known loci contribute to 13.73% (S.E.

= 0.83) ADV risk, 8.81% (S.E. = 0.75) GA risk, and 12.89% (SE=0.82) CNV risk (Table

5.3). The 52 lead variants identified by the IAMDGC alone explain 14.52% (S.E. = 2.48) of ADV risk. By comparison, the h2 estimates for these variants were lower (8.02% and

13.62%) for GA and CNV, respectively.

Given the individual estimates for the SDGs and 34 loci, we performed GREML analyses on combinations of the SDGs and 34 loci for ADV, GA, and CNV. In all our analyses, we found that the h2 estimates were very similar (Table 5.3). Together, the 34 loci and 8 SDGs contribute to 13.06%, 8.78%, and 12.59% h2 for ADV, GA, and CNV, respectively (Table 5.3). The h2 estimates derived from the 34 loci and 2 SDGs are comparable to these values (ADV: 13.51%, GA: 8.60%, and CNV: 12.20%) (Table 5.3).

To interrogate existing linkage among the SDG variants and those in the 34 AMD loci, we performed pairwise linkage disequilibrium (LD) analyses of these variants. No pairs of variants in high LD (r2 > 0.7) were found between any of the variants in the 8 SDGs and the expanded 34 loci. This indicates that the variants we compared did not have preexisting LD outside of the known AMD loci and were independent of each other.

To replicate chip heritability calculations from the 2016 GWAS published by the

IAMDGC, we performed GREML analyses of the full datasets (i.e. chip heritability) for

ADV, GA, and CNV. The chip heritability for ADV was 44.05% (S.E. = 1.29) (Table

5.4). We achieved similar values (44.16% (S.E. = 1.29)) when we re-performed our

125 analyses with the first 10 principal components (PCs) calculated for the full chip and

ADV cases and controls. Chip heritability estimates were higher for GA and CNV

(46.37% and 62.03%, respectively) than the heritability estimate for the combined ADV dataset (Table 5.4). These values were similar to the estimates calculated for GA and

CNV chip heritability including PCs for those respective datasets (46.50% and 62.18%, respectively). Values decreased after incorporating age data available from the IAMDGC

(age at diagnosis for cases and age at exam for controls) and the first 10 PCs into the

GREML analyses for chip heritability of the ADV and its subtypes (Table 5.4).

Table 5.4. Chip heritability estimates calculated by GREML for advanced AMD

(ADV), GA, and CNV using quantitative covariates.

ADV GA CNV Analyses Estimate S.E. Estimate S.E. Estimate S.E. (%) (%) (%) (%) (%) (%) Full Chip (no 44.05 1.29 46.37 2.13 62.03 1.47 covariates) with Age 57.96 1.31 45.41 2.16 59.71 1.51 with 10 PCs 44.16 1.29 46.50 2.13 62.18 1.48 with Age + 10 PCs 42.06 1.31 41.24 1.97 60.21 1.51 These estimates were based on 15,656 advanced AMD (ADV) cases (2,964 GA-specific cases and 10,340 CNV-specific cases) and 17,832 controls with age information in the

IAMDGC data or with an age over 50. Covariates included age information available from the IAMDGC (age at diagnosis for cases and age at exam for controls) and 10 PCs calculated for each of the datasets.

To further verify that contributions to ADV heritability from the common variants in the 8 SDGs were unlikely due to chance, we selected randomized variants from the

126 autosomal genome that met the same MAF criteria we had used before. We recreated the

GRM for the ADV data and these variants and performed the same GREML analyses we had on the 8 SDGs. The 1,122 randomized variants explain 1.76% (S.E. = 0.21), indicating that the ADV heritability estimate for the 8 SDGs (3.76%) was not likely due to chance. In addition, we ran several (>5) randomized variant analyses with different sets of 79 random variants to mimic the number of variants in the 2 novel SDGs and found that they explain at most 0.076% (S.E. = 0.048). Although this is near the estimate we calculated for the 2 SDGs alone (0.097%), it is unlikely that the ADV heritability estimate for the 2 SDGs alone (0.097%) was due to chance.

Epistasis analyses

To interrogate possible interactions among the 52 AMD variants identified by the

IAMDGC and the variants in the novel SDGs (PPARA and PLCG2), we performed logistic regression-based epistasis analyses using PLINK (174, 281). Although we did not identify any significant interactions, several modest epistatic interactions (p < 0.001) were uncovered between known AMD variants and variants in PPARA and PLCG2

(Table 5.5). Interactions between PPARA and PLCG2 variants were also identified but did not reach the significance threshold correcting for multiple testing (Table 5.6).

127

P

with

2

0.001

0.0002 0.0005 0.0007 0.0008 0.0009

Epistatic

Interaction

OR

1.55 1.09 0.66 1.08 0.63 0.53

are significant significant after are

Name

Locus

PLCG2 PLCG2 PLCG2

PPARA PPARA PPARA

rsID

rs4889432 rs8043845 rs4243218

rs35883013

rs182313981 rs182313981

AMD SDG Locus AMD

(bp)

values from the logistic regression test, distributed as χ logistic the test, distributed as values regression from

-

Location

46681141 81932165 81821531 81900628 46685754 46685754

p

based epistasis testing between variants in or within 50 kb variants withinbased in of between epistasis or testing

-

p,

22 16 16 16 22 22

Chr

associated index variants from the 2016 GWAS. IAMDGC indexassociated variants from

-

3/CFB

C9

). Chr, chromosome; chromosome; ). Chr,

CFH

6

APOE

-

RAD51B

C2/C

Locus Name

) and the 52 AMD the ) and

MIR6130/RORB

< 2.33 × 10 2.33 × <

PLCG2

p

and

rsID

rs204993 rs429358

rs10922109 rs10781182 rs62358361 rs61985136

PPARA

(

(bp)

Epistatic interactions from pairwise regression logistic from Epistatic interactions

AMD Locus from IAMDGC Locus from GWAS AMD

3255581

Location

5.

76617720 39327888 68769199 45411941

196704632

1 9 5 6

14 19

Chr

Table 5. SDGs 2 novel the given on build based of these interactions in base (bp) genome. are pairs the human None 37 of Locations multiple testing ( for correcting 1 d.f.

128

P

test,

0.00075 0.00095

OR

1.38 1.16

Epistatic Interaction

from pairwise logistic logistic pairwise from

Name

Locus

PPARA PPARA

PPARA

and

2

values from the logistic regression the logistic regression from values

-

rsID

p

PLCG

Variants

p,

rs41479847 rs78864133

PPARA

(bp)

Location

46633037 46638486

22 22

Chr

). Chr., chromosome; chromosome; ). Chr.,

6

-

Name

Locus

PLCG2 PLCG2

< 2.33 × 10 2.33 × <

p

rsID

rs4580153

rs12921780

Variants

multiple testing (

n

PLG2

with 1 d.f.

2

(bp)

based epistasis testing.

Epistatic interactions between variants within between in or 50 kb of Epistatic interactions

-

Locatio

81979125 81817239

16 16

Chr

Table 5.6. regression significant these given interactions are on build based of in base (bp) genome. are pairs the human Neither 37 of Locations for correcting after χ distributed as

129

Discussion

In this study, we estimated the proportion of ADV heritability attributable to the

SDGs we previously identified by pathway analysis of the summary statistics from the

IAMDGC 2016 GWAS (231). This included common variants from the 8 SDGs that exhibited significant signals across significant pathways (p < 0.0001) from KEGG,

Reactome, and GO pathway databases and the 2 SDGs (PPARA and PLCG2) that fell outside of the known AMD loci identified by the IAMDGC in their recent GWAS. To compare our results with those obtained by the IAMDGC (56), we calculated heritability estimates for the whole DNA array chip, 34 AMD loci, 52 lead variants from the 34 loci, and combinations of the SDGs and the 34 loci. The estimates and 95% confidence intervals we calculated for ADV (41.52%–46.58%) and GA (42.20%–50.54%) chip heritability overlap with the 95% confidence intervals for chip heritability determined by the IAMDGC for ADV (44.5%–48.8%, (56)) and GA (47.2%–57.4%, (56)). By contrast, our estimate for CNV chip heritability (59.15%–64.91%) was much higher than what was calculated by the IAMDGC (42.2–46.5%), (56)).

Our estimate for ADV heritability based on the 52 lead variants is lower than that observed for these variants by the IAMDGC (14.52% vs. 27.2%, respectively) (56). This difference is likely due to the alternative methods used to estimate these values. The

IAMDGC (56) calculated their estimates by a theoretical, population-based formula based on the log odds ratios, allele frequencies of the 52 variants and the assumed trait prevalence. This formula assumes that all markers are independent, and therefore that all contributions to genetic variance are additive (282). The IAMDGC also assumed disease prevalence of 1%, 5%, or 10% in their analyses (56). The addition of the two lead

130 variants of the PLCG2 and PPARA genes does not increase our heritability estimates for

ADV (14.54%). These results reinforce the notion that the variants in these SDGs in isolate are not significant but in aggregate contribute strongly to the statistical signals we previously observed for AMD-associated pathways. Their association with AMD is likely not additive but rather as a consequence of their interactions within AMD-associated pathways, demonstrating the benefit of using pathway analysis to identify genetic variance with non-additive effects.

Heritability estimates based on each of our variant sets varied based on the advanced AMD subtype analyzed (ADV, GA, or CNV). With the exception of the chip heritability estimates, the values estimated for ADV and CNV were much higher than those calculated for GA. Based on the GCTA GREML power calculator

(http://cnsgenomics.com/shiny/gctaPower/), we had good (over 80%) power to detect genetic variance for GA. It has been previously shown that particular AMD-associated variants contribute to a particular subtype. For instance, the IAMDGC identified the first subtype-specific variant for CNV near the MMP9 gene on chromosome 20 (56).

Additional genes involved in extracellular matrix maintenance have been implicated in

ADV subtypes, not intermediate AMD (283). Although the HTRA1/ARMS2 locus contributes generally to ADV risk (including both subtypes), it has been consistently associated with increased CNV-specific risk (212, 284-286), and smokers with the risk allele for CFH Y402H have an increased risk of developing wet AMD specifically (83).

Based on our calculations of the heritability explained collectively by the SDGs and the 34 AMD loci, we hypothesize that the contributions of the common variants in these regions may not be purely additive. Additionally, we suspect that that the

131 contributions of the common variants in PPARA and PLCG2 drive the heritability estimates for the combinations of variants we tested given the nearly identical estimates for the combination of the common variants from the 8 SDGs and 34 loci relative to the combination of the common variants from the 2 SDGs and 34 loci. In the IAMDGC

GWAS, the locus boundaries were defined by distance and LD structure from the lead variant in each locus (56). Therefore, based on the definitions of the 8 SDGs and 34 loci, we expanded the amount of variants covered in six of the 34 loci (Table 5.2) in our combined analysis (34 AMD Loci and 8 SDGs ± 50 kb) in this study. Based on our pairwise disequilibrium analysis, these additional variants are mostly independent of the variants in the known loci. Only a few variants were in linkage disequilibrium (r2 > 0.7), but these variants were only connected to one variant in an SDG locus. By contrast, in the analysis of the common variants from the 34 loci and 2 SDGs, the 34 loci themselves were not expanded despite the addition of the 2 SDGs.

The variance explained by genetic variants in genes from AMD-related pathway defined in the literature has been previously explored using GCTA (85). The 19 then- known AMD associated variants explained 13.3% of AMD risk in general, and significant additional heritability was attributable to variants in inflammatory and complement pathways when accounting for the known risk variants (85). Other pathways, including angiogenesis and apoptosis, did not significantly contribute to AMD heritability estimates (85). By contrast, in our approach, pathways were identified via in silico pathway analysis of large-scale GWAS data with PARIS and multiple curated pathway databases. We then focused specifically on the SDGs that significantly contribute to AMD-associated pathways in our analysis.

132

While we determined that common variants from the SDGs contribute to the

ADV, GA, and CNV heritability, this study had several limitations. Even with our estimates, there is a substantial portion of ADV, GA, and CNV variance left unexplained by the loci interrogated in this study. Additional sources of heritability not examined in this study include rare variants, structural variants, further investigations into epistasis, and epigenetic effects. Seven rare variants were among the 52 independent, genome-wide significant markers identified by the IAMDGC in their recent GWAS (56). In this study, we excluded rare variants (MAF < 1%) from our GREML analyses for the known loci and the SDGs, removing about half of variants from the 8 SDG variants and about two- thirds of the variants from PPARA and PLCG2 alone. Rare variants were not excluded in our previous pathway analyses because PARIS does not take MAF into account when identifying associated pathways (237, 238). Therefore, we were unable to consider the complete contributions of these variants to disease heritability.

Given the non-additive nature of our heritability estimates for the combinations of known AMD loci and the SDGs for ADV and its subtypes, we hypothesize that the variants in these loci may be interacting with one another. Our epistasis analyses did not reveal any significant epistatic interactions among the known AMD variants and the variants in the 2 novel SDGs. However, this is only an initial look at the possible representation of epistatic interactions between these genes. Our identification of modest interactions among these loci, including between the novel SDGs, suggests that there may be region-wide interactions that are individually too weak to discover using these analyses. Further studies should be performed to confirm potential epistatic interactions between variants in the known loci and the SDGs. Additionally, because we performed

133 our analyses on the largest dataset of ADV cases and controls currently available, we are unable to replicate our findings with a comparable, independent dataset. As with many genetic studies, our study only included individuals of European descent. Additional work should be done to elucidate the contributions of the SDGs to AMD heritability in diverse populations.

Conclusions

Our study elucidated the contribution of pathway SDGs and known AMD loci to the heritability of ADV and its subtypes. Heritability estimates for particular ADV subtypes were previously uncharacterized. The SDGs we analyzed in this study were previously identified from pathway analyses utilizing multiple pathway databases.

This more comprehensive approach uncovered an appreciable portion of ADV heritability that had not been previously characterized. While they do not demonstrate an additive amount of heritability to that estimated for the 34 AMD susceptibility loci identified by the IAMDGC, we suspect that this is due to interaction effects or the exclusion of rare variants from our analyses. It has been previously shown that additional AMD loci (RLBP1 and CLUL1) can be identified by accounting for gene x age interaction effects (87). We propose that identifying statistical driver genes from in silico pathway analyses of GWAS data may be a valid approach to recognizing patterns of heritability (including non-additive contributions) from large-scale genomic data that are undetectable by GWAS. We applied this approach to ADV and its subtypes, but it could be applied to uncover novel loci associated with other complex traits for which GWAS have been performed. Additionally, pathway analysis provides

134 biological context for the loci in GWAS, which could aid in understanding the underlying mechanisms of traits and developing targeted treatments for diseases.

Methods

Statistical driver genes for advanced AMD

We performed in silico pathway analyses using Pathway Analysis by

Randomization Incorporating Structure (PARIS, v2.4) (238) on the largest available

ADV case-control GWAS results from the IAMDGC (56). This included summary statistics for 445,115 directly genotyped variants on 16,144 advanced AMD cases and

17,832 controls (56). The ADV cases include GA-specific cases (n = 3,235), CNV- specific cases (n = 10,749), and individuals with both GA and CNV (n = 2,160) (56).

Samples were genotyped with the Illumina HumanCoreExome Array as previously described (56) and are accessible through dbGAP (Accession: phs001039.v1.p1). Our knowledge-driven pathway analyses utilized three pathway databases (KEGG, Reactome, and GO) and led to the discovery of eight statistical driver genes for ADV (Table 5.7)

(231). Statistical driver genes (SDGs) were defined as genes that were strongly contributing (gene-level p < 0.0001) to the statistical signal of the significant pathways

(pathway-level p < 0.0001) identified by PARIS. Two of these SDGs (PPARA and

PLCG2) remained significant following the exclusion of the 34 known AMD loci identified by the IAMDGC from the pathway analysis because they fall outside of the known loci boundaries.

135

Table 5.7. Statistical driver genes for advanced AMD identified with PARIS.

Gene Chromosome Full Gene Name C2 6 complement C2 MICA 6 MHC class I polypeptide-related sequence A NOTCH4 6 notch receptor 4 RAD51B 14 RAD51 paralog B LIPC 15 lipase C, hepatic type PLCG2 16 phospholipase C gamma 2 C3 19 complement C3 PPARA 22 peroxisome proliferator activated receptor alpha The eight statistical driver genes were identified with pathway analysis using PARIS and multiple biological pathway databases (231). Two of these genes (PLCG2 and PPARA) were not previously identified as a part of the IAMDGC susceptibility loci and are noted in bold. Full gene names are based on the HUGO Gene Nomenclature Committee

(HGNC).

Variant Selection and Genotype Extraction

For our heritability estimates, we extracted genotypes for variants in one of these seven variant criteria subsets (Table 5.2):

 IAMDGC Chip: Variants that were directly genotyped by the IAMDGC on the

Illumina HumanCoreExome chip with custom content as previously described

(56)

 8 SDGs ± 50 kb: Variants in or within 50 kilobasepairs (kb) of the eight SDGs

(C2, C3, LIPC, MICA, NOTCH4, PLCG2, PPARA, and RAD51B) (231)

 2 Novel SDGs ± 50 kb: Variants in or within 50 kb of the PLCG2 and PPARA

genes (231)

 Lead IAMDGC Variants: Variants that were identified as one of the 52 lead

variants from the IAMDGC 2016 GWAS (56)

136

 34 AMD Loci: Variants in the 34 susceptibility loci identified by the IAMDGC

2016 GWAS (Supplementary Table 5 in (56))

 34 AMD Loci and 2 Novel SDGs ± 50 kb: Variants that occur in the 34

susceptibility loci identified by the IAMDGC 2016 GWAS (Supplementary Table

5 in (56)) and variants that fall in or within 50 kb of the PLCG2 and PPARA

genes (231)

 34 AMD Loci and 8 SDGs ± 50 kb: Variants in the 34 susceptibility loci

identified by the IAMDGC 2016 GWAS (Supplementary Table 5 in (56) and

variants that fall in or within 50 kb of the eight SDGs (231)

Gene and loci boundaries were based on build GRCh37 of the human genome.

Using PLINK v1.90 beta (174, 281), we filtered these variants based on MAF and genotyping call rate to exclude variants that had MAF < 0.01 and missing call rate > 0.01.

These data were extracted from the ADV case-control data, GA-specific case-control data

(GA), and CNV-specific case-control data (CNV) (Table 5.1).

Estimation of AMD heritability with GCTA GREML

Genetic relationship matrices (GRMs) were constructed using Genome-wide

Complex Trait Analysis (GCTA v1.91.3beta) (287) for each category of late AMD disease states (ADV (GA and CNV combined), GA, and CNV), and for each subset of variants we selected (Table 5.2). These subsets included variants within PPARA and

PLCG2, variants within eight previously identified SDGs, the 34 loci, the 52 lead

IAMDGC variants, and the full directly genotyped IAMDGC chip. To obtain invertible and reliable GRMs, variants were filtered by minor allele frequency (MAF < 0.01) and missingness (missing genotype call rate > 0.01) when constructing the GRMs for the 34

137

AMD loci and the SDGs. Without this filtering step, GCTA is unable to create functional

(invertible) GRMs for subsequent restricted maximum-likelihood (REML) analyses, which estimate the proportion of phenotypic variance attributable to additive genetic variance (287).

We used the genomic-relatedness-based restricted maximum-likelihood

(GREML) approach in the GCTA v1.91.3beta software on each of the three advanced

AMD datasets (ADV, GA, and CNV) and each of the seven variant sets to estimate narrow-sense heritability (h2). The GREML approach does not elucidate dominance variance or epistatic interactions. We also estimated chip heritability using age information available from the IAMDGC (age at diagnosis for cases and age at exam for controls) and ten PCs derived from the full chip in the GREML analysis of the full chip. In our analyses, we also estimated heritability based on a population prevalence of

ADV in individuals of European descent (0.5%, (11)), as the IAMDGC samples in this study were filtered to unrelated individuals of European ancestry (56). We validated that the estimates we observed were not likely artifacts by re-performing our analyses of the

ADV data and a random set of 1,112 variants (equal to the number of common variants in the 8 SDGs) that met the same filtering criteria for common variants in the 8 SDGs. In addition, we performed the same analysis using random sets of 79 variants (equal to the number of variants in the 2 SDGs alone) that met the same filtering criteria for the common variants in the 2 SDGs to validate that the estimates we observed were not likely to be artifacts.

138

Pairwise LD analysis of SDGs and 34 AMD Loci

To evaluate the linkage and independence of the variants in the 8 SDGs and the

34 AMD loci (Table 5.2), we performed pairwise LD analysis using two computational approaches: LDMatrix (182) and SNiPA (288). Results from each analysis were pruned for pairs of variants with r2 > 0.7. We further assessed whether or not each variant in the pair was considered part of an SDG, one of the AMD loci, or overlapped between the two groups of variants.

Epistatic interaction analyses

To investigate heritability attributable to non-additive interactions among the lead

AMD variants from the IAMDGC GWAS (56) and the novel SDGs (231), we performed pairwise logistic regression-based epistasis analyses using PLINK v1.90 beta (174, 281).

Analyses were performed using the full set of variants in or within 50 kb of the 2 SDGs and the 52 lead variants from the 2016 IAMDGC GWAS (56); therefore, the threshold for significance was set at 2.33 × 10-6 for multiple testing correction (Bonferroni correction for 21,431 tests). If both variants in an epistatic interaction were from the same gene/locus, we found their LD in the European population using LDlink

(https://ldlink.nci.nih.gov/, (182)). If the r2 was greater than 0.7, we determined the signals to be the same and not independent.

.

139

CHAPTER 6

Discussion and Future Directions

Overview

Age-related macular degeneration (AMD) is a multifactorial, progressive eye disease that affects millions of people around the world. With molecular and statistical approaches, we sought to investigate the genetic architecture of AMD utilizing genetic data generated from the Amish and the general population of European descent.

Historically, genetic linkage and association analyses provided the foundation for our understanding of AMD genetics. However, their findings generally posed more questions than answers about the consequences of these variants. With the advancement and diversity of genomic technologies and statistical methods, we are more equipped to isolate and contextualize the genetic factors that contribute to the risk and development of

AMD.

Further Investigation of CFH P503A and Implications for AMD Pathology

In Chapter 2, we aimed to characterize the effects of a rare risk variant for AMD

(CFH P503A), which was originally identified in 19 Amish individuals (18 heterozygotes and 1 homozygote) from Ohio and Indiana. While computational approaches suggested that the variant was damaging to the CFH protein (136), no functional studies had been performed to identify the consequences of this variant. In our study, we identified an additional 39 related Amish individuals from Ohio and Indiana that were heterozygous for the risk allele. All 58 carriers are related to one another through a large, all- connecting path pedigree and can trace their lineage to 12 common ancestors. Our

140 functional experiments did not uncover significant differences in CFH mRNA or protein expression in blood samples from carriers and non-carriers. There also are no significant differences among the ages at which carriers were diagnosed with AMD compared to individuals without the risk allele. Genetic risk scores among carriers were not significantly different from those calculated for non-carriers. However, in silico modeling of the SCR8 domain showed evidence of slight conformational changes to CFH structure that changed the predicted number of interactions among amino acids in the region around CFH P503A.

Based on the results from our computational protein modeling of CFH P503A, the effects of P503A on CFH binding and function in the complement pathway should be characterized in vitro. CFH P503A falls within binding sites for several proteins including C-reactive protein, C3b, and proteins of pathogenic bacteria that deposit on cell surfaces (166). If P503A results in altered binding affinities for these proteins or other cofactors in the complement pathway like CFI, changes may be observed in expression levels of these proteins rather than CFH itself. The common, high-effect variant in the adjacent domain in CFH (Y402H, rs1061170) to P503A affects CFH’s affinity for binding ligands, like heparin, because the substitution of the smaller histidine residue for the tyrosine residue changes the spatial distribution of basic residues in this region (163,

164, 289). The retinas of carriers of the risk allele for the Y402H variant have altered levels of inflammatory proteins and complement pathway members (162, 167).

Therefore, understanding the effects of P503A on CFH binding may lead to new knowledge about the impacts of this variant on protein function and suggest implications for AMD disease etiology.

141

Finally, to thoroughly understand the local effects of CFH P503A in AMD- affected individuals, our experiments should be repeated in relevant eye tissue. For instance, the effects of P503A could be introduced to and modeled in an established RPE cell line (ARPE-19) (290). Alternatively, induced pluripotent stem cells could be developed from blood samples from P503A carriers and non-carriers and then differentiated into RPE cells to model the variant’s effect in vitro (291). To directly determine the effects of CFH P503A on the human eye accounting for the retinal microenvironment, a retinal organoid model could be developed for P503A using induced pluripotent stem cells reprogrammed from carriers’ blood samples (292). Such investigations would suggest local impacts of the P503A variant on the eyes of carriers and partially explain its contributions to AMD pathophysiology in affected carriers of the risk allele.

Beyond Rare Variants and Loci for Age-Related Macular Degeneration

In Chapter 3, we utilized family-based association and linkage analyses to uncover novel rare variants and loci for AMD in the Midwest Amish. This included novel variants in LCN9, RTEL1, CGRRF1, and DLGAP1 as well as loci on chromosomes 8q and 18q. Our ability to detect significant (p < 1.34 x 10-6) AMD-associated variants in a cohort of 175 related individuals (86 affecteds, 77 unaffecteds, and 12 relatives of unknown AMD status) demonstrates the utility of performing family-based association analyses with genetic data generated from an isolated population like the Amish to identify AMD loci. Our affecteds-only linkage analyses found significant evidence of linkage on chromosomes 8 and 18 that were robust to repartitioning of the all-connecting path pedigree structure. Gene ontology (GO) enrichment analyses of our significant loci

142 on chromosomes 8 and 18 showed that genes in these loci participate in diverse biological processes such as fatty acid binding, triglyceride metabolic processes, peptidase activities, and the regulation of epithelial to mesenchymal transition. These analyses provide statistical evidence for genomic loci and biological pathways that may be implicated in AMD; however, functional studies are needed to validate their contributions to AMD pathophysiology.

While our linkage results provide evidence for AMD-linked regions on chromosomes 8 and 18, there are multiple genes within each of our linkage peaks

(Chapter 3 Appendix: Supplemental Tables 2-4). Therefore, additional fine mapping of these regions is necessary to elucidate which variants are directly associated with AMD.

Additionally, the identification of novel AMD-associated variants may be facilitated by analyzing whole exome or whole genome sequencing data rather than exome chip data.

Alternatively, with the creation of an Anabaptist-specific reference panel for genetics research (293), improved coverage of the genome could be achieved with imputation.

In our study, we performed linkage and association analyses based on AMD status (affected, unaffected, and unknown affection status). With data from retinal imaging methods, we could perform similar analyses on quantitative measurements to identify genetic associations for AMD endophenotypes, such as drusen volume or choroidal thickness. This information could increase our appreciation for particular endophenotypes as biomarkers for AMD and improve AMD diagnostics.

In addition to identifying novel genetic variants with additive effects, possible epistatic interactions among these novel loci and the known AMD loci should be further explored in the Amish. About one-third of AMD heritability is unaccounted for by

143 variation in the 34 loci; therefore, the identification of epistatic interactions among genomic variants could partially resolve the missing heritability of AMD (60). Novel epistatic interactions may also be uncovered through genome-wide interaction analyses in the Amish.

Future Directions from Pathway Analyses and Statistical Driver Genes

In Chapter 4, we utilized in silico pathway analyses on GWAS data from the

IAMDGC to identify biological pathways associated with advanced AMD (ADV) and statistical driver genes that strongly contributed to our significant pathway signals. We found eight statistical driver genes (C2, C3, LIPC, MICA, NOTCH4, PLCG2, PPARA, and RAD51B) that were consistently significant across significant pathways in the

KEGG, Reactome, and GO databases. These eight genes encode proteins that are involved in protein-protein interactions with other immune, signaling, and DNA repair proteins. Motif analysis also showed that they are all downstream of one consensus sequence motif with known binding sites for 12 transcription factors. Only one statistical driver gene (PLCG2) was consistently significant across these larger pathway databases and NetPath, which is a specialized database of signaling pathways.

Our multi-database pathway analyses were performed on the combined ADV case-control results from the IAMDGC 2016 GWAS (56). Therefore, knowledge-driven pathway analyses should be performed on the GA-specific and CNV-specific case- control results from the IAMDGC GWAS (56) to establish evidence for subtype-specific pathways and statistical driver genes. Such information could lead to the development of more targeted therapies for advanced AMD patients. While anti-VEGF treatments are available for CNV patients, the identification of additional targets of intervention could

144 increase treatment options for these patients. Moreover, no therapeutics are available for patients with GA; therefore, there is a significant need for therapies targeted to this AMD subtype (19).

Because the PLCG2 protein is an enzyme involved in several immune, metabolic, and signaling pathways as well as the VEGF pathway, we hypothesized if it would serve as an appropriate drug target for treating CNV. This enzyme has also been previously suggested as a drug target for late-onset Alzheimer’s disease (LOAD) (274). Therefore, functional studies should be performed to determine if modulation of PLCG2’s enzymatic activity could alter phenotypes associated with AMD or LOAD. Additionally, the suspected role of PLCG2 in these late-onset, degenerative diseases will hopefully promote further research on the possible shared etiology of both diseases.

Considerations for Statistical Driver Genes and Missing Heritability

Following our identification of statistical driver genes (SDGs) for advanced AMD in Chapter 4, we calculated heritability estimates for the variants in or near the SDGs in

Chapter 5. We compared our results to estimates for the 34 known loci identified by the

IAMDGC and the “chip heritability”. Our results suggested that the variants in the SDGs do not contribute to the heritability estimates in an additive manner. Therefore, we performed pairwise logistic regression-based epistasis analyses of these variants to determine if there were epistatic interactions among them. Although we found modest epistatic interactions among the 52 IAMDGC variants and the 2 novel SDGs, more extensive analyses between variants in the 34 AMD loci and the SDGs should be performed to fully elucidate possible epistatic interactions among these loci.

145

Genetic variants from 34 loci explain more than half of AMD heritability (h2), but about one-third of AMD h2 is still unidentified (56). We performed pathway analyses and identified statistical driver genes based on their significant contributions to biological pathways. Our results from this approach reinforce the hypothesis that missing heritability may be explained by a number of phenomena outside of single genetic variants with additive effects on the phenotype (60). These can include non-additive effects, such as dominance effects and epistatic interactions, and additional genetic variants that are poorly interrogated in GWAS, such as structural variants and rare variants (60). In Chapter 5, we concluded that knowledge-driven pathway analyses of

GWAS data and the identification of statistical driver genes may serve as a means to uncover the missing heritability of AMD. Therefore, we postulate that this approach may be useful for other complex traits, such as Alzheimer’s disease and glaucoma.

In addition to strictly genetic contributions to disease development, it would be crucial to interrogate possible gene-environment interactions that are not calculable using the methods we have described here. Based on epidemiological studies, environmental factors, such as smoking and diet, contribute 20-40 percent of the phenotypic variance of

AMD (33-35, 37, 38, 41, 47, 48). Previous studies have identified joint effects of environmental exposures and genetic variation for AMD (82-84, 276), and a genome- wide interaction analysis of the IAMDGC data provided evidence for two novel AMD loci based on gene x age interactions (87). Thus, further study of gene x environment interactions could yield testable hypotheses for understanding variation in disease age-of- onset, progression, and severity.

146

Conclusion

In the research studies that comprise this dissertation, we identified novel genetic factors and biological pathways for AMD using genetic data from both the Amish and general population of European descent. These studies required multidisciplinary, collaborative efforts and would not have been possible without the contributions of investigators and research participants from all over the world. This research also highlights the importance of developing a working relationship between investigators and research participants like the Amish. It is essential for investigators to consider the concerns and values of the culture of their research participants (294) so that future genetic epidemiological studies are also built upon trusting relationships between investigators and research participants.

The work described herein integrated information derived from molecular techniques and statistical methods to inform our understanding of AMD genetics and pathophysiology. Our identification of novel AMD loci and characterization of known

AMD loci bolster support for the complexity of the genetic architecture of AMD. Our hope is that the research we performed will lay the foundation for future work that will ultimately improve the lives of patients with AMD. With greater knowledge of AMD pathophysiology, more effective therapeutic interventions can be developed to prevent or alleviate the symptoms of AMD. The introduction of new treatments may subsequently reduce the cases of blindness and partially mitigate the global health burden of AMD.

147

APPENDIX

Chapter 2 Appendix

Supplemental Figure 1. Representative allelic discrimination plot from QuantStudio software for the custom CFH P503A TaqMan assay. Allele 2 is the guanine (G) risk allele, and allele 1 is the cytosine (C) non-risk allele. Blue dots correspond to homozygous individuals for the risk allele (G/G). Green dots are heterozygous individuals (C/G). Red dots represent homozygous individuals for the non-risk allele

(C/C). Samples with undetermined genotypes appear as black X’s in the plot.

148

Individuals who

sum tests. Comparisons

carriers. carriers.

-

-

carriers with AMD is notcarriers

-

Wallis test. Wallis

-

ntly younger than those that have been diagnosed been diagnosed with ntly that have those than younger

Comparisons of age at exam for carriers and for non at exam Comparisons of age

values were generated using pairwise Wilcoxon rank using Wilcoxon generated pairwise were values

-

P

Figure 2. Figure

al

l groups were performed performed Kruskalwith the l groups were

Supplement diagnosed significa not been with AMD are have to non age carriers compared the for AMD diagnosis However, at AMD. different. significantly al among

149

Supplemental Table 1. IAMDGC AMD-associated variants included in risk score calculations. We obtained genotypes for the 52 AMD-associated variants identified by the IAMDGC in their 2016 GWAS (56). Following extensive quality control and removal of variants that had a call rate less than 80 percent in our sample, we calculated genetic risk scores on 35 AMD-associated variants. Variant positions are given in basepairs according to build 37 of the human genome. Alleles are reported as the common allele in the IAMDGC GWAS followed by the rare allele in the IAMDGC GWAS (56). The odds ratios were previously calculated by the IAMDGC in their 2016 GWAS (56).

Odds rsID Chromosome Position Alleles Gene/Locus Ratio rs148553336 1 196,613,173 T/C 0.29 CFH rs570618 1 196,657,064 G/T 2.38 CFH rs10922109 1 196,704,632 C/A 0.38 CFH rs35292876 1 196,706,642 C/T 2.42 CFH rs121913059 1 196,716,375 C/T 20.28 CFH CFH rs61818925 1 196,815,450 G/T 0.6 (CFHR3/CFHR1) rs191281603 1 196,958,651 C/G 1.07 CFH rs11884770 2 228,086,920 C/T 0.9 COL4A3 rs140647181 3 99,180,668 T/C 1.59 COL8A1 rs10033900 4 110,659,067 C/T 1.15 CFI rs62358361 5 39,327,888 G/T 1.8 C9 rs2746394 6 31,946,792 G/A 1.39 C2/CFB/SKIV2L rs181705462 6 31,947,027 G/T 1.55 C2/CFB/SKIV2L C2/CFB/SKIV2L rs204993 6 32,155,581 A/G 1.13 (PBX2) rs943080 6 43,826,627 T/C 0.88 VEGFA rs7803454 7 99,991,548 C/T 1.13 PILRB/PILRA rs1142 7 104,756,326 C/T 1.11 KMT2E/SRPK2 rs13278062 8 23,082,971 T/G 0.9 TNFRSF10A rs71507014 9 73,438,605 GC/G 1.1 TRPM3 rs10781182 9 76,617,720 G/T 1.11 MIR6130/RORB rs12357257 10 24,999,593 G/A 1.11 ARHGAP21 rs3750846 10 124,215,565 T/C 2.81 ARMS2/HTRA1 rs61941274 12 112,132,610 G/A 1.51 ACAD10 rs61985136 14 68,769,199 T/C 0.9 RAD51B rs2842339 14 68,986,999 A/G 1.14 RAD51B

150 rs17231506 16 56,994,528 C/T 1.16 CETP rs5817082 16 56,997,349 C/CA 0.84 CETP rs72802342 16 75,234,872 C/A 0.79 CTRB2/CTRB1 rs11080055 17 26,649,724 C/A 0.91 TMEM97/VTN rs12019136 19 5,835,677 G/A 0.71 C3 (NRTN/FUT6) rs147859257 19 6,718,146 T/G 2.86 C3 rs2230199 19 6,718,387 C/G 1.43 C3 rs429358 19 45,411,941 T/C 0.7 APOE rs5754227 22 33,105,817 T/C 0.77 SYN3/TIMP3 rs8135665 22 38,476,276 C/T 1.14 SLC16A8

151

Chapter 3 Appendix

Genotype Calling and Clustering in GenomeStudio

Quality Control in GenomeStudio

Rare Variant Calling with zCall

Manual Review of Variants in GenomeStudio

Validation of Variants and Samples with PLINK

Exclusion of Monomorphic Variants

Supplemental Figure 1. Outline of quality control steps. Genotype calling and clustering, quality control, and manual rare variant calling were performed in

GenomeStudio. Rare variant calling was also performed separately in zCall. Variants with four more heterozygous calls in zCall were flagged for manual review in

GenomeStudio. Additional quality control measures were utilized with PLINK, and monomorphic variants were excluded from downstream analyses. The color scheme indicates the software tool used at that step.

152

Supplemental Figure 2. Sex mismatch determined for 2,466 post-quality control samples. Male samples are expected to have values above 0.8, and females are expected to have values less than 0.2. One true sex mismatch in this Amish sample was identified and removed from subsequent analyses. Other apparent sex mismatches were among the non-Amish samples, which were all excluded from our association and linkage analyses.

153

Supplemental Figure 3. Principal components analysis demonstrating genetic ancestry distribution for the 2,466 post-QC samples. The Amish study participants are highlighted in purple.

154

Heterozygosity

Supplemental Figure 4. Heterozygosity distribution of the 2,466 samples. Few samples had heterozygosity values greater than 0.8 indicating them as outliers. None of the Amish samples in this study had heterozygosity values in this range.

155

Supplemental Figure 5. All-connecting path pedigree of the 180 Amish individuals genotyped on the exome chip drawn using the Pedigraph software tool and information from the Anabaptist Genealogy Database (AGDB). Circles represent females, and squares represent males. The colored lines connect children to their parents.

The five individuals we omitted from our analyses due to extensive distant relatedness are surrounded by the blue box.

156

Supplemental Figure 6. Quantile-Quantile (QQ) plot of p-values obtained from association testing using ROADTRIPS. P-values were obtained from the RM test in

ROADTRIPS. The genomic control parameter was 1.05.

157

A

B

Supplemental Figure 7. Genome-wide HLOD scores for the autosomes under affecteds-only (A) dominant and (B) recessive models with disease allele frequency of 0.10. The green line in each plot designates the genome-wide significance threshold

(HLOD > 3.6).

158

A

B

Supplemental Figure 8. Genome-wide HLOD scores for the autosomes under affecteds-only (A) dominant and (B) recessive models with disease allele frequency of 0.01. The green line in each plot designates the genome-wide significance threshold

(HLOD > 3.6).

159

Supplemental Figure 9. HLOD scores for chromosome 8 under the recessive model assuming risk allele frequency of 0.10 with different sub-pedigree structures.

Linkage peaks were as follows for the maximum pedigree sizes tested: 23 bits, 98.01 cM

(HLOD = 2.84); 24 bits, 98.80 cM (HLOD = 4.027); 25 bits, 77.53 cM (HLOD = 2.00).

160

A

B

Supplemental Figure 10. HLOD scores for chromosome 18 under (A) dominant and

(B) recessive models assuming disease allele frequency of 0.10 with different sub- pedigree structures. Linkage peaks from the dominant model were as follows for the maximum pedigree sizes tested: 23 bits, 76.38 cM (HLOD = 2.29); 24 bits, 84.18 cM

(HLOD = 3.87); 25 bits, 81.31 cM (HLOD = 3.30). Linkage peaks from the recessive model were as follows for the maximum pedigree sizes tested: 23 bits, 76.38 cM (HLOD

= 2.40); 24 bits, 76.38-76.50 cM (HLOD = 4.27); 25 bits, 76.38 cM (HLOD = 3.54).

161

Supplemental Figure 11. HLOD scores obtained from multipoint linkage analysis in

MERLIN under the affecteds-only dominant and recessive models on chromosome

2. The black line denotes genome-wide significance (HLOD Score > 3.6). The maximum dominant HLOD score was 4.02, and the maximum recessive HLOD score was 4.89.

Tick marks along the upper x-axis correspond to the marker positions.

162

Supplemental Figure 12. HLOD scores obtained from multipoint linkage analysis in

MERLIN under the affecteds-only dominant and recessive models on chromosome

15. The black line denotes genome-wide significance (HLOD Score > 3.6). The maximum HLOD score was 3.90, which was obtained under the recessive model. Tick marks along the upper x-axis correspond to the marker positions.

163

A

B

Supplemental Figure 13. Disparate HLOD scores for chromosome 2 with different

sub-pedigrees structures. HLOD scores were obtained for markers on chromosome

2 under (A) dominant and (B) recessive models assuming disease allele frequency of

0.01 with different sub-pedigree structures. Linkage peaks from the dominant model

were as follows for the maximum pedigree sizes tested: 23 bits, 114.02 cM (HLOD =

1.97); 24 bits, 245.36 cM (HLOD = 4.02); 25 bits, 104.49 cM (HLOD = 2.18). Linkage

peaks from the recessive model were as follows for the maximum pedigree sizes tested:

23 bits, 111.18 cM (HLOD = 1.97); 24 bits, 245.36 cM (HLOD = 4.89); 25 bits, 245.36

cM (HLOD = 2.27).

164

Supplemental Figure 14. Disparate recessive HLOD scores for chromosome 15 with different sub-pedigrees structures. Parametric HLOD scores were calculated based on varying pedigree structures for markers on chromosome 15 under a recessive model.

Linkage peaks from the recessive model were as follows for the maximum pedigree sizes tested: 23 bits, 91.69 cM (HLOD = 0.0056); 24 bits, 91.27 cM (HLOD = 3.90); 25 bits,

67.17 cM (HLOD = 1.02).

165

Supplemental Table 1. Quality control steps performed in GenomeStudio. Each entry describes the variant exclusion criteria and number of variants excluded at that step. In total, 96,330 were excluded. Low cluster separation: cluster separation scores < 0.2. Low call frequency: call frequency < 0.85. Low intensity values: intensity values < 0.2.

Number of Variants GenomeStudio Quality Control Step Excluded Reset GenCall to 0.15 and Filter Variants with Call 967 Frequency < 0.85 Low Cluster Separation, Call Frequency, and Non-Zero 24,548 GenTrain Score Low Intensity Values 2,605 Mendelian Error, Low Call Frequency, and Hemizygous 61 Low Call Frequency and Hemizygous 65,781 More Stringent Low Call Frequency (<0.95) 2,368

166

Supplemental Table 2. Genes located in the 1-HLOD support interval of the maximum HLOD score on chromosome 8 under the recessive model and disease allele frequency of 0.10. Genomic positions are in base pairs (Ensembl, human genome build 37/hg19). Genes highlighted in bold are located in the region with HLOD score >

3.6.

Chromosome Start Position Stop Position Gene Name 8 73,269,963 73,270,050 RNA5SP271 8 73,449,626 73,850,584 KCNB2 8 73,530,957 73,531,618 HAUS1P3 8 73,921,099 73,960,357 TERF1 8 73,931,180 73,931,279 RNU6-285P 8 73,976,775 74,036,323 SBSPON 8 74,202,506 74,208,024 RPL7 8 74,206,847 74,237,516 RDH10 8 74,332,239 74,353,761 STAU2-AS1 8 74,332,604 74,659,943 STAU2 8 74,563,524 74,563,837 VENTXP6 8 74,692,332 74,791,145 UBE2W 8 74,817,592 74,827,245 GYG1P1 8 74,851,404 74,884,522 TCEB1 8 74,884,672 74,895,018 TMEM70 8 74,894,360 74,894,702 RPS20P21 8 74,903,587 74,941,322 LY96 8 74,960,912 74,961,019 RNU6-1300P 8 74,968,117 74,968,882 RPS3AP32 8 74,991,772 74,991,878 RNU6-1197P 8 75,146,935 75,233,563 JPH1 8 75,233,365 75,401,107 GDAP1 8 75,460,778 75,460,852 MIR5681A 8 75,736,772 75,767,264 PI15 8 75,896,750 75,946,793 CRISPLD1 8 76,135,639 76,236,976 CASC9 8 76,214,531 76,214,809 HIGD1AP6 8 76,288,944 76,289,249 PKMP4 8 76,320,149 76,479,078 HNF4G 8 77,177,778 77,177,967 RNU2-54P

167

8 77,316,293 77,319,373 LINC01109 8 77,318,889 77,436,591 LINC01111 8 77,403,435 77,595,513 ZFHX4-AS1 8 77,515,475 77,516,087 MRPL9P1 8 77,593,454 77,779,521 ZFHX4 8 77,892,494 77,913,280 PEX2 8 77,925,987 77,926,264 HIGD1AP18 8 79,310,750 79,310,857 RNU6-1220P 8 79,428,374 79,517,502 PKIA 8 79,578,282 79,632,000 ZC2HC1A 8 79,587,978 79,717,758 IL7 8 79,672,377 79,674,666 PRKRIRP7 8 80,483,388 80,484,570 RPL3P9 8 80,523,049 80,578,410 STMN2 8 80,676,245 80,680,098 HEY1 8 80,712,368 80,712,430 RNU7-85P 8 80,830,952 80,942,524 MRPS28 8 80,870,571 81,143,467 TPD52 8 81,116,841 81,117,135 RN7SL41P 8 81,153,624 81,153,708 MIR5708 8 81,317,751 81,317,844 RNU6-1213P 8 81,397,854 81,438,500 ZBTB10 8 81,471,105 81,471,992 RPSAP47 8 81,497,820 81,498,008 RNU2-71P 8 81,507,050 81,507,938 SLC25A51P3 8 81,525,779 81,526,061 RN7SL107P 8 81,540,686 81,787,016 ZNF704 8 81,557,174 81,557,408 CKS1BP7 8 81,661,796 81,661,928 RNU11-6P 8 81,688,901 81,689,197 RN7SL308P 8 81,724,377 81,724,816 HMGB1P41 8 81,880,045 82,024,303 PAG1 8 82,167,105 82,167,464 UBE2HP1 8 82,192,598 82,197,012 FABP5 8 82,352,561 82,359,758 PMP2 8 82,370,576 82,373,814 FABP9 8 82,390,654 82,395,498 FABP4 8 82,433,917 82,434,467 FTH1P11 8 82,437,216 82,443,613 FABP12 8 82,517,804 82,543,369 IMPA1P

168

8 82,529,455 82,530,795 NIPA2P4 8 82,539,504 82,539,852 RPS26P34 8 82,546,592 82,547,897 SLC10A5P1 8 82,570,196 82,598,928 IMPA1 8 82,605,842 82,608,409 SLC10A5 8 82,613,569 82,645,138 ZFAND1 8 82,644,669 82,671,750 CHMP4C 8 82,711,816 82,755,101 SNX16 8 82,720,007 82,720,961 HNRNPA1P36 8 83,203,859 83,204,614 HNRNPA1P4

169

Supplemental Table 3. Genes located in the 1-HLOD support interval of the maximum HLOD score on chromosome 18 under the dominant model and disease allele frequency of 0.10. Genomic positions are in base pairs (Ensembl, human genome build 37/hg19). Genes highlighted in bold are located in the region with HLOD score >

3.6.

Chromosome Start Position Stop Position Gene Name 18 48,405,419 48,474,691 ME2 18 48,494,361 48,514,491 ELAC1 18 48,494,410 48,611,415 SMAD4 18 48,633,124 48,633,416 RN7SL695P 18 48,684,652 48,685,753 SRSF10P1 18 48,700,920 48,744,674 MEX3C 18 48,810,103 48,810,266 RNU1-46P 18 48,947,433 48,948,044 SS18L2P2 18 49,139,009 49,139,480 RSL24D1P9 18 49,371,377 49,371,998 RPS8P3 18 49,866,542 51,057,784 DCC 18 51,131,624 51,132,095 RPL29P32 18 51,679,079 51,751,158 MBD2 18 51,748,654 51,748,782 SNORA37 18 51,795,774 51,847,636 POLI 18 51,850,728 51,884,334 STARD6 18 51,884,287 51,911,588 C18orf54 18 51,933,259 51,933,484 SNRPGP2 18 52,254,988 52,266,724 DYNAP 18 52,385,091 52,562,747 RAB27B 18 52,411,981 52,412,812 RPSAP57 18 52,568,740 52,626,739 CCDC68 18 52,621,634 52,621,963 MAP1LC3P 18 52,813,777 52,813,876 RNA5SP459 18 52,889,562 53,332,018 TCF4 18 53,146,452 53,146,529 MIR4529 18 53,303,064 53,303,541 RPL21P126 18 54,264,439 54,318,831 TXNL1 18 54,318,574 54,698,828 WDR7 18 54,695,063 54,706,076 WDR7-OT1

170

18 54,721,813 54,739,350 LINC-ROR 18 54,814,293 54,817,531 BOD1L2 18 54,951,833 54,951,939 RNU6-737P 18 55,018,044 55,038,962 ST8SIA3 18 55,102,917 55,158,529 ONECUT2 18 55,215,515 55,254,004 FECH 18 55,267,888 55,289,445 NARS 18 55,313,658 55,470,333 ATP8B1 18 55,422,626 55,422,732 RNU6-742P 18 55,505,708 55,506,198 RSL24D1P11 18 55,686,021 55,686,293 HMGN1P30 18 55,711,599 56,068,772 NEDD4L 18 56,118,306 56,118,390 MIR122 18 56,148,479 56,296,189 ALPK2 18 56,298,491 56,299,352 RPL9P31 18 56,338,618 56,417,371 MALT1 18 56,357,588 56,359,385 MRPL37P1 18 56,470,253 56,470,561 RN7SL112P 18 56,472,500 56,472,606 RNU6-219P 18 56,529,832 56,653,712 ZNF532 18 56,593,429 56,593,528 RNU2-69P 18 56,663,966 56,736,570 OACYLP 18 56,806,709 56,826,068 SEC11C 18 56,887,400 56,898,006 GRP 18 56,934,267 56,941,318 RAX 18 56,942,388 56,985,881 CPLX4 18 56,995,055 57,027,194 LMAN1 18 57,098,172 57,364,612 CCBE1 18 57,428,790 57,429,137 RPS26P54 18 57,445,804 57,446,221 GLUD1P4 18 57,567,180 57,571,538 PMAIP1 18 57,636,670 57,638,628 NFE2L3P1 18 57,640,146 57,640,437 RN7SL342P 18 57,677,224 57,678,375 SDCCAG3P1 18 57,684,010 57,684,673 FAM60CP 18 57,685,857 57,685,963 RNU6-567P 18 57,816,808 57,817,597 RPS3AP49 18 57,830,800 57,830,940 RNU4-17P 18 58,038,564 58,040,001 MC4R 18 58,187,074 58,187,398 MRPS5P4

171

18 58,330,348 58,331,602 CTBP2P3 18 59,000,815 59,223,006 CDH20 18 59,058,828 59,058,936 RNU6-116P 18 59,072,091 59,072,438 RPL30P14 18 59,475,296 59,561,480 RNF152 18 59,624,506 59,625,247 RPIAP1 18 59,710,800 59,854,351 PIGN 18 59,854,491 59,974,355 KIAA1468 18 59,992,520 60,058,516 TNFRSF11A 18 60,082,911 60,083,460 RPL17P44 18 60,109,261 60,110,359 ACTBP9 18 60,190,240 60,254,942 ZCCHC2 18 60,317,543 60,317,834 RN7SL705P 18 60,382,672 60,647,666 PHLPP1 18 60,398,459 60,398,565 RNU6-142P 18 60,790,579 60,987,361 BCL2 18 60,994,959 61,034,743 KDSR 18 61,056,423 61,089,716 VPS4B 18 61,143,994 61,172,318 SERPINB5 18 61,164,222 61,164,633 ATP5G1P6 18 61,223,393 61,234,244 SERPINB12 18 61,254,223 61,271,873 SERPINB13 18 61,304,493 61,311,532 SERPINB4 18 61,314,813 61,391,127 SERPINB11 18 61,322,431 61,329,197 SERPINB3 18 61,420,169 61,472,604 SERPINB7 18 61,538,926 61,571,124 SERPINB2 18 61,564,408 61,603,345 SERPINB10 18 61,616,535 61,649,008 HMSD 18 61,637,159 61,672,278 SERPINB8 18 61,739,392 61,739,887 RPL12P39 18 61,747,243 61,816,264 LINC00305

172

Supplemental Table 4. Genes located in the 1-HLOD support intervals (1-3) of the maximum HLOD score on chromosome 18 under the recessive model and disease allele frequency of 0.10. Genomic positions are in base pairs (Ensembl, human genome build 37/hg19).

Chromosome Start Position Stop Position Gene Name Interval 1 18 51,748,654 51,748,782 SNORA37 18 51,795,774 51,847,636 POLI 18 51,850,728 51,884,334 STARD6 18 51,884,287 51,911,588 C18orf54 18 51,933,259 51,933,484 SNRPGP2 18 52,254,988 52,266,724 DYNAP 18 52,385,091 52,562,747 RAB27B 18 52,411,981 52,412,812 RPSAP57 18 52,568,740 52,626,739 CCDC68 18 52,621,634 52,621,963 MAP1LC3P 18 52,813,777 52,813,876 RNA5SP459 18 52,889,562 53,332,018 TCF4 18 53,146,452 53,146,529 MIR4529 18 53,303,064 53,303,541 RPL21P126 18 54,264,439 54,318,831 TXNL1 18 54,318,574 54,698,828 WDR7 18 54,695,063 54,706,076 WDR7-OT1 18 54,721,813 54,739,350 LINC-ROR 18 54,814,293 54,817,531 BOD1L2 18 54,951,833 54,951,939 RNU6-737P 18 55,018,044 55,038,962 ST8SIA3 18 55,102,917 55,158,529 ONECUT2 18 55,215,515 55,254,004 FECH 18 55,267,888 55,289,445 NARS 18 55,313,658 55,470,333 ATP8B1 18 55,422,626 55,422,732 RNU6-742P 18 55,505,708 55,506,198 RSL24D1P11 18 55,686,021 55,686,293 HMGN1P30 18 55,711,599 56,068,772 NEDD4L 18 56,118,306 56,118,390 MIR122 Interval 2

173

18 57,445,804 57,446,221 GLUD1P4 18 57,567,180 57,571,538 PMAIP1 18 57,636,670 57,638,628 NFE2L3P1 18 57,640,146 57,640,437 RN7SL342P 18 57,677,224 57,678,375 SDCCAG3P1 18 57,684,010 57,684,673 FAM60CP 18 57,685,857 57,685,963 RNU6-567P 18 57,816,808 57,817,597 RPS3AP49 18 57,830,800 57,830,940 RNU4-17P Interval 3 18 58,187,074 58,187,398 MRPS5P4 18 58,330,348 58,331,602 CTBP2P3 18 59,000,815 59,223,006 CDH20 18 59,058,828 59,058,936 RNU6-116P 18 59,072,091 59,072,438 RPL30P14 18 59,475,296 59,561,480 RNF152 18 59,624,506 59,625,247 RPIAP1

174

Supplemental Table 5. Maximum HLOD scores obtained from the model-based multipoint linkage analyses on chromosomes 1-22 with disease allele frequency of

0.10. For analyses on chromosomes 1-22 with the dominant model, penetrance values for these analyses were 0, 0.0001, or 0.0001 for 0, 1, or 2 copies of the risk allele, respectively. For the recessive model, we set the penetrance values to 0, 0, or 0.0001 for

0, 1, or 2 copies of the risk allele, respectively.

Dominant Model Recessive Model Maximum Maximum Chromosome Position (cM) Position (cM) HLOD Score HLOD Score 1 3.5 234.6 2.02 237.65 2 1.7 244.26 2.97 245.36 3 2.33 32.34 2.73 85.87 4 1.65 206.48 1.22 109.79 5 3.13 15.15 3.06 179.54 6 2.06 34.07 3.32 31.73 7 2.56 86.09 3.09 52.43 8 2.59 131.52 4.03 98.8 9 1.91 8.24 2.35 17.93 10 1.9 46.26 2.51 46.7 11 3.4 9.61 1.44 95.87 12 1 18.72 0.55 24.22-24.33 13 2.95 86.27 3.31 64.77 14 2.03 56.59 1.69 37.46 15 0.78 100.76-100.88 1.06 91.69 16 1.65 129.66-129.85 2.38 129.85 17 1.42 108.2 2.06 104.41 18 3.87 84.18 4.27 76.38-76.50 19 0.61 58.65 0.77 58.65 20 1.43 0.18 0.93 0.18 21 2 59.3 1.58 59.3 22 0.63 40 0.06 10.40-11.04

175

Supplemental Table 6. Maximum HLOD scores obtained from the model-based multipoint linkage analyses on chromosomes 1-22 with disease allele frequency of

0.01. Values are given for both dominant and recessive models. Under the dominant model, penetrance values for these analyses were 0, 0.0001, or 0.0001 for 0, 1, or 2 copies of the disease allele, respectively. For the recessive model, we set the penetrance values to 0, 0, or 0.0001 for 0, 1, or 2 copies of the disease allele, respectively.

Dominant Model Recessive Model Maximum Maximum Chromosome Position (cM) Position (cM) HLOD Score HLOD Score 1 0.91 213.97 0.91 292.2 2 4.02 245.36 4.89 245.36 3 3.22 85.87 3.19 85.87 4 0.92 109.45 0.84 109.23-109.45 5 2.05 56.46 2.2 179.54 6 1.18 45.45 1.33 39.82 7 2.83 155.89 2.45 51.56 8 3.41 108.68 2.82 111.01 9 1.13 10.07 0.63 11.81 10 2.1 46.7 1.64 47.74 11 0.95 95.87 1.54 161.84 12 1.12 61.65 1.29 61.65-61.73 13 2.2 86.27 1.82 62.96 14 0.48 0.98 0.36 0.98 15 3.16 91.41 3.9 91.27-91.34 16 1.4 127.68 1.29 117.18 17 2.83 48.07-48.21 3.18 48.07 18 2.38 77.17 2.55 76.61 19 0.65 57.8-58.15 0.2 58.65 20 1.22 109.66 1.32 90.13 21 0.53 59.2-59.3 0.09 64.59 22 0.19 38.93-39.47 0.01 37.46-37.92

176

Supplemental Table 7. Significant linkage loci identified from model-based multipoint linkage analyses of Amish families with disease allele frequency of 1 percent. The peak HLOD score is the maximum HLOD score obtained for the designated chromosome.

Region with 1-HLOD Peak Model Chromosome HLOD > 3.6 Support Interval HLOD Score (cM) (cM) Dominant 2 4.02 243.4-246.5 242.51-247.13 Recessive 2 4.89 242.61-247.75 243.05-247.13 Recessive 15 3.9 90.88-91.88 90.13-94.23

177

PMP2

FABP4 FABP5 FABP4 FABP5 FABP9

FABP12

Associated Associated

Genes Found

9.3

6.12 7.69 7.69 5.06

Genes

Percent Percent

HLOD support interval of the of the HLOD interval support

-

Associated Associated

e e groups GO recognized by ClueGO.

Group0 Group1 Group1 Group1 Group1

GO Groups

d

04 05 05 05 05

- - - - -

Value

All 80 of the genes wer genes All 80 of the

-

Group

P

1.14x10 3.16x10 3.16x10 3.16x10 3.16x10

Correcte

functional analysis of the 80 genes in 80 genes the 1 of the functional analysis

04 06 06 06 05

- - - - -

Value

-

P

in silico

1.14x10 5.46x10 5.91x10 5.91x10 2.11x10

Corrected

Term

process process process process

binding

catabolic catabolic catabolic catabolic catabolic

fatty acid acid fatty

GO Term

triglyceride triglyceride

glycerolipid glycerolipid

neutral lipid neutral

acylglycerol acylglycerol

GO ID

GO:0005504 GO:0019433 GO:0046461 GO:0046464 GO:0046503

Supplemental Table 8. GO terms from the TableSupplemental 8. GO terms model. 8 under the recessive linkage on region chromosome analysis. the merged GO from the terms represent

178

2

TCF4

Genes

Found HMSD

SMAD4

SERPINB3 SERPINB4 SERPINB2 SERPINB3 SERPINB4 SERPINB5 SERPINB7 SERPINB8

Associated

SERPINB10 SERPINB11 SERPINB1 SERPINB13

7.27 8.87 4.64 4.66 4.80

Genes

Percent Percent

Associated Associated

HLOD support interval of the HLOD of the support interval

-

GO

Group0 Group1 Group1 Group1 Group1

Groups

05 10 10 10 10

- - - - -

Value

-

P

1.65x10 1.21x10 1.21x10 1.21x10 1.21x10

Corrected

Group

Of these 102 genes, 100 were recognized by ClueGO, and 2 were by ClueGO, 100 were andrecognized 102 genes, 2 were these Of

05 13 11 11 11

- - - - -

ed

functional analysis of the 102 genes in 102 genes the 1 of the functional analysis

Value

-

P

1.65x10 1.81x10 5.55x10 7.08x10 7.68x10

Correct

in silico

Term

idase

type

activity

-

peptidase

GO Term

serine

endopeptidase endopeptidase endopept

of epithelial to of epithelial

inhibitor inhibitor activity inhibitor activity

chromosome 18 under the dominant model. chromosome activity regulator

positive regulation

mesenchymal transition mesenchymal

GO ID

GO:0010718 GO:0004867 GO:0061135 GO:0030414 GO:0004866

Supplemental Table 9. GO terms from the TableSupplemental 9. GO terms linkage on region terms analysis. the merged GO from the represent missing. groups GO 179

Chapter 4 Appendix

Supplemental Table 1. Significant KEGG pathways identified by PARIS. The total number of genes and features (simple and complex) defined in each pathway are given.

The p-value represents the pathway-level p-value calculated empirically by PARIS.

Pathways in bold (n = 10) remained significant following the exclusion of the 34 AMD susceptibility loci identified by Supplementary Table 5 in the IAMDGC GWAS (56).

Total Total Total Total Pathway Name Simple Complex P-value Genes Features Features Features MicroRNAs in cancer 297 1,172 747 425 1.00E-05 Viral carcinogenesis 206 897 608 289 1.00E-05 Epstein-Barr virus infection 202 858 566 292 1.00E-05 Herpes simplex infection 186 830 593 237 1.00E-05 Non-alcoholic fatty liver disease (NAFLD) 151 582 391 191 1.00E-05 Systemic lupus erythematosus 136 450 321 129 1.00E-05 Antigen processing and presentation 79 255 185 70 1.00E-05 Pertussis 75 380 283 97 1.00E-05 Leishmaniasis 74 340 238 102 1.00E-05 Adipocytokine signaling pathway 70 519 381 138 1.00E-05 Complement and coagulation cascades 69 546 432 114 1.00E-05 Staphylococcus aureus infection 57 430 355 75 1.00E-05 Graft-versus-host disease 43 110 66 44 1.00E-05 Allograft rejection 39 108 62 46 1.00E-05 Homologous recombination 28 197 125 72 1.00E-05 Natural killer cell mediated cytotoxicity 134 714 499 215 2.00E-05

180

NF-kappa B signaling pathway 91 486 334 152 2.00E-05 Legionellosis 55 295 239 56 2.00E-05 Tuberculosis 179 758 519 239 3.00E-05 Fc epsilon RI signaling pathway 70 601 411 190 3.00E-05 Epithelial cell signaling in Helicobacter pylori infection 68 307 180 127 4.00E-05 Metabolic pathways 1,213 6,366 4,144 2,222 6.00E-05 Autoimmune thyroid disease 54 176 101 75 6.00E-05 Hepatitis C 133 642 444 198 7.00E-05 PPAR signaling pathway 69 545 419 126 1.00E-04

181

Supplemental Table 2. Significant Reactome pathways identified by PARIS. The total number of genes and features (simple and complex) defined in each pathway are given. The p-value represents the pathway-level p-value calculated empirically by

PARIS. Pathways in bold (n = 32) remained significant following the exclusion of the 34

AMD susceptibility loci identified by Supplementary Table 5 in the IAMDGC GWAS

(56).

Total Total Total Total Pathway Name Simple Complex P-value Genes Features Features Features Immune System 1,230 6,377 4,251 2,126 1.00E-05 Gene Expression 1,224 4,193 2,625 1,568 1.00E-05 Innate Immune System 688 3,786 2,557 1,229 1.00E-05 Adaptive Immune System 664 3,191 2,027 1,164 1.00E-05 Metabolism of lipids and lipoproteins 586 4,075 2,891 1,184 1.00E-05 Hemostasis 496 3,746 2,496 1,250 1.00E-05 Cell Cycle Mitotic 445 1,396 788 608 1.00E-05 Organelle biogenesis and maintenance 318 1,342 861 481 1.00E-05 Cellular responses to stress 292 1,051 654 397 1.00E-05 Platelet activation signaling and aggregation 221 1,743 1,200 543 1.00E-05 Fatty acid triacylglycerol and ketone body metabolism 186 1,249 852 397 1.00E-05 Cellular Senescence 158 477 258 219 1.00E-05 Factors involved in megakaryocyte development and platelet production 139 758 471 287 1.00E-05

182

Oxidative Stress Induced Senescence 124 260 145 115 1.00E-05 Immunoregulatory interactions between a Lymphoid and a non- Lymphoid cell 123 413 315 98 1.00E-05 Regulation of lipid metabolism by Peroxisome proliferator-activated receptor alpha (PPARalpha) 115 862 603 259 1.00E-05 PPARA activates gene expression 112 804 555 249 1.00E-05 Complement cascade 85 337 274 63 1.00E-05 Transcriptional regulation of white adipocyte differentiation 78 491 364 127 1.00E-05 Initial triggering of complement 68 233 206 27 1.00E-05 Circadian Clock 65 476 336 140 1.00E-05 Lipid digestion mobilization and transport 58 716 583 133 1.00E-05 Neurotransmitter Release Cycle 50 581 418 163 1.00E-05 Mitochondrial biogenesis 49 381 292 89 1.00E-05 Transcriptional activation of mitochondrial biogenesis 42 318 241 77 1.00E-05 Activation of gene expression by SREBF (SREBP) 41 333 250 83 1.00E-05 Lipoprotein metabolism 30 557 466 91 1.00E-05

RORA activates gene expression 27 345 261 84 1.00E-05 Regulation of Complement cascade 25 231 186 45 1.00E-05 Dopamine Neurotransmitter Release Cycle 23 376 277 99 1.00E-05

183

Serotonin Neurotransmitter Release Cycle 18 357 270 87 1.00E-05 Chylomicron-mediated lipid transport 17 324 286 38 1.00E-05 Dectin-2 family 10 94 65 29 1.00E-05 Activation of C3 and C5 8 133 119 14 1.00E-05 Alternative complement activation 5 88 81 7 1.00E-05 PLC-mediated hydrolysis of PIP2 1 43 34 9 1.00E-05 Metabolism 1,583 9,253 6,275 2,978 2.00E-05 Assembly of the primary cilium 188 792 481 311 3.00E-05 HDL-mediated lipid transport 15 232 182 50 3.00E-05 Regulation of cholesterol biosynthesis by SREBP (SREBF) 54 368 264 104 4.00E-05 Synthesis of pyrophosphates in the cytosol 10 30 13 17 4.00E-05 Cation-coupled Chloride cotransporters 7 100 82 18 4.00E-05 Regulation of PLK1 Activity at G2/M Transition 81 271 150 121 5.00E-05 ER-Phagosome pathway 64 193 124 69 5.00E-05 YAP1- and WWTR1 (TAZ)- stimulated gene expression 29 343 274 69 5.00E-05 Cell Cycle 523 1,729 1,006 723 6.00E-05 transcription pathway 51 528 347 181 6.00E-05 TGF-beta receptor signaling in EMT (epithelial to mesenchymal transition) 16 79 41 38 6.00E-05

184

Cytokine Signaling in Immune system 307 1,367 917 450 8.00E-05 Inositol phosphate metabolism 44 339 206 133 8.00E-05

185

Supplemental Table 3. Significant Gene Ontology (GO) pathways identified by

PARIS. The total number of genes and features (simple and complex) defined in each pathway are given. The p-value represents the pathway-level p-value calculated empirically by PARIS. Pathways in bold (n = 53) remained significant following the exclusion of the 34 AMD susceptibility loci identified by Supplementary Table 5 in the

IAMDGC GWAS (56).

Total Total Total Total Pathway Name Simple Complex P-value Genes Features Features Features nucleus 3,407 15,965 10,464 5,501 1.00E-05 nucleoplasm 2,512 11,345 7,328 4,017 1.00E-05 poly(A) RNA binding 1,124 3,613 2,236 1,377 1.00E-05 extracellular space 907 6,346 4,501 1,845 1.00E-05 extracellular region 845 5,477 3,898 1,579 1.00E-05 gene expression 796 2,627 1,534 1,093 1.00E-05 innate immune 622 3,728 2,579 1,149 1.00E-05 response blood coagulation 466 3,457 2,295 1,162 1.00E-05 DNA binding 450 2,189 1,415 774 1.00E-05 ATP binding 232 1,596 1,138 458 1.00E-05 transcription initiation from RNA 186 1,174 751 423 1.00E-05 polymerase II promoter blood microparticle 127 601 454 147 1.00E-05 microtubule 117 567 389 178 1.00E-05 cytoskeleton cell cycle arrest 103 540 344 196 1.00E-05 regulation of immune 87 372 280 92 1.00E-05 response integrin binding 73 661 481 180 1.00E-05 double-strand break repair via homologous 63 315 201 114 1.00E-05 recombination single-stranded DNA 62 277 158 119 1.00E-05 binding RNA processing 60 285 161 124 1.00E-05

186

synaptic vesicle 58 525 377 148 1.00E-05 circadian rhythm 50 304 203 101 1.00E-05 cholesterol homeostasis 48 548 453 95 1.00E-05 regulation of 46 297 229 68 1.00E-05 circadian rhythm Wnt signaling 41 244 170 74 1.00E-05 pathway complement activation 40 242 200 42 1.00E-05 lipid binding 40 246 187 59 1.00E-05 38 353 249 104 1.00E-05 signaling pathway DNA-dependent 32 257 182 75 1.00E-05 ATPase activity DNA recombination 28 201 134 67 1.00E-05 ligand-activated sequence-specific DNA binding RNA 26 257 201 56 1.00E-05 polymerase II transcription factor activity regulation of complement 26 250 204 46 1.00E-05 activation B cell receptor 25 273 177 96 1.00E-05 signaling pathway RNA polymerase II repressing 23 139 106 33 1.00E-05 transcription factor binding reciprocal meiotic 21 182 125 57 1.00E-05 recombination triglyceride 20 337 274 63 1.00E-05 homeostasis triglyceride catabolic 18 278 233 45 1.00E-05 process reverse cholesterol 17 329 280 49 1.00E-05 transport high-density lipoprotein particle 15 289 249 40 1.00E-05 remodeling MHC class II protein 15 42 23 19 1.00E-05 complex lipid homeostasis 14 90 67 23 1.00E-05 lipid transporter 14 138 117 21 1.00E-05 activity

187

high-density 13 80 67 13 1.00E-05 lipoprotein particle integral component of mitochondrial inner 13 60 38 22 1.00E-05 membrane negative regulation of macrophage derived 13 188 153 35 1.00E-05 foam cell differentiation beta-2-microglobulin 11 35 25 10 1.00E-05 binding positive regulation of G2/M transition of 11 100 64 36 1.00E-05 mitotic cell cycle intrinsic component of 10 106 87 19 1.00E-05 membrane low-density lipoprotein 10 254 216 38 1.00E-05 particle remodeling replication fork 10 119 82 37 1.00E-05 four-way junction 9 138 103 35 1.00E-05 DNA binding natural killer cell lectin-like receptor 9 25 15 10 1.00E-05 binding apical junction 8 59 28 31 1.00E-05 complex high-density lipoprotein particle 8 122 109 13 1.00E-05 assembly phospholipid 8 108 83 25 1.00E-05 homeostasis triglyceride lipase 8 205 178 27 1.00E-05 activity phospholipase activity 7 201 174 27 1.00E-05 positive regulation of fatty acid beta- 7 101 90 11 1.00E-05 oxidation positive regulation of 7 97 86 11 1.00E-05 lipid storage very-low-density lipoprotein particle 7 222 197 25 1.00E-05 remodeling fatty acid transport 6 64 54 10 1.00E-05 negative regulation of 6 133 112 21 1.00E-05 cholesterol storage

188

negative regulation of metalloenzyme 6 52 40 12 1.00E-05 activity positive regulation of 6 123 114 9 1.00E-05 apoptotic cell clearance chylomicron remnant 5 186 168 18 1.00E-05 clearance complement activation, 5 77 69 8 1.00E-05 alternative pathway negative regulation of 5 63 56 7 1.00E-05 appetite negative regulation of sequestering of 5 70 63 7 1.00E-05 triglyceride positive regulation of G-protein coupled 5 92 84 8 1.00E-05 receptor protein signaling pathway uniplex complex 5 47 34 13 1.00E-05 gamma-delta T cell 4 18 11 7 1.00E-05 activation mitochondrial calcium 4 44 34 10 1.00E-05 ion homeostasis Rad51B-Rad51C- Rad51D-XRCC2 4 99 68 31 1.00E-05 complex regulation of synaptic transmission, 4 269 224 45 1.00E-05 GABAergic endothelial cell 3 38 23 15 1.00E-05 apoptotic process negative regulation of 3 91 76 15 1.00E-05 glycolytic process negative regulation of receptor biosynthetic 3 68 63 5 1.00E-05 process phosphatidylcholine 3 179 158 21 1.00E-05 catabolic process positive regulation of 3 87 74 13 1.00E-05 fatty acid oxidation positive regulation of 3 72 64 8 1.00E-05 glucose transport apolipoprotein A-I 2 92 82 10 1.00E-05 receptor activity regulation of lipid 2 55 51 4 1.00E-05 transport by positive

189

regulation of transcription from RNA polymerase II promoter regulation of triglyceride 2 71 65 6 1.00E-05 biosynthetic process C5L2 anaphylatoxin chemotactic receptor 1 67 62 5 1.00E-05 binding collagen type VIII 1 19 12 7 1.00E-05 trimer intermediate-density lipoprotein particle 1 171 154 17 1.00E-05 remodeling regulation of cellular ketone metabolic process by positive regulation of 1 54 51 3 1.00E-05 transcription from RNA polymerase II promoter regulation of glycolytic by positive regulation of 1 54 51 3 1.00E-05 transcription from RNA polymerase II promoter cytosol 2,539 14,625 9,511 5,114 2.00E-05 DNA repair 192 840 562 278 2.00E-05 cellular lipid 148 1,049 732 317 2.00E-05 metabolic process circadian regulation 52 410 305 105 2.00E-05 of gene expression double-stranded DNA 52 355 227 128 2.00E-05 binding phagocytic vesicle 22 160 128 32 2.00E-05 membrane cholesterol transporter 16 335 275 60 2.00E-05 activity very-low-density 15 68 59 9 2.00E-05 lipoprotein particle phospholipid efflux 14 286 237 49 2.00E-05 mitochondrial calcium 8 58 40 18 2.00E-05 ion transport

190

steroid hormone 7 106 80 26 2.00E-05 receptor activity induction of bacterial 3 78 75 3 2.00E-05 agglutination negative regulation of excitatory 3 11 3 8 2.00E-05 postsynaptic membrane potential negative regulation of natural killer cell 1 10 8 2 2.00E-05 activation zymogen binding 1 76 75 1 2.00E-05 transcription regulatory region 169 972 657 315 3.00E-05 DNA binding peptidase activity 49 335 247 88 3.00E-05 apolipoprotein binding 12 378 321 57 3.00E-05 fatty acid biosynthetic 10 207 176 31 3.00E-05 process negative regulation of 10 35 19 16 3.00E-05 cell differentiation immune response to 2 15 12 3 3.00E-05 tumor cell regulation of 2 21 17 4 3.00E-05 cholesterol efflux triglyceride transport 2 22 18 4 3.00E-05 oxygen metabolic 1 23 16 7 3.00E-05 process organelle 293 1,113 671 442 4.00E-05 organization lipid metabolic 88 570 411 159 4.00E-05 process rough endoplasmic 15 43 17 26 4.00E-05 reticulum MHC class II 12 33 16 17 4.00E-05 receptor activity metalloendopeptidase 10 123 80 43 4.00E-05 inhibitor activity positive regulation of 2 45 43 2 4.00E-05 phospholipid efflux catalytic step 2 83 165 78 87 5.00E-05 spliceosome Rab GTPase binding 65 484 329 155 5.00E-05 cholesterol efflux 24 354 289 65 5.00E-05

191

cysteine-type endopeptidase inhibitor 24 100 70 30 5.00E-05 activity involved in apoptotic process triglyceride binding 1 20 17 3 5.00E-05 extrinsic apoptotic 29 130 92 38 6.00E-05 signaling pathway integral component of lumenal side of 29 92 67 25 6.00E-05 endoplasmic reticulum membrane positive regulation of SMAD protein import 10 30 13 17 6.00E-05 into nucleus phospholipid 9 278 228 50 6.00E-05 transporter activity glomerular basement membrane 8 133 93 40 6.00E-05 development ubiquitin conjugating 6 171 122 49 6.00E-05 enzyme binding apolipoprotein A-I- mediated signaling 5 110 93 17 6.00E-05 pathway protein phosphatase 2 5 2 3 6.00E-05 regulator activity acetylcholine secretion 1 10 8 2 6.00E-05 sodium:chloride 1 58 53 5 6.00E-05 symporter activity extracellular vesicular 2,770 16,721 11,457 5,264 7.00E-05 exosome 5'-flap endonuclease 6 19 10 9 7.00E-05 activity protein binding 8,440 46,197 31,190 15,007 8.00E-05 positive regulation of transcription from 616 3,988 2,720 1,268 8.00E-05 RNA polymerase II promoter regulation of Cdc42 protein signal 4 65 57 8 8.00E-05 transduction maintenance of cell 3 57 49 8 8.00E-05 polarity

192

peptide antigen- transporting ATPase 2 21 15 6 8.00E-05 activity inositol phosphate 46 341 206 135 9.00E-05 metabolic process calcium channel 10 98 72 26 9.00E-05 inhibitor activity antigen processing and presentation of exogenous peptide 9 19 12 7 9.00E-05 antigen via MHC class I, TAP-independent phagocytic cup 7 54 45 9 1.00E-04 signaling pattern recognition receptor 7 163 140 23 1.00E-04 activity malate metabolic 3 29 19 10 1.00E-04 process

193

Supplemental Table 4. Significant NetPath pathways identified by PARIS. The total number of genes and features (simple and complex) defined in each pathway are given.

The p-value represents the pathway-level p-value calculated empirically by PARIS. This pathway was no longer significant (p < 0.0001) in our pathway analysis that excluded the

34 AMD susceptibility loci identified by Supplementary Table 5 in the IAMDGC GWAS

(56).

Total Total Total Simple Total Complex Pathway Name P-value Genes Features Features Features Wnt 45 309 225 84 3.00E-05

194

BIBLIOGRAPHY

1. World Health Organization. World Report on Vision Geneva: World Health

Organization; 2019.

2. Mitchell J, Bradley C. Quality of life in age-related macular degeneration: a

review of the literature. Health Qual Life Outcomes. 2006;4:97.

3. Jette AM, Branch LG. Impairment and disability in the aged. J Chronic Dis.

1985;38(1):59-65.

4. Laforge RG, Spector WD, Sternberg J. The Relationship of Vision and Hearing

Impairment to One-Year Mortality and Functional Decline. Journal of Aging and

Health. 2016;4(1):126-48.

5. Rein DB, Zhang P, Wirth KE, Lee PP, Hoerger TJ, McCall N, et al. The

economic burden of major adult visual disorders in the United States. Arch

Ophthalmol. 2006;124(12):1754-60.

6. Rozing MP, Durhuus JA, Krogh Nielsen M, Subhi Y, Kirkwood TB, Westendorp

RG, et al. Age-related macular degeneration: A two-level model hypothesis. Prog

Retin Eye Res. 2019:100825.

7. West SK, Munoz B, Rubin GS, Schein OD, Bandeen-Roche K, Zeger S, et al.

Function and visual impairment in a population-based study of older adults. The

SEE project. Salisbury Eye Evaluation. Invest Ophthalmol Vis Sci.

1997;38(1):72-82.

195

8. National Research Council. Aging and the Macroeconomy: Long-Term

Implications of an Older Population. Washington, DC: The National Academies

Press; 2012. 256 p.

9. Levine JA, Schmier JK. Economic Impact of Progression of Age-related Macular

Degeneration. US Ophthalmic Review. 2013;06(01):52.

10. Ayoub T, Patel N. Age-related macular degeneration. J R Soc Med.

2009;102(2):56-61.

11. Wong WL, Su X, Li X, Cheung CMG, Klein R, Cheng C-Y, et al. Global

prevalence of age-related macular degeneration and disease burden projection for

2020 and 2040: a systematic review and meta-analysis. The Lancet Global Health.

2014;2(2):e106-e16.

12. National Eye Institute. Age-Related Macular Degeneration (AMD) Tables 2019

[updated 2019. Available from: https://www.nei.nih.gov/learn-about-eye-

health/resources-for-health-educators/eye-health-data-and-statistics/age-related-

macular-degeneration-amd-data-and-statistics/age-related-macular-degeneration-

amd-tables.

13. Mitchell P, Liew G, Gopinath B, Wong TY. Age-related macular degeneration.

The Lancet. 2018;392(10153):1147-59.

14. Wang JJ, Mitchell P, Smith W, Cumming RG. Bilateral involvement by age

related maculopathy lesions in a population. Br J Ophthalmol. 1998;82(7):743-7.

15. Spaide RF, Jaffe GJ, Sarraf D, Freund KB, Sadda SR, Staurenghi G, et al.

Consensus Nomenclature for Reporting Neovascular Age-Related Macular

196

Degeneration Data: Consensus on Neovascular Age-Related Macular

Degeneration Nomenclature Study Group. Ophthalmology. 2019.

16. Seddon JM, Sharma S, Adelman RA. Evaluation of the clinical age-related

maculopathy staging system. Ophthalmology. 2006;113(2):260-6.

17. Ardeljan D, Chan CC. Aging is not a disease: distinguishing age-related macular

degeneration from aging. Prog Retin Eye Res. 2013;37:68-89.

18. Fleckenstein M, Mitchell P, Freund KB, Sadda S, Holz FG, Brittain C, et al. The

Progression of Geographic Atrophy Secondary to Age-Related Macular

Degeneration. Ophthalmology. 2018;125(3):369-90.

19. Flaxel CJ, Adelman RA, Bailey ST, Fawzi A, Lim JI, Vemulakonda GA, et al.

Age-Related Macular Degeneration Preferred Practice Pattern(R).

Ophthalmology. 2020;127(1):P1-P65.

20. Tripathy K, Salini B. Amsler Grid. 2019. In: StatPearls [Internet]. Treasure

Island, FL: StatPearls Publishing. Available from:

https://www.ncbi.nlm.nih.gov/books/NBK538141/.

21. Lim LS, Mitchell P, Seddon JM, Holz FG, Wong TY. Age-related macular

degeneration. The Lancet. 2012;379(9827):1728-38.

22. Age-Related Eye Disease Study 2 Research G. Lutein + zeaxanthin and omega-3

fatty acids for age-related macular degeneration: the Age-Related Eye Disease

Study 2 (AREDS2) randomized clinical trial. JAMA. 2013;309(19):2005-15.

23. van Lookeren Campagne M, LeCouter J, Yaspan BL, Ye W. Mechanisms of age-

related macular degeneration and therapeutic opportunities. J Pathol.

2014;232(2):151-64.

197

24. Yannuzzi NA, Freund KB. Brolucizumab: evidence to date in the treatment of

neovascular age-related macular degeneration. Clin Ophthalmol. 2019;13:1323-9.

25. Grassmann F, Ach T, Brandl C, Heid IM, Weber BHF. What Does Genetics Tell

Us About Age-Related Macular Degeneration? Annu Rev Vis Sci. 2015;1:73-96.

26. Fritsche LG, Fariss RN, Stambolian D, Abecasis GR, Curcio CA, Swaroop A.

Age-related macular degeneration: genetics and biology coming together. Annu

Rev Genomics Hum Genet. 2014;15:151-71.

27. Zarbin MA. Current concepts in the pathogenesis of age-related macular

degeneration. Arch Ophthalmol. 2004;122(4):598-614.

28. Kent DL. Age-related macular degeneration: beyond anti-angiogenesis. Mol Vis.

2014;20:46-55.

29. Cooke Bailey JN, Sobrin L, Pericak-Vance MA, Haines JL, Hammond CJ, Wiggs

JL. Advances in the genomics of common eye diseases. Hum Mol Genet.

2013;22(R1):R59-65.

30. McHarg S, Clark SJ, Day AJ, Bishop PN. Age-related macular degeneration and

the role of the complement system. Mol Immunol. 2015;67(1):43-50.

31. Clark SJ, Bishop PN. Role of Factor H and Related Proteins in Regulating

Complement Activation in the Macula, and Relevance to Age-Related Macular

Degeneration. J Clin Med. 2015;4(1):18-31.

32. Seddon JM. Genetic and environmental underpinnings to age-related ocular

diseases. Invest Ophthalmol Vis Sci. 2013;54(14):ORSF28-30.

198

33. Smith W, Assink J, Klein R, Mitchell P, Klaver CC, Klein BE, et al. Risk factors

for age-related macular degeneration: Pooled findings from three continents.

Ophthalmology. 2001;108(4):697-704.

34. Thornton J, Edwards R, Mitchell P, Harrison RA, Buchan I, Kelly SP. Smoking

and age-related macular degeneration: a review of association. Eye (Lond).

2005;19(9):935-44.

35. Khan JC, Thurlby DA, Shahid H, Clayton DG, Yates JRW, Bradley M, et al.

Smoking and age related macular degeneration: the number of pack years of

cigarette smoking is a major determinant of risk for both geographic atrophy and

choroidal neovascularisation. Brit J Ophthalmol. 2006;90(1):75-80.

36. Seddon JM, Cote J, Page WF, Aggen SH, Neale MC. The US twin study of age-

related macular degeneration: relative roles of genetic and environmental

influences. Arch Ophthalmol. 2005;123(3):321-7.

37. Mares-Perlman JA, Brady WE, Klein R, VandenLangenberg GM, Klein BE, Palta

M. Dietary fat and age-related maculopathy. Arch Ophthalmol. 1995;113(6):743-

8.

38. Seddon JM, Rosner B, Sperduto RD, Yannuzzi L, Haller JA, Blair NP, et al.

Dietary fat and risk for advanced age-related macular degeneration. Arch

Ophthalmol. 2001;119(8):1191-9.

39. Seddon JM, Cote J, Rosner B. Progression of age-related macular degeneration:

association with dietary fat, transunsaturated fat, nuts, and fish intake. Arch

Ophthalmol. 2003;121(12):1728-37.

199

40. SanGiovanni JP, Chew EY. The role of omega-3 long-chain polyunsaturated fatty

acids in health and disease of the retina. Prog Retin Eye Res. 2005;24(1):87-138.

41. Seddon JM, George S, Rosner B. Cigarette smoking, fish consumption, omega-3

fatty acid intake, and associations with age-related macular degeneration: the US

Twin Study of Age-Related Macular Degeneration. Arch Ophthalmol.

2006;124(7):995-1001.

42. Delcourt C, Carriere I, Cristol JP, Lacroux A, Gerber M. Dietary fat and the risk

of age-related maculopathy: the POLANUT study. Eur J Clin Nutr.

2007;61(11):1341-4.

43. SanGiovanni JP, Chew EY, Agron E, Clemons TE, Ferris FL, Gensler G, et al.

The relationship of dietary omega-3 long-chain polyunsaturated fatty acid intake

with incident age-related macular degeneration - AREDS report no. 23. Arch

Ophthalmol. 2008;126(9):1274-9.

44. Sperduto RD, Hiller R. Systemic hypertension and age-related maculopathy in the

Framingham Study. Arch Ophthalmol. 1986;104(2):216-9.

45. Hyman L, Schachat AP, He Q, Leske MC. Hypertension, cardiovascular disease,

and age-related macular degeneration. Age-Related Macular Degeneration Risk

Factors Study Group. Arch Ophthalmol. 2000;118(3):351-8.

46. Klein R, Klein BE, Marino EK, Kuller LH, Furberg C, Burke GL, et al. Early age-

related maculopathy in the cardiovascular health study. Ophthalmology.

2003;110(1):25-33.

47. Katsi VK, Marketou ME, Vrachatis DA, Manolis AJ, Nihoyannopoulos P,

Tousoulis D, et al. Essential hypertension in the pathogenesis of age-related

200

macular degeneration: a review of the current evidence. J Hypertens.

2015;33(12):2382-8.

48. Klein R, Peto T, Bird A, Vannewkirk MR. The epidemiology of age-related

macular degeneration. Am J Ophthalmol. 2004;137(3):486-95.

49. Heiba IM, Elston RC, Klein BE, Klein R. Sibling correlations and segregation

analysis of age-related maculopathy: the Beaver Dam Eye Study. Genet

Epidemiol. 1994;11(1):51-67.

50. Seddon JM, Ajani UA, Mitchell BD. Familial Aggregation of Age-related

Maculopathy. Am J Ophthalmol. 1997;123(2):199-206.

51. De Jong PT, Klaver CC, Wolfs RC, Assink JJ, Hofman A. Familial Aggregation

of Age-related Maculopathy. Am J Ophthalmol. 1997;124(6):862-3.

52. Klaver CC, Wolfs RC, Assink JJ, van Duijn CM, Hofman A, de Jong PT. Genetic

risk of age-related maculopathy. Population-based familial aggregation study.

Arch Ophthalmol. 1998;116(12):1646-51.

53. Meyers SM, Greene T, Gutman FA. A Twin Study of Age-related Macular

Degeneration. Am J Ophthalmol. 1995;120(6):757-66.

54. Hammond CJ, Webster AR, Snieder H, Bird AC, Gilbert CE, Spector TD.

Genetic influence on early age-related maculopathy: a twin study.

Ophthalmology. 2002;109(4):730-6.

55. Klein ML, Mauldin WM, Stoumbos VD. Heredity and age-related macular

degeneration. Observations in monozygotic twins. Arch Ophthalmol.

1994;112(7):932-7.

201

56. Fritsche LG, Igl W, Cooke Bailey JN, Grassmann F, Sengupta S, Bragg-Gresham

JL, et al. A large genome-wide association study of age-related macular

degeneration highlights contributions of rare and common variants. Nat Genet.

2016;48(2):134-43.

57. Visscher PM, Hill WG, Wray NR. Heritability in the genomics era--concepts and

misconceptions. Nat Rev Genet. 2008;9(4):255-66.

58. Saint Pierre A, Genin E. How important are rare variants in common disease?

Brief Funct Genomics. 2014;13(5):353-61.

59. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common

disease-common variant...or not? Hum Mol Genet. 2002;11(20):2417-23.

60. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al.

Finding the missing heritability of complex diseases. Nature.

2009;461(7265):747-53.

61. Morton NE. Sequential tests for the detection of linkage. Am J Hum Genet.

1955;7(3):277-318.

62. Klein ML, Schultz DW, Edwards A, Matise TC, Rust K, Berselli CB, et al. Age-

related macular degeneration. Clinical features in a large family and linkage to

chromosome 1q. Arch Ophthalmol. 1998;116(8):1082-8.

63. Seddon JM, Santangelo SL, Book K, Chong S, Cote J. A genomewide scan for

age-related macular degeneration provides evidence for linkage to several

chromosomal regions. Am J Hum Genet. 2003;73(4):780-90.

202

64. Schick JH, Iyengar SK, Klein BE, Klein R, Reading K, Liptak R, et al. A whole-

genome screen of a quantitative trait of age-related maculopathy in sibships from

the Beaver Dam Eye Study. Am J Hum Genet. 2003;72(6):1412-24.

65. Majewski J, Schultz DW, Weleber RG, Schain MB, Edwards AO, Matise TC, et

al. Age-related macular degeneration--a genome scan in extended families. Am J

Hum Genet. 2003;73(3):540-50.

66. Iyengar SK, Song D, Klein BE, Klein R, Schick JH, Humphrey J, et al. Dissection

of genomewide-scan data in extended families reveals a major locus and

oligogenic susceptibility for age-related macular degeneration. Am J Hum Genet.

2004;74(1):20-39.

67. Weeks DE, Conley YP, Tsai HJ, Mah TS, Schmidt S, Postel EA, et al. Age-

related maculopathy: a genomewide scan with continued evidence of

susceptibility loci within the 1q31, 10q26, and 17q25 regions. Am J Hum Genet.

2004;75(2):174-89.

68. Abecasis GR, Yashar BM, Zhao Y, Ghiasvand NM, Zareparsi S, Branham KE, et

al. Age-related macular degeneration: a high-resolution genome scan for

susceptibility loci in a population enriched for late-stage disease. Am J Hum

Genet. 2004;74(3):482-94.

69. Fisher SA, Abecasis GR, Yashar BM, Zareparsi S, Swaroop A, Iyengar SK, et al.

Meta-analysis of genome scans of age-related macular degeneration. Hum Mol

Genet. 2005;14(15):2257-64.

70. Schork NJ, Murray SS, Frazer KA, Topol EJ. Common vs. rare allele hypotheses

for complex diseases. Curr Opin Genet Dev. 2009;19(3):212-9.

203

71. Reich DE, Lander ES. On the allelic spectrum of human disease. Trends Genet.

2001;17(9):502-10.

72. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, Ioannidis JP, et

al. Genome-wide association studies for complex traits: consensus, uncertainty

and challenges. Nat Rev Genet. 2008;9(5):356-69.

73. Hoggart CJ, Clark TG, De Iorio M, Whittaker JC, Balding DJ. Genome-wide

significance for dense SNP and resequencing data. Genet Epidemiol.

2008;32(2):179-85.

74. Risch N, Merikangas K. The Future of Genetic Studies of Complex Human

Diseases. Science. 1996;273(5281):1516-7.

75. Wray NR, Maier R. Genetic Basis of Complex Genetic Disease: The Contribution

of Disease Heterogeneity to Missing Heritability. Curr Epidemiol Rep.

2014;1(4):220-7.

76. Edwards AO, Ritter R, 3rd, Abel KJ, Manning A, Panhuysen C, Farrer LA.

Complement factor H polymorphism and age-related macular degeneration.

Science. 2005;308(5720):421-4.

77. Haines JL, Hauser MA, Schmidt S, Scott WK, Olson LM, Gallins P, et al.

Complement factor H variant increases the risk of age-related macular

degeneration. Science. 2005;308(5720):419-21.

78. Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS, Haynes C, et al. Complement

factor H polymorphism in age-related macular degeneration. Science.

2005;308(5720):385-9.

204

79. Rivera A, Fisher SA, Fritsche LG, Keilhauer CN, Lichtner P, Meitinger T, et al.

Hypothetical LOC387715 is a second major susceptibility gene for age-related

macular degeneration, contributing independently of complement factor H to

disease risk. Hum Mol Genet. 2005;14(21):3227-36.

80. Fritsche LG, Chen W, Schu M, Yaspan BL, Yu Y, Thorleifsson G, et al. Seven

new loci associated with age-related macular degeneration. Nat Genet.

2013;45(4):433-9, 9e1-2.

81. Seddon JM, Yu Y, Miller EC, Reynolds R, Tan PL, Gowrisankar S, et al. Rare

variants in CFI, C3 and C9 are associated with high risk of advanced age-related

macular degeneration. Nat Genet. 2013;45(11):1366-70.

82. Schmidt S, Hauser MA, Scott WK, Postel EA, Agarwal A, Gallins P, et al.

Cigarette smoking strongly modifies the association of LOC387715 and age-

related macular degeneration. Am J Hum Genet. 2006;78(5):852-64.

83. DeAngelis MM, Ji F, Kim IK, Adams S, Capone A, Jr., Ott J, et al. Cigarette

smoking, CFH, APOE, ELOVL4, and risk of neovascular age-related macular

degeneration. Arch Ophthalmol. 2007;125(1):49-54.

84. Naj AC, Scott WK, Courtenay MD, Cade WH, Schwartz SG, Kovach JL, et al.

Genetic factors in nonsmokers with age-related macular degeneration revealed

through genome-wide gene-environment interaction analysis. Ann Hum Genet.

2013;77(3):215-31.

85. Hall JB, Bailey JNC, Hoffman JD, Pericak-Vance MA, Scott WK, Kovach JL, et

al. Estimating cumulative pathway effects on risk for age-related macular

degeneration using mixed linear models. Bmc Bioinformatics. 2015;16.

205

86. Sardell RJ, Bailey JN, Courtenay MD, Whitehead P, Laux RA, Adams LD, et al.

Whole exome sequencing of extreme age-related macular degeneration

phenotypes. Mol Vis. 2016;22:1062-76.

87. Winkler TW, Brandl C, Grassmann F, Gorski M, Stark K, Loss J, et al.

Investigating the modulation of genetic effects on late AMD by age and sex:

Lessons learned and two additional loci. PLOS One. 2018;13(3):e0194321.

88. Auer PL, Lettre G. Rare variant association studies: considerations, challenges

and opportunities. Genome Med. 2015;7(1):16.

89. Hatzikotoulas K, Gilly A, Zeggini E. Using population isolates in genetic

association studies. Brief Funct Genomics. 2014;13(5):371-7.

90. Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, et al. Searching

for missing heritability: designing rare variant association studies. Proc Natl Acad

Sci U S A. 2014;111(4):E455-64.

91. Strauss KA, Puffenberger EG. Genetics, medicine, and the Plain people. Annu

Rev Genomics Hum Genet. 2009;10:513-36.

92. Kraybill DB, Johnson-Weiner KM, Nolt SM. The Amish. Baltimore: Johns

Hopkins University Press; 2013.

93. McKusick VA, Hostetler JA, Egeland JA. GENETIC STUDIES OF THE

AMISH, BACKGROUND AND POTENTIALITIES. Bull Johns Hopkins Hosp.

1964;115:203-22.

94. McKusick VA. The Amish. Endeavour. 1980;4(2):52-7.

95. Nittala MG, Song YE, Sardell R, Adams LD, Pan S, Velaga SB, et al. AMISH

EYE STUDY: Baseline Spectral Domain Optical Coherence Tomography

206

Characteristics of Age-Related Macular Degeneration. Retina. 2019;39(8):1540-

50.

96. Young Center for Anabaptist and Pietist Studies. Amish Population Profile, 2018

Elizabethtown College2018 [cited 2020. Available from:

http://groups.etown.edu/amishstudies/statistics/amish-population-profile-2018.

97. Donnermeyer JF, Anderson C, Cooksey EC. The Amish Population: County

Estimates and Settlement Patterns. Journal of Amish and Plain Anabaptist

Studies. 2013;1(1):72-109.

98. Cross HE, Crosby AH. Amish Contributions to Medical Genetics. Mennonite

Quarterly Review. 2008;82(3):449-67.

99. Kessler MD, Loesch DP, Perry JA, Heard-Costa NL, Taliun D, Cade BE, et al. De

novo mutations across 1,465 diverse genomes reveal mutational insights and

reductions in the Amish founder population. Proc Natl Acad Sci U S A.

2020;117(5):2560-9.

100. Bray SM, Mulle JG, Dodd AF, Pulver AE, Wooding S, Warren ST. Signatures of

founder effects, admixture, and selection in the Ashkenazi Jewish population.

Proc Natl Acad Sci U S A. 2010;107(37):16222-7.

101. Inglish P. Inherited Physical Disorders and the Amish Baby Boom: Owlcation;

2016 [cited 2020. Available from: https://owlcation.com/stem/Some-Inherited-

Physical-Conditions-and-Illnesses-of-the-Amish].

102. Pollin TI, Damcott CM, Shen H, Ott SH, Shelton J, Horenstein RB, et al. A null

mutation in human APOC3 confers a favorable plasma lipid profile and apparent

cardioprotection. Science. 2008;322(5908):1702-5.

207

103. Crawford DC, Dumitrescu L, Goodloe R, Brown-Gentry K, Boston J, McClellan

B, Jr., et al. Rare variant APOC3 R19X is associated with cardio-protective

profiles in a diverse population-based survey as part of the Epidemiologic

Architecture for Genes Linked to Environment Study. Circ Cardiovasc Genet.

2014;7(6):848-53.

104. Francomano CA, McKusick VA, Biesecker LG. Medical genetic studies in the

Amish: historical perspective. Am J Med Genet C Semin Med Genet.

2003;121C(1):1-4.

105. Mitchell BD, Hsueh WC, King TM, Pollin TI, Sorkin J, Agarwala R, et al.

Heritability of life span in the Old Order Amish. Am J Med Genet.

2001;102(4):346-52.

106. Sorkin J, Post W, Pollin TI, O'Connell JR, Mitchell BD, Shuldiner AR. Exploring

the genetics of longevity in the Old Order Amish. Mech Ageing Dev.

2005;126(2):347-50.

107. Yerges-Armstrong LM, Chai S, O'Connell JR, Curran JE, Blangero J, Mitchell

BD, et al. Gene Expression Differences Between Offspring of Long-Lived

Individuals and Controls in Candidate Longevity Regions: Evidence for PAPSS2

as a Longevity Gene. J Gerontol A Biol Sci Med Sci. 2016;71(10):1295-9.

108. Ben-Avraham D, Govindaraju DR, Budagov T, Fradin D, Durda P, Liu B, et al.

The GH receptor exon 3 deletion is a marker of male-specific exceptional

longevity associated with increased GH sensitivity and taller stature. Sci Adv.

2017;3(6):e1602025.

208

109. Edwards DR, Gilbert JR, Jiang L, Gallins PJ, Caywood L, Creason M, et al.

Successful aging shows linkage to chromosomes 6, 7, and 14 in the Amish. Ann

Hum Genet. 2011;75(4):516-28.

110. Courtenay MD, Gilbert JR, Jiang L, Cummings AC, Gallins PJ, Caywood L, et al.

Mitochondrial haplogroup X is associated with successful aging in the Amish.

Hum Genet. 2012;131(2):201-8.

111. Edwards DR, Gilbert JR, Hicks JE, Myers JL, Jiang L, Cummings AC, et al.

Linkage and association of successful aging to the 6q25 region in large Amish

kindreds. Age (Dordr). 2013;35(4):1467-77.

112. Khan SS, Shah SJ, Klyachko E, Baldridge AS, Eren M, Place AT, et al. A null

mutation in SERPINE1 protects against biological aging in humans. Sci Adv.

2017;3(11):eaao1617.

113. Lee SL, Murdock DG, McCauley JL, Bradford Y, Crunk A, McFarland L, et al. A

genome-wide scan in an Amish pedigree with parkinsonism. Ann Hum Genet.

2008;72(Pt 5):621-9.

114. Cummings AC, Lee SL, McCauley JL, Jiang L, Crunk A, McFarland LL, et al. A

genome-wide linkage screen in the Amish with Parkinson disease points to

chromosome 6. Ann Hum Genet. 2011;75(3):351-8.

115. Davis MF, Cummings AC, D'Aoust LN, Jiang L, Velez Edwards DR, Laux R, et

al. Parkinson disease loci in the mid-western Amish. Hum Genet.

2013;132(11):1213-21.

209

116. Racette BA, Good LM, Kissel AM, Criswell SR, Perlmutter JS. A population-

based study of parkinsonism in an Amish community. Neuroepidemiology.

2009;33(3):225-30.

117. Racette BA, Rundle M, Wang JC, Goate A, Saccone NL, Farrer M, et al. A multi-

incident, Old-Order Amish family with PD. Neurology. 2002;58(4):568-74.

118. Pericak-Vance MA, Johnson CC, Rimmler JB, Saunders AM, Robinson LC,

D'Hondt EG, et al. Alzheimer's disease and apolipoprotein E-4 allele in an Amish

population. Ann Neurol. 1996;39(6):700-4.

119. Holder J, Warren AC. Prevalence of Alzheimer's disease and apolipoprotein E

allele frequencies in the Old Order Amish. J Neuropsychiatry Clin Neurosci.

1998;10(1):100-2.

120. van der Walt JM, Scott WK, Slifer S, Gaskell PC, Martin ER, Welsh-Bohmer K,

et al. Maternal lineages and Alzheimer disease risk in the Old Order Amish. Hum

Genet. 2005;118(1):115-22.

121. Cummings AC, Jiang L, Velez Edwards DR, McCauley JL, Laux R, McFarland

LL, et al. Genome-wide association and linkage study in the Amish detects a

novel candidate late-onset Alzheimer disease gene. Ann Hum Genet.

2012;76(5):342-51.

122. D'Aoust LN, Cummings AC, Laux R, Fuzzell D, Caywood L, Reinhart-Mercer L,

et al. Examination of candidate exonic variants for association to Alzheimer

disease in the Amish. PLOS One. 2015;10(2):e0118043.

210

123. Ashley-Koch AE, Shao Y, Rimmler JB, Gaskell PC, Welsh-Bohmer KA, Jackson

CE, et al. An autosomal genomic screen for dementia in an extended Amish

family. Neurosci Lett. 2005;379(3):199-204.

124. Hahs DW, McCauley JL, Crunk AE, McFarland LL, Gaskell PC, Jiang L, et al. A

genome-wide linkage analysis of dementia in the Amish. Am J Med Genet B

Neuropsychiatr Genet. 2006;141B(2):160-6.

125. McCauley JL, Hahs DW, Jiang L, Scott WK, Welsh-Bohmer KA, Jackson CE, et

al. Combinatorial Mismatch Scan (CMS) for loci associated with dementia in the

Amish. BMC Med Genet. 2006;7:19.

126. Johnson CC, Rybicki BA, Brown G, D'Hondt E, Herpolsheimer B, Roth D, et al.

Cognitive impairment in the Amish: a four county survey. Int J Epidemiol.

1997;26(2):387-94.

127. Stambolian D, Ciner EB, Reider LC, Moy C, Dana D, Owens R, et al. Genome-

wide scan for myopia in the Old Order Amish. Am J Ophthalmol.

2005;140(3):469-76.

128. Peet JA, Cotch MF, Wojciechowski R, Bailey-Wilson JE, Stambolian D.

Heritability and familial aggregation of refractive error in the Old Order Amish.

Invest Ophthalmol Vis Sci. 2007;48(9):4002-6.

129. Wojciechowski R, Bailey-Wilson JE, Stambolian D. Fine-mapping of candidate

region in Amish and Ashkenazi families confirms linkage of refractive error to a

QTL on 1p34-p36. Mol Vis. 2009;15:1398-406.

130. Wojciechowski R, Stambolian D, Ciner E, Ibay G, Holmes TN, Bailey-Wilson

JE. Genomewide linkage scans for ocular refraction and meta-analysis of four

211

populations in the Myopia Family Study. Invest Ophthalmol Vis Sci.

2009;50(5):2024-32.

131. Wojciechowski R, Bailey-Wilson JE, Stambolian D. Association of matrix

metalloproteinase gene polymorphisms with refractive error in Amish and

Ashkenazi families. Invest Ophthalmol Vis Sci. 2010;51(10):4989-95.

132. Wojciechowski R, Yee SS, Simpson CL, Bailey-Wilson JE, Stambolian D. Matrix

metalloproteinases and educational attainment in refractive error: evidence of

gene-environment interactions in the Age-Related Eye Disease Study.

Ophthalmology. 2013;120(2):298-305.

133. Musolf AM, Simpson CL, Alexander TA, Portas L, Murgia F, Ciner EB, et al.

Genome-wide scans of myopia in Pennsylvania Amish families reveal significant

linkage to 12q15, 8q21.3 and 5p15.33. Hum Genet. 2019;138(4):339-54.

134. Cross HE. Population studies and the Old Order Amish. Nature.

1976;262(5563):17-20.

135. Ferketich AK, Katz ML, Kauffman RM, Paskett ED, Lemeshow S, Westman JA,

et al. Tobacco use among the Amish in Holmes County, Ohio. J Rural Health.

2008;24(1):84-90.

136. Hoffman JD, Cooke Bailey JN, D'Aoust L, Cade W, Ayala-Haedo J, Fuzzell D, et

al. Rare complement factor H variant associated with age-related macular

degeneration in the Amish. Invest Ophthalmol Vis Sci. 2014;55(7):4455-60.

137. Sardell RJ, Nittala MG, Adams LD, Laux RA, Cooke Bailey JN, Fuzzell D, et al.

Heritability of Choroidal Thickness in the Amish. Ophthalmology.

2016;123(12):2537-44.

212

138. Velaga SB, Nittala MG, Vupparaboina KK, Jana S, Chhablani J, Haines J, et al.

Choroidal Vascularity Index and Choroidal Thickness in Eyes with Reticular

Pseudodrusen. Retina. 2019.

139. Chavali VR, Diniz B, Huang J, Ying GS, Sadda SR, Stambolian D. Association of

OCT derived drusen measurements with AMD associated-genotypic SNPs in

Amish population. J Clin Med. 2015;4(2):304-17.

140. Nittala MG, Velaga SB, Hariri A, Pfau M, Birch DG, Haines J, et al. Retinal

Sensitivity Using Microperimetry in Age-Related Macular Degeneration in an

Amish Population. Ophthalmic Surg Lasers Imaging Retina. 2019;50(9):e236-

e41.

141. Raychaudhuri S, Iartchouk O, Chin K, Tan PL, Tai AK, Ripke S, et al. A rare

penetrant mutation in CFH confers high risk of age-related macular degeneration.

Nat Genet. 2011;43(12):1232-6.

142. Waksmunski AR, Igo RP, Jr., Song YE, Cooke Bailey JN, Laux R, Fuzzell D, et

al. Rare variants and loci for age-related macular degeneration in the Ohio and

Indiana Amish. Hum Genet. 2019;138(10):1171-82.

143. Hageman GS, Anderson DH, Johnson LV, Hancox LS, Taiber AJ, Hardisty LI, et

al. A common haplotype in the complement regulatory gene factor H (HF1/CFH)

predisposes individuals to age-related macular degeneration. Proc Natl Acad Sci

U S A. 2005;102(20):7227-32.

144. Warwick A, Khandhadia S, Ennis S, Lotery A. Age-Related Macular

Degeneration: A Disease of Systemic or Local Complement Dysregulation? J Clin

Med. 2014;3(4):1234-57.

213

145. Scholl HP, Charbel Issa P, Walier M, Janzer S, Pollok-Kopp B, Borncke F, et al.

Systemic complement activation in age-related macular degeneration. PLOS One.

2008;3(7):e2593.

146. Weber BH, Charbel Issa P, Pauly D, Herrmann P, Grassmann F, Holz FG. The

role of the complement system in age-related macular degeneration. Dtsch

Arztebl Int. 2014;111(8):133-8.

147. Agarwala R, Biesecker LG, Schaffer AA. Anabaptist genealogy database. Am J

Med Genet C Semin Med Genet. 2003;121C(1):32-7.

148. Garbe JR, Da Y. Pedigraph: A Software Tool for the Graphing and Analysis of

Large Complex Pedigree.2008; User manual Version 2.4. Available from:

https://animalgene.umn.edu/sites/animalgene.umn.edu/files/pedigraph_manual.pd

f.

149. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ. The Phyre2 web

portal for protein modeling, prediction and analysis. Nat Protoc. 2015;10(6):845-

58.

150. Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, et

al. UCSF Chimera--a visualization system for exploratory research and analysis. J

Comput Chem. 2004;25(13):1605-12.

151. Schneider CA, Rasband WS, Eliceiri KW. NIH Image to ImageJ: 25 years of

image analysis. Nat Methods. 2012;9(7):671-5.

152. Davarinejad H. Quantifications of western blots with Image J 2017 [Available

from: http://www.yorku.ca/yisheng/Internal/Protocols/ImageJ.pdf.

214

153. Ansari M, McKeigue PM, Skerka C, Hayward C, Rudan I, Vitart V, et al. Genetic

influences on plasma CFH and CFHR1 concentrations and their role in

susceptibility to age-related macular degeneration. Hum Mol Genet.

2013;22(23):4857-69.

154. Grunin M, Burstyn-Cohen T, Hagbi-Levi S, Peled A, Chowers I. Chemokine

receptor expression in peripheral blood monocytes from patients with neovascular

age-related macular degeneration. Invest Ophthalmol Vis Sci. 2012;53(9):5292-

300.

155. Dewey FE, Murray MF, Overton JD, Habegger L, Leader JB, Fetterolf SN, et al.

Distribution and clinical impact of functional variants in 50,726 whole-exome

sequences from the DiscovEHR study. Science. 2016;354(6319).

156. The NHLBI Trans-Omics for Precision Medicine (TOPMed) Whole Genome

Sequencing Program. BRAVO variant browser: University of Michigan and

NHLBI; 2018. [Available from: https://bravo.sph.umich.edu/freeze5/hg38/].

157. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, Wang Q, et al.

Variation across 141,456 human exomes and genomes reveals the spectrum of

loss-of-function intolerance across human protein-coding genes. 2019.

158. Carey DJ, Fetterolf SN, Davis FD, Faucett WA, Kirchner HL, Mirshahi U, et al.

The Geisinger MyCode community health initiative: an electronic health record-

linked biobank for precision medicine research. Genet Med. 2016;18(9):906-13.

159. Kersten E, Geerlings MJ, den Hollander AI, de Jong EK, Fauser S, Peto T, et al.

Phenotype Characteristics of Patients With Age-Related Macular Degeneration

215

Carrying a Rare Variant in the Complement Factor H Gene. JAMA Ophthalmol.

2017;135(10):1037-44.

160. Triebwasser MP, Roberson ED, Yu Y, Schramm EC, Wagner EK, Raychaudhuri

S, et al. Rare Variants in the Functional Domains of Complement Factor H Are

Associated With Age-Related Macular Degeneration. Invest Ophthalmol Vis Sci.

2015;56(11):6873-8.

161. Wagner EK, Raychaudhuri S, Villalonga MB, Java A, Triebwasser MP, Daly MJ,

et al. Mapping rare, deleterious mutations in Factor H: Association with early

onset, drusen burden, and lower antigenic levels in familial AMD. Sci Rep.

2016;6:31531.

162. Johnson PT, Betts KE, Radeke MJ, Hageman GS, Anderson DH, Johnson LV.

Individuals homozygous for the age-related macular degeneration risk-conferring

variant of complement factor H have elevated levels of CRP in the choroid. Proc

Natl Acad Sci U S A. 2006;103(46):17456-61.

163. Ferreira VP, Pangburn MK, Cortes C. Complement control protein factor H: the

good, the bad, and the inadequate. Mol Immunol. 2010;47(13):2187-97.

164. Shaw PX, Zhang L, Zhang M, Du H, Zhao L, Lee C, et al. Complement factor H

genotypes impact risk of age-related macular degeneration by interaction with

oxidized phospholipids. Proc Natl Acad Sci U S A. 2012;109(34):13757-62.

165. Tortajada A, Montes T, Martinez-Barricarte R, Morgan BP, Harris CL, de

Cordoba SR. The disease-protective complement factor H allotypic variant Ile62

shows increased binding affinity for C3b and enhanced cofactor activity. Hum

Mol Genet. 2009;18(18):3452-61.

216

166. Yu J, Wiita P, Kawaguchi R, Honda J, Jorgensen A, Zhang K, et al. Biochemical

analysis of a common human polymorphism associated with age-related macular

degeneration. Biochemistry. 2007;46(28):8451-61.

167. Cao S, Wang JC, Gao J, Wong M, To E, White VA, et al. CFH Y402H

polymorphism and the complement activation product C5a: effects on NF-kappaB

activation and inflammasome gene regulation. Br J Ophthalmol. 2016;100(5):713-

8.

168. Zhou R, Caspi RR. Ocular immune privilege. F1000 Biol Rep. 2010;2.

169. Anderson DH, Radeke MJ, Gallo NB, Chapin EA, Johnson PT, Curletti CR, et al.

The pivotal role of the complement system in aging and age-related macular

degeneration: hypothesis re-visited. Prog Retin Eye Res. 2010;29(2):95-112.

170. Illumina. GenomeStudio Software [Available from:

http://www.illumina.com/techniques/microarrays/array-data-analysis-

experimental-design/genomestudio.html.

171. Grove ML, Yu B, Cochran BJ, Haritunians T, Bis JC, Taylor KD, et al. Best

practices and joint calling of the HumanExome BeadChip: the CHARGE

Consortium. PLOS One. 2013;8(7):e68095.

172. Guo Y, He J, Zhao S, Wu H, Zhong X, Sheng Q, et al. Illumina human exome

genotyping array clustering and quality control. Nat Protoc. 2014;9(11):2643-62.

173. Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, et al. zCall:

a rare variant caller for array-based genotyping: genetics and population analysis.

Bioinformatics. 2012;28(19):2543-5.

217

174. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al.

PLINK: a tool set for whole-genome association and population-based linkage

analyses. Am J Hum Genet. 2007;81(3):559-75.

175. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D.

Principal components analysis corrects for stratification in genome-wide

association studies. Nat Genet. 2006;38(8):904-9.

176. Thornton T, McPeek MS. ROADTRIPS: case-control association testing with

partially or completely unknown population and pedigree structure. Am J Hum

Genet. 2010;86(2):172-84.

177. Bourgain C. KinInbcoef: Calculation of kinship and inbreeding coefficients. 2003.

178. Shalev V, Sror M, Goldshtein I, Kokia E, Chodick G. Statin Use and the Risk of

Age Related Macular Degeneration in a Large Health Organization in Israel.

Ophthal Epidemiol. 2011;18(2):83-90.

179. Abecasis Lab. Exome Chip Design Wiki Site 2013 [Available from:

https://genome.sph.umich.edu/wiki/Exome_Chip_Design.

180. Liu F, Kirichenko A, Axenovich TI, van Duijn CM, Aulchenko YS. An approach

for cutting large and complex pedigrees for linkage analysis. Eur J Hum Genet.

2008;16(7):854-60.

181. Abecasis GR, Cherny SS, Cookson WO, Cardon LR. Merlin--rapid analysis of

dense genetic maps using sparse gene flow trees. Nat Genet. 2002;30(1):97-101.

182. Machiela MJ, Chanock SJ. LDlink: a web-based application for exploring

population-specific haplotype structure and linking correlated alleles of possible

functional variants. Bioinformatics. 2015;31(21):3555-7.

218

183. Sofat R, Casas JP, Webster AR, Bird AC, Mann SS, Yates JR, et al. Complement

factor H genetic variant and age-related macular degeneration: effect size,

modifiers and relationship to disease subtype. Int J Epidemiol. 2012;41(1):250-62.

184. Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, et al.

ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and

pathway annotation networks. Bioinformatics. 2009;25(8):1091-3.

185. Ding H, Schertzer M, Wu X, Gertsenstein M, Selig S, Kammori M, et al.

Regulation of murine telomere length by Rtel: an essential gene encoding a

helicase-like protein. Cell. 2004;117(7):873-86.

186. Barber LJ, Youds JL, Ward JD, McIlwraith MJ, O'Neil NJ, Petalcorin MI, et al.

RTEL1 maintains genomic stability by suppressing homologous recombination.

Cell. 2008;135(2):261-71.

187. Codd V, Nelson CP, Albrecht E, Mangino M, Deelen J, Buxton JL, et al.

Identification of seven loci affecting mean telomere length and their association

with disease. Nat Genet. 2013;45(4):422-7.

188. von Zglinicki T, Martin-Ruiz CM. Telomeres as biomarkers for ageing and age-

related diseases. Curr Mol Med. 2005;5(2):197-203.

189. Shaw PX, Stiles T, Douglas C, Ho D, Fan W, Du H, et al. Oxidative stress, innate

immunity, and age-related macular degeneration. AIMS Mol Sci. 2016;3(2):196-

221.

190. Suzuki K, Lareyre JJ, Sanchez D, Gutierrez G, Araki Y, Matusik RJ, et al.

Molecular evolution of epididymal lipocalin genes localized on mouse

chromosome 2. Gene. 2004;339:49-59.

219

191. Parmar T, Parmar VM, Perusek L, Georges A, Takahashi M, Crabb JW, et al.

Lipocalin 2 Plays an Important Role in Regulating Inflammation in Retinal

Degeneration. J Immunol. 2018;200(9):3128-41.

192. Kallo G, Emri M, Varga Z, Ujhelyi B, Tozser J, Csutak A, et al. Changes in the

Chemical Barrier Composition of Tears in Alzheimer's Disease Reveal Potential

Tear Diagnostic Biomarkers. PLOS One. 2016;11(6):e0158000.

193. Wang JC, Ku HY, Chen TS, Chuang HS. Detection of low-abundance biomarker

lipocalin 1 for diabetic retinopathy using optoelectrokinetic bead-based

immunosensing. Biosens Bioelectron. 2017;89(Pt 2):701-9.

194. Feng W, Zhang M. Organization and dynamics of PDZ-domain-related

supramodules in the postsynaptic density. Nat Rev Neurosci. 2009;10(2):87-99.

195. Li J, Wilkinson B, Clementel VA, Hou J, O'Dell TJ, Coba MP. Long-term

potentiation modulates synaptic phosphorylation networks and reshapes the

structure of the postsynaptic interactome. Sci Signal. 2016;9(440):rs8.

196. Young TL, Ronan SM, Drahozal LA, Wildenberg SC, Alvear AB, Oetting WS, et

al. Evidence that a locus for familial high myopia maps to chromosome 18p. Am J

Hum Genet. 1998;63(1):109-19.

197. Scavello GS, Jr., Paluru PC, Zhou J, White PS, Rappaport EF, Young TL.

Genomic structure and organization of the high grade Myopia-2 locus (MYP2)

critical region: mutation screening of 9 positional candidate genes. Mol Vis.

2005;11:97-110.

198. Skeie JM, Mahajan VB. Proteomic landscape of the human choroid-retinal

pigment epithelial complex. JAMA Ophthalmol. 2014;132(11):1271-81.

220

199. Lee MK, Shaffer JR, Leslie EJ, Orlova E, Carlson JC, Feingold E, et al. Genome-

wide association study of facial morphology reveals novel associations with

FREM1 and PARK2. PLOS One. 2017;12(4):e0176566.

200. Glaeser K, Urban M, Fenech E, Voloshanenko O, Kranz D, Lari F, et al. ERAD-

dependent control of the Wnt secretory factor Evi. EMBO J. 2018;37(4).

201. Zhou T, Hu Y, Chen Y, Zhou KK, Zhang B, Gao G, et al. The pathogenic role of

the canonical Wnt pathway in age-related macular degeneration. Invest

Ophthalmol Vis Sci. 2010;51(9):4371-9.

202. Wiggs JL, Yaspan BL, Hauser MA, Kang JH, Allingham RR, Olson LM, et al.

Common variants at 9p21 and 8q22 are associated with increased susceptibility to

optic nerve degeneration in glaucoma. PLOS Genet. 2012;8(4):e1002654.

203. Wang L, Clark ME, Crossman DK, Kojima K, Messinger JD, Mobley JA, et al.

Abundant lipid and protein components of drusen. PLOS One. 2010;5(4):e10329.

204. Merle BM, Benlian P, Puche N, Bassols A, Delcourt C, Souied EH, et al.

Circulating omega-3 Fatty acids and neovascular age-related macular

degeneration. Invest Ophthalmol Vis Sci. 2014;55(3):2010-9.

205. Paun CC, Ersoy L, Schick T, Groenewoud JM, Lechanteur YT, Fauser S, et al.

Genetic Variants and Systemic Complement Activation Levels Are Associated

With Serum Lipoprotein Levels in Age-Related Macular Degeneration. Invest

Ophthalmol Vis Sci. 2015;56(13):7766-73.

206. Roh MI, Kim JH, Byeon SH, Koh HJ, Lee SC, Kwon OW. Estimated prevalence

and risk factor for age-related maculopathy. Yonsei Med J. 2008;49(6):931-41.

221

207. Semba RD, Cotch MF, Gudnason V, Eiriksdottir G, Harris TB, Sun K, et al.

Serum carboxymethyllysine, an advanced glycation end product, and age-related

macular degeneration: the Age, Gene/Environment Susceptibility-Reykjavik

Study. JAMA Ophthalmol. 2014;132(4):464-70.

208. Munch IC, Linneberg A, Larsen M. Precursors of age-related macular

degeneration: associations with physical activity, obesity, and serum lipids in the

inter99 eye study. Invest Ophthalmol Vis Sci. 2013;54(6):3932-40.

209. Nowak M, Swietochowska E, Marek B, Szapska B, Wielkoszynski T, Kos-Kudla

B, et al. Changes in lipid metabolism in women with age-related macular

degeneration. Clin Exp Med. 2005;4(4):183-7.

210. Colijn JM, den Hollander AI, Demirkan A, Cougnard-Gregoire A, Verzijden T,

Kersten E, et al. Increased High-Density Lipoprotein Levels Associated with Age-

Related Macular Degeneration: Evidence from the EYE-RISK and European Eye

Epidemiology Consortia. Ophthalmology. 2019;126(3):393-406.

211. Hirschhorn JN, Daly MJ. Genome-wide association studies for common diseases

and complex traits. Nat Rev Genet. 2005;6(2):95-108.

212. Sobrin L, Ripke S, Yu Y, Fagerness J, Bhangale TR, Tan PL, et al. Heritability

and Genome-Wide Association Study to Assess Genetic Differences between

Advanced Age-related Macular Degeneration Subtypes. Ophthalmology.

2012;119(9):1874-85.

213. Ishikawa K, Kannan R, Hinton DR. Molecular mechanisms of subretinal fibrosis

in age-related macular degeneration. Exp Eye Res. 2016;142:19-25.

222

214. Ghosh S, Shang P, Terasaki H, Stepicheva N, Hose S, Yazdankhah M, et al. A

Role for betaA3/A1-Crystallin in Type 2 EMT of RPE Cells Occurring in Dry

Age-Related Macular Degeneration. Invest Ophthalmol Vis Sci.

2018;59(4):AMD104-AMD13.

215. Ishikawa K, Sreekumar PG, Spee C, Nazari H, Zhu D, Kannan R, et al. alphaB-

Crystallin Regulates Subretinal Fibrosis by Modulation of Epithelial-

Mesenchymal Transition. Am J Pathol. 2016;186(4):859-73.

216. Kobayashi M, Tokuda K, Kobayashi Y, Yamashiro C, Uchi SH, Hatano M, et al.

Suppression of Epithelial-Mesenchymal Transition in Retinal Pigment Epithelial

Cells by an MRTF-A Inhibitor. Invest Ophthalmol Vis Sci. 2019;60(2):528-37.

217. Zhang M, Volpert O, Shi YH, Bouck N. Maspin is an angiogenesis inhibitor. Nat

Med. 2000;6(2):196-9.

218. Shellenberger TD, Mazumdar A, Henderson Y, Briggs K, Wang M,

Chattopadhyay C, et al. Headpin: a serpin with endogenous and exogenous

suppression of angiogenesis. Cancer Res. 2005;65(24):11501-9.

219. Pescosolido N, Barbato A, Pascarella A, Giannotti R, Genzano M, Nebbioso M.

Role of Protease-Inhibitors in Ocular Diseases. Molecules. 2014;19(12):20557-

69.

220. Baratz KH, Tosakulwong N, Ryu E, Brown WL, Branham K, Chen W, et al. E2-2

protein and Fuchs's corneal dystrophy. N Engl J Med. 2010;363(11):1016-24.

221. Li YJ, Minear MA, Rimmler J, Zhao B, Balajonda E, Hauser MA, et al.

Replication of TCF4 through Association and Linkage Studies in Late-Onset

Fuchs Endothelial Corneal Dystrophy. PLOS One. 2011;6(4).

223

222. Riazuddin SA, McGlumphy EJ, Yeo WS, Wang J, Katsanis N, Gottsch JD.

Replication of the TCF4 intronic variant in late-onset Fuchs corneal dystrophy

and evidence of independence from the FCD2 locus. Invest Ophthalmol Vis Sci.

2011;52(5):2825-9.

223. Thalamuthu A, Khor CC, Venkataraman D, Koh LW, Tan DT, Aung T, et al.

Association of TCF4 gene polymorphisms with Fuchs' corneal dystrophy in the

Chinese. Invest Ophthalmol Vis Sci. 2011;52(8):5573-8.

224. Kuot A, Hewitt AW, Griggs K, Klebe S, Mills R, Jhanji V, et al. Association of

TCF4 and CLU polymorphisms with Fuchs' endothelial dystrophy and

implication of CLU and TGFBI proteins in the disease process. Eur J Hum Genet.

2012;20(6):632-8.

225. Male DA, Ormsby RJ, Ranganathan S, Giannakis E, Gordon DL. Complement

factor H: sequence analysis of 221 kb of human genomic DNA containing the

entire fH, fHR-1 and fHR-3 genes. Mol Immunol. 2000;37(1-2):41-52.

226. Martinez-Barricarte R, Recalde S, Fernandez-Robredo P, Millan I, Olavarrieta L,

Vinuela A, et al. Relevance of complement factor H-related 1 (CFHR1) genotypes

in age-related macular degeneration. Invest Ophthalmol Vis Sci.

2012;53(3):1087-94.

227. Lores-Motta L, Paun CC, Corominas J, Pauper M, Geerlings MJ, Altay L, et al.

Genome-Wide Association Study Reveals Variants in CFH and CFHR4

Associated with Systemic Complement Activation: Implications in Age-Related

Macular Degeneration. Ophthalmology. 2018;125(7):1064-74.

224

228. Hughes AE, Orr N, Esfandiary H, Diaz-Torres M, Goodship T, Chakravarthy U.

A common CFH haplotype, with deletion of CFHR1 and CFHR3, is associated

with lower risk of age-related macular degeneration. Nat Genet.

2006;38(10):1173-7.

229. Hageman GS, Hancox LS, Taiber AJ, Gehrs KM, Anderson DH, Johnson LV, et

al. Extended haplotypes in the complement factor H (CFH) and CFH-related

(CFHR) family of genes protect against age-related macular degeneration:

Characterization, ethnic distribution and evolutionary implications. Ann Med.

2006;38(8):592-604.

230. Fritsche LG, Lauer N, Hartmann A, Stippa S, Keilhauer CN, Oppermann M, et al.

An imbalance of human complement regulatory proteins CFHR1, CFHR3 and

factor H influences risk for age-related macular degeneration (AMD). Hum Mol

Genet. 2010;19(23):4694-704.

231. Waksmunski AR, Grunin M, Kinzy TG, Igo RP, Jr., Haines JL, Cooke Bailey JN,

et al. Pathway Analysis Integrating Genome-Wide and Functional Data Identifies

PLCG2 as a Candidate Gene for Age-Related Macular Degeneration. Invest

Ophthalmol Vis Sci. 2019;60(12):4041-51.

232. Brown GC, Sharma S, Brown MM, Kistler J. Utility Values and Age-related

Macular Degeneration. Arch Ophthalmol. 2000;118(1):47-51.

233. Scott AW, Bressler NM, Ffolkes S, Wittenborn JS, Jorkasky J. Public Attitudes

About Eye and Vision Health. JAMA Ophthalmol. 2016;134(10):1111-8.

225

234. Yu Y, Bhangale TR, Fagerness J, Ripke S, Thorleifsson G, Tan PL, et al.

Common variants near FRK/COL10A1 and VEGFA are associated with advanced

age-related macular degeneration. Hum Mol Genet. 2011;20(18):3699-709.

235. Yaspan BL, Veatch OJ. Strategies for pathway analysis from GWAS data. Curr

Protoc Hum Genet. 2011;Chapter 1:Unit1 20.

236. Green ML, Karp PD. The outcomes of pathway database computations depend on

pathway ontology. Nucleic Acids Res. 2006;34(13):3687-97.

237. Yaspan BL, Bush WS, Torstenson ES, Ma DQ, Pericak-Vance MA, Ritchie MD,

et al. Genetic analysis of biological pathway data through genomic randomization.

Hum Genet. 2011;129(5):563-71.

238. Butkiewicz M, Bailey JNC, Frase A, Dudek S, Yaspan BL, Ritchie MD, et al.

Pathway analysis by randomization incorporating structure-PARIS: an update.

Bioinformatics. 2016;32(15):2361-3.

239. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic

Acids Res. 2000;28(1):27-30.

240. Haw RA, Croft D, Yung CK, Ndegwa N, D'Eustachio P, Hermjakob H, et al. The

Reactome BioMart. Database (Oxford). 2011;2011:bar031.

241. The Gene Ontology Consortium, Ashburner M, Ball CA, Blake JA, Botstein D,

Butler H, et al. Gene ontology: tool for the unification of biology. Nat Genet.

2000;25(1):25-9.

242. Kandasamy K, Mohan SS, Raju R, Keerthikumar S, Kumar GS, Venugopal AK,

et al. NetPath: a public resource of curated signal transduction pathways. Genome

Biol. 2010;11(1):R3.

226

243. Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et

al. STRING v10: protein-protein interaction networks, integrated over the tree of

life. Nucleic Acids Res. 2015;43(Database issue):D447-52.

244. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al.

The UCSC Table Browser data retrieval tool. Nucleic Acids Res.

2004;32(Database issue):D493-6.

245. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, et al. MEME

SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(Web

Server issue):W202-8.

246. Mathelier A, Fornes O, Arenillas DJ, Chen CY, Denay G, Lee J, et al. JASPAR

2016: a major expansion and update of the open-access database of transcription

factor binding profiles. Nucleic Acids Res. 2016;44(D1):D110-D5.

247. Kulakovskiy IV, Vorontsov IE, Yevshin IS, Sharipov RN, Fedorova AD,

Rumynskiy EI, et al. HOCOMOCO: towards a complete collection of

transcription factor binding models for human and mouse via large-scale ChIP-

Seq analysis. Nucleic Acids Res. 2018;46(D1):D252-D9.

248. Rhee SG, Bae YS. Regulation of phosphoinositide-specific phospholipase C

isozymes. J Biol Chem. 1997;272(24):15045-8.

249. Homma Y, Takenawa T, Emori Y, Sorimachi H, Suzuki K. Tissue- and cell type-

specific expression of mRNAs for four types of inositol phospholipid-specific

phospholipase C. Biochem Biophys Res Commun. 1989;164(1):406-12.

250. Inoue O, Suzuki-Inoue K, Dean WL, Frampton J, Watson SP. Integrin

alpha(2)beta(1) mediates outside-in regulation of platelet spreading on collagen

227

through activation of Src kinases and PLC-gamma 2. J Cell Biol.

2003;160(5):769-80.

251. Marshall AJ, Niiro H, Yun TJ, Clark EA. Regulation of B-cell activation and

differentiation by the phosphatidylinositol 3-kinase and phospholipase Cgamma

pathway. Immunol Rev. 2000;176:30-46.

252. Ohmori T, Yatomi Y, Wu Y, Osada M, Satoh K, Ozaki Y. Wheat germ

agglutinin-induced platelet activation via platelet endothelial cell adhesion

molecule-1: involvement of rapid phospholipase C gamma 2 activation by Src

family kinases. Biochemistry. 2001;40(43):12992-3001.

253. Wonerow P, Pearce AC, Vaux DJ, Watson SP. A critical role for phospholipase

Cgamma2 in alphaIIbbeta3-mediated platelet spreading. J Biol Chem.

2003;278(39):37520-9.

254. Zhang B, Gaiteri C, Bodea LG, Wang Z, McElwee J, Podtelezhnikov AA, et al.

Integrated systems approach identifies genetic nodes and networks in late-onset

Alzheimer's disease. Cell. 2013;153(3):707-20.

255. Han W, Takano T, He J, Ding J, Gao S, Noda C, et al. Role of BLNK in oxidative

stress signaling in B cells. Antioxid Redox Signal. 2001;3(6):1065-73.

256. Uckun F, Ozer Z, Vassilev A. Bruton's tyrosine kinase prevents activation of the

anti-apoptotic transcription factor STAT3 and promotes apoptosis in neoplastic B-

cells and B-cell precursors exposed to oxidative stress. Br J Haematol.

2007;136(4):574-89.

228

257. Chen XD, Su MY, Chen TT, Hong HY, Han AD, Li WS. Oxidative stress affects

retinal pigment epithelial cell survival through epidermal growth factor

receptor/AKT signaling pathway. Int J Ophthalmol-Chi. 2017;10(4):507-14.

258. Defoe DM, Grindstaff RD. Epidermal growth factor stimulation of RPE cell

survival: contribution of phosphatidylinositol 3-kinase and mitogen-activated

protein kinase pathways. Exp Eye Res. 2004;79(1):51-9.

259. Xu KP, Yu FSX. Cross talk between c-Met and epidermal growth factor receptor

during retinal pigment epithelial wound healing. Invest Ophth Vis Sci.

2007;48(5):2242-8.

260. Yan F, Hui YN, Li YJ, Guo CM, Meng H. Epidermal growth factor receptor in

cultured human retinal pigment epithelial cells. Ophthalmologica.

2007;221(4):244-50.

261. Sasore T, Kennedy B. Deciphering combinations of PI3K/AKT/mTOR pathway

drugs augmenting anti-angiogenic efficacy in vivo. PLOS One.

2014;9(8):e105280.

262. Manne BK, Badolia R, Dangelmaier C, Eble JA, Ellmeier W, Kahn M, et al.

Distinct pathways regulate Syk protein activation downstream of immune tyrosine

activation motif (ITAM) and hemITAM receptors in platelets. J Biol Chem.

2015;290(18):11557-68.

263. Abtahian F, Guerriero A, Sebzda E, Lu MM, Zhou R, Mocsai A, et al. Regulation

of blood and lymphatic vascular separation by signaling proteins SLP-76 and Syk.

Science. 2003;299(5604):247-51.

229

264. Guo DQ, Jia Q, Song HY, Warren RS, Donner DB. Vascular Endothelial-Cell

Growth-Factor Promotes Tyrosine Phosphorylation of Mediators of Signal-

Transduction That Contain Sh2 Domains - Association with Endothelial-Cell

Proliferation. J Biol Chem. 1995;270(12):6729-33.

265. Xia P, Aiello LP, Ishii H, Jiang ZY, Park DJ, Robinson GS, et al. Characterization

of vascular endothelial growth factor's effect on the activation of protein kinase C,

its isoforms, and endothelial cell growth. J Clin Invest. 1996;98(9):2018-26.

266. Ombrello MJ, Remmers EF, Sun G, Freeman AF, Datta S, Torabi-Parizi P, et al.

Cold urticaria, immunodeficiency, and autoimmunity related to PLCG2 deletions.

N Engl J Med. 2012;366(4):330-8.

267. Afroz S, Giddaluru J, Vishwakarma S, Naz S, Khan AA, Khan N. A

Comprehensive Gene Expression Meta-analysis Identifies Novel Immune

Signatures in Rheumatoid Arthritis Patients. Front Immunol. 2017;8:74.

268. Grassmann F, Kiel C, Zimmermann ME, Gorski M, Grassmann V, Stark K, et al.

Genetic pleiotropy between age-related macular degeneration and 16 complex

diseases and traits. Genome Med. 2017;9(1):29.

269. Keenan TDL, Goldacre R, Goldacre MJ. ASSOCIATIONS BETWEEN AGE-

RELATED MACULAR DEGENERATION, OSTEOARTHRITIS AND

RHEUMATOID ARTHRITIS Record Linkage Study. Retina-J Ret Vit Dis.

2015;35(12):2613-8.

270. Ratnapriya R, Sosina OA, Starostik MR, Kwicklis M, Kapphahn RJ, Fritsche LG,

et al. Retinal transcriptome and eQTL analyses identify genes associated with

age-related macular degeneration. Nat Genet. 2019;51(4):606-10.

230

271. Rickman CB, Ebright JN, Zavodni Z, Yu L, Wang T, Daiger SP, et al. Defining

the human macula transcriptome and 14 candidate retinal disease genes using

EyeSAGE. Invest Ophth Vis Sci. 2006;47(6):2305-16.

272. Farkas MH, Grant GR, White JA, Sousa ME, Consugar MB, Pierce EA.

Transcriptome analyses of the human retina identify unprecedented transcript

diversity and 3.5 Mb of novel transcribed sequence via significant alternative

splicing and novel genes. BMC Genomics. 2013;14:486.

273. Magno L, Lessard CB, Martins M, Lang V, Cruz P, Asi Y, et al. Alzheimer's

disease phospholipase C-gamma-2 (PLCG2) protective variant is a functional

hypermorph. Alzheimer's Res Ther. 2019;11(1):16.

274. Sims R, van der Lee SJ, Naj AC, Bellenguez C, Badarinarayan N, Jakobsdottir J,

et al. Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-

mediated innate immunity in Alzheimer's disease. Nat Genet. 2017;49(9):1373-

84.

275. Conway OJ, Carrasquillo MM, Wang X, Bredenberg JM, Reddy JS, Strickland

SL, et al. ABI3 and PLCG2 missense variants as risk factors for

neurodegenerative diseases in Caucasians and African Americans. Mol

Neurodegener. 2018;13.

276. Courtenay MD, Cade WH, Schwartz SG, Kovach JL, Agarwal A, Wang GF, et al.

Set-Based Joint Test of Interaction Between SNPs in the VEGF Pathway and

Exogenous Estrogen Finds Association With Age-Related Macular Degeneration.

Invest Ophth Vis Sci. 2014;55(8):4873-9.

231

277. Rudolph A, Hein R, Lindstrom S, Beckmann L, Behrens S, Liu J, et al. Genetic

modifiers of menopausal hormone replacement therapy and breast cancer risk: a

genome-wide interaction study. Endocr Relat Cancer. 2013;20(6):875-87.

278. Rudolph A, Fasching PA, Behrens S, Eilber U, Bolla MK, Wang Q, et al. A

comprehensive evaluation of interaction between genetic variants and use of

menopausal hormone therapy on mammographic density. Breast Cancer Res.

2015;17:110.

279. Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10

Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum

Genet. 2017;101(1):5-22.

280. White MJ, Yaspan BL, Veatch OJ, Goddard P, Risse-Adams OS, Contreras MG.

Strategies for Pathway Analysis Using GWAS and WGS Data. Curr Protoc Hum

Genet. 2019;100(1):e79.

281. Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-

generation PLINK: rising to the challenge of larger and richer datasets.

Gigascience. 2015;4:7.

282. So HC, Gui AH, Cherny SS, Sham PC. Evaluating the heritability explained by

known susceptibility variants: a survey of ten complex diseases. Genet Epidemiol.

2011;35(5):310-7.

283. DeAngelis MM, Owen LA, Morrison MA, Morgan DJ, Li M, Shakoor A, et al.

Genetics of age-related macular degeneration (AMD). Hum Mol Genet.

2017;26(R1):R45-R50.

232

284. Dewan A, Liu M, Hartman S, Zhang SS, Liu DT, Zhao C, et al. HTRA1 promoter

polymorphism in wet age-related macular degeneration. Science.

2006;314(5801):989-92.

285. Deangelis MM, Ji F, Adams S, Morrison MA, Harring AJ, Sweeney MO, et al.

Alleles in the HtrA serine peptidase 1 gene alter the risk of neovascular age-

related macular degeneration. Ophthalmology. 2008;115(7):1209-15 e7.

286. Shuler RK, Jr., Hauser MA, Caldwell J, Gallins P, Schmidt S, Scott WK, et al.

Neovascular age-related macular degeneration and its association with

LOC387715 and complement factor H polymorphism. Arch Ophthalmol.

2007;125(1):63-7.

287. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genome-wide

complex trait analysis. Am J Hum Genet. 2011;88(1):76-82.

288. Arnold M, Raffler J, Pfeufer A, Suhre K, Kastenmuller G. SNiPA: an interactive,

genetic variant-centered annotation browser. Bioinformatics. 2015;31(8):1334-6.

289. Ormsby RJ, Ranganathan S, Tong JC, Griggs KM, Dimasi DP, Hewitt AW, et al.

Functional and structural implications of the complement factor H Y402H

polymorphism associated with age-related macular degeneration. Invest

Ophthalmol Vis Sci. 2008;49(5):1763-70.

290. Dunn KC, Aotaki-Keen AE, Putkey FR, Hjelmeland LM. ARPE-19, a human

retinal pigment epithelial cell line with differentiated properties. Exp Eye Res.

1996;62(2):155-69.

291. Brandl C. Generation of Functional Retinal Pigment Epithelium from Human

Induced Pluripotent Stem Cells. Methods Mol Biol. 2019;1834:87-94.

233

292. Boutin ME, Hampton C, Quinn R, Ferrer M, Song MJ. 3D Engineering of Ocular

Tissues for Disease Modeling and Drug Testing. Adv Exp Med Biol.

2019;1186:171-93.

293. Hou L, Kember RL, Roach JC, O'Connell JR, Craig DW, Bucan M, et al. A

population-specific reference panel empowers genetic studies of Anabaptist

populations. Sci Rep. 2017;7(1):6079.

294. Blanchet Garneau A, Farrar H, Fan H, Kulig J. Applying cultural safety beyond

Indigenous contexts: Insights from health research with Amish and Low German

Mennonites. Nurs Inq. 2018;25(1).

234