Brigham Young University BYU ScholarsArchive

Undergraduate Honors Theses

2019-08-05

EPISTASIS IN ALZHEIMER’S DISEASE: INTERACTIONS BETWEEN CLU-MS4A4E AND WWC1-TLN2

Omoremi Hamblin

Follow this and additional works at: https://scholarsarchive.byu.edu/studentpub_uht

BYU ScholarsArchive Citation Hamblin, Omoremi, "EPISTASIS IN ALZHEIMER’S DISEASE: INTERACTIONS BETWEEN CLU-MS4A4E AND WWC1-TLN2" (2019). Undergraduate Honors Theses. 88. https://scholarsarchive.byu.edu/studentpub_uht/88

This Honors Thesis is brought to you for free and open access by BYU ScholarsArchive. It has been accepted for inclusion in Undergraduate Honors Theses by an authorized administrator of BYU ScholarsArchive. For more information, please contact [email protected], [email protected]. Honors Thesis

EPISTASIS IN ALZHEIMER’S DISEASE: INTERACTIONS BETWEEN CLU- MS4A4E AND WWC1-TLN2

by Omoremi Hamblin

Submitted to Brigham Young University in partial fulfillment of graduation requirements for University Honors

Department of Public Health Brigham Young University August 2019

Advisor: John Kauwe

Honors Coordinator: Len Novilla

i

ii

ABSTRACT

EPISTASIS IN ALZHEIMER’S DISEASE: INTERACTIONS BETWEEN CLU- MS4A4E AND WWC1-TLN2

Omoremi Hamblin

Department of Public Health

Bachelor of Science

Alzheimer’s disease is the most common form of dementia among people age 65 and older. Because of the complexity of this disease, many studies have taken the epistatic approach to discovering underlying genetic disorders. Epistasis is the study of -gene interaction in which the effects of one gene mask the effects of the other gene.

In this study, we attempted to replicate two earlier studies that identified three gene-gene interactions. We used the 2019 Alzheimer’s Diseases Genetics Consortium (ADGC) dataset and we conducted a logistic regression with and without the interaction terms.

The interactions we considered were between CLU rs11136000 and MS4A4E rs670139

(p-value = 0.537, OR = 0.97, 95% CI = 0.88,1.07), WWC1 rs3733980 and TLN2 rs7175766 in males (p-value = 0.745, OR = 0.98, 95% CI = 0.83, 1.14), and WWC1 rs1477307 and TLN2 rs4077746 in males (p-value = 0.996, OR = 0. 999, 95% CI = 0.85,

1.17). Analyses in this dataset failed to replicate any of the interactions, suggesting that

iii previously-reported associations may be the result of more complex epistatic interactions, genetic heterogeneity, or false-positive associations due to limited sample sizes.

iv

v

ACKNOWLEDGEMENTS

I am grateful to Dr. John Kauwe for creating such an amazing mentoring experience for me. His words of encouragement supported me throughout the duration of this project. I would also like to thank Dr. Evan Thacker for his support and guidance during this project. I am grateful for his prompt, clear, and detailed feedback on this thesis. My sincere appreciation goes to Dr. Novilla for all her help throughout my time here at BYU. Her support, and faith in me have helped me accomplish more than I ever thought I would be able to, including the completion of this project. I am thankful to Dr.

Justin Miller and Liz Vance for their help in making the completion of this thesis a success.

I would like to thank my husband and mom, Glen and Beam for the support they provided while I worked on this project. The support and contributions of these individuals led to the successful completion of my honors thesis.

vi

vii

TABLE OF CONTENTS

Title………………………………………………………………………………...………i Abstract………………………………………………………………………...…………iii Acknowledgements……………………………………………………………..…………v Table of Contents…………………………………………………………...……………vii

I. Introduction……………………………………………………………………1

II. Methods…………………………………………….………………………….3 a. Data Description…………………………………………………………..3 b. Dataset and Filtering……………………………………………..………..4 c. Statistical Analysis……………………………………...…………………5

III. Results…………………………………………………………...…………….6

Discussion………………………….…………………………...……………………..8 References……………………………………………………………………………11

Appendix…………...………………………………………………….……………..14 a. R code used…………………………………………………………..…………..14

viii

ix

Introduction

Alzheimer’s disease (AD) is the most common form of dementia and it affects more than 50 million people worldwide (Nature, 2018). It is an irreversible, degenerative brain disorder that slowly destroys memory and thinking skills, and eventually hinders the ability to carry out the simplest tasks (NIA, 2017). AD is characterized by the development of beta-amyloid plaques and neurofibrillary tau tangles (Perl, 2010). It leads to the progressive loss of connections between neurons and brain cells (Niikura, Tajima,

& Kita, 2006). There have been decades of years spent investigating AD, but there are only very few drugs that have been shown to improve the symptoms in some patients

(NIA, 2018). However, these drugs are not very effective and certainly do not reverse the underlying pathophysiologic processes.

There are two types of AD: Early Onset Alzheimer’s Disease (EOAD) - symptoms manifest between age 30 and mid 60s - and Late Onset Alzheimer’s Disease

(LOAD) - person starts showing symptoms in their mid-60s or later. For the purposes of this paper, we’ll be focusing on LOAD. The causes of LOAD are not yet known. It is determined that LOAD is influenced by a combination of genetics, environmental, and lifestyle factors (NIA, 2015). Apolipoprotein e4 (APOE e4) on 19 has been found to be a significant risk factor for LOAD (NIH, 2015). APOE e4 is associated with an increase in the level of amyloid deposition in the brain and is responsible for an early age of onset of AD (DiBattista, Heinsinger, Rebeck, 2016) but not earlier than 65 years old. The APOE e4 gene is a gene that can be inherited and passed down through generations. An individual may have 0, 1 or 2 APOE e4 alleles. The more APOE e4

1 alleles an individual has, the higher their risk of getting AD. Although APOE e4 increases one’s risk of contracting the disease, it is not definite that a person with this gene will have AD later in life.

Epistasis occurs when multiple interact to create a single phenotype

(Churchill, 2001). In some cases, gene 1 masks the effects of gene 2 making the effects of gene 2 negligible. There are also some epistatic interactions where the genes combine to amplify an existing trait or form a new trait (Miko, 2008). These gene-gene interactions play a critical role in the etiology of complex diseases. Therefore, it is important to consider the epistasis occurring in AD in order to provide a better understanding of the complex genetic interactions that are responsible for the underlying pathophysiological processes (Combarros, Cortina-Borja, Smith, & Lehmann, 2009).

Clusterin (CLU) is a encoded by the CLU gene and is found on the human chromosome 8 (Figueroa, Gordon, Yao, & Levine, 2019). Many studies have reported its association with AD in healthy patients (Wu, Yu, Li, Tan, 2012) (Ebbert, et al., 2016).

CLU is currently the third most associated AD risk gene in humans (Tan et al., 2016). It has been suggested to be involved in neurodegenerative disorders and other biological events (NCBI, 2019). Membrane spanning 4-domains A4E (MS4A4E) is a gene found on chromosome 11 of humans (alzgene.org). The WW and C2 domain containing 1

(WWC1) encodes a WWC1 protein and is found on in humans (NCBI).

Some WWC1 alleles have been suggested to be associated with enhanced memory in some humans (NCBI, 2019). Talin 2 (TLN2) is a gene found on chromosome 15 of humans. This gene encodes TLN2 protein which is similar to the TLN1 protein. It plays a significant role in microfilament formation, which is important for cell shape and

2 structure (NCBI, 2019). This study addresses the interactions between CLU rs11136000 and MS4A4E rs670139, WWC1 rs3733980 and TLN2 rs7175766 in males, and WWC1 rs1477307 and TLN2 rs4077746 in males.

Two previous studies (Ebbert et al., 2017; Gusareva et al., 2018) reported evidence of three gene-gene interactions that increase AD risk. Ebbert et al. found an interaction between rs11136000 - rs670139 (CLU-MS4A4E, p= 0.004). Guasareva et al., reported significant interactions between rs3733980 - rs7175766 (WWC1-TLN2, pbonferroni-

-3 corrected = 0.0165) and rs1477307 - rs4077746 (WWC1-TLN2, pbonferroni-corrected = 9.02*10 ) in males only.

This paper expands on the previous studies and attempts to detect statistical epistasis between CLU rs11136000 and MS4A4E rs670139, WWC1 rs3733980 and TLN2 rs7175766, and WWC1 rs1477307 and TLN2 rs4077746 with respect to AD risk using the data from 25,772 individuals from the Alzheimer’s Disease Genetic Consortium

(ADGC).

Methods

Data Description We used the Single Nucleotide Polymorphism (SNP) data from the ADGC dataset which was compiled from 31 studies. This ADGC dataset was prepared on February 13,

2019 by Kevin L.Boehme, Shubhabrata Mukherjee “Joey”, and John Kauwe. The dataset had a total of 38,876 individuals. We excluded 7000 genetically related subjects and our analysis only considered the 31,876 unrelated individuals. Subjects were considered to be related if they were within second cousins of each other. There were 12,501 controls

3 subjects, 13,271 cases, and 6,104 missing case-control status. Cases were those who had been clinically diagnosed with any severity of AD. The total number of subjects in our study was 25,772.

Dataset and Filtering Our analysis started with genome-wide association (GWAS) data from all 31,876 individuals in the ADGC dataset as described by Naj et al. (2011). This ADGC dataset is a collection of 31 merged datasets spanning 1984 to 2019 and was established to help identify genetic markers of late onset AD (Boehme, Mukherjee, Crane, & Kauwe, 2014).

ADGC imputed the 31 datasets to the Haplotype Reference Consortium (HRC) reference panel, which includes 64,976 haplotypes and 1,291,057,196 SNPs (Loh et al., 2016;

Adam C. Naj et al., 2017). Genotyped markers with a minor allele frequency less than

1% and a deviation from Hardy Weinberg Equilibrium (HWE) where ɑ<10-6 were removed. Principal components were calculated using Eigensoft (Patterson, Price, &

Reich, 2006; Price et al., 2006) to account for population specific variations in allele distribution. All aspects of the study was approved by institutional review boards, and each applicant signed a written form of consent for their genetic data to be used for research purposes.

After filtering, 13,271 cases and 12,501 control subjects contained genotypic information for CLU rs11136000, MS4A4E rs670139, WWC1 rs3733980, WWC1 rs1477307, TLN2 rs4077746 and TLN2 rs7175766.

4

Statistical Analysis We input our data into R statistical software (R) in a .txt format. We then converted all the missing data points into NA which is R’s format for missing data. We did this to make sure that the missing data points were excluded from our data analyses.

We converted the AD status from a continuous variable to a categorical variable in preparation for a logistic regression (2 = affected, 1 = unaffected). AD status was the independent variable.

We ran a logistic regression model to determine if the interactions between the snps of interest were associated with AD risk. We conducted a one-sided test because we were only interested in testing if the gene interactions resulted in an increase in AD risk.

Our goal was to replicate the previous studies, so conducting a two-sided test would not be necessary for our purposes. Our research question was “Does the interaction between genes result in increased AD risk?” The hypotheses were -

Ho: gene-gene interaction does not result in increased AD risk

Ha: gene-gene interaction results in increased AD risk

We set our alpha level at 0.05. This means that if our p-value was less than 0.05, we would reject the null hypothesis and conclude that gene-gene interaction resulted in increased AD risk. If our p-value was greater than 0.05, we would fail to reject the null hypothesis and conclude that the gene-gene interaction did not result in increased AD risk.

The model of the logistic regression was

AD status = SNP1 + SNP2 + age + sex + PC1-10 + APOE4dose + (SNP1*SNP2). where

AD status represented whether the subject had Alzheimer’s disease or not

5

SNP represented the genetic variation we tested in the participants

Age was included as a covariate to control for confounding factors

Sex was included as a covariate to control for confounding factors

PC1-10 were principal components included to ensure that the variables were not correlated with one another. This extra measure was taken to control for genetic similarities in the study participants’ relations that go beyond second cousins.

APOE4dose represented the dosage of APOE4 carried by the subjects. (0 - no allele, 1 - 1 allele, 2 - 2 alleles)

We included all subjects in the dataset for the first interaction analysis between CLU rs11136000 and MS4A4E rs670139. We then stratified the dataset by gender. Only the males were considered for the interactions between WWC1 rs3733980 and TLN2 rs7175766 and WWC1 rs1477307 and TLN2 rs4077746 because the analyses performed by Gusareva et al. (2018) only included males.

Results A logistic regression analysis was run for the complete dataset. We first tested the main effects of the variants and found that MS4A4E rs670139, WWC1 rs3733980, and

TLN2 rs4077746 were each associated with AD risk (MS4A4E rs670139, p-value =

4.17e-05, OR = 1.41, 95% CI = 1.07, 1.22; WWC1 rs3733980, p-value = 0.005, OR =

0.91, 95% CI = 0.85, 0.97; TLN2 rs4077746, p-value = 1.45e-05, OR = 1.16, 95% CI =

1.08, 1.24). We observed that WWC1 rs3733980 was statistically significant for being protective for AD. This was consistent to previous studies mentioned earlier (NCBI,

6

2019). Since we are primarily interested in the epistasis between the variants, we conducted a logistic regression analysis with each interaction term.

The first interaction between CLU rs11136000 and MS4A4E rs670139 failed to show significant association with AD risk (p=0.54, OR = 0.97, 95% CI = 0.88, 1.07).

Since the p-value was greater than 0.05, we failed to reject the null hypothesis and concluded that there was no statistically significant evidence for an interaction between

CLU rs11136000 and MS4A4E rs670139. The interaction between WWC1 rs3733980 and

TLN2 rs7175766 in males also failed to demonstrate significant association (p=0.75, OR

= 0.97, 95% CI = 0.83, 1.14), suggesting that the interaction between these two variants does not result in an increased risk for AD in males. The last analysis we conducted was a logistic regression with an interaction between WWC1 rs1477307 and TLN2 rs4077746 in males (p=.996, OR = 0.99, 95% CI = 0.85, 1.17). The insignificant p-value suggests that there was likewise no interaction between these two variants associated with AD risk in males.

7

Table 1: Table displaying results of statistical analyses

Terms Odds Ratio Lower 95% CI Upper 95% CI P-Value

rs11136000 1.0002 0.9322 1.0729 0.99461 (CLU)

rs670139 1.1411 1.0712 1.2153 4.17e-05 *** (MS4A4E)

rs3733980 0.9095 0.8507 0.9721 0.0053 ** (WWC1)

rs7175766 1.0502 0.9808 1.1243 0.1595 (TLN2)

rs1477307 1.0539 0.9839 1.1286 0.1339 (WWC1)

rs4077746 1.1582 1.0838 1.2376 1.45e-05 *** (TLN2)

rs11136000*rs67 0.9700 0.8807 1.0683 0.5365 0139

rs3733980*rs717 0.9745 0.8337 1.1386 0.7451 5766

rs1477746*rs407 0.9995 0.85411 1.1695 0.9955 7746

Significance codes:

0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Discussion In this study, we failed to detect evidence of epistasis between the three gene-gene interactions CLU rs11136000 and MS4A4E rs670139, WWC1 rs11136000 and TLN2 rs670139, and WWC1 rs3733980 and TLN2 rs7175766 as risk factors for AD in the

8

ADGC dataset. These findings do not support the conclusions drawn in the previous reports by Ebbert et al. (2017), and Gusareva et al. (2018). The cause for this variability among studies could be a result of genetic heterogeneity, the complex nature of epistasis, or false positives in the previous studies due to a limited sample size.

Although recent literature suggests that much of the unidentified genetic makeup of AD is due to epistasis (Mez, 2016; Ebbert et al., 2014; Bullock et al., 2013), the complex nature of gene-gene interactions makes it difficult for epistasis to be accurately measured and defined (Urbanowicz, Kiralis, Fisher, & Moore, 2012; Gilbert-Diamond &

Moore, 2011). Models for epistatic interactions are challenging to create because such models require large datasets to analyze combinations of variables simultaneously. When an insufficient number of samples are used, results have poor statistical power, which can lead to frequent false negatives in epistatic studies.

Similarly, the numerous comparisons required to assess epistasis may generate false positive findings (Mackay & Moore, 2014). Inadequate sample size can also lead to false positives and is identified through statistical power analyses (Christley, 2010).

Current research suggests a troubling occurence known as the “winner’s curse,” which happens when the estimated effect of an association is inflated compared to the true genetic effect and the effects later measured in follow-up studies (Huang, Ritchie,

Brozynska, & Inouye, 2018; Palmer & Pe’er, 2017). The level of power necessary to accurately detect epistasis is currently unknown, and as such, replication studies are a necessary part of validating epistasis. As our results show, studies that appear to have sufficient power should be re-evaluated when larger datasets become available.

9

Heterogeneity in the genetic causes of AD is certainly present (Mez, 2016), and further erodes power to detect statistical epistasis. Finally, even when statistical evidence for epistasis is detected, it does not necessarily indicate the presence of a physical biological interaction between the implicated (M.T.W. Ebbert et al., 2015).

Statistical patterns can be a product of a variety of underlying mechanisms. Therefore, the complexity of biological and statistical epistasis could also account for disparities in replication studies. Increasing sample sizes gives us better statistical power. Likewise, increasing the amount of multidimensional -omics data will help us focus our efforts on specific candidate interactions. We anticipate that as more multidimensional -omics data become available, our ability to understand the role of epistasis in AD risk will improve and help in the development of novel approaches to prevent and treat the disease.

10

References

Alzgene. (2011). MS4A4E. Retrieved from http://www.alzgene.org/geneoverview.asp?geneid=733

Bullock, J.M., Medway, C., Cortina-Borja, M., Turton, J.C., Prince, J.A., Ibrahim-Verbaas, C.A.,...Morgan, K. (2013). Discovery by the Epistasis Project of an epistatic interaction between the GSTM3 gene and the HHEX/IDE/KIF11 locus in the risk of Alzheimer’s disease. Nerubiol Aging. 34(4), 1309.e1301-1307. doi:10.1016/j.neurobioloaging.2012.08.010

Christley, R. (2010). Power and Error: Increased Risk of False Positive Results in Underpowered Studies (Vol. 3).

Churchill, G.A. (2001). Epistasis. Encyclopedia of Genetics. Retrieved from https://www.sciencedirect.com/topics/neuroscience/epistasis

Combarros, O., Cortina-Borja, M., Smith, A.D., Lehmann, D.J. (2009). Epistasis in sporadic Alzheimer’s disease. Neurobiology of Aging, 30(9), 1333-1349. doi: 10.1016/j.neurobiolaging.2007.11.027

Cortina-Borja, M., Smith, A. D., Combarros, O., & Lehmann, D. J. (2009). The synergy factor: a statistic to measure interactions in complex diseases. BMC Res Notes, 2, 105.

DiBattista, A. M., Heinsinger, N. M., & Rebeck, G. W.. (2016). Alzheimer’s Disease Genetic Risk Factor APOE-ε4 Also Affects Normal Brain Function. Current Alzheimer’s Research, 13(11), 1200. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/27033053

Ebbert, M. T., Ridge, P. G., Wilson, A. R., Sharp, A. R., Bailey, M., Norton, M. C., . . . Kauwe, J. S. (2014). Population-based analysis of Alzheimer's disease risk alleles implicates genetic

Ebbert, M. T. W., Boehme, K. L., Wadsworth, M. E., Staley, L. A., Mukherjee, S., Crane, P. K., . . . Kauwe, J. S. K. (2015). Interaction between variants in CLU and MS4A4E modulates alzheimer’s disease risk. doi:10.1016/j.jalz.2015.08.163

Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: a flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behav Res Methods, 39(2), 175-191.

Figueroa, D. M., Gordon, E. M.,Yao, X., & Levine, S. J. (2019). Apolipoproteins as context- dependent regulators of lung inflammation. In R.A. Johnston & B.T. Suratt (Eds.), Mechanisms and Manifestations of Obesity in Lung Disease (pp. 301-326). Academic Press. doi://doi.org/10.1016/B978-0-12-813553-2.00013-0

11

Huang, Q. Q., Ritchie, S. C., Brozynska, M., & Inouye, M. (2018). Power, false discovery rate and Winner's Curse in eQTL studies. Nucleic acids research, 46(22), e133-e133. doi:10.1093/nar/gky780

Gilbert-Diamond, D., & Moore, J. H. (2011). Analysis of Gene-Gene Interactions. Curr Protoc Hum Genet, 0 1, Unit1.14. doi:10.1002/0471142905.hg0114s70

Gusareva, E. S., Twizere, J., Sleegers, K., Dourlen, P., Abisambra, J. F., Meier, S., . . . Van Steen, K. (2018). Male-specific epistasis between WWC1 and TLN2 genes is associated with alzheimer's disease. Neurobiology of Aging, 72, 188.e12. doi:10.1016/j.neurobiolaging.2018.08.001

Loh, P.-R., Danecek, P., Palamara, P. F., Fuchsberger, C., A Reshef, Y., K Finucane, H., . . . L Price, A. (2016). Reference-based phasing using the Haplotype Reference Consortium panel. Nature genetics, 48(11), 1443-1448. doi:10.1038/ng.3679

Mackay, T. F., & Moore, J. H. (2014). Why epistasis is important for tackling complex human disease genetics. Genome Medicine, 6(6), 42. doi:10.1186/gm561

Mez, J. (2016). The executive prominent/memory prominent spectrum in Alzheimer’s disease is highly heritable. 41, 115-121. doi:10.1016/j.neurobiolaging.2016.02.015

Miko, I. (2008). Epistasis: Gene interaction and phenotype effects. Nature Education 1(1):197

Naj, A. C., Below, J., Zhao, Y., Chen, H.-H., van der Lee, S. J., Hamilton-Nelson, K. L., . . . Schellenberg, G. D. (2017). GENOME-WIDE RARE VARIANT IMPUTATION AND TISSUE-SPECIFIC TRANSCRIPTOMIC ANALYSIS IDENTIFY NOVEL RARE VARIANT CANDIDATE LOCI IN LATE-ONSET ALZHEIMER’S DISEASE: THE ALZHEIMER’S DISEASE GENETICS CONSORTIUM. Alzheimer's & Dementia, 13(7, Supplement), P189. doi:https://doi.org/10.1016/j.jalz.2017.07.045

Naj, A. C., Jun, G., Beecham, G. W., Wang, L. S., Vardarajan, B. N., Buros, J., . . . Schellenberg, G. D. (2011). Common variants at MS4A4/MS4A6E, CD2AP, CD33 and EPHA1 are associated with late-onset Alzheimer's disease. Nat Genet, 43(5), 436-441. doi:10.1038/ng.801

National Institute of Aging (NIA). (2015). Alzheimer’s Disease Genetics Fact Sheet. Retrieved from https://www.nia.nih.gov/health/alzheimers-disease-genetics-fact-sheet

National Institute of Aging (NIA). (2017). What is Alzheimer’s Disease. Retrieved from https://www.nia.nih.gov/health/what-alzheimers-disease

National Institute of Aging (NIA). (2018). How is Alzheimer’s Disease Treated? Retrieved from https://www.nia.nih.gov/health/how-alzheimers-disease-treated

Nature Outlook Alzheimer’s Disease (26 July 2018 / Vol 559/ Issue No 7715)

12

NCBI. (2019). CLU. Retrieved from https://www.ncbi.nlm.nih.gov/gene/1191

NCBI. (2019). MS4A4E. Retrieved from https://www.ncbi.nlm.nih.gov/gene/643680

NCBI. (2019). TLN2. Retrieved from https://www.ncbi.nlm.nih.gov/gene/83660

NCBI. (2019). WWC1. Retrieved from https://www.ncbi.nlm.nih.gov/gene/23286

Niikura, T., Tajima, H., Kita, Y. (2006). Neuronal Cell Death in Alzheimer’s Disease and a Neuroprotective Factor, Humanin. Retrieved from https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2430668/

Palmer, C., & Pe'er, I. (2017). Statistical correction of the Winner's Curse explains replication variability in quantitative trait genome-wide association studies. PLoS genetics, 13(7), e1006916-e1006916. doi:10.1371/journal.pgen.1006916

Patterson, N., Price, A. L., & Reich, D. (2006). Population structure and eigenanalysis. PLoS Genet, 2(12), e190. doi:10.1371/journal.pgen.0020190

Perl, D. P., (2010). Neuropathology of Alzheimer’s Disease. Doi: 10.1002/msj.20157

Urbanowicz, R. J., Kiralis, J., Fisher, J. M., & Moore, J. H. (2012). Predicting the difficulty of pure, strict, epistatic models: metrics for simulated model selection. BioData Mining, 5(1), 15. doi:10.1186/1756-0381-5-15

Wu Z.C., Yu, J.T., Li, Y., Tan, L.(2012). Clusterin in Alzheimer’s disease. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/22397031

13

Appendix

R code used

#Read in ADGC data ADGC <- read.table("final_adgc.txt", header = TRUE)

#Convert the -9s to ADGC$status[ADGC$status==-9] <- NA ADGC$apoe4dose[ADGC$apoe4dose==-9] <- NA

#Make the status variable a factor ADGC$status <- as.factor(ADGC$status)

##Logistic Regression for entire ADGC Dataset--BINOMIAL ADGC_Results <- glm(status~pc1 + pc2 + pc3 + pc4 + pc5 + pc6 + pc7 + pc8 + pc9 + pc10 + apoe4dose + sex + rs670139 + rs1477307 + rs3733980 + rs4077746 + rs7175766 + rs11136000 + age_onset, family = binomial, data = ADGC) summary(ADGC_Results)

#Odds ratio and confidence intervals for the full logistic regression model require(MASS) ci_all <- exp(cbind(coef(ADGC_Results), confint(ADGC_Results))) summary(ci_all)

#AD status = SNP1 + SNP2 + age + sex + PC1-10 + (SNP1*SNP2)

#First interaction rs11136000, rs670139 int1 <- glm(status~pc1 + pc2 + pc3 + pc4 + pc5 + pc6 + pc7 + pc8 + pc9 + pc10 + apoe4dose + sex + rs11136000 + rs670139 + age_onset + (rs11136000*rs670139), family = binomial,data = ADGC) summary(int1) ########### First Interaction CI ############# c1 <- exp(cbind(coef(int1), confint(int1))) summary(c1)

14

## Male only interactions library(tidyverse) ADGC_Males <- ADGC %>% filter(sex == 1)

#Second interaction rs3733980, rs7175766 (Males Only) int2_males <- glm(status~pc1 + pc2 + pc3 + pc4 + pc5 + pc6 + pc7 + pc8 + pc9 + pc10 + apoe4dose + rs3733980 + rs7175766 + age_onset + (rs3733980 * rs7175766), family = binomial, data = ADGC_Males) summary(int2_males) ############ Second Interaction CI ############## c2 <- exp(cbind(coef(int2_males), confint(int2_males))) summary(c2)

#Third interaction rs1477307, rs4077746 (Males Only) int3_males <- glm(status~pc1 + pc2 + pc3 + pc4 + pc5 + pc6 + pc7 + pc8 + pc9 + pc10 + apoe4dose + rs1477307 + rs4077746 + age_onset + (rs1477307 * rs4077746), family = binomial, data = ADGC_Males) summary(int3_males) ########### Third Interaction CI ########### c3 <- exp(cbind(coef(int3_males), confint(int3_males))) summary(c3)

15