PLOS ONE

Overrepresentation of Glutamate Signaling in Alzheimer's Disease: Network-Based Pathway Enrichment using Meta-analysis of Genome-Wide Association Studies --Manuscript Draft--

Manuscript Number: PONE-D-13-13386R1 Article Type: Research Article Full Title: Overrepresentation of Glutamate Signaling in Alzheimer's Disease: Network-Based Pathway Enrichment using Meta-analysis of Genome-Wide Association Studies Short Title: Glutamate signaling is overrepresented in AD GWAS Corresponding Author: Giancarlo V. De Ferrari, Ph.D. Universidad Andres Bello Santiago, Santiago CHILE Keywords: Alzheimer's disease; meta-analysis; network-based pathway analysis; glutamate signaling; Apolipoprotein E; genome-wide association studies Abstract: Genome-wide association studies (GWAS) have successfully identified several risk loci for Alzheimer's disease (AD). Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 4,569 individuals (2,540 cases and 2,029 healthy controls) derived from three publicly available GWAS in AD and replicated a broad genomic region (> 248,000 bp) associated with the disease near the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values (GW), defined as the single strongest association signal within a gene, or calculated a more stringent gene-based association p-value using the extended Simes (GATES) procedure. Comparison of these strategies revealed that ontological sub-networks involved in glutamate signaling were significantly overrepresented in AD (p < 2.7x10- 11, p < 1.9x10-11; GW and GATES, respectively). Glutamate signaling sub-networks were further replicated in an additional public dataset from the Alzheimer's disease Neuroimaging Initiative (ADNI) consisting in 499 cases and 194 controls (p < 5.1x10- 8). Interestingly, components of the glutamate signaling sub-networks are coordinately expressed in disease-related tissues, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional biological networks should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder. Order of Authors: Eduardo Pérez-Palma Bernabé I. Bustos Camilo Villamán Marcelo A. Alarcón Miguel E. Avila Giorgia D. Ugarte Ariel E. Reyes Carlos Opazo Giancarlo V. De Ferrari, Ph.D. Suggested Reviewers: Andrew Singleton, PhD Chief Laboratory of Neurogenetics, NIA, NIA-NIH [email protected] Specialist on genome-wide association studies John Hardy, PhD Professor of , University College London

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation [email protected] Specialist on genome-wide association studies Opposed Reviewers: Amanda Myers, PhD Assistant Professor, University of Miami [email protected] Conflict of interest Response to Reviewers: Perez-Palma, Bustos et al. Your Reference: PONE-D-13-13386, Revise Response to Reviewers

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes Reviewer #2: Partly

Please explain (optional). Reviewer #2: Pathways analysis is an intriguing approach to complex disorders with the problem of missing heritability, thus this is a worthy project. However, I do have some minor concerns regarding the quality of the data. First, rather than only imputing the TGEN sample (which was genotyped on a different platform), it would be better to impute all three samples. Second, there was no mention of the QC procedures on the Pfizer dataset, and only the MIND threshold was mentioned regarding the TGEN dataset (and this threshold, 0.05, was different from the NIA-LOAD/NCRAD threshold of 0.02. Why were different thresholds used across samples?) Third, the NIA- LOAD/NCRAD contained family data. Instead of correcting for the kinship coefficient, it would be cleaner to simply analyze one individual from each family and exclude the other individuals. Fourth, what was the rational/inclusion criteria for choosing only the 4 samples for the expression analysis?

ANSWERS TO REVIEWER #2 First, rather than only imputing the TGEN sample (which was genotyped on a different platform), it would be better to impute all three samples. R: It was not possible to impute all the samples since the Pfizer dataset comes from the summary statistics results downloaded directly from the supplementary information (provided in the original publication), and thus no genotypes were available for imputation. Nonetheless, we did impute the NCRAD dataset. Briefly, while in the previous submission we applied a “kinship correction” to correct for familial bias (see below), here we removed the familial component by selecting only unrelated individuals, as it has been described in Antunez et al. (2011). Such data was subsequently imputed for missing genotypes (as it was done in the TGEN1 dataset) in order to increase the number of genetic variants shared between datasets.

Second, there was no mention of the QC procedures on the Pfizer dataset, and only the MIND threshold was mentioned regarding the TGEN dataset (and this threshold, 0.05, was different from the NIA-LOAD/NCRAD threshold of 0.02. Why were different thresholds used across samples?) R: We corrected the MIND threshold for the NCRAD dataset (MIND = 0.05) in order to make it comparable to the TGEN dataset. Since we were not able to obtain the genotypic data, we did not modify the QC procedures in the Pfizer dataset, which were described in the original publication by Hu X et al. (2011). We note that the missing call rate per sample is an informative indicator of sample quality and that different studies generally fail samples with missing call rates >5% (see for instance Laurie et al., 2010; Génin et al., 2011; Merjonen et al., 2011). Finally, we did not observe significant changes in the association results when comparing the NCRAD data with different MIND thresholds (0.01 and 0.05).

Third, the NIA-LOAD/NCRAD contained family data. Instead of correcting for the kinship coefficient, it would be cleaner to simply analyze one individual from each family and exclude the other individuals. R: As suggested by the reviewer, and following the recommendation of Antunez et al.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation (2011), using family trees provided with the dbGAP data, we excluded all related controls and kept only 1 case per family in order to clean the data. This procedure also allowed imputing NCRAD data.

Fourth, what was the rational/inclusion criteria for choosing only the 4 brain samples for the expression analysis?" R: At the time of our previous submission, we took the 4 available brain donors with expression data from the Allen Institute for Brain Science. In this revised version, we have updated the expression analysis including data for 2 new individuals, which is the complete series of brain donors.

2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No

Please explain (optional). Reviewer #2: The results were declared significant after being compared via t-test to ten permutations. Although computationally intensive, a much larger amount of permutations (i.e. 1000 permutations) would be required to convincingly establish statistical significance. This would represent a more accurate null distribution. R: We note that every time JActive modules is interrogated in order to report a significant sub-network, the program internally creates a null distribution by permuting 10,000 times the p-values over the genes introduced in a Monte Carlo procedure. Therefore, each set of p-values interrogated is indeed permuted 100,000 times. We included a detailed explanation of this procedure in the Methods and Results sections of the manuscript.

4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors below. Reviewer #1: Yes Reviewer #2: No

Please explain (optional). Reviewer #2: Some of the main analytical concepts presented in the paper are difficult to follow. For example, a more explicit description of exactly what the authors did for the FPAN analysis would be helpful. (Did they input all of the top genes from the association analysis into the "multiple names" tab in STRING?) An explanation of the Simes (GATES) procedure, and how it differs from the genewise min-p is needed. Additionally, an introduction to the cytoscape package and specifically what this does is necessary. Finally, although overall the English is quite good, there are a few typos (i.e. major instead of major), as well as a few awkwardly worded sentences. R: First, the final input for each analysis (i.e. gene-wise and GATES) was the intersection of genes annotated between top meta-analysis associations (results with a cutoff p-value < 0.05) and genes present in the FPAN (cut-off interaction score 0.7, high confidence). In this regard, not all genes associated with AD had high confidence functional interactions on STRING. This issue was further explained in the newer version of the manuscript. Second, the main difference between the GW and GATES procedure is that while the former uses the single strongest p-value present inside a gene without any correction, the latter corrects the p-values by gene-size and LD structure. The procedure has also been explained in detail in the text. Third, we expanded the Method section to include a general description of Cytoscape and its use in network interaction studies. Finally, a native speaking colleague has revised the text to improve the message of our manuscript.

5. Additional Comments to the Author (optional) Please offer any additional comments here, including concerns about dual publication or research or publication ethics. Reviewer #1:

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation 1) In methods association analysis section, it is not clear how TGEN1 data was cleaned. Is it comparable to NIA-LOAD/NCRAD dataset? R: In this newer version of the manuscript, the QC procedures to clean both TGEN1 and NCRAD have been standardized in order to make the studies comparable (see also above). QC parameters in both studies are now: MAF 0.05; GENO 0.02; MIND 0.05; and HWE 1x10E-6. This information was included in the Method section.

2) Could the author explain why they used different interaction score cutoff (0.7 for Meta-AD, but 0.9 for Meta-AD-GATES) for the prior network? R: The initial idea was to compare results with different levels of confidence (interaction), but as corrections were made in sample size (to eliminate the familial contribution) the use of the STRING cutoff = 0.9 became less informative and too stringent and therefore it was eliminated. In the revised manuscript all module searches (i.e. gene-wise, GATES, permuted and without genome-wide) have been performed using the same FPAN with an interaction score cutoff of 0.7, which reflects High Confidence Interactions, as defined by the STRING database.

3) In the module search section, it is not clear how the significant sub-network was defined and identified. Please give more details about how the Monte Carlo procedure was performed for the standardization of the score. Is the permutation for permuting phenotype, or just permuting the p values across genes? R: First, a significant sub-network was defined as a connected group of genes enriched with minor associations in AD disease with a degree that could not be reached by chance. These sub-networks have been identified using JActive modules, which is a procedure that internally compares the sub-network structure with 10,000 randomly generated sub-networks (Monte Carlo). A detailed explanation in relation to this subject was added to the manuscript. Finally, the permutation refers to just permuting p-values across the genes, since other controls related to bias in gene-size and LD were also included (GATES procedures).

4) Figure 4A), how to get the average number of significant modules for Meta-AD or Meta-AD-GATES? In addition, there is no description for C and D panels. R: In this version of the manuscript, Figure 4 corresponds to Figure 3, and it has been edited to accommodate the ADNI replication dataset. The average number of significant modules is obtained after 10 JActive iteration searches. It is shown in the corresponding figure and provided also in the text (Pathway Analysis section of Results). Note that Meta-AD-GW and Meta-AD-GATES have changed for clarity to Meta-GW and Meta-GATES, respectively. We finally acknowledge that in the previous version there was no description of panels C and D in Figure 4, which was a mistake that has been corrected.

References: Antunez C, Boada M, Gonzalez-Perez A, Gayan J, Ramirez-Lorca R, et al. (2011) The membrane-spanning 4-domains, subfamily A (MS4A) gene cluster contains a common variant associated with Alzheimer's disease. Genome Med 3: 33.

Génin E, Schumacher M, Roujeau JC, Naldi L, Liss Y, Kazma R, Sekula P, Hovnanian A, Mockenhaupt M. Genome-wide association study of Stevens-Johnson Syndrome and Toxic Epidermal Necrolysis in Europe. Orphanet J Rare Dis. 2011 Jul 29;6:52. Hu X, Pickering E, Liu YC, Hall S, Fournier H, et al. (2011) Meta-analysis for genome- wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease. PLoS One 6: e16616.

Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, Boehm F, Caporaso NE, Cornelis MC, Edenberg HJ, Gabriel SB, Harris EL, Hu FB, Jacobs KB, Kraft P, Landi MT, Lumley T, Manolio TA, McHugh C, Painter I, Paschall J, Rice JP, Rice KM, Zheng X, Weir BS; GENEVA Investigators. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet Epidemiol. 2010 Sep;34(6):591-602.

Merjonen P, Keltikangas-Järvinen L, Jokela M, Seppälä I, Lyytikäinen LP, Pulkki- Råback L, Kivimäki M, Elovainio M, Kettunen J, Ripatti S, Kähönen M, Viikari J, Palotie A, Peltonen L, Raitakari OT, Lehtimäki T. Hostility in adolescents and adults: a genome-wide association study of the Young Finns. Transl Psychiatry. 2011 Jun

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation 21;1:e11.

Additional Information: Question Response Competing Interest The authors have declared that no competing interests exist.

For yourself and on behalf of all the authors of this manuscript, please declare below any competing interests as described in the "PLoS Policy on Declaration and Evaluation of Competing Interests."

You are responsible for recognizing and disclosing on behalf of all authors any competing interest that could be perceived to bias their work, acknowledging all financial support and any other relevant financial or competing interests.

If no competing interests exist, enter: "The authors have declared that no competing interests exist."

If you have competing interests to declare, please fill out the text box completing the following statement: "I have read the journal's policy and have the following conflicts"

* typeset Financial Disclosure This work was supported by Grants from the Chilean Government FONDECYT 1100942 and FONDAP 15090007 to G.V.D. E.P.-P. and M.E.A. are supported by Describe the sources of funding that have doctoral fellowships from CONICYT. B.I.B. is supported by a doctoral fellowship by supported the work. Please include MECESUP UAB0802. relevant grant numbers and the URL of any funder's website. Please also include The funders had no role in study design, data collection and analysis, decision to this sentence: "The funders had no role in publish, or preparation of the manuscript. study design, data collection and analysis, decision to publish, or preparation of the manuscript." If this statement is not correct, you must describe the role of any sponsors or funders and amend the aforementioned sentence as needed.

* typeset Ethics Statement Written informed consent was obtained for all participants and prior Institutional Review Board approval was obtained at each participating institution. All research involving human participants must have been approved by the authors'

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation institutional review board or equivalent committee(s) and that board must be named by the authors in the manuscript. For research involving human participants, informed consent must have been obtained (or the reason for lack of consent explained, e.g. the data were analyzed anonymously) and all clinical investigation must have been conducted according to the principles expressed in the Declaration of Helsinki. Authors should submit a statement from their ethics committee or institutional review board indicating the approval of the research. We also encourage authors to submit a sample of a patient consent form and may require submission of completed forms on particular occasions.

All animal work must have been conducted according to relevant national and international guidelines. In accordance with the recommendations of the Weatherall report, "The use of non- human primates in research" we specifically require authors to include details of animal welfare and steps taken to ameliorate suffering in all work involving non-human primates. The relevant guidelines followed and the committee that approved the study should be identified in the ethics statement.

Please enter your ethics statement below and place the same text at the beginning of the Methods section of your manuscript (with the subheading Ethics Statement). Enter "N/A" if you do not require an ethics statement.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Cover Letter

Centro de Investigaciones Biomédicas Facultad Ciencias Biológicas Facultad de Medicina República Nº 239, piso 2 Interior

January 10, 2013

James Bennett Potash, M.D., M.P.H. Academic Editor PLOS ONE

Your Reference: PONE-D-13-13386, Revise

Dear Dr. Bennett, Please, find attached the revised version of the manuscript by Eduardo Pérez-Palma, Bernabé Bustos, and collaborators. While answering the queries raised by the reviewers, we were able to replicate our initial findings in a new dataset and thus we believe that our findings are much stronger and well supported. We have changed the title to accommodate the new data: Overrepresentation of Glutamate Signaling in Alzheimer's Disease: Network-Based Pathway Enrichment using Meta-analysis of Genome-Wide Association Studies. We followed all the queries of the reviewers and answered to all of them (see below). We hope you will find this revised version amenable for acceptance in your prestigious journal. Sincerely,

Giancarlo De Ferrari Valentini, PhD Associate Professor Center for Biomedical Research FONDAP Center for Cell Regulation Universidad Andres Bello Santiago, CHILE [email protected] [email protected]

CAMPUS CASONA LAS CONDES CAMPUS REPUBLICA CAMPUS VIÑA DEL MAR FERNANDEZ CONCHA 700, LAS CONDES AV. REPUBLICA 252, SANTIAGO * LOS FRESNOS 91, MIRAFLORES TELEFONO: (2) 6618500 FAX: (2) 2153767 TELEFONO: (2) 6618000 FAX: (2) 6711936 TELEFONO: (32) 328800 FAX: (32) 673082 e-mail: [email protected] e-mail: [email protected] * ALVARES 2138, CHORRILLOS www.unab.cl TELEFONO: (32) 673204 FAX: (32) 630427 e-mail: [email protected]

Manuscript Click here to download Manuscript: gdeferrari_glutamate signaling in AD.pdf

Overrepresentation of Glutamate Signaling in Alzheimer's Disease: Network-Based Pathway Enrichment using Meta-analysis of Genome- Wide Association Studies

Short title: Glutamate signaling is overrepresented in AD GWAS

Eduardo Pérez-Palma,*1 Bernabé I. Bustos,*1 Camilo F. Villamán,1 Marcelo A. Alarcón,1,2 Miguel E. Avila,1,2 Giorgia D. Ugarte,1 Ariel E. Reyes,1 Carlos Opazo3 and Giancarlo V. De Ferrari1,†, for the Alzheimer’s Disease Neuroimaging Initiative‡ and the NIA- LOAD/NCRAD Family Study Group

*These authors contributed equally to this work

1Center for Biomedical Research and FONDAP Center for Genome Regulation, Faculty of Biological Sciences and Faculty of Medicine, Universidad Andres Bello, Santiago, Chile. 2Faculty of Biological Sciences, Universidad de Concepción, Concepción, Chile. 3Oxidation Biology Laboratory, The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Melbourne, Australia.

†Correspondence: Giancarlo V. De Ferrari, PhD Center for Biomedical Research and FONDAP Center for Genome Regulation Universidad Andres Bello, Republica #239, PO Box 8370134, Santiago, Chile. E-mail: [email protected] Phone number: (56-2) 7703249; FAX: (56-2) 6980414

‡Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided

1 data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply /ADNI_Acknowledgement_List.pdf

2

Abstract

Genome-wide association studies (GWAS) have successfully identified several risk loci for

Alzheimer's disease (AD). Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 4,569 individuals (2,540 cases and 2,029 healthy controls) derived from three publicly available GWAS in AD and replicated a broad genomic region (> 248,000 bp) associated with the disease near the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values (GW), defined as the single strongest association signal within a gene, or calculated a more stringent gene-based association p-value using the extended Simes (GATES) procedure. Comparison of these strategies revealed that ontological sub-networks involved in glutamate signaling were significantly overrepresented in AD (p < 2.7x10-11, p < 1.9x10-11; GW and GATES, respectively).

Glutamate signaling sub-networks were further replicated in an additional public dataset from the Alzheimer’s disease Neuroimaging Initiative (ADNI) consisting in 499 cases and

194 controls (p < 5.1x10-8). Interestingly, components of the glutamate signaling sub- networks are coordinately expressed in disease-related tissues, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional biological networks should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder.

3

Introduction

Alzheimer's disease (AD [MIM 104300]) is the most common neurodegenerative disorder in the human population [1]. Clinically, AD is characterized by a progressive loss of cognitive abilities and memory impairment. At a biological level, it is thought that the presence of extracellular deposits of the β-amyloid peptide (Aβ) and intracellular neurofibrillary tangles composed of hyperphosphorylated Tau protein leads to synaptic loss and neuronal death [1,2]. Genetically, AD is complex and heterogeneous.[3,4] A small percentage of AD cases (1–2% of all cases) have an early-onset familial form of presentation, with symptoms appearing before 65 years of age, and most cases are late- onset or “sporadic” with no apparent familial recurrence of the disease [4]. While familial-

AD has been associated with mutations in the genes coding for the amyloid precursor protein (APP) and the presenilins (PSEN1 and PSEN2) proteins, the only genetic factor extensively replicated for sporadic AD is the apolipoprotein E-ε4 (APOE-ε4) allele [4-6], which is present in ca. 60% of the cases [1,7-9]. However, the APOE-ε4 allele is not causative, since it has been found in individuals that would not develop the disease, suggesting that other genetic contributions remain to be identified.

During the past decade, the scientific efforts focused in identifying these genetic hallmarks reported more than 2,900 Single Nucleotide Polymorphisms (SNP) within ~

4,700 genes associated with AD[10] (see also AlzGene.org). More recently, the use of high density DNA genotyping microarrays in genome-wide associations studies (GWAS), combined with powerful statistical procedures, have expanded the search for novel susceptibility loci for the disease [11]. Nevertheless, these genetic approaches currently exhibit some limitations. First, they present a high rate of false positives and require major

4 sample sizes in order to be replicated. In fact, different simulations have shown that authentic associations have only a 26% chance of falling into the top 1,000 p-values in a

GWAS [12]. Second, they only examine the association of a single genetic variant at a time, therefore failing in the detection of minor associations that can still be present and confer risk in a cumulative way. Finally, current top hits associated with AD, including the bridging integrator 1 (BIN1) [13], clusterin (CLU) [14,15], the ATP-binding cassette sub- family A member 7 (ABCA7) [16], the complement component (3b/4b) 1 (CR1)

[15], and the phosphatidylinositol binding clathrin assembly protein (PICALM) [14], do not account for the entire genetic contribution of the disease or surpass the risk conferred solely by the APOE-ε4 allele.

Therefore, considering that the etiology of complex diseases might depend on functional protein-protein interaction networks [17,18], here we performed a meta-analysis followed by network-based pathway analyses on publicly available GWAS in AD and used significant genetic information to identify glutamate signaling as a key ontological pathway of the disease.

5

Subjects and Methods

GWAS datasets included in the analysis

We selected three publicly available GWAS in AD performed on unrelated case-control and familial samples from European-descent populations (Table 1). The GWAS datasets are: i) the Translational Genomics Research Institute (TGen1) study on AD [19], including

829 AD cases and 535 control individuals; ii) the National Institute on Aging - Late Onset

Alzheimer's Disease and the National Cell Repository for Alzheimer's Disease (NIA-

LOAD/NCRAD) [20,21], including 5,220 subjects from which we only considered for analysis a subset of 3,689 individuals (1,837 cases and 1,852 controls) that were self- declared non-Hispanic European Americans, passed principal components analyses and had non-missing phenotypes. Given that this subset is composed by familial data, using the provided family trees, we excluded all related controls and kept only one case per family giving a final number of 978 AD cases and 702 controls individuals that were eligible for our study; and iii) the Pfizer Pharmaceutical Company (Pfizer) study on AD [13], including

733 AD cases and 792 control individuals. The TGen1 data was downloaded from the

TGen website (https://www.tgen.org/research/research-divisions/neurogenomics.aspx) and the NIA-LOAD/NCRAD data was retrieved through the database of Genotypes and

Phenotypes (dbGaP; http://www.ncbi.nlm.nih.gov/gap) [22] of the National Institute of

Health (NIH), under the accession number phs000168.v1.p1. The Pfizer data was gathered from the supplementary information accompanying the original publication [13]. Detailed information regarding recruitment, diagnosis, affection status and age at the time of enrollment can be found in the original studies [13,19,21]. Written informed consent was obtained for all participants and prior Institutional Review Board approval was obtained at each participating institution. Additionally, data used in the preparation of this article were

6 obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database

(http://adni.loni.usc.edu). The ADNI was launched in 2003 by the National Institute on

Aging (NIA), the National Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug Administration (FDA), private pharmaceutical companies and non- profit organizations, as a $60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment

(MCI) and early Alzheimer’s disease (AD). Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical

Center and University of California – San Francisco. ADNI is the result of efforts of many co-investigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 subjects but ADNI has been followed by ADNI-GO and

ADNI-2. To date these three protocols have recruited over 1500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older individuals, people with early or late MCI, and people with early AD. The follow up duration of each group is specified in the protocols for ADNI-1, ADNI-2 and ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2.

7

Imputation

In order to maximize information on linkage disequilibrium (LD) structure between the studies, the TGen1 and the NIA-LOAD/NCRAD datasets were imputed by comparison with the CEU reference panel (unrelated individuals) from the HapMap III phased data

(release 2) [23]. Imputation was carried out using the Markov Chain Haplotyping method implemented in MACH 1.0 following author recommendations [24].

Association analysis

Quality control (QC) procedures such as minor allele frequency (MAF), Hardy-Weinberg equilibrium (HWE), missing rate per individuals (MIND) and per SNPs (GENO) were performed on the TGen1 and the NIA-LOAD/NCRAD dataset using PLINK v.1.07

(http://pngu.mgh.harvard.edu/purcell/plink/) [25] with threshold values of 0.05, 1x10-6,

0.05 and 0.02, respectively. We applied a logistic regression analysis, using an additive model on the imputed datasets data with MACH2DAT [24]. SNPs with r2 values less than

0.29 were removed from further analysis. Similarly, in the Pfizer dataset, the standard error

(SE) per SNP was estimated from the p-values reported in the study [13]. Briefly, p-values were transformed into the corresponding Z score with the INVNORMAL function implemented in STATA v.10 (StataCorp, College Station, TX), and then the SE was calculated taking the log of the odds ratios (OR) divided by the corresponding Z score. In the ADNI replication dataset, we performed QC and association analysis based on a quantitative trait locus (QTL) method as described previously [26]. MAF, HWE, MIND and GENO QC values of 0.05, 1x10-6, 0.1 and 0.1 were applied, respectively. In order to control for population stratification we conducted Principal Component Analysis (PCA) with EIGENSTRAT [27]. After this step, 63 individuals were excluded from further

8 analysis, leaving 693 individuals from the initial 756 subjects. The QTL association analysis was carried out using an additive genetic linear regression model with PLINK using different co-variables including age at baseline visit, education, gender and APOE status (ε4 allele present or not). Finally, the results for each dataset was assessed for genomic inflation and visualized in Quantile-Quantile (Q-Q) plots using the statistical R

(www.r-project.org) [28].

Meta-analysis

Meta-analysis was performed using the inverse variance method implemented in PLINK v.1.07 [25]. We checked that all statistics values (p-values, OR and SE) for each dataset prior the meta-analysis were computed for the same allele. Annotation of the results was done with the RefSeq Genes for the human genome assembly Build 36.3/Hg18 available at the UCSC Table Browser [29] using own Perl scripts (available upon request). In order to consider a SNP inside a gene we defined a threshold of +/- 5,000 bp relative to the transcription start and end sites. The annotated output of the meta-analysis used for the pathway approach is available in Table S1. PRISMA guidelines were followed (showed in

Checklist S1) [30].

Single-gene p-value generation

Genetic association values with a cut-off threshold of p < 0.05, from either the Meta- analysis or the ADNI replication dataset, were transformed into single-gene associations using two independent approaches: i) Selection of a gene-wise p-value (GW) [17], defined as the single strongest association inside a gene; and ii) Calculation of a gene-based association p-value using the extended Simes procedure (GATES), which extract a gene-

9 level association from the combination of the SNPs p-values within a gene. This approach does not relies on genotype or phenotypic data and has been shown to correct type 1 error rates in both simulated and permuted datasets, regardless of the gene size or LD structure

[31]. GATES is an open-source tool named Knowledge-Based Mining System for Genome- wide Genetic Studies (KGG; http:// http://bioinfo1.hku.hk:13080/kggweb/).

Functional Protein Association Network (FPAN)

To evaluate single-gene associations in a network context the complete human functional protein association network (FPAN) was retrieved from the STRING 9.0 database (Search

Tool for the Retrieval of Interacting Genes/Proteins; http://string-db.org/) [32]. FPAN contains highly curated known and predicted interactions emerging from different evidence channels such as: genomic context, co-expression and curated literature. Raw text- formatted protein-protein functional interactions were downloaded from STRING. To avoid redundancy and false positives, alternative proteins and their interactions were consolidated into one gene using own Perl scripts (available upon request). We kept only the interactions with a combined score > 0.7 (provided by STRING), which stand for high confidence interactions. The final FPAN generated was composed by 14,793 nodes (genes) and

229,357 edges (interactions) and was introduced as an input to Cytoscape [33], which is an open source platform for visualizing complex networks that not only allows the integration of additional attribute data (i.e. gene annotations, expression profiles and interactions source and confidence), but also provides a comprehensive set of tools to perform integrated pathways analysis. Thus, the p-values of GW and GATES procedures were introduced as a floating-point attribute into the FPAN (Table S2).

10

Module search (Sub-networks)

The module search was carried out with the Cytoscape JActive Modules Plug-in [34] with a gene overlap threshold of 50 %. JActive modules is designed to detect if a certain group of connected nodes are significantly enriched with a statistical parameter such as the single gene p-value, which in our study comes from either the meta-analysis or the ADNI replication dataset. Briefly, starting from one node a sub-network grows to its connected genes by computing an aggregated score (S) derived from the conversion of the single-gene p-value (if present) into their corresponding z-score (with the inverse normal cumulative distribution function). This score is compared internally with a background distribution created from the scores of 10,000 random modules of the same size in a Monte Carlo procedure. If the aggregated score cease growing above the expected by chance, the algorithm stops and the growing sub-network is reported as a result. As in the original publication, modules with S > 3 (3 standard deviation above the mean of randomized scores) and with a size below 50 were considered significant [17]. To standardize the aggregated S score obtained in each sub-network the search was performed 10 times for each analysis (Meta-GW, Meta-GATES, ADNI-GW and ADNI-GATES). Finally, the same procedure was conducted with their corresponding permuted p-values over the entire genes present in the FPAN (Permuted analysis) and without genome wide significant results (p- values <10-8), in this case with real and permuted data, respectively (WGW analysis).

Statistical differences between permuted and non-permuted analyses were assessed through two-sided t-test.

11

Gene Ontology (GO) and KEGG pathway enrichment Analysis

To examine if the structure of significant sub-networks obtained in the Meta or the ADNI replication dataset were biologically meaningful, gene lists of the first 10 significant modules were tested for pathway enrichment using information from Gene Ontology (GO; http://geneontology.org/) [35] and the Kyoto Encyclopedia of Genes and Genomes database

(KEGG; http://www.genome.jp/kegg/ ) [36]. We initially used the ontology structure and annotations using the package Ontologizer [37], only considering categories with less than

500 members to avoid associations to major categories that are less informative (i.e. signaling) and excluding the ones "Inferred by Electronic Annotation" (IEA), from

"Reviewed Computational Analysis" (RCA) and with "No biological Data available" (ND), which are characterized by a high rate of false positives [38]. In this case, we used the parent-child-union algorithm to call for overrepresentation adjusting the p-values with a

Westfall-Young Single-Step multiple test correction, to avoid additional false positives

[39], and considering a GO term significantly over-represented when the adjusted p-value was below 0.01. Similarly, to determine overrepresentation, KEGG pathway enrichment was assessed in the complete set of pathways and components [40], using an hypergeometric test with the phyper function contained in the R statistical package [41].

Gene expression heatmaps and cluster analysis

To evaluate the expression pattern of genes of interest from the network-based analysis, human gene expression profiles were downloaded from the Allen Brain Atlas (ABA) website (http://www.brain-map.org) [42]. We used the Gene Search web-tool to enter a list of genes arising from the intersection of sub-networks and analyzed their expression profiles through 27 brain regions. We averaged the expression levels from 6 brain donor

12 individuals (ids. H0351.2001, H0351.2002, H0351.1009, H0351.1012, H0351.1015 and

H0351.1016) and used the collapseRows R script [43] to generate a gene-wise expression dataset. Expression heatmaps and hierarchical clusters were analyzed using Cluster v.3.0

(http://www.geo.vu.nl/~huik/cluster.htm) and visualized with the aid of JavaTreeView v.1.1.6r2 (http://jtreeview.sourceforge.net) [44]. Identification of genes co-expressed and correlation analyses were performed with Cluster v.3.0, using Euclidean distance in conjunction with centroid linkage algorithms, and a correlation coefficient cutoff of r > 0.7 to denote highly correlated gene clusters.

13

Results

Meta-Analysis

The complete strategy implemented in the present study is shown in Figure 1.

General features of the datasets used for the meta-analysis (TGen1, NIA-LOAD/NCRAD and Pfizer) and the replication step (ADNI) are described in Table 1. The genetic information of 4,569 individuals (2,540 AD cases and 2,029 controls) was considered after passing QC thresholds based on MAF, HWE, MIND and GENO parameters calculated in

PLINK. Additionally, we imputed a total of 1,231,704 and 1,245,964 QC-passing SNPs for the TGen1 and NIA-LOAD/NCRAD datasets, respectively. To account for bias still present after QC procedures, SNP association p-values were further assessed for genomic inflation, which is represented in Q-Q plots (Figure S1). All the datasets yielded an inflation factor

(λ) between the acknowledged margins of 0.9 to 1.1, where the contribution of population structure to the genome-wide association is negligible [45]. Taking into account these considerations, we performed a meta-analysis using the inverse variance method, selecting p-values and ORs from the fixed effects model, assuming that these studies have been conducted under similar conditions and subjects [46-48]. The combined analysis showed a normal distribution of the p-values with an excess of significant signals seen only at the end of the curve, indicating likely true association events (λ = 1.05, Figure S1).

Whole-genome meta-analysis results are depicted as a Manhattan plot (Figure 2),

-8 with a significance threshold defined above –log10(5x10 ), which marks the beginning of genome-wide significant values [49]. In agreement with current reports, the strongest associations were located in a broad genomic region (> 250,000 bp) in the vicinity of the

APOE locus in chromosome 19 (Table 2). In particular, highly significant genome-wide associations signals were observed in the coding region of the translocase of the

14 mitochondrial outer membrane gene (TOMM40: rs2075650, p = 8.54x10-116, OR = 4.48; rs157580, p = 9.6x10-35, OR = 0.51 and rs8106922, p = 1.17x10-25, OR = 0.57), upstream of the apolipoprotein C-I gene (APOC1: rs439401, p = 8.82x10-29, OR = 0.54), inside the poliovirus receptor related 2 isoform delta gene (PVRL2: rs6859, p = 7.87x10-28, OR = 1.7 and rs3852861 p = 5.32x10-11, OR = 0.64) and between TOMM40 and the APOE gene

(rs405509, p = 2.29x10-27, OR = 0.57). In addition, in the same chromosomal region we observed genome-wide association, downstream of the basal cell adhesion molecule isoform 1 gene (BCAM: rs10402271, p = 1.98x10-17, OR = 1.46 and rs10405693, p =

2.83x10-12, OR = 1.49) and in the region of the B-cell CLL/lymphoma 3 gene (BCL3: rs8103315, p = 1.87x10-8, OR = 1.5). We note that the strength of the association signal for some SNPs is derived from the Pfizer and NIA-LOAD/NCRAD datasets only, since these were either not genotyped in the TGen1 sample or poorly imputed due to the intrinsic array density in that particular region of chromosome 19 in the Affimetrix GeneChip 500K (See

"N" column, Table 2). Interestingly, among the top 25 associations detected, novel genome- wide marginally significant signals outside chromosome 19 were observed: in chromosome

12 (intergenic: rs249153, p = 4.38 x10-07, OR = 1.41; intergenic: rs249166 and rs249167, both with p = 6.91 x10-07 and OR = 1.40) and in chromosome 5 (intergenic: rs13178362, p

= 6.60x10-07, OR = 0.75), followed by trends inside the membrane-spanning 4-domains, subfamily A, member 3 gene in chromosome 11 (MS4A3: rs474951, p = 1.25x10-6, OR =

0.79 and rs528823, p = 1.55x10-6, OR = 0.79) and also within the Fanconi anemia group D2 gene in chromosome 3 (FANCD2: rs1552244, p = 1.63x10-06, OR = 0.76; rs9849434, p =

1.88x10-06, OR = 0.71).

15

Pathway Analysis

To test the hypothesis that highly-connected sub-networks enriched with minor associations might be significantly overrepresented in AD, we performed a gene-oriented pathway analysis by loading meta-analysis results into a high confidence functional protein association network (FPAN), gathered from the STRING database [32]. First, to avoid noise, the analysis was restricted to SNPs with p-value < 0.05 (Table S1). Second, we calculated whole-gene association values using two alternative approaches: (i) the extraction of a gene-wise p-value, corresponding to the strongest association signal within a gene (Meta-GW) [17]; and (ii) the derivation of a more stringent gene-based p-value using the extended Simes test (Meta-GATES), which combines all association signals within a gene and controls for the bias that could be generated by gene-size or LD structure among markers [31]. Thus, we observed 66,204 SNP association p-values < 0.05 tagging 7,527 genes. Of these, we loaded only 4,891 and 4,647 p-values (from GW and GATES procedures, respectively) into FPAN (Table S2), which was composed by 14,793 genes having 229,357 non-redundant high-confidence interactions (note that not all genes with minor association values are informative in the FPAN). Third, we conducted a sub-network search using the information from GW and GATES procedures permuting the p-values

10,000 times to create an expected background distribution. Fourth, we obtained the mean score for GW and GATES by repeating the search 10 times using a Monte Carlo procedure.

Fifth, the above results were controlled by 2 further module searches (see Figure 1), this time including permuted data or results without genome wide (WGW) significance. While in the former search, the p-values were permuted 10 times over the entire FPAN to determine if the sub-network structure and its score could be obtained by chance (Meta-

GW-Permuted, Meta-GATES-Permuted); in the latter, we discarded the possibility that the

16 sub-networks could be the result of bias due to the strong genome-wide significant p-values within the APOE locus.

The result of the global search for overrepresented modules in AD is presented in

Figure 3. First, the total number of significant sub-networks obtained in the Meta-GW analysis (average number = 32.3, SD = 6.14) was significantly higher (p = 2.51x10-7), than those arising from chance (Figure 3A; Meta-GW Permuted; average = 13.7, SD = 4.05).

Likewise, the total number of significant sub-networks in Meta-GATES was similar to

Meta-GW (average number = 34.9, SD = 0.99) and significantly higher (p = 1.35x10-9) than those obtained by chance (Figure 3A; Meta-GATES Permuted; average = 6.1, SD = 7.99).

Second, the sub-network scores in each procedure were always significantly higher among real vs. permuted data (Figure 3B). Third, the number and scores of the modules obtained in the WGW control was similar to the ones obtained with the whole set of associations, indicating that the strongest associations of the meta-analysis did not influence the present observations (Figure 3A and B). Fourth, the modules obtained in the Meta-GW and Meta-

GATES searches, remained consistent in significance and structure across iterations, changing only in their respective rank/score. Remarkably, the most significant sub-network detected in each approach (Meta-GW SN1: S = 6.14, p-value = 8.16x10-8; and Meta-

GATES SN1: S =5.62, p-value = 1.77x10-5; Figure 3B and Table 3) was identical in structure differing only in the presence of the guanine nucleotide binding protein (G protein), alpha z polypeptide gene (GNAZ), which was absent in Meta-GATES SN1 (Table

S3). Finally, we note that the list of genes contained in each sub-network was not replicated in the permutation analyses and that only 2 out of 688 permuted genes were also seen either in Meta-GW SN1 or Meta-GATES SN1 (Figure S2). Altogether, these results indicate that the quantity, significance and structure of the modules identified could not be reached by

17 chance, strongly suggesting that these sub-networks could be biologically meaningful in the etiology of AD.

Glutamate signaling is overrepresented in AD

To examine the above-mentioned hypothesis, gene ontology (GO) term enrichment was assessed in the top 10 sub-networks identified in each analysis using the package

Ontologizer (see Methods). Table 3 presents the top 3 Meta-GW and Meta-GATES sub- networks, as a function of biological process (BP), cellular component (CC) and molecular function (MF) categories. Meta-GW results indicated that: SN1 was heavily composed by genes acting at the synapse (21/461, p = 2.87x10-18), participating in the signaling pathway (11/46, 2.67x10-11) and specifically related to glutamate receptor activity

(13/24, p = 1.21x10-18); SN2 was mostly over-represented by genes belonging to the axon guidance biological process (33/341, p = 1.20x10-9), located mostly at the cell leading edge

(12/245, p = 9.35x10-11); and SN3 was over-represented by the roundabout signaling pathway (3/3, p = 1.6x10-7). On the other hand, the results with the alternative and more stringent Meta-GATES procedure showed that SN1 had identical ontological enrichment patterns as observed for Meta-GW SN1, being glutamate receptor activity the most significant category (13/24, 1.69x10-18). Interestingly, Meta-GATES SN2 contained several genes involved in lipid metabolism including categories such as steroid metabolic process and lipid localization (11/269, p = 2.13x10-9 and 13/221, p = 3.81x10-9, respectively).

Finally, Meta-GATES SN3 was composed of genes participating in transmembrane receptor protein kinase activity (12/82, p = 1.18x10-12), growth factor binding (9/99, p =

3.42x10-12), protein autophosphorylation (16/174, 2.30x10-12) and located mainly at the synapse (7/461, 3.28x10-6). Sub-networks features and components are described in Table

18

S3 and the complete set of ontologies overrepresented in the first top 10 sub-networks is provided in Table S4.

Replication of glutamate signaling in the ADNI dataset

Considering that glutamate signaling pathway components were consistently present in significant sub-networks enriched with minor associations to AD, both in the Meta-GW and Meta-GATES analyses, and since both procedures were originated from a single set of

SNP associations, we next interrogated the ADNI dataset under the same pipeline (Figure

1), as an attempt to replicate the results in an independent sample of AD individuals. This additional dataset was composed of 693 subjects of which 499 were cases and 194 were controls (Table 1). Genetic association values were calculated replicating the quantitative trait locus (QTL) method reported in the original study [26], which is based on the composite memory score, a measure of the level of memory impairment, reported for each patient (see Methods). Although, the ADNI case cohort includes subjects with mild cognitive impairment (MCI), the phenotype is considered a transitional state with significant risk of progression to clinically diagnostic AD [50], which validates their inclusion. After the corresponding QC procedures, the ADNI dataset showed no significant genomic inflation (λ = 1.02, Figure S1). According to was described in the original publication, our results indicate that QTL testing yielded 25,785 SNP associations (p-value

<0.05), tagging 4,915 genes (Table S5), did not reach genome wide significant levels.

Marginal associations were observed within the dual specificity phosphatase 23 (DUSP23) gene in chromosome 1 (rs1129923, p = 1.07x10-6), the 3'-phosphoadenosine 5'- phosphosulfate synthase 1 (PAPSS1) gene (rs9569, p = 6.84x10-6) and in the phosphatidylinositol-4-phosphate 3-kinase, catalytic subunit type 2 gamma (PIK3C2G)

19 gene (rs10841025, p = 9.01x10-6), as well as association signals in intergenic regions on chromosome 17 and 3 (rs9890008, p = 4.25x10-6 and rs4857008, p = 5.88x10-6, respectively).

We introduced 3,244 and 3,113 p-values, ADNI-GW and ADNI-GATES respectively (Table S2), into the same FPAN used for the Meta-analysis and module search was carried out with their respective null datasets (ADNI-GW-Permuted, ADNI-GATES-

Permuted), since in the absence of genome wide significant results, the WGW control was not necessary. In general agreement with the meta-analysis data, global results indicated that the number of significant modules obtained in either ADNI-GW (average number =

30.3, SD = 2.00) or ADNI-GATES (average number = 24.4, SD = 1.7), was significantly higher (p = 7.01x10-3 and p = 1.80x10-5, respectively), than those obtained by chance

(Permuted; Figure 3C). Interestingly, when comparing the scores of the first 10 sub- networks only the ones belonging to the ADNI-GW analysis remained significantly above their respective permuted ones (Figure 3D) and thus were considered for further analysis

(Table S3).

GO term enrichment in ADNI indicated that genes belonging to categories such as voltage-gated calcium channel complex and ion channel complex were significantly overrepresented in AD (p = 1.24x10-8 and p = 9.24x10-7, respectively; see also Table S3 and Table S4 for complete ontological results). Moreover, we replicated multiple modules enriched with glutamate signaling genes (Table 4), including modules SN3 (S = 5.23, p =

5.09x10-8), SN4 (S= 4.94, p = 7.14x10-8) and SN7 (S = 4.38, p-value = 3.40x10-8).

Individual sub-network structure is presented in Figure S3.

20

KEGG pathway enrichment

KEGG provided information regarding 280 pathways involving 6,733 genes.

Although in comparison with the GO database the amount of information provided by

KEGG is substantially reduced, the fact that each annotation is manually curated makes any association much more reliable [51]. Notably, throughout this analysis we detected that glutamate signaling was again the main overrepresented biological process in both Meta-

GW SN1 (p = 1.10x10-28) and Meta-GATES SN1 (p = 5.94x10-29) sub-networks (Table

S6). Glutamatergic synapse, as a KEGG pathway category, was also significantly associated in ADNI sub-networks SN3, SN4 and SN7 (p = 1.20x10-06, p = 1.32x10-06 and p

= 8.18x10-10, respectively; Table S6). Finally, logical and structural relationships of all sub- networks enriched with glutamate signaling genes from the meta-analysis and the ADNI- replication dataset allowed us to define a list of genes of interest, which were shared at least by 3 sub-networks (Figure 4). The list was composed by 20 signaling components, including membrane-anchored ionotropic (GRIN2A, GRIN2B, GRID2, GRIA1 and

GRIA2) and metabotropic glutamate receptors (GRM1, GRM3, GRM7 and GRM8), intracellular downstream effectors CAMK2A and AKAP5, as well as scaffold proteins

SHANK1 and SHANK2, which are required for proper formation and function of neuronal synapses.[52] The functional relationships of these signaling components in the context of a glutamatergic synapse is shown in Figure 5.

Expression of glutamate-signaling genes in the

At the physiological level, to explore if there was a transcriptional relationship among the glutamate signaling genes previously identified, we examined their expression profiles in

27 normal human brain regions, using the information from the Allen Brain Atlas [42], as a

21 reference. Clearly, the expression pattern of these components clustered in brain regions tightly related to AD pathology, such as the , and white matter [53-55] (Figure 6). While it is known that glutamate signaling is active in these brain domains [56-58], it was interesting to find out clusters with high- and low- expression levels. For instance, GRIA1, GRIA2, CAMK2A, GRIN2B and GRM7 were found among highly expressed gene clusters in the hippocampal formation (r = 0.94), particularly in the CA2-CA3 and CA4 region (Figure 6A), while GRM3, LPHN3, GRID2 and SLC9A9 were grouped in a low-expression cluster (r = 0.87). Low-expression clusters were also observed in the hypothalamus (LSP1, GRM3, GRM1, GRIN2A, ITPR1, AKAP5 and ATP2B2; Figure 6B), the dorsal thalamus (SHANK2, SHANK1, BAI3, PIAS1,

GRIA1, GRID2 and GRIA2; Figure 6C) and also distinguished in the white matter

(GRIA2, AKAP5, GRIN2A, GRIN2B, CAMK2A, SHANK2, GRM8, GRM1, GRM7,

BAI3, LPHN3, ITPR1, SHANK1 and ATP2B2; Figure 6D). Interestingly, there was an inverse relationship in the expression pattern of a subset of these genes, since components highly expressed in the hippocampus were found in low-expression clusters in the white matter, and vice versa (i.e. GRIA2, CAMK2 and GRIN2B vs. GRM3 and SLC9A9).

Altogether these results indicate that glutamate signaling components are differentially expressed in restricted brain domains for proper neuronal or glial functional activity (see also Figure 5).

22

Discussion

In agreement with previous GWAS in AD [8,13-16,19] our meta-analysis detected strong genome-wide association signals in a 250 kb window of chromosome 19, centered in the coding/regulatory region of the TOMM40 gene, in close proximity to the APOE locus, and that also included significant signals in the PVRL2, APOC1, BCAM and BCL3 genes.

While it has been suggested that the association of such extended region may reflect that other variants in LD with APOE may be of pathogenic importance, particularly a poly-T track in the TOMM40 gene [59,60], recent studies have shown that APOE alleles account for essentially all the inherited risk of AD associated in this region [61]. Besides the signal in chromosome 19, we detected marginal associations of 2 novel SNPs in the MS4A3 gene, located in a wide LD region containing a cluster of SNPs in the MS4A6A/MS4A4E loci in the long arm of chromosome 11 (i.e. rs610932, rs670139, rs1562990, rs4938933 and rs983392), which reached genome-wide significant levels by other recent studies

[16,62,63].

Assuming that the APOE locus is the major genetic hallmark associated with the disease and that it does not explain the entire susceptibility of AD [1,7-9], we conducted a network-based pathway analysis with our meta-analysis results to explore the biology behind variants with minor effect size. Initially, to integrate the whole genetic contribution from the meta-analysis we used a gene-wise p-value (GW, single min p-value method) that has been widely applied in detecting novel associations using GWAS data [17,64].

Although this approach has a certain bias for pathways enriched with larger genes and does not consider intergenic associations or LD structure, we note that the random permutation of p-values yielded a distribution of results expected by chance from where the actual data could be compared. Likewise, with the search for significant sub-networks with real and

23 permuted data, and additionally with the WGW control, we believe that the actual contribution of the aforementioned problems to the final result is strongly surpassed by the combination of true minor effect size variants. Still, we considered appropriated the introduction of the GATES procedure that is specifically designed to directly address the gene size and LD structure issues and thus we ended up with a more stringent gene-oriented p-value. Notably, through this approach we replicated essentially the same sub-network

(i.e. GW and GATES SN1), which was populated by genes related to glutamate signaling, differing only in the absence of GNAZ gene whose only association in the meta-analysis

(Table S1; rs4820537, p = 0.02096) was found not informative in the GATES procedure.

Glutamate signaling was further replicated in the ADNI dataset and this time it reached significant association levels in three sub-networks (ADNI-GW SN3, SN4 and SN7).

From a biological point of view, the relationship of these genetic observations with current knowledge about AD is straightforward. Glutamate signaling has been reported to regulate multiple biological processes, including fast excitatory synaptic transmission, neuronal growth and differentiation, synaptic plasticity, learning and memory [65,66].

Degenerating neurons and synapses in AD are usually located within regions that project to or from areas displaying high densities of Aβ plaques and tangles [67] and in this regard, glutamatergic neurons located in the hippocampus, as well as in other areas of the brain, are severely affected by these neurotoxic insults [65,67]. Likewise, it has been established that there is a relationship between glutamate receptor signaling and soluble Aβ oligomers in the hippocampus, affecting their expression and recycling, which leads to long term depression, synaptic loss and ultimately to cognitive deficit [68,69]. Moreover, sustained activation of glutamate receptors at the synapse rise Ca2+ influxes and second

24 messenger levels activating neuronal nitric oxide synthase (NO), increasing reactive nitrogen and oxygen species, thus contributing to neuronal damage independently of the presence of Aβ oligomers [70]. Alternatively, and from a genomic perspective, here we provide strong evidence that common genetic variants within a complete set of genes acting as ionotropic/metabotropic glutamate receptors and its downstream effectors are associated in a network context with AD. Accordingly, it has been recently reported that pathways related to neurotransmitter receptor-mediated calcium signaling and long-term potentiation are similarly associated with mild cognitive impairment and AD [26]. In addition we have found that the expression pattern of these glutamate signaling genes cluster in specific brain regions, which are affected during the development of the disease, such as the hippocampal formation and the hypothalamus. Therefore, it will be interesting to learn if the genes identified through our network-based pathway approach are spatially and coordinately modulated at the transcriptional or post-transcriptional level as a result of various trophic or toxic stimuli. Finally, our data extends the notion that the remaining genetic risk for complex traits, such as AD, is likely explained by the accumulation of functional genetic variants inside an entire pathway, rather than by punctual independent mutations.

25

Acknowledgments

We would like to thank the Translational Genomics Research Institute (TGen) and the Pfizer Inc. Pharmaceutical Company (Pfizer), and all contributors associated who kindly provided access to the genotypic and phenotypic association data. Likewise we thank the Joint Addiction, Aging, and Mental Health (JAAMH) Data Access Committee of the Database of Genotypes and Phenotypes (dbGaP) that allowed to access the National

Institute of Aging (NIA) and National Cell Repository For Alzheimer disease (NIA-

LOAD/NCRAD) dataset under the accession number phs000168.v1.p1, which was collected and analyzed under the supervision of Richard Mayeux MD, MSc, Columbia

University, New York, NY, USA and Tatiana Foroud PhD, from the NCRAD and Indiana

University, Indianapolis, IN, USA. Full list of co-investigators is provided at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000168.v1.p1.

The genotyping, cleaning and harmonization of the NIA-LOAD/NCRAD dataset was carried out at the Johns Hopkins University Center for Inherited Disease Research (CIDR),

Baltimore, MD, USA. Additionally, data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health

Grant U01 AG024904) and DOD ADNI (Department of Defense award number

W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National

Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation;

BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company; Eisai Inc.; Elan

Pharmaceuticals, Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; IXICO Ltd.; Janssen

Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson

26

Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co., Inc.; Meso

Scale Diagnostics, LLC.; NeuroRx Research; Novartis Pharmaceuticals Corporation; Pfizer

Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical Company. The

Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in

Canada. Private sector contributions are facilitated by the Foundation for the National

Institutes of Health (www.fnih.org). The grantee organization is the Northern California

Institute for Research and Education, and the study is coordinated by the Alzheimer's

Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Finally, we wish to thank the participation of the patients and their families that ultimately have made possible this study. Our work was supported by Grants from the Chilean

Government FONDECYT 1100942 and FONDAP 15090007 to G.V.D. CONICYT supported doctoral fellowships for E.P.-P. and M.E.A. MECESUP UAB0802 doctoral fellowship supports B.I.B.

27

References

1. Bettens K, Sleegers K, Van Broeckhoven C (2010) Current status on Alzheimer disease molecular genetics: from past, to present, to future. Hum Mol Genet 19: R4-R11. 2. Lambert JC, Schraen-Maschke S, Richard F, Fievet N, Rouaud O, et al. (2009) Association of plasma amyloid beta with risk of dementia: the prospective Three-City Study. Neurology 73: 847-853. 3. Hardy J, Selkoe DJ (2002) The amyloid hypothesis of Alzheimer's disease: progress and problems on the road to therapeutics. Science 297: 353-356. 4. Guerreiro RJ, Gustafson DR, Hardy J (2012) The genetic architecture of Alzheimer's disease: beyond APP, PSENs and APOE. Neurobiol Aging 33: 437-456. 5. Saunders AM, Strittmatter WJ, Schmechel D, George-Hyslop PH, Pericak-Vance MA, et al. (1993) Association of apolipoprotein E allele epsilon 4 with late-onset familial and sporadic Alzheimer's disease. Neurology 43: 1467-1472. 6. Strittmatter WJ, Saunders AM, Schmechel D, Pericak-Vance M, Enghild J, et al. (1993) Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc Natl Acad Sci U S A 90: 1977-1981. 7. Kamboh MI (2004) Molecular genetics of late-onset Alzheimer's disease. Ann Hum Genet 68: 381-404. 8. Coon KD, Myers AJ, Craig DW, Webster JA, Pearson JV, et al. (2007) A high-density whole- genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. J Clin Psychiatry 68: 613-618. 9. Meyer MR, Tschanz JT, Norton MC, Welsh-Bohmer KA, Steffens DC, et al. (1998) APOE genotype predicts when--not whether--one is predisposed to develop Alzheimer disease. Nat Genet 19: 321-322. 10. Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE (2007) Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 39: 17-23. 11. Cowperthwaite MC, Mohanty D, Burnett MG (2010) Genome-wide association studies: a powerful tool for neurogenomics. Neurosurg Focus 28: E2. 12. Zaykin DV, Zhivotovsky LA (2005) Ranks of genuine associations in whole-genome scans. Genetics 171: 813-823. 13. Hu X, Pickering E, Liu YC, Hall S, Fournier H, et al. (2011) Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease. PLoS One 6: e16616. 14. Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, et al. (2009) Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nat Genet 41: 1088-1093. 15. Lambert JC, Heath S, Even G, Campion D, Sleegers K, et al. (2009) Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nat Genet 41: 1094-1099. 16. Hollingworth P, Harold D, Sims R, Gerrish A, Lambert JC, et al. (2011) Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nat Genet 43: 429-435. 17. Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, et al. (2009) Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 18: 2078-2090. 18. Lechner M, Hohn V, Brauner B, Dunger I, Fobo G, et al. (2012) CIDeR: multifactorial interaction networks in human diseases. Genome Biol 13: R62.

28

19. Reiman EM, Webster JA, Myers AJ, Hardy J, Dunckley T, et al. (2007) GAB2 alleles modify Alzheimer's risk in APOE epsilon4 carriers. Neuron 54: 713-720. 20. Lee JH, Cheng R, Graff-Radford N, Foroud T, Mayeux R (2008) Analyses of the National Institute on Aging Late-Onset Alzheimer's Disease Family Study: implication of additional loci. Arch Neurol 65: 1518-1526. 21. Wijsman EM, Pankratz ND, Choi Y, Rothstein JH, Faber KM, et al. (2011) Genome-wide association of familial late-onset Alzheimer's disease replicates BIN1 and CLU and nominates CUGBP2 in interaction with APOE. PLoS Genet 7: e1001308. 22. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, et al. (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 39: 1181-1186. 23. The_International_HapMap_Consortium (2005) A haplotype map of the human genome. Nature 437: 1299-1320. 24. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34: 816-834. 25. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559-575. 26. Ramanan VK, Kim S, Holohan K, Shen L, Nho K, et al. (2012) Genome-wide pathway analysis of memory impairment in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort implicates gene candidates, canonical pathways, and networks. Brain Imaging Behav 6: 634-648. 27. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904-909. 28. Team R (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna Austria. 29. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, et al. (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32: D493-496. 30. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6: e1000097. 31. Li MX, Gui HS, Kwan JS, Sham PC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88: 283-293. 32. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39: D561-568. 33. Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, et al. (2007) Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2: 2366-2382. 34. Ideker T, Ozier O, Schwikowski B, Siegel AF (2002) Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18 Suppl 1: S233-240. 35. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258-261. 36. Arakawa K, Kono N, Yamada Y, Mori H, Tomita M (2005) KEGG-based pathway visualization tool for complex omics data. In Silico Biol 5: 419-423. 37. Grossmann S, Bauer S, Robinson PN, Vingron M (2007) Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics 23: 3024-3031.

29

38. Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, et al. (2010) The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38: W214-220. 39. Westfall PH, Young SS (1993) Resampling-based multiple testing: Examples and methods for p- value adjustment: Wiley-Interscience. 40. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27-30. 41. Team R (2010) R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna Austria. 42. Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, et al. (2012) An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489: 391-399. 43. Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, et al. (2011) Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinformatics 12: 322. 44. Saldanha AJ (2004) Java Treeview--extensible visualization of microarray data. Bioinformatics 20: 3246-3248. 45. Melum E, Franke A, Karlsen TH (2009) Genome-wide association studies--a summary for the clinical gastroenterologist. World J Gastroenterol 15: 5377-5396. 46. Fleiss JL, Gross AJ (1991) Meta-analysis in epidemiology, with special reference to studies of the association between exposure to environmental tobacco smoke and lung cancer: a critique. J Clin Epidemiol 44: 127-139. 47. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7: 177-188. 48. Ades AE, Lu G, Higgins JP (2005) The interpretation of random-effects meta-analysis in decision models. Med Decis Making 25: 646-654. 49. Barsh GS, Copenhaver GP, Gibson G, Williams SM (2012) Guidelines for genome-wide association studies. PLoS Genet 8: e1002812. 50. Aisen PS, Petersen RC, Donohue MC, Gamst A, Raman R, et al. (2010) Clinical Core of the Alzheimer's Disease Neuroimaging Initiative: progress and plans. Alzheimers Dement 6: 239-246. 51. Holmans P (2010) Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Adv Genet 72: 141-179. 52. Grabrucker AM, Schmeisser MJ, Schoen M, Boeckers TM (2011) Postsynaptic ProSAP/Shank scaffolds in the cross-hair of synaptopathies. Trends Cell Biol 21: 594-603. 53. Fotuhi M, Do D, Jack C (2012) Modifiable factors that alter the size of the hippocampus with ageing. Nat Rev Neurol 8: 189-202. 54. de Jong LW, Ferrarini L, van der Grond J, Milles JR, Reiber JH, et al. (2011) Shape abnormalities of the striatum in Alzheimer's disease. J Alzheimers Dis 23: 49-59. 55. de Leeuw FE, Barkhof F, Scheltens P (2005) Progression of cerebral white matter lesions in Alzheimer's disease: a new window for therapy? J Neurol Neurosurg Psychiatry 76: 1286- 1288. 56. Alix JJ, Domingues AM (2011) White matter synapses: form, function, and dysfunction. Neurology 76: 397-404. 57. Tamminga CA, Southcott S, Sacco C, Wagner AD, Ghose S (2012) Glutamate dysfunction in hippocampus: relevance of dentate and CA3 signaling. Schizophr Bull 38: 927-935. 58. Xu J, Kurup P, Nairn AC, Lombroso PJ (2012) Striatal-enriched protein tyrosine phosphatase in Alzheimer's disease. Adv Pharmacol 64: 303-325. 59. Cruchaga C, Nowotny P, Kauwe JS, Ridge PG, Mayo K, et al. (2011) Association and expression analyses with single-nucleotide polymorphisms in TOMM40 in Alzheimer disease. Arch Neurol 68: 1013-1019.

30

60. Roses AD, Lutz MW, Amrine-Madsen H, Saunders AM, Crenshaw DG, et al. (2010) A TOMM40 variable-length polymorphism predicts the age of late-onset Alzheimer's disease. Pharmacogenomics J 10: 375-384. 61. Jun G, Vardarajan BN, Buros J, Yu CE, Hawk MV, et al. (2012) Comprehensive Search for Alzheimer Disease Susceptibility Loci in the APOE Region. Arch Neurol: 1-10. 62. Antunez C, Boada M, Gonzalez-Perez A, Gayan J, Ramirez-Lorca R, et al. (2011) The membrane- spanning 4-domains, subfamily A (MS4A) gene cluster contains a common variant associated with Alzheimer's disease. Genome Med 3: 33. 63. Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, et al. (2013) Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat Genet 45: 1452-1458. 64. Torkamani A, Topol EJ, Schork NJ (2008) Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 92: 265-272. 65. Mattson MP (2008) Glutamate and neurotrophic factors in neuronal plasticity and disease. Ann N Y Acad Sci 1144: 97-112. 66. Yang JL, Sykora P, Wilson DM, 3rd, Mattson MP, Bohr VA (2011) The excitatory neurotransmitter glutamate stimulates DNA repair to increase neuronal resiliency. Mech Ageing Dev 132: 405-411. 67. Revett TJ, Baker GB, Jhamandas J, Kar S (2012) Glutamate system, amyloid ss peptides and tau protein: functional interrelationships and relevance to Alzheimer disease pathology. J Psychiatry Neurosci 38: 6-23. 68. Almeida CG, Tampellini D, Takahashi RH, Greengard P, Lin MT, et al. (2005) Beta-amyloid accumulation in APP mutant neurons reduces PSD-95 and GluR1 in synapses. Neurobiol Dis 20: 187-198. 69. Shankar GM, Bloodgood BL, Townsend M, Walsh DM, Selkoe DJ, et al. (2007) Natural oligomers of the Alzheimer amyloid-beta protein induce reversible synapse loss by modulating an NMDA-type glutamate receptor-dependent signaling pathway. J Neurosci 27: 2866-2875. 70. Nakamura T, Lipton SA (2009) Cell death: protein misfolding and neurodegenerative diseases. Apoptosis 14: 455-468.

31

Figure Legends

Figure 1. Study strategy. Genotype imputation was carried out in the Tgen1 and NIA-

LOAD/NCRAD datasets (asterisk). The meta-analysis was conducted using the inverse variance method in PLINK, after pruning bad genotypes and samples using standard quality control (QC) tests. The ADNI dataset was included for replication following similar QC procedures at this step. Meta-analysis (dark grey arrows) and ADNI associations (light grey arrows) results were annotated and single gene p-values calculated using the Gene-Wise

(GW) method or the more stringent GATES procedure (threshold p < 0.05). We next introduced this information to FPAN (from STRING database). Module search was performed 10 times, side by side with the permuted data and without genome-wide- significant (WGW) results, which served as internal controls. Significant sub-networks

(white squares) were compared and assayed for gene ontology (GO) term and KEGG pathways enrichment to obtain the final overrepresented pathways associated with AD, inside each sub-network. Equal results between Meta and WGW analysis (“=”) that could not be obtained with the permuted control (“≠”) were expected.

Figure 2. Genome wide meta-analysis results in AD. Manhattan plot showing the p- values obtained in the meta-analysis. The end and beginning of a chromosome is denoted by the change of color pattern of the SNPs (black, grey and brown dots). Genome-wide significance threshold is denoted by a red line (5.0x10-8). The Y axis has been truncated to show all associated SNPs inside the APOE loci and to improve visualization of suggestive associations.

32

Figure 3. Sub-network search results. (A) The number of significant sub-networks (size

< 50 and score > 3) in Meta-GW (light green) and Meta-GATES (dark green) is shown compared with same values permuted across the FPAN: Meta-GW-Permuted (light grey) and Meta-GATES-Permuted (dark grey). (B) Score comparison of the top 10 sub-networks obtained in the corresponding module searches presented in (A). (C) The number of significant sub-networks in the replication step for ADNI-GW (light blue) and ADNI-

GATES (dark blue) analysis, in comparison with their corresponding permuted controls:

ADNI-GW-Permuted (light grey) and ADNI-GW-Permuted (dark grey). (D) Score comparison of the top 10 sub-networks obtained for each module searches presented in (C).

Caped bar/points denote SD; Significant differences between real and permuted data observed in GW and GATES analysis are denoted by an asterisk and those between real and permuted data observed only in GW analysis are denoted by a plus sign (two-sided

Student’s t-test; p < 0.01).

Figure 4. Structure and relationships between glutamate signaling sub-networks overrepresented in AD. Sub-network gene composition (nodes) and interactions (edges) are shown for: Meta-SN1 in the upper left corner with green edges (which includes GW and GATES modules, GNAZ* gene only present in GW); ADNI-GW SN3 in the botton left corner with dark blue edges; ADNI-GW SN4 in the bottom right corner with blue edges and ADNI-GW SN7 in the upper right corner with light blue edges. Genes shared by at least 2 sub-networks are located at the center in bold font and cross interactions between genes inside each module are denoted by light grey edges. Node color represents the OR behavior in a gradient from green to red values (i.e. green: OR<1; red OR>1; white: OR =

1), denoting protection and risk, respectively. Similarly, node size is proportional to the -

33 log10 p-value obtained from the meta-analysis (if absent, node size is the minimum).

Triangle shaped nodes marks genes belonging to the glutamate signaling pathway GO term

(GO:0007215); Diamond shaped nodes denotes genes belonging to KEGG Glutamatergic synapse pathway (hsa04724); Squares Square shaped nodes denotes genes belonging to both gene ontology term GO:0007215 and KEGG hsa04724 pathway.

Figure 5. Overrepresented glutamate signaling components at the functional synapse.

The original version of the hsa04724 KEGG pathway (Glutamatergic synapse) is shown.

Black arrows denotes direct molecular interaction or relation between gene products (green squares) or other types of molecules (unfilled circles), while black arrows with dashed lines denotes an indirect effect between the each nodes. Relationships with other KEGG pathways are shown with the presence of white round rectangles. Gene symbols in components belonging to Meta-GW and META-GATES SN1 are denoted in red.

Figure 6. Gene expression analysis of glutamate signaling components in selected human brain regions. Heatmap and dendrogram of normalized expression levels of the 20 genes of interest displaying significant clustering in: (A) hippocampal formation (HIF); (B) hypothalamus (HY); (C) Dorsal Thalamus (DT); and (D) white matter (WM). Heatmaps were generated using normalized Z score gene-wise expression values, which were averaged from 6 brain donor individuals (ids. H0351.2001, H0351.2002, H0351.1009,

H0351.1012, H0351.1015 and H0351.1016). Bright red and green color indicates high (Z >

2) and low expression (Z < 2). Highly correlated gene clusters (Euclidean distance correlation coefficient r > 0.7) are denoted by colored lines in the dendrograms: green clusters, indicates low expression patterns; red clusters show high levels of expression of

34 correlated genes. Gene expression patterns in the corresponding substructures are shown for

HIF: (DG); Cornu Ammonis 1 (CA1); Cornu Ammonis 2 (CA2); Cornu

Ammonis 3 (CA3); Cornu Ammonis 4 (CA4) and Subiculum (S). For HY: Anterior

Hypothalamic Area (AHA); Lateral hypothalamic Area (LHA); Paraventricular Nucleus of the Hypothalamus (PVH); Supraoptic Nucleus (SO); Lateral Hypothalamic Area,

Mammillary Region (LHM); Mammillary Body (MB); Posterior Hypothalamic Area

(PHA); Supramammillary Nucleus (SuM); Tuberomammillary Nucleus (TM); Preoptic

Region (PrOR); Arcuate Nucleus of the Hypothalamus (ARH); Dorsomedial Hypothalamic

Nucleus (DMH); Lateral Hypothalamic Area, Tuberal Region (LHT); Lateral Tuberal

Nucleus (LTu); Perifornical Nucleus (PeF); Ventromedial Hypothalamic Nucleus (VMH).

For DT: Anterior Group of Nuclei (DTA); Caudal Group of intralaminar Nuclei (ILc);

Dorsal Lateral Geneiculate Nucleus (LGd); Lateral Group of Nuclei, Dorsal Division

(DTLd); Lateral Group of Nuclei, Ventral Division (DTLv); Medial Geniculate Complex

(MG); Medial Group of Nuclei (DTM); Posterior Group of Nuclei (DTP); Rostral Group of

Intralaminar Nuclei (ILr). For WM: Cc: ; Cgb: Cingulum bundle.

35

Figure S1. Quantile-Quantile (Q-Q) plots for GWAS datasets and combined meta- analysis. Comparison of the association results for each SNP (black dots) with those expected by chance (red line) in TGen1 (A), NIA-LOAD/NCRAD (B), Pfizer (C) the final meta-analysis (D) and in the ADNI replication dataset. In each dataset, the genomic inflation factor (λ) is shown. Values of λ between 0.9 and 1.1 are considered unbiased by the population structure.

Figure S2. Gene structure comparison between modules detected with real and permuted data. Gene coincidences between Meta-GW SN1 (49 genes, light grey circle) and Meta-GATES SN1 (48 genes, dark grey circle) are shown in a Venn diagram and compared with the total number of genes in the first 10 modules of each permuted analysis:

Meta-GW SN1 to SN10 (397 genes, light grey circle) and Meta-GATES SN1 to SN10 (354 genes, dark grey circle).

Figure S3. Glutamate signaling sub-networks overrepresented in AD. Meta-GW SN1 in conjunction with Meta-GATES SN1, and ADNI-GW SN3, ADNI-GW SN4, ADNI-GW

SN7 sub-networks are shown in A through D, respectively. Nodes represent genes and edges their corresponding interactions extracted from FPAN based upon the information in the STRING database. Network legend is provided at the bottom panel: the node color represents the OR behavior in a gradient from green to red values (i.e. green: OR<1; red

OR>1; white: OR = 1), denoting protection and risk, respectively. Similarly, node size and edge thickness are proportional to the -log10 p-value obtained in the meta-analysis (if absent, node size is the minimum) and the combined score of interaction. Asterisk in

GNAZ gene is a reminder that this gene is only present in Meta-GW SN1.

36

Table 1. Genome-Wide Association Studies main features. Datasets Samples Ethnicity Cases Controls Platform SNPs TGen1 1,364 Caucasian 829 535 Affymetrix 500K GeneChip 1,231,704a NIA-LOAD/NCRAD 1,680 Caucasian 978 702 Illumina 610-Quad 1,245,964a Pfizer 1,525 Caucasian 733 792 Illumina 550K, 610-Quad 439,113 Total Meta-analysis 4,569 Caucasian 2540 2029 - 1,216,213b ADNI (replication) 693 Caucasian 499c 194 Illumina 610-Quad 524,993 aImputed. bSNPs shared between the 3 studies. cAD + mild cognitive impairment (MCI) individuals.

37

Table 2. Meta-analysis 25 top hits. Chr BP SNP A1 A2 N P OR Gene 19 50,087,459 rs2075650 G A 3 8.54E-116 4.48 TOMM40, PVRL2 19 50,087,106 rs157580 G A 2 9.60E-35 0.51 TOMM40, PVRL2 19 50,106,291 rs439401 T C 2 8.82E-29 0.54 APOC1, APOE 19 50,073,874 rs6859 A G 3 7.87E-28 1.70 PVRL2 19 50,100,676 rs405509 G T 2 2.29E-27 0.57 TOMM40, APOE 19 50,093,506 rs8106922 G A 2 1.17E-25 0.57 TOMM40 19 50,021,054 rs10402271 G T 3 1.98E-17 1.46 BCAM 19 50,018,504 rs10405693 T C 2 2.83E-12 1.49 BCAM 19 50,074,901 rs3852861 T G 2 5.32E-11 0.64 PVRL2 19 49,857,752 rs714948 A C 3 2.03E-10 1.60 PVR 19 49,946,008 rs8103315 A C 2 1.87E-08 1.50 BCL3 19 50,054,507 rs377702 A G 3 1.99E-07 1.29 PVRL2 19 49,933,947 rs2927438 A G 2 3.32E-07 1.35 intergenic 19 50,421,115 rs346763 A G 2 3.67E-07 1.80 EXOC3L2 12 93,848,520 rs249153 C T 2 4.38E-07 1.41 intergenic 19 50,053,064 rs440277 A G 2 4.79E-07 0.76 PVRL2 19 50,043,586 rs1871047 G A 3 4.87E-07 0.79 PVRL2 5 30,214,770 rs13178362 C T 3 6.60E-07 0.75 intergenic 12 93,852,604 rs249166 T G 2 6.91E-07 1.40 intergenic 12 93,854,404 rs249167 A T 2 6.91E-07 1.40 intergenic 19 50,417,946 rs10422797 C T 2 1.02E-06 1.82 EXOC3L2 11 59,595,197 rs474951 G T 3 1.25E-06 0.79 MS4A3 11 59,593,673 rs528823 T C 3 1.55E-06 0.79 MS4A3 3 10,110,577 rs1552244 G A 3 1.63E-06 0.76 FANCD2, FANCD2OS 3 10,108,710 rs9849434 A G 2 1.88E-06 0.71 FANCD2, FANCD2OS Chr: Chromosome; BP: Physical Position (Base Pair, NCBI 36.3/Hg18); A1: Allele 1 (Affected); A2: Allele 2 (Reference); N: Number of datasets with information; p-value: Fixed effect model p-value; OR: Odd Ratio.

38

Table 3. GO terms enriched in Meta-GW and Meta-GATES top 3 Sub-Networks. SN GO ID GO Name GISN GIP TG p-value Meta-GW SN1 BP GO:0007215 glutamate receptor signaling pathway 11 46 49 2.67E-11 GO:0007610 behavior 17 497 49 1.02E-08 GO:0003001 generation of a signal involved in cell-cell signaling 12 332 49 1.03E-08 CC GO:0045202 synapse 21 461 49 2.87E-18 GO:0044456 synapse part 19 347 49 5.90E-18 GO:0031234 extrinsic to internal side of plasma membrane 9 50 49 2.13E-13 MF GO:0008066 glutamate receptor activity 13 24 49 1.21E-18 GO:0035254 glutamate receptor binding 8 22 49 2.03E-10 GO:0031683 G-protein beta/gamma-subunit complex binding 8 22 49 4.63E-10 SN2 BP GO:0007411 axon guidance 33 341 46 1.20E-09 GO:0040012 regulation of locomotion 14 435 46 1.16E-07 GO:0035385 Roundabout signaling pathway 3 3 46 3.81E-07 CC GO:0031252 cell leading edge 12 245 46 9.35E-11 GO:0005955 calcineurin complex 3 4 46 1.69E-07 GO:0031256 leading edge membrane 8 87 46 4.67E-07 MF GO:0005042 netrin receptor activity 4 4 46 6.30E-10 GO:0048495 Roundabout binding 3 4 46 6.77E-08 SN3 BP GO:0035385 Roundabout signaling pathway 3 3 45 1.60E-07 GO:0061364 apoptotic process involved in luteolysis 3 3 45 1.74E-07 GO:0021889 interneuron differentiation 4 11 45 3.15E-06 MF GO:0048495 Roundabout binding 3 4 45 9.46E-07 Meta-GATES SN1 BP GO:0007215 glutamate receptor signaling pathway 11 46 48 1.86E-11 GO:0007610 behavior 17 497 48 6.58E-09 GO:0003001 generation of a signal involved in cell-cell signaling 12 332 48 7.92E-09 CC GO:0045202 synapse 21 461 48 1.69E-18 GO:0044456 synapse part 19 347 48 3.70E-18 GO:0097060 synaptic membrane 16 202 48 2.55E-13 MF GO:0008066 glutamate receptor activity 13 24 48 1.21E-18 GO:0035254 glutamate receptor binding 8 22 48 1.14E-10 GO:0031683 G-protein beta/gamma-subunit complex binding 7 22 48 8.22E-09 SN2 BP GO:0008202 steroid metabolic process 11 269 50 2.13E-09 GO:0010876 lipid localization 13 221 50 3.81E-09 GO:0006869 lipid transport 12 195 50 8.12E-09 CC GO:0032994 protein-lipid complex 5 36 50 2.51E-07 MF GO:0048495 Roundabout binding 3 4 50 2.02E-06 GO:0071814 protein-lipid complex binding 4 23 50 3.22E-06

39

GO:1901681 sulfur compound binding 7 163 50 8.26E-06 SN3 BP GO:0046777 protein autophosphorylation 16 174 21 2.30E-12 GO:0040012 regulation of locomotion 10 435 21 6.55E-08 GO:0051270 regulation of cellular component movement 10 443 21 1.14E-07 CC GO:0045202 synapse 7 461 21 3.28E-06 MF GO:0019199 transmembrane receptor protein kinase activity 12 82 21 1.18E-12 GO:0019838 growth factor binding 9 99 21 3.42E-12 GO:0004713 protein tyrosine kinase activity 16 136 21 1.09E-10 SN: Sub-network; GO ID:Gene ontology term ID; GIP: Genes in population; GISN: Genes in sub-network; TG: Total genes in SN; BP: Biological process; CC:Cellular component; MF: Molecular function

40

Table 4. Glutamate positive sub-networks in ADNI-GW analysis. SN Type GO ID GO Name GISN GIP TG p-value SN3 BP GO:0035637 multicellular organismal signaling 19 483 48 5.87E-12 GO:0019226 transmission of nerve impulse 19 467 48 6.22E-10 GO:0007215 glutamate receptor signaling pathway 7 44 48 5.09E-08 CC GO:0044447 axoneme part 6 20 48 9.54E-11 GO:0044463 cell projection part 12 235 48 1.12E-10 GO:0008328 ionotropic glutamate receptor complex 6 21 48 3.24E-08 MF GO:0008066 glutamate receptor activity 5 19 48 1.78E-06 SN4 BP GO:0031280 Neg. regulation of cyclase activity 4 19 22 2.59E-06 GO:0051350 Neg. regulation of lyase activity 4 20 22 3.17E-06 GO:0030809 Neg. regulation of nucleotide biosynthetic process 4 23 22 2.38E-06 MF GO:0008066 glutamate receptor activity 6 19 22 7.14E-08 SN7 BP GO:0007215 glutamate receptor signaling pathway 7 44 40 5.09E-08 CC GO:0008328 ionotropic glutamate receptor complex 4 21 40 2.09E-05 MF GO:0008066 glutamate receptor activity 5 19 40 3.40E-08 SN: Sub-network; GO ID: Gene ontology term ID; GIP: Genes in population; GISN: Genes in sub-network; TG: Total genes in SN; BP: Biological process; CC: Cellular component; MF: Molecular function

41

Figure 1 Click here to download Figure:AD gdeferrari_figure Datasets 1.eps

TGen1* NIA-NCRAD Pfizer /LOAD*

QC, Inverse Imputation* ADNI Meta-Analysis Variance (MACH) (Replication) (PLINK)

SNP to Gene Gene-Wise p-value GATES p-value associations (GW) (GATES) (p<0.05)

Controls: FPAN 14,793 Genes -Permuted (STRING) 229,357 Interactions -WGW Score > 0.7

Subnetwork Search (Cytoscape JActive Modules) 10

Permuted Meta WGW

≠ =

Pathway evaluation (GO and KEGG) Figure 2 Click here to download Figure: gdeferrari_figure 2.eps

APOE locus

(p)

10 0 20 40 120 40 20 0

-log 0 5 1 5 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2122 Chromosome Figure 3 Click here to download Figure: gdeferrari_figure 3.eps

A B * * 40 6.5 * * 6 30 5.5 * * * * + 5 * + + 20 4.5 MeanScore

10 4 N° of significant of N° Subnetworks 3.5

0 3 S -GW SN1 SN2 SN3 SN4 SN5 SN6 SN7 SN8 SN9 SN10 WGW -GATE S Meta Permuted -GW WGW W PermutedMeta Meta -GATE ATES eta-G M Meta ta-G Me C D 40 6.5 + * * 6 + 30 + 5.5 + 5 + + 20 + + + + 4.5 MeanScore

4 10 N° of significant of N° Subnetworks 3.5

0 3

GW SN1 SN2 SN3 SN4 SN5 SN6 SN7 SN8 SN9 SN10 muted er GATES ADNI- P Permuted ADNI- NI-GW AD I-GATES ADN Figure 4 Click here to download Figure: gdeferrari_figure 4.eps

EDN1

GNA12

GPR112 TAAR2 SLC1A3 GPRC5B RAPH1 APP GRM5

ADORA1

DNMBP Meta SN1 APBB1 ADNI SN7

GFAP MAP2K4 EVL SUCLG1 GPR37L1 HOMER1 HOMER3 GNAS

RGS7 PIAS2 GRIK1

ROBO1 * ENAH GNAZ BSN GNAO1

GNB5 CTNND1 GNA14

RYR2 GRM6 FYB GNAQ FAT1 GNA15

HOMER2

LCP2 FAT3 CNR1 PLCB4 ARHGAP28

FJX1 GRM4

ESRP1 PRKCA CAMK2B

LAMA4 GRIA4 GRM1 MSRB3 GRM3 CAMK2G CAMK2D GRM7 GRIN2B ISCA2

GRM8 COMMD9 GRIN2A

ACYP1 ITPR1 CMIP GRID2 ALDH9A1 DNAH1 DNAH9 PDE7B LPHN3 GRIA2 Shared genes DNAH8 DNAH2 ALDH7A1 ACSS1 ABCA1 DNAH17 GRIA1

DEFB112 LSP1 CAMK2A DNAH3

E2F1 ACTG1 DLG2 PIAS1 BAI3 RIN2 DLGAP4 SHANK1

PPP1R9A CACNA1C ATP2B2 AKAP5 SHANK2 GRM2

CACNG2 TPTE2 CASP8 DLGAP1 SLC9A9 SH3GL2 RAB5A BIN1

KCND2 GRIK2 NEK10

BRIP1 XRCC4 ERBB4 CD209 SRC

RIN3 TOX3 GPR124 GRB10 DLG1

ADAM10 KCNAB1 BRCA2

SLC4A7 PAK2 FGFR2

APBA1

ADRB1 USH2A ADNI SN3 KCND3 ADNI SN4

KCNA2

GPR98 MTMR2

KCNJ10

CRB1 Figure 5 Click here to download Figure: gdeferrari_figure 5.eps Figure 6 Click here to download Figure: gdeferrari_figure 6.eps

A) HIF B) HY DG CA1 S CA2 CA3 CA4 SO SO LHM MB LHA LHT AHA SuM PrOR DMH VMH PVH PHA TM ARH PeF LTu

GRIA1 SHANK2 GRIA2 GRM8 CAMK2A r = 0.94 LPHN3 GRIN2B SLC9A9 GRM7 GRM7 AKAP5 GRID2 BAI3 BAI3 GRIN2A GRIA1 LSP1 GRIA2 GRM1 PIAS1 SHANK2 CAMK2A PIAS1 GRIN2B ATP2B2 SHANK1 SHANK1 LSP1 ITPR1 GRM3 GRM8 GRM1 GRM3 GRIN2A r = 0.70 LPHN3 r = 0.87 ITPR1 GRID2 AKAP5 SLC9A9 ATP2B2

C) DT D) WM ILc ILc LGd DTLd DTLv DTA DTM DTP ILr MG cc cgb ATP2B2 SLC9A9 GRM1 GRM3 GRIN2B PIAS1 AKAP5 LSP1 CAMK2A GRID2 ITPR1 GRIA1 GRIN2A GRIA2 LSP1 AKAP5 GRM8 GRIN2A GRM7 GRIN2B SLC9A9 CAMK2A GRM3 SHANK2 LPHN3 GRM8 r = 0.80 SHANK2 GRM1 SHANK1 GRM7 BAI3 BAI3 PIAS1 r = 0.87 LPHN3 GRIA1 ITPR1 GRID2 SHANK1 GRIA2 ATP2B2 Supporting Information Figure S1 Click here to download Supporting Information: gdeferrari_figure S1.eps

Supporting Information Figure S2 Click here to download Supporting Information: gdeferrari_figure S2.eps

Supporting Information Figure S3 Click here to download Supporting Information: gdeferrari_figure S3.eps

Supporting Information Table S1 Click here to download Supporting Information: TABLE S1.xlsx

Supporting Information Table S2 Click here to download Supporting Information: TABLE S2.xlsx

Supporting Information Table S3 Click here to download Supporting Information: TABLE S3.xlsx

Supporting Information Table S4 Click here to download Supporting Information: TABLE S4.xlsx

Supporting Information Table S5 Click here to download Supporting Information: TABLE S5.xlsx

Supporting Information Table S6 Click here to download Supporting Information: TABLE S6.xlsx

Supporting Information PRISMA Checklist Click here to download Supporting Information: PRISMA 2009 Checklist.pdf

Supporting Information PRISMA Flow Diagram Click here to download Supporting Information: PRISMA 2009 Flow Diagram.pdf

Revised Manuscript with Track Changes

Overrepresentation of Glutamate Signaling in aAlzheimer's Disease: Formatted: Justified Network-Based Pathway Enrichment Approach using Meta-analysis of Genome-Wide Association Studies in Alzheimer's Disease

Formatted: Font: Not Bold Short title: Glutamate signaling is overrepresented in AD GWAS

Formatted: Justified * *1 * *1 1 1 Eduardo Pérez-Palma , , Bernabé I. Bustos , , Camilo F. Villamán ,Villamán, Marcelo Formatted: English (U.S.) A. Alarcón1Alarcón,1,2, Miguel E. Avila1Avila,1,2, Giorgia D. Ugarte,1 Ariel E. Formatted: English (U.S.) Formatted: English (U.S.) 1 1 3 1,†,† Reyes ,Reyes, Carlos Opazo and Giancarlo V. De Ferrari , for the Alzheimer’s Disease Formatted: English (U.S.) Neuroimaging Initiative‡ and the NIA-LOAD/NCRAD Family Study Group Formatted: English (U.S.) Formatted: English (U.S.)

Formatted: English (U.S.) * These authors contributed equally to this work Formatted: English (U.S.) †Corresponding author Formatted: English (U.S.) Formatted: English (U.S.)

Formatted: English (U.S.) 1 Center for Biomedical Research and FONDAP Center for Genome Regulation, Faculty of Formatted: English (U.S.) Biological Sciences and Faculty of Medicine, Universidad Andres Bello, Santiago, Chile. Formatted: Justified 2 Faculty of Biological Sciences, Universidad de Concepción, Concepción, Chile. Formatted: Spanish (Chile) 3 3 Oxidation Oxidation Biology Laboratory, The Florey Institute of Neuroscience and Mental Health, The University of Melbourne, Melbourne, Australia.

† Address correspondence to: Correspondence: Formatted: Spanish (Chile) Giancarlo V. De Ferrari, PhD Center for Biomedical Research and FONDAP Center for Genome Regulation Universidad Andres Bello, Republica #239, PO Box 8370134, Santiago, Chile.

E-mail: [email protected] Formatted: Font: Calibri, 11 pt, Spanish (Chile) Phone number: (56-2) 7703249; FAX: (56-2) 6980414 FAX: (56-2) 6980414

1

‡Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply /ADNI_Acknowledgement_List.pdf

2

Abstract Formatted: Justified

Genome-wide association studies (GWAS) have successfully identified several risk loci for

Alzheimer's disease (AD). Nonetheless, these loci do not explain the entire susceptibility of the disease, suggesting that other genetic contributions remain to be identified. Here, we performed a meta-analysis combining data of 6,5784,569 individuals (3,3992,540 cases and

3,1792,029 healthy controls) derived from three publicly available GWAS in AD and replicated a broad genomic region (> 250248,000 bp) highly associated with the disease in the vicinity ofnear the APOE/TOMM40 locus in chromosome 19. To detect minor effect size contributions that could help to explain the remaining genetic risk, we conducted network-based pathway analyses either by extracting gene-wise p-values, (GW), defined as the single strongest association signal within a gene, or calculated a more stringent gene- based association p-value using the extended Simes (GATES) procedure. Comparison of these strategies revealed that five ontological sub-networks involved in glutamate signaling were significantly overrepresented in AD (Module Search Score > 5). Besides from finding twop < 2.7x10-11, p < 1.9x10-11; GW and GATES, respectively). Glutamate signaling sub- networks containing biological pathways related to lipid and apolipoprotein metabolism (p

< 5x10-6) previously associated with were further replicated in an additional public dataset from the Alzheimer’s disease, we identified three novel Neuroimaging Initiative (ADNI) consisting in 499 cases and 194 controls (p < 5.1x10-8). Interestingly, components of the glutamate signaling sub-networks highly enriched with ontological categories involved in glutamate receptor signaling (p < 1.5x10-6), which are coordinately expressed in phenotypedisease-related tissues and, which are tightly related to known pathological hallmarks of AD. Our findings suggest that genetic variation within glutamate signaling contributes to the remaining genetic risk of AD and support the notion that functional

3

biological networks, rather than genes acting alone, should be targeted in future therapies aimed to prevent or treat this devastating neurological disorder.

Formatted: Justified

4

Introduction

Alzheimer's disease (AD) [MIM 104300]) is the most common neurodegenerative disorder in the human population [1].[1]. Clinically, AD is characterized by a progressive loss of cognitive abilities and memory impairment. At a biological level, it is thought that the presence of extracellular deposits of the -amyloid peptide (A) and intracellular neurofibrillary tangles composed of hyperphosphorylated Tau protein leads to synaptic loss and neuronal death [1,2]. Genetically, AD is complex and heterogeneous [3,4].[1,2].

Genetically, AD is complex and heterogeneous.[3,4] A small percentage of AD cases (1–

2% of all cases) have an early-onset familial form of presentation, with symptoms appearing before 65 years of age, and most cases are late-onset or “sporadic” with no apparent familial recurrence of the disease [4].[4]. While familial-AD has been associated with mutations in the genes coding for the amyloid precursor protein (APP) and the presenilins (PSEN1 and PSEN2) proteins, the only genetic factor extensively replicated for sporadic AD is the apolipoprotein E-ε4 (APOE-ε4) allele [4-6][4-6], which is present in ca.

60% of the cases [1,7-9][1,7-9]. However, the APOE-4 allele is not causative, since it has been found in individuals that would not develop the disease, suggesting that other genetic contributions remainsremain to be identified.

During the past decade, the scientific efforts focused in identifying these genetic hallmarks reported more than 2,900 Single Nucleotide Polymorphisms (SNP) in close to within ~ 4,700 genes associated with AD [10][10] (see also AlzGene.org). More recently, the use of high density DNA genotyping microarrays in genome-wide associations studies

(GWAS), combined with powerful statistical procedures, have expanded the search for novel susceptibility loci for the disease [11].[11]. Nevertheless, these genetic approaches

5

currently exhibit some limitations. First, they present a high rate of false positives and require mayormajor sample sizes in order to be replicated, and in. In fact, different simulations showhave shown that authentic associations have only a 26% chance of falling into the top 1,000 p-values in a GWAS [12].[12]. Second, they only examine the association of a single genetic variant at a time, therefore failing in the detection of minor associations that can still be present and confer risk in a cumulative way. Finally, current top hits associated with AD, including the bridging integrator 1 (BIN1) [13], clusterin

(CLU) [13,14][14,15], the ATP-binding cassette sub-family A member 7 (ABCA7) [16], the complement component (3b/4b) receptor 1 (CR1) [14], the [15], and the phosphatidylinositol binding clathrin assembly protein (PICALM) [13], the ATP-binding cassette sub-family A member 7 (ABCA7) [15] and the bridging integrator 1 (BIN1)

[16][14], do not account for the entire genetic contribution toof the disease and still none of them supersedeor surpass the risk conferred solely by the APOE-ε4 allele.

Therefore, considering that the etiology of complex diseases might depend on functional protein-protein interaction networks [17,18][17,18], here we performed a meta- analysis on threefollowed by network-based pathway analyses on publicly available GWAS in AD and used significant genetic information to conduct a network-based pathway analysis that led us to the identification ofidentify glutamate signaling as a key ontological process, which is believed to be affected during the onset or developmentpathway of the disease.

Formatted: Justified

6

Subjects and Methods Formatted: Font: Not Bold

GWAS datasets included in the analysis

We selected three publicly available GWAS in AD performed on unrelated case-control and familial samples from European-descent populations (Table 1). The GWAS datasets are: 1i) the Translational Genomics Research Institute (TGen1) study on AD [19][19], including 829 AD cases and 535 control individuals; 2ii) the National Institute on Aging -

Late Onset Alzheimer's Disease and the National Cell Repository for Alzheimer's Disease

(NIA-LOAD/NCRAD) [20,21] study,[20,21], including 5,220 subjects from which we only considered for analysis a subset of 3,689 individuals (1,837 AD cases and 1,852 control individuals; and 3controls) that were self-declared non-Hispanic European Americans, passed principal components analyses and had non-missing phenotypes. Given that this subset is composed by familial data, using the provided family trees, we excluded all related controls and kept only one case per family giving a final number of 978 AD cases and 702 controls individuals that were eligible for our study; and iii) the Pfizer

Pharmaceutical Company (Pfizer) study on AD [16][13], including 733 AD cases and 792 control individuals. The TGen1 data was downloaded from the TGen website

(https://www.tgen.org/research/research-divisions/neurogenomics.aspx) and the NIA- Formatted: Spanish (International Sort) Formatted: Font: Calibri, 11 pt, Spanish LOAD/NCRAD data was retrieved through the database of Genotypes and Phenotypes (Chile) Formatted: Spanish (International Sort) (dbGaP; http://www.ncbi.nlm.nih.gov/gap) [22]http://www.ncbi.nlm.nih.gov/gap) [22] of Formatted: Spanish (International Sort) the National Institute of Health (NIH), under the accession number phs000168.v1.p1. The

Pfizer data was gathered from the supplementary information accompanying the original publication [16].[13]. Detailed information regarding recruitment, diagnosis, affection status and age at the time of enrollment can be found in the original studies

[16,19,21].[13,19,21]. Written informed consent was obtained for all participants and prior

7

Institutional Review Board approval was obtained at each participating institution.

Additionally, data used in the preparation of this article were obtained from the

Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (http://adni.loni.usc.edu).

The ADNI was launched in 2003 by the National Institute on Aging (NIA), the National

Institute of Biomedical Imaging and Bioengineering (NIBIB), the Food and Drug

Administration (FDA), private pharmaceutical companies and non-profit organizations, as a

$60 million, 5-year public-private partnership. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). Determination of sensitive and specific markers of very early AD progression is intended to aid researchers and clinicians to develop new treatments and monitor their effectiveness, as well as lessen the time and cost of clinical trials. The

Principal Investigator of this initiative is Michael W. Weiner, MD, VA Medical Center and

University of California – San Francisco. ADNI is the result of efforts of many co- investigators from a broad range of academic institutions and private corporations, and subjects have been recruited from over 50 sites across the U.S. and Canada. The initial goal of ADNI was to recruit 800 subjects but ADNI has been followed by ADNI-GO and

ADNI-2. To date these three protocols have recruited over 1500 adults, ages 55 to 90, to participate in the research, consisting of cognitively normal older individuals, people with early or late MCI, and people with early AD. The follow up duration of each group is specified in the protocols for ADNI-1, ADNI-2 and ADNI-GO. Subjects originally recruited for ADNI-1 and ADNI-GO had the option to be followed in ADNI-2.

Formatted: Line spacing: single, Pattern: Clear, Tab stops: Not at 3.07"

8

Imputation Formatted: Justified, Tab stops: Not at 3.07" Formatted: Font: Not Bold The In order to maximize information on linkage disequilibrium (LD) structure between the studies, the TGen1 dataset wasand the NIA-LOAD/NCRAD datasets were imputed by comparison with the CEU reference panel (unrelated individuals) from the HapMap III phased data (release 2) [23][23]. In order to maximize information on linkage disequilibrium (LD) structure between the studies, imputation was carried out using the

Markov Chain Haplotyping method implemented in MACH 1.0 following author recommendations [24].

. Imputation was carried out using the Markov Chain Haplotyping method implemented in

MACH 1.0 following author recommendations [24].

Formatted: Justified, Indent: Left: 0", Hanging: 0.49" Association analysis

Quality control (QC) procedures such as minor allele frequency (MAF), Hardy-Weinberg equilibrium (HWE), missing rate per individuals (MIND) and per SNPs (GENO) were performed on the NIA-LOAD/NCRAD dataset using PLINK v.1.07

(http://pngu.mgh.harvard.edu/purcell/plink/) [25] with threshold values of 0.05, 1x10-6,

0.02 and 0.02, respectively. Similarly, the TGen1 dataset was filtered using a MIND value of 0.05. We applied a logistic regression analysis, using an additive model on the imputed

TGen1 data with MACH2DAT [24]. SNPs with r2 values less than 0.29 were removed from further analysis. The association analysis of the NIA-LOAD/NCRAD dataset included corrections for ancestry origins and family relatedness among individuals using information from the principal component analysis and by kinship estimations [21]. Kinship coefficients were calculated with Kinship v.2.0

(http://faculty.washington.edu/wijsman/software.shtml), using a set of 25,728 SNPs

9

selected to be maximally informative and relatively-uniformly spaced (minimum interval between markers ca. 100 kb) on the reference genomic sequence (Build 36.3/Hg18) [26].

Similarly, in the Pfizer dataset, the standard error (SE) per SNP was estimated from the p- values reported in the study [16]. Briefly, p-values were transformed into the corresponding

Z score with the INVNORMAL function implemented in STATA Statistical Software v.10

(StataCorp, College Station, TX), and then the SE was calculated taking the log of the odds ratios (OR) divided by the corresponding Z score. Finally, the results for each dataset was assessed for genomic inflation, which was normalized near 1 and visualized in Quantile-

Quantile (Q-Q) plots using the statistical software R (www.r-project.org) [27].

Meta-analysis

Meta-analysis was performed using the inverse variance method implemented in PLINK v.1.07 [25]. We checked that all statistics values: i.e. p-values, OR and SE, for each dataset prior the meta-analysis were computed for the same allele. The annotation of the results was done with the RefSeq Genes for the human genome assembly Build 36.3/Hg18 available at the UCSC Table Browser [28] using own Perl scripts (available upon request).

In order to consider a SNP inside a gene we defined a threshold of +/- 5,000 bp relative to the transcription start and end sites. The complete annotated output of the meta-analysis is available as Table S1. PRISMA guidelines were followed (showed in Checklist S1) [29].

Quality control (QC) procedures such as minor allele frequency (MAF), Hardy-Weinberg equilibrium (HWE), missing rate per individuals (MIND) and per SNPs (GENO) were performed on the TGen1 and the NIA-LOAD/NCRAD dataset using PLINK v.1.07

(http://pngu.mgh.harvard.edu/purcell/plink/) [25] with threshold values of 0.05, 1x10-6,

0.05 and 0.02, respectively. We applied a logistic regression analysis, using an additive

10

model on the imputed datasets data with MACH2DAT [24]. SNPs with r2 values less than

0.29 were removed from further analysis. Similarly, in the Pfizer dataset, the standard error

(SE) per SNP was estimated from the p-values reported in the study [13]. Briefly, p-values were transformed into the corresponding Z score with the INVNORMAL function implemented in STATA v.10 (StataCorp, College Station, TX), and then the SE was calculated taking the log of the odds ratios (OR) divided by the corresponding Z score. In the ADNI replication dataset, we performed QC and association analysis based on a quantitative trait locus (QTL) method as described previously [26]. MAF, HWE, MIND and GENO QC values of 0.05, 1x10-6, 0.1 and 0.1 were applied, respectively. In order to control for population stratification we conducted Principal Component Analysis (PCA) with EIGENSTRAT [27]. After this step, 63 individuals were excluded from further analysis, leaving 693 individuals from the initial 756 subjects. The QTL association analysis was carried out using an additive genetic linear regression model with PLINK using different co-variables including age at baseline visit, education, gender and APOE status (ε4 allele present or not). Finally, the results for each dataset was assessed for genomic inflation and visualized in Quantile-Quantile (Q-Q) plots using the statistical R

(www.r-project.org) [28].

Meta-analysis

Meta-analysis was performed using the inverse variance method implemented in PLINK v.1.07 [25]. We checked that all statistics values (p-values, OR and SE) for each dataset prior the meta-analysis were computed for the same allele. Annotation of the results was done with the RefSeq Genes for the human genome assembly Build 36.3/Hg18 available at the UCSC Table Browser [29] using own Perl scripts (available upon request). In order to

11

consider a SNP inside a gene we defined a threshold of +/- 5,000 bp relative to the transcription start and end sites. The annotated output of the meta-analysis used for the pathway approach is available in Table S1. PRISMA guidelines were followed (showed in

Checklist S1) [30].

Single-gene p-value generation

Genetic association values with a cut-off threshold of p < 0.05, from either the Meta- analysis or the ADNI replication dataset, were transformed into single-gene associations using two independent approaches: i) Selection of a gene-wise p-value (GW) [17], defined as the single strongest association inside a gene; and ii) Calculation of a gene-based association p-value using the extended Simes procedure (GATES), which extract a gene- level association from the combination of the SNPs p-values within a gene. This approach does not relies on genotype or phenotypic data and has been shown to correct type 1 error rates in both simulated and permuted datasets, regardless of the gene size or LD structure

[31]. GATES is an open-source tool named Knowledge-Based Mining System for Genome- wide Genetic Studies (KGG; http:// http://bioinfo1.hku.hk:13080/kggweb/).

Formatted: Font: Bold Formatted: Justified Functional Protein Association Network (FPAN)

Meta-analysis results with a threshold of p < 0.05 were transformed into a single-gene associations by two parallel analysis: i) Selection of a gene-wise p-value (Meta-AD) [17]; and ii) Calculation of a gene-based association p-value using the using the extended Simes procedure (Meta-AD-GATES) [30]. We subsequently introduced these single-gene p- values as a floating point attribute into the complete human functional protein association network (FPAN), which contains highly curated known and predicted interactions from the

12

STRING 9.0 database (Search Tool for the Retrieval of Interacting Genes/Proteins) [31].

Briefly, in the Meta-AD analysis we considered data with a score of interaction above 0.7, which stand for high confidence interactions composed by 14,793 nodes (genes) and

229,357 edges (interactions). Alternatively, in the Meta-AD-GATES analysis we considered only interactions with a score above 0.9 (highest confidence) yielding 9,955 nodes and 59,675 edges. Sub-networks were analyzed and visualized using Cytoscape software [32]. The complete set of gene-wise and GATES p-values introduced to FPAN is provided as Table S3.

To evaluate single-gene associations in a network context the complete human functional protein association network (FPAN) was retrieved from the STRING 9.0 database (Search

Tool for the Retrieval of Interacting Genes/Proteins; http://string-db.org/) [32]. FPAN contains highly curated known and predicted interactions emerging from different evidence channels such as: genomic context, co-expression and curated literature. Raw text- formatted protein-protein functional interactions were downloaded from STRING. To avoid redundancy and false positives, alternative proteins and their interactions were consolidated into one gene using own Perl scripts (available upon request). We kept only the interactions with a combined score > 0.7 (provided by STRING), which stand for high confidence interactions. The final FPAN generated was composed by 14,793 nodes (genes) and

229,357 edges (interactions) and was introduced as an input to Cytoscape [33], which is an open source platform for visualizing complex networks that not only allows the integration of additional attribute data (i.e. gene annotations, expression profiles and interactions source and confidence), but also provides a comprehensive set of tools to perform

13

integrated pathways analysis. Thus, the p-values of GW and GATES procedures were introduced as a floating-point attribute into the FPAN (Table S2).

Module search (subSub-networks) Formatted: Justified Formatted: Font: Not Bold, Font color: Red The module search was carried out with the Cytoscape JActive Modules Plug-in with an

[34] with a gene overlap threshold of 50 %%. JActive modules is designed to detect if a certain group of connected nodes are significantly enriched with a statistical parameter such as the single gene p-value, which in our study comes from either the meta-analysis or the

ADNI replication dataset. Briefly, starting from one node a sub-network grows to its connected genes by computing an aggregated score (S) derived from the conversion of the single-gene p-value (if present) into their corresponding z-score (with the inverse normal cumulative distribution function). This score is compared internally with a background distribution created from the scores of 10,000 random modules of the same size in a Monte

Carlo procedure. If the aggregated score cease growing above the expected by chance, the algorithm stops and the growing sub-network is reported as a result. As in the original publication, modules with S > 3 (3 standard deviation above the mean of randomized scores) and with a size below 50 were aconsidered significant sub-network is defined as any module with a score > 3 and a maximum number of genes involved of fifty [17][17].

To standardize the scores aggregated S score obtained byin each sub-network (derived from a Monte Carlo procedure) [33] the search was performed 10 times for each analysis (Meta-

ADGW, Meta-AD-GATES). The, ADNI-GW and ADNI-GATES). Finally, the same procedure was conducted with: their corresponding permuted p-values over the entire genes present in the FPAN (Meta-AD-Permuted, Meta-AD-GATES-Permuted analysis) and without genome wide (WGW) significant results (p-values <10-8)), in this case with real

14

(Meta-AD-WGW, Meta-AD-GATES-WGW) and and permuted data (Meta-AD-WGW-

Permuted, Meta-AD-GATES-WGW-Permuted),, respectively. (WGW analysis). Statistical differences between permuted and non-permuted analyses were assessed through two-sided t-test.

Formatted: Line spacing: single, Pattern: Clear Formatted: Font color: Auto

15

Gene Ontology (GO) and KEGG pathway enrichment Analysis Formatted: Justified

To examine if the structure of significant sub-networks were biologically meaningful, gene lists of each module were tested against all genes present in the corresponding FPAN

(Meta-AD = 14,793; Meta-AD-GATES = 9,955) for GO-term enrichment with the package Ontologizer [34]. We used the ontology structure and annotations provided by the

GO website [35], only considering categories with less than 500 members to avoid associations to major categories that are less informative (i.e. signaling) and excluding the ones "Inferred by Electronic Annotation" (IEA), from "Reviewed Computational Analysis"

(RCA) and with "No biological Data available" (ND), which are characterized by a high rate of false positives [36]. We finally used the parent-child-union algorithm to call for overrepresentation adjusting the p-values with a Westfall-Young Single-Step multiple test correction to avoid additional false positives [37] considering a GO term significantly over- represented when the adjusted p-value was below 0.01.

To examine if the structure of significant sub-networks obtained in the Meta or the ADNI replication dataset were biologically meaningful, gene lists of the first 10 significant modules were tested for pathway enrichment using information from Gene Ontology (GO; http://geneontology.org/) [35] and the Kyoto Encyclopedia of Genes and Genomes database

(KEGG; http://www.genome.jp/kegg/ ) [36]. We initially used the ontology structure and annotations using the package Ontologizer [37], only considering categories with less than

500 members to avoid associations to major categories that are less informative (i.e. signaling) and excluding the ones "Inferred by Electronic Annotation" (IEA), from

"Reviewed Computational Analysis" (RCA) and with "No biological Data available" (ND), which are characterized by a high rate of false positives [38]. In this case, we used the parent-child-union algorithm to call for overrepresentation adjusting the p-values with a

16

Westfall-Young Single-Step multiple test correction, to avoid additional false positives

[39], and considering a GO term significantly over-represented when the adjusted p-value was below 0.01. Similarly, to determine overrepresentation, KEGG pathway enrichment was assessed in the complete set of pathways and components [40], using an hypergeometric test with the phyper function contained in the R statistical package [41].

Formatted: Font: Bold Formatted: Justified Gene expression heatmaps and cluster analysis

To evaluate the expression pattern of selected genes of interest from the network-based analysis, human gene expression profiles were downloaded from the Allen Brain Atlas

(ABA) website (http://www.brain-map.org) [38].[42]. We used the Gene Search web-tool Formatted: Font: Calibri, 11 pt, Spanish (Chile) to enter a list of genes arising from the intersection of sub-networks and analyzed their expression profiles through 27 brain regions. We averaged the expression levels from 46 brain donor individuals (ids. H0351.2001, H0351.2002, H0351.1009, H0351.1012,

H0351.1015 and H0351.10121016) and used the collapseRows R script [39][43] to generate a gene-wise expression dataset. Expression heatmaps and hierarchical clusters were analyzed using Cluster v.3.0 software (http://www.geo.vu.nl/~huik/cluster.htm) and Formatted: Font: Calibri, 11 pt, Spanish (Chile) visualized with the aid of JavaTreeView v.1.1.6r2 software

(http://jtreeview.sourceforge.net) [40]. Identification of genes co-expressed and correlation Formatted: Font: Calibri, 11 pt, Spanish (Chile) analyses were performed with Cluster v.3.0 software[44]. Identification of genes co- expressed and correlation analyses were performed with Cluster v.3.0, using Euclidean distance in conjunction with centroid linkage algorithms, and a correlation coefficient cutoff of r > 0.7 to denote highly correlated gene clusters. Formatted: Font: Bold

17

Formatted: Justified

18

Results

Meta-Analysis

The complete strategy implemented in the present study is shown in Figure 1 Formatted: Justified, Indent: First line: 0.49" and. General features of the datasets used for the meta-analysis (TGen1, NIA-

LOAD/NCRAD and Pfizer) and the replication step (ADNI) are described in Table 1. A totalThe genetic information of 6,5784,569 individuals (3,3992,540 AD cases and

3,1792,029 controls) werewas considered after passing QC thresholds based on MAF,

HWE, MIND and GENO parameters calculated in PLINK (see Methods). To increase sample size and thus the statistic power of the meta-analysis, we implemented the following procedures: First, in the TGen1 data we imputed a total of 1,389,697 SNPs, since it only had 30 % of overlapping SNPs with respect to NIA-LOAD/NCRAD and Pfizer datasets, which were genotyped with different platforms (Affymetrix vs. Illumina, respectively). The output resulted in a total of 1,231,704 QC-passing SNPs (TGen1, Table 1). Second, the

NIA-LOAD/NCRAD dataset, consisted in 5,220 subjects from which we only considered for further analysis a subset of 3,689 individuals (1,837 cases and 1,852 controls) that were self-declared non-Hispanic European Americans, passed principal components analyses and had non-missing phenotypes. Given that this subset is composed by familial data, we performed a kinship correction, since it has been suggested that is the most accurate procedure to counter the type-I error due cryptic relatedness [21]. Finally, to account for bias still present after QC procedures, the SNPs. Additionally, we imputed a total of

1,231,704 and 1,245,964 QC-passing SNPs for the TGen1 and NIA-LOAD/NCRAD datasets, respectively. To account for bias still present after QC procedures, SNP association p-values were further assessed for genomic inflation, which is represented in Q-

Q plots (Figure 2). Notably, allS1). All the datasets yielded an inflation factor (λ) between

19

the acknowledged margins of 0.9 to 1.1, where the contribution of population structure to the genome-wide association is negligible [41].[45]. Taking into account these considerations, we performed a meta-analysis using the inverse variance method and, selecting p-values and ORORs from the fixed effects model, assuming that these studies have been conducted under similar conditions and subjects [42-44]. As shown in Figure

2D, the[46-48]. The combined analysis showed a normal distribution of the p-values with an excess of significant signals seen only at the end of the curve, indicating likely true association events (λ = 1.05, Figure S1).

Whole-genome meta-analysis results are depicted as a Manhattan plot (Figure 32), Formatted: Justified

-8 with a significance threshold defined above –log10(5x10 ), which marks the beginning of genome-wide significant values [45].[49]. In agreement with current reports, the strongest associations were located in a broad genomic region (> 250,000 bp) in the vicinity of the

APOE locus in chromosome 19 (Table 2). In particular, highly significant genome-wide associations signals were observed in the coding region of the translocase of the mitochondrial outer membrane gene (TOMM40: rs2075650, p = 3.9x10-1328.54x10-116, OR

= 3.1; 4.48; rs157580, p = 9.6x10-35, OR = 0.51 and rs8106922, p = 4.5x10-34, OR = 0.6 and rs157580, p = 2.4x10-33, OR = 0.6), 1.17x10-25, OR = 0.57), upstream of the apolipoprotein

C-I gene (APOC1: rs439401, p = 8.82x10-29, OR = 0.54), inside the poliovirus receptor related 2 isoform delta gene (PVRL2: rs6859, p = 7.87x10-28, OR = 1.7 and rs3852861 p =

5.32x10-11, OR = 0.64) and between TOMM40 and the APOE gene (rs405509, p = 1.3x10-

34, OR = 1.7) and upstream of the APOC1 gene (rs439401, p = 1.0x10-25, OR =

0.6).2.29x10-27, OR = 0.57). In addition, in the same chromosomal region we observed genome-wide association signals inside the poliovirus receptor related 2 isoform delta gene

(PVRL2: rs6859, p = 4.1x10-34, OR = 1.6; rs440277, p = 1.3x10-7, OR = 0.8),, downstream

20

of the basal cell adhesion molecule isoform 1 gene (BCAM: rs10402271, p = 4.2x10-

161.98x10-17, OR = 1.446 and rs10405693, p = 2.83x10-12, OR = 1.49) and in the region of the B-cell CLL/lymphoma 3 gene (BCL3: rs8103315, p = 4.4x10-91.87x10-8, OR = 1.45).

We note that the strength of the association signal for some SNPs is derived from the Pfizer and NIA-LOAD/NCRAD datasets only, since these were either not genotyped in the

TGen1 sample or poorly imputed due to the intrinsic array density in that particular region of chromosome 19 in the Affimetrix GeneChip 500K (See "N" column, Table 2).

Interestingly, among the top 2025 associations we detected 2, novel genome-wide marginally significant signals outside chromosome 19 were observed: in chromosome 11,

12 (intergenic: rs249153, p = 4.38 x10-07, OR = 1.41; intergenic: rs249166 and rs249167, both with p = 6.91 x10-07 and OR = 1.40) and in chromosome 5 (intergenic: rs13178362, p

= 6.60x10-07, OR = 0.75), followed by trends inside the membrane-spanning 4-domains, subfamily A, member 3 (hematopoietic cell-specific) genegene in chromosome 11

(MS4A3: rs474951, p = 1.7x10-725x10-6, OR = 0.8;79 and rs528823, p = 2.1x10-7, OR =

0.8); followed by trends in an intergenic region of chromosome 12 (rs10860303, p =

-7 -6 9.4x10 1.55x10 , OR = 0.879) and also within the MDS1 and EVI1 complex locusFanconi Formatted: Default Paragraph Font, Font: Calibri, 11 pt, Spanish (Chile) anemia group D2 gene in chromosome 3 (MECOM: rs9809961, p = 3.2x10-6, OR = 1.3). Formatted: Default Paragraph Font, Font: Calibri, 11 pt, Spanish (Chile) Formatted: st, Font: Calibri, 11 pt, Spanish Finally, suggestive associations within current top genes associated with AD (see (Chile) alzgene.org) were observed for PICALM (rs3851179, p = 2.9x10-5FANCD2: rs1552244, p

= 1.63x10-06, OR = 0.9) and MS4A6A (rs61093276; rs9849434, p = 1.6x10-488x10-06, OR

= 0.9) (Table S271).

21

Pathway Analysis Formatted: Justified

To test the hypothesis that highly-connected sub-networks enriched with minor associations might be significantly overrepresented in AD, we performed a gene-oriented pathway analysis by loading the meta-analysis dataresults into a high confidence functional protein association network (FPAN), following 2 gathered from the STRING database [32].

First, to avoid noise, the analysis was restricted to SNPs with p-value < 0.05 (Table S1).

Second, we calculated whole-gene association values using two alternative approaches: (i)

Meta-AD: based on the extraction of a gene-wise p-value (i.e., corresponding to the single strongest association signal within a gene [17]),(Meta-GW) [17]; and (ii) the derivation of a more stringent gene-based p-value using the extended Simes test (Meta-GATES), which was integratedcombines all association signals within a gene and controls for the bias that could be generated by gene-size or LD structure among markers [31]. Thus, we observed

66,204 SNP association p-values < 0.05 tagging 7,527 genes. Of these, we loaded only

4,891 and 4,647 p-values (from GW and GATES procedures, respectively) into FPAN using STRING scores > 0.7 and resulting in (Table S2), which was composed by 14,793 genes withhaving 229,357 non-redundant high-confidence interactions; and (ii) Meta-AD-

GATES: based on the derivation of a more stringent gene-based association p-value, using the extended Simes test (GATES), which offers an effective control of type-I error regardless of gene-size and LD structure among markers [30]. This time we integrated the data into FPAN using STRING scores > 0.9 to account only for the highest confidence interactions (note that not all genes with minor association values are informative in the database obtaining 9,955 genes with 59,675 interactions. Formally, in either approach

FPAN). Third, we conducted a sub-network search with the Cytoscape JActive Modules

Plug-in [33] (see Methods), restricting the analysis to those genes with p-values < 0.05 and

22

adding this information as a floating-point attribute to the genes present in the FPAN. We repeated this procedureusing the information from GW and GATES procedures permuting the p-values 10,000 times to obtain a create an expected background distribution. Fourth, we obtained the mean score for each analysis (i.e. Meta-AD, Meta-AD-GATES). NextGW and GATES by repeating the search 10 times using a Monte Carlo procedure. Fifth, the corresponding above results were controlled by 2 additionalfurther module searches (see

Figure 1), this time including permuted data andor results without genome wide (WGW) significance. While in the former search, the p-values were permuted 10 times over the entire FPAN to determine if the sub-network structure and its score could be obtained by chance (Meta-ADGW-Permuted, Meta-AD-GATES-Permuted); in the latter, we discarded the possibility that the sub-networks could be the result of bias due to the strong genome- wide significant p-values bias (i.e.within the APOE locus), conducting the same number of searches without these signals with real (Meta-AD-WGW, Meta-AD-GATES-WGW) or permuted data (Meta-AD-WGW-Permuted, Meta-AD-GATES-WGW-Permuted) (.

The result of the global search for overrepresented modules in AD is presented in

Figure 4).

First, comparison between the gene-wise and GATES procedures allowed us to Formatted: Justified obtain a similar number of SNP associations tagging the same number of genes: 3,214 vs.

3,164, respectively (Table S3). Second, . First, the total number of significant sub-networks obtained in the Meta-ADGW analysis (average number = 4732.3, SD = 6.2814) was significantly higher (p = 1.4x10-62.51x10-7), than those arising from chance (Figure 4A3A;

Meta-AD-GW Permuted; average = 22.113.7, SD = 5.974.05). Likewise, the total number of significant sub-networks in the Meta-AD-GATES analysiswas similar to Meta-GW

(average number = 22.534.9, SD = 1.64) was0.99) and significantly higher (p = 3.43x10-

23

5),1.35x10-9) than those obtained by chance (Figure 4C3A; Meta-AD-GATES- Permuted; average = 16.6.1, SD = 2.98). Third7.99). Second, the number of sub-networks was network scores in each procedure were always significantly higher (p = 2.3x10-6) when comparing the Meta-AD-WGW among real vs. the Meta-AD-WGW-permuted results and such differences repeated again using the GATES procedure (p = 2.5x10-4). Fourth, we observed no differences between the numbers of sub-networks detected either in the Meta-

AD or Meta-AD-GATES when including WGW searchesdata (Figure 4A and C),3B).

Third, the number and scores of the modules obtained in the WGW control was similar to the ones obtained with the whole set of associations, indicating that the strongest associations of the meta-analysis did not influence the number of significant modules. present observations (Figure 3A and B). Fourth, the modules obtained in the Meta-GW and

Meta-GATES searches, remained consistent in significance and structure across iterations, changing only in their respective rank/score. Remarkably, the most significant sub-network detected in each approach (Meta-GW SN1: S = 6.14, p-value = 8.16x10-8; and Meta-

GATES SN1: S =5.62, p-value = 1.77x10-5; Figure 3B and Table 3) was identical in structure differing only in the presence of the guanine nucleotide binding protein (G protein), alpha z polypeptide gene (GNAZ), which was absent in Meta-GATES SN1 (Table

S3). Finally, we note that the sub-networks scores in each procedure were significantly higher among real vs. permuted data (Figure 4B and 4D) and that the list of genes included couldcontained in each sub-network was not be replicated in the permutation analysis.analyses and that only 2 out of 688 permuted genes were also seen either in Meta-

GW SN1 or Meta-GATES SN1 (Figure S2). Altogether, these results indicate that the quantity, significance and structure of the modules identified could not be reached by

24

chance, strongly suggesting that these sub-networks could be biologically meaningful in the etiology of AD.

Gene

Glutamate signaling is overrepresented in AD

To examine the above-mentioned hypothesis, gene ontology (GO) term enrichment Formatted: Justified amongwas assessed in the top-5 10 sub-networks identified during the iteration process inin each analysis using the gene-wise procedure (Meta-AD, SN1 through SN5;package

Ontologizer (see Methods). Table 3 presents the top 3 Meta-GW and Figure S1), which remained consistent in significanceMeta-GATES sub-networks, as a function of biological process (BP), cellular component (CC) and structure and that changed only their respective rank/score in the Meta-AD-WGW,molecular function (MF) categories. Meta-GW results indicated that: (i) SN1 was heavily composed by genes acting at the synapse (17/328, p =

9.4x10-1721/461, p = 2.87x10-18), participating in the glutamate receptor signaling pathway

(11/46, 2.67x10-11) and specifically related to glutamate receptor activity (1113/24, p =

5.7x10-16); (ii)1.21x10-18); SN2 was mostly over-represented by genes belonging to cell projection organization (13/225the axon guidance biological process (33/341, p =

1.9x1020x10-9), located mostly at the cell leading edge (10/231, p = 1.8x10-8) and the netrin class of receptors (3/4, p = 3.9x10-7), which are involved in axon guidance; (iii) 12/245, p =

9.35x10-11); and SN3 was over-represented by integrin mediated signaling (10/69, p =

4,3x10-11); (iv) SN4 has a strong core of genes participating in protein dephosphorylation

(16/131, p = 8.5x10-17); and (v) SN5the roundabout signaling pathway (3/3, p = 1.6x10-7).

On the other hand, the results with the alternative and more stringent Meta-GATES procedure showed that SN1 had identical ontological enrichment patterns as observed for

Meta-GW SN1, being glutamate receptor activity the most significant category (13/24,

25

1.69x10-18). Interestingly, Meta-GATES SN2 contained several genes involved in cell metabolism in GO categories such as sterol metabolic and lipid biosynthetic processes

(15/112, p = 5.3x10-10 and 13/460, p = 1.5x10-6, respectively). lipid metabolism including categories such as steroid metabolic process and lipid localization (11/269, p = 2.13x10-9 and 13/221, p = 3.81x10-9, respectively). Finally, Meta-GATES SN3 was composed of genes participating in transmembrane receptor protein kinase activity (12/82, p = 1.18x10-

12), growth factor binding (9/99, p = 3.42x10-12), protein autophosphorylation (16/174,

2.30x10-12) and located mainly at the synapse (7/461, 3.28x10-6). Sub-networks features and components are described in Table S3 and the complete set of ontologies overrepresented in the first top 10 sub-networks is provided in Table S4.

Similar GO term enrichment analysis with the alternative and more stringent

GATES procedure (i.e. Meta-AD-GATES, SNG1 through SNG3; Table 3 and Figure S2) revealed that glutamate signaling was again significantly overrepresented in two out of the three significant sub-networks (Figure 4D). Indeed, glutamate receptor signaling pathway, as a GO biological process, was the sole component in the SNG1 module (6/40, p = 1,5x10-

6) and the third significant GO category in the SNG2 sub-network (8/40, p = 9.7x10-8), which was also composed by fluid transport (7/32, p = 1.2x10-8) and multicellular organismal signaling (16/412, p = 2.5x10-8) processes. Likewise, glutamate receptor activity was also found overrepresented as a molecular function in the SNG2 sub-network

(6/18, p = 4.8x10-8), along with cyclase and lyase activities (5/12, p = 1.6x10-9 and 5/72, p

= 2.3x10-5, respectively). Finally, a full set of lipid metabolic categories was found to be significantly over-represented in the SNG3 sub-network, including regulation of plasma lipoprotein particle levels (10/42, p = 2.6x10-13) and apolipoprotein binding (7/14, p =

2.4x10-12), confirming and extending previous GO categories observed with the gene-wise

26

p-value strategy (SN5). Gene composition of the significant sub-networks obtained is presented in Table S4 and their interactions arising from the FPAN are depicted in Figure 5.

Logical relationships of gene components between SN5 and SNG3 modules, related to lipid/apolipoprotein metabolism showed that despite the large number of genes included in each sub-network (49 and 44 genes, respectively; Table S4) there were only two genes:

APOE and ABCA1, shared among them (Figure 5 and Figure S3A). Conversely, logical relationships among sub-networks SN1, SNG1 and SNG2, related to glutamate signaling, revealed that there were 20 components shared by at least 2 modules and 5 genes between all three modules (Figure 5 and Figure S3B). These 20 genes of interest included the ionotropic glutamate receptors GRIN2A, GRIN2B and GRIK1, the metabotropic glutamate receptors GRM1, GRM3, GRM4, GRM7 and GRM8, its downstream effectors HOMER1 and HOMER2, as well as the scaffold proteins SHANK1 and SHANK2, which are required for proper formation and function of neuronal synapses [46].

Replication of glutamate signaling in the ADNI dataset

Considering that glutamate signaling pathway components were consistently present in significant sub-networks enriched with minor associations to AD, both in the Meta-GW and Meta-GATES analyses, and since both procedures were originated from a single set of

SNP associations, we next interrogated the ADNI dataset under the same pipeline (Figure

1), as an attempt to replicate the results in an independent sample of AD individuals. This additional dataset was composed of 693 subjects of which 499 were cases and 194 were controls (Table 1). Genetic association values were calculated replicating the quantitative

27

trait locus (QTL) method reported in the original study [26], which is based on the composite memory score, a measure of the level of memory impairment, reported for each patient (see Methods). Although, the ADNI case cohort includes subjects with mild cognitive impairment (MCI), the phenotype is considered a transitional state with significant risk of progression to clinically diagnostic AD [50], which validates their inclusion. After the corresponding QC procedures, the ADNI dataset showed no significant genomic inflation (λ = 1.02, Figure S1). According to was described in the original publication, our results indicate that QTL testing yielded 25,785 SNP associations (p-value

<0.05), tagging 4,915 genes (Table S5), did not reach genome wide significant levels.

Marginal associations were observed within the dual specificity phosphatase 23 (DUSP23) gene in chromosome 1 (rs1129923, p = 1.07x10-6), the 3'-phosphoadenosine 5'- phosphosulfate synthase 1 (PAPSS1) gene (rs9569, p = 6.84x10-6) and in the phosphatidylinositol-4-phosphate 3-kinase, catalytic subunit type 2 gamma (PIK3C2G) gene (rs10841025, p = 9.01x10-6), as well as association signals in intergenic regions on chromosome 17 and 3 (rs9890008, p = 4.25x10-6 and rs4857008, p = 5.88x10-6, respectively).

We introduced 3,244 and 3,113 p-values, ADNI-GW and ADNI-GATES respectively (Table S2), into the same FPAN used for the Meta-analysis and module search was carried out with their respective null datasets (ADNI-GW-Permuted, ADNI-GATES-

Permuted), since in the absence of genome wide significant results, the WGW control was not necessary. In general agreement with the meta-analysis data, global results indicated that the number of significant modules obtained in either ADNI-GW (average number =

30.3, SD = 2.00) or ADNI-GATES (average number = 24.4, SD = 1.7), was significantly higher (p = 7.01x10-3 and p = 1.80x10-5, respectively), than those obtained by chance

28

(Permuted; Figure 3C). Interestingly, when comparing the scores of the first 10 sub- networks only the ones belonging to the ADNI-GW analysis remained significantly above their respective permuted ones (Figure 3D) and thus were considered for further analysis

(Table S3).

GO term enrichment in ADNI indicated that genes belonging to categories such as voltage-gated calcium channel complex and ion channel complex were significantly overrepresented in AD (p = 1.24x10-8 and p = 9.24x10-7, respectively; see also Table S3 and Table S4 for complete ontological results). Moreover, we replicated multiple modules enriched with glutamate signaling genes (Table 4), including modules SN3 (S = 5.23, p =

5.09x10-8), SN4 (S= 4.94, p = 7.14x10-8) and SN7 (S = 4.38, p-value = 3.40x10-8).

Individual sub-network structure is presented in Figure S3.

29

KEGG pathway enrichment

KEGG provided information regarding 280 pathways involving 6,733 genes.

Although in comparison with the GO database the amount of information provided by

KEGG is substantially reduced, the fact that each annotation is manually curated makes any association much more reliable [51]. Notably, throughout this analysis we detected that glutamate signaling was again the main overrepresented biological process in both Meta-

GW SN1 (p = 1.10x10-28) and Meta-GATES SN1 (p = 5.94x10-29) sub-networks (Table

S6). Glutamatergic synapse, as a KEGG pathway category, was also significantly associated in ADNI sub-networks SN3, SN4 and SN7 (p = 1.20x10-06, p = 1.32x10-06 and p

= 8.18x10-10, respectively; Table S6). Finally, logical and structural relationships of all sub- networks enriched with glutamate signaling genes from the meta-analysis and the ADNI- replication dataset allowed us to define a list of genes of interest, which were shared at least by 3 sub-networks (Figure 4). The list was composed by 20 signaling components, including membrane-anchored ionotropic (GRIN2A, GRIN2B, GRID2, GRIA1 and

GRIA2) and metabotropic glutamate receptors (GRM1, GRM3, GRM7 and GRM8), intracellular downstream effectors CAMK2A and AKAP5, as well as scaffold proteins

SHANK1 and SHANK2, which are required for proper formation and function of neuronal synapses.[52] The functional relationships of these signaling components in the context of a glutamatergic synapse is shown in Figure 5.

Expression of glutamate-signaling componentsgenes in the human brain Formatted: Justified

To assess for underlying At the physiological level, to explore if there was a transcriptional relationships of these genes of interestrelationship among the glutamate signaling genes previously identified, we examined their expression profiles in 27 normal human brain

30

regions, using the information from the Allen Brain Atlas [42], as a reference [38] (Figure

S4). Notably, . Clearly, the expression pattern of these components clustered specifically in brain regions tightly related withto AD pathology, such as the hippocampal formation, the striatumhypothalamus and white matter [47-49][53-55] (Figure 6). While it is well known that glutamate signaling is active in these brain regions [50-52]domains [56-58], it iswas interesting to note that find out clusters with high- and low-expression clusters were identified.levels. For instance, GRIA1, GRIA2, CAMK2A, GRIN2B and GRM7 were found among highly expressed gene clusters in the hippocampal formation seven components, including GRIN2A, GRIN2B, GRM7, SHANK1, HOMER1, GNAQ and

MAP2K4, were found in a cluster of highly expressed genes (r = 0.8294), particularly in the CA2-CA3 and CA4 region (Figure 6A). On the other hand, 8 genes were grouped into 2 low-expression clusters (cluster 1: GRM3, GRM4), while GRM3, LPHN3, GRID2 and

SLC9A9 were grouped in a low-expression cluster (r = 0.87). Low-expression clusters were also observed in the hypothalamus (LSP1, GRM3, GRM1, GRIN2A, ITPR1, AKAP5 and

ATP2B2; Figure 6B), the dorsal thalamus (SHANK2, SHANK1, BAI3, PIAS1, GRIA1,

GRID2 and GRIA2; Figure 6C) and also distinguished in the white matter (GRIA2,

AKAP5, GRIN2A, GRIN2B, CAMK2A, SHANK2, GRM8 and LPHN3; cluster 2:

SHANK2, GNAS, GNAO1 and NGF). Similarly, two clusters of high- and low-expression genes were also identified in the striatum (GRM3 and GRM4 in a high-expression cluster, r

= 0.95; and GRM1, GRIK1, PIK3CA, MAP2K4, HOMER2 and, GRM1, GRM7, BAI3,

LPHN3, ITPR1, SHANK1 in a low-expression cluster, r = 0.77and ATP2B2; Figure

6B6D). Interestingly, we observedthere was an inverse relationship in the expression pattern in of a subset of these two brain regionsgenes, since genes groupedcomponents highly expressed in the hippocampus were found in low-expression clusters in the

31

hippocampal formation are found in high-expression clusters in the striatum (i.e. white matter, and vice versa (i.e. GRIA2, CAMK2 and GRIN2B vs. GRM3 and GRM4), and vice-versa (i.e. SHANK1 and MAP2K4). Finally, two clusters of high- and low-expression genes were clearly distinguished in the white matter (Figure 6C), suggesting that only a subset of components are essential for glutamate signaling in this brain region. SLC9A9).

Altogether these results indicate that glutamate signaling components are differentially expressed in restricted brain domains for proper neuronal or glial functional activity (see also Figure 5).

32

Discussion Formatted: Justified

In agreement with previous GWAS in AD [8,13-16,19][8,13-16,19] our meta- analysis detected strong genome-wide association signals in a 250 kb window of chromosome 19, centered in the coding/regulatory region of the TOMM40 gene, in close proximity to the APOE locus, and that also included significant signals in the PVRL2,

APOC1, BCAM and BCL3 genes. While it has been suggested that the association of such extended region may reflect that other variants in LD with APOE may be of pathogenic importance, particularly a poly-T track in the TOMM40 gene [53,54][59,60], recent studies have shown that APOE alleles account for essentially all the inherited risk of AD associated in this region [55]. Remarkably, besides[61]. Besides the signal in chromosome 19, we detected marginal associations of 2 novel SNPs in the MS4A3 gene, located in a wide LD region in the long arm of chromosome 11, distant ca. 180 kb fromcontaining a cluster of

SNPs in the MS4A6A/MS4A4E regionloci in the long arm of chromosome 11 (i.e. rs610932, rs670139, rs1562990 and, rs4938933 and rs983392), which reached genome- wide significant levels by other recent studies [15,56][16,62,63].

Assuming that the APOE locus is the major genetic hallmark associated with the disease and that it does not explain the entire susceptibility of AD [1,7-9][1,7-9], we conducted a network-based pathway analysis with our meta-analysis results to explore the biology behind variants with minor effect size variants from our meta-analysis. Initially, to integrate the whole genetic contribution from the meta-analysis we used a gene-wise p- value (GW, single min p-value method) that has been widely applied in detecting novel associations using GWAS data [17,57].[17,64]. Although this approach has a certain bias

33

for pathways enriched with larger genes and does not consider intergenic associations or

LD structure, we note that the random permutation of p-values yielded a distribution of results expected by chance from where the actual data wascould be compared. Likewise, taking into accountwith the search for significant sub-networks with real and permuted data, and additionally with the WGW control, we believe that the actual contribution of thesethe aforementioned problems to the final result is supersededstrongly surpassed by the combination of true minor effect size variants. Still, we considered appropriated the introduction of the GATES procedure that is specifically designed to address directly address the gene size and LD structure issues and thus we ended up with a more stringent gene-oriented p-value. We introduced these new p-values into FPAN, but this time selected Formatted: Font color: Red only the highest confidence interactions in the STRING database (score >0.9). This approach served as an internal control of our network-based pathway analysis and led us to replicate lipid/apolipoprotein metabolism as a key biological process involved in AD etiology [58] and to identify GO sub-networks significantly enriched with genes participating in glutamate signaling at the synapse.Notably, through this approach we replicated essentially the same sub-network (i.e. GW and GATES SN1), which was populated by genes related to glutamate signaling, differing only in the absence of GNAZ gene whose only association in the meta-analysis (Table S1; rs4820537, p = 0.02096) was found not informative in the GATES procedure. Glutamate signaling was further replicated in the ADNI dataset and this time it reached significant association levels in three sub- networks (ADNI-GW SN3, SN4 and SN7).

From a biological point of view, the relationship of these genetic observations with Formatted: Font color: Red current knowledge about AD is straightforward. Indeed, since the identification of the

APOE-ε4 allele as the mayor risk factor in AD [5,6], lipid/apolipoprotein metabolism

34

(represented here as SN5 and SNG3 modules) has been intensively studied. For instance, aberrant lipid metabolism in AD brains is thought to affect the stability of lipid rafts where

APP processing by β- and γ-secretase is enhanced [59]. Likewise, APOE and APOE receptors have been shown to interact and modulate the aggregation and/or clearance of Aβ oligomers [60]. Considering that APOE alleles show dissimilar efficiencies in lipid homeostasis [61] and that other genetic risk variants have been reported associated with AD within this pathway – specifically in the SORL1 [62] and the LRP6 [63] genes – we consider that the identification of additional minor risk variants among genes reported in

SN5 and SNG3 modules is feasible.

Glutamate signaling has been reported to regulate multiple biological processes, Formatted: Justified including fast excitatory synaptic transmission, neuronal growth and differentiation, synaptic plasticity and learning and memory [64,65]., learning and memory [65,66].

Degenerating neurons and synapses in AD brains are usually located within regions that project to or from areas displaying high densities of A plaques and tangles [66]. In[67] and in this regard, glutamatergic neurons located in the hippocampus, as well as in other areas of the brain, are severely affected by these neurotoxic insults [64,66].[65,67].

Likewise, it has been established that there is a relationship between glutamate receptor signaling and soluble Aβ oligomers in the hippocampus, affecting their expression and recycling and, which leads to long term depression, synaptic loss and ultimately to cognitive deficit [67,68].[68,69]. Moreover, sustained activation of glutamate receptors at the synapse rise Ca2+ influxes and second messenger levels activating neuronal nitric oxide synthase (NO) and), increasing reactive nitrogen and oxygen species, thus contributing to neuronal damage independently of the presence of Aβ oligomers [69]. Interestingly, here

35

we have found[70]. Alternatively, and from a genomic perspective, here we provide strong evidence that common genetic variants within a complete set of genes acting as ionotropic/metabotropic glutamate receptors and its downstream effectors are associated in a network context with AD. Accordingly, and as an independent confirmation of our findings, it has been recently reported that pathways related to neurotransmitter receptor- mediated calcium signaling and long-term potentiation are similarly associated with mild cognitive impairment and AD [70].[26]. In addition we have found that the expression Formatted: Font: Times New Roman, 12 pt, English (U.S.) pattern of these glutamate signaling genes cluster specifically in the hippocampal formation Formatted: Font color: Red and striatum. Therefore, considering that thesein specific brain regions, which are known to be affected during the development of the disease, our results indicate thatsuch as the hippocampal formation and the hypothalamus. Therefore, it will be interesting to learn if the genes identified through theour network-based pathway approach are interacting within a functional networkspatially and are coordinately expressed in phenotype-related tissues

(timely and regionally), supportingmodulated at the transcriptional or post-transcriptional level as a result of various trophic or toxic stimuli. Finally, our data extends the notion that the remaining genetic risk for complex traits, such as AD, is likely explained by the accumulation of functional genetic variants inside an entire pathway, rather than by punctual independent mutations.

36

Formatted: Justified

37

Acknowledgments Formatted: Font color: Auto Formatted: Font: Bold We would like to thank the Translational Genomics Research Institute (TGen) and the Pfizer Inc. Pharmaceutical Company (Pfizer), and all contributors associated who kindly provided access to the genotypic and phenotypic association data. Likewise we thank the Joint Addiction, Aging, and Mental Health (JAAMH) Data Access Committee of the Database of Genotypes and Phenotypes (dbGaP) that allowed to access the National

Institute of Aging (NIA) and National Cell Repository For Alzheimer disease (NIA-

LOAD/NCRAD) dataset under the accession number phs000168.v1.p1, which was collected and analyzed under the supervision of Richard Mayeux MD, MSc, Columbia

University, New York, NY, USA and Tatiana Foroud PhD, from the NCRAD and Indiana

University, Indianapolis, IN, USA. Full list of co-investigators is provided at http://www.ncbi.nlm.nih.gov/projects/gap/cgi- bin/study.cgi?study_id=phs000168.v1.p1.http://www.ncbi.nlm.nih.gov/projects/gap/cgi- bin/study.cgi?study_id=phs000168.v1.p1. The genotyping, cleaning and harmonization of the NIA-LOAD/NCRAD dataset was carried out at the Johns Hopkins University Center for Inherited Disease Research (CIDR), Baltimore, MD, USA. Additionally, data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative

(ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug

Discovery Foundation; BioClinica, Inc.; Biogen Idec Inc.; Bristol-Myers Squibb Company;

Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; GE Healthcare; Innogenetics, N.V.; IXICO

38

Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson &

Johnson Pharmaceutical Research & Development LLC.; Medpace, Inc.; Merck & Co.,

Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Novartis Pharmaceuticals

Corporation; Pfizer Inc.; Piramal Imaging; Servier; Synarc Inc.; and Takeda Pharmaceutical

Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern

California Institute for Research and Education, and the study is coordinated by the

Alzheimer's Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern

California. Finally, we wish to thank the participation of the patients and their families that ultimately have made possible this study. ThisOur work was supported by Grants from the

Chilean Government FONDECYT 1100942 and FONDAP 15090007 to G.V.D. E.P.-P. and M.E.A. areCONICYT supported by doctoral fellowships from CONICYT. B.I.B. is supported by afor E.P.-P. and M.E.A. MECESUP UAB0802 doctoral fellowship by

MECESUP UAB0802supports B.I.B.

Formatted: Justified

39

References

1. Bettens K, Sleegers K, Van Broeckhoven C (2010) Current status on Alzheimer disease molecular Formatted: Spanish (Chile) genetics: from past, to present, to future. Hum Mol Genet 19: R4-R11. 2. Lambert JC, Schraen-Maschke S, Richard F, Fievet N, Rouaud O, et al. (2009) Association of plasma amyloid beta with risk of dementia: the prospective Three-City Study. Neurology 73: 847-853. 3. Hardy J, Selkoe DJ (2002) The amyloid hypothesis of Alzheimer's disease: progress and problems on the road to therapeutics. Science 297: 353-356. 4. Guerreiro RJ, Gustafson DR, Hardy J (2012) The genetic architecture of Alzheimer's disease: beyond APP, PSENs and APOE. Neurobiol Aging 33: 437-456. 5. Saunders AM, Strittmatter WJ, Schmechel D, George-Hyslop PH, Pericak-Vance MA, et al. (1993) Association of apolipoprotein E allele epsilon 4 with late-onset familial and sporadic Alzheimer's disease. Neurology 43: 1467-1472. 6. Strittmatter WJ, Saunders AM, Schmechel D, Pericak-Vance M, Enghild J, et al. (1993) Apolipoprotein E: high-avidity binding to beta-amyloid and increased frequency of type 4 allele in late-onset familial Alzheimer disease. Proc Natl Acad Sci U S A 90: 1977-1981. 7. Kamboh MI (2004) Molecular genetics of late-onset Alzheimer's disease. Ann Hum Genet 68: 381-404. 8. Coon KD, Myers AJ, Craig DW, Webster JA, Pearson JV, et al. (2007) A high-density whole- genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. J Clin Psychiatry 68: 613-618. 9. Meyer MR, Tschanz JT, Norton MC, Welsh-Bohmer KA, Steffens DC, et al. (1998) APOE genotype predicts when--not whether--one is predisposed to develop Alzheimer disease. Nat Genet 19: 321-322. 10. Bertram L, McQueen MB, Mullin K, Blacker D, Tanzi RE (2007) Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat Genet 39: 17-23. 11. Cowperthwaite MC, Mohanty D, Burnett MG (2010) Genome-wide association studies: a powerful tool for neurogenomics. Neurosurg Focus 28: E2. 12. Zaykin DV, Zhivotovsky LA (2005) Ranks of genuine associations in whole-genome scans. Genetics 171: 813-823. 13. Hu X, Pickering E, Liu YC, Hall S, Fournier H, et al. (2011) Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease. PLoS One 6: e16616. 14. Harold D, Abraham R, Hollingworth P, Sims R, Gerrish A, et al. (2009) Genome-wide association study identifies variants at CLU and PICALM associated with Alzheimer's disease. Nat Genet 41: 1088-1093. 1415. Lambert JC, Heath S, Even G, Campion D, Sleegers K, et al. (2009) Genome-wide association study identifies variants at CLU and CR1 associated with Alzheimer's disease. Nat Genet 41: 1094-1099. 1516. Hollingworth P, Harold D, Sims R, Gerrish A, Lambert JC, et al. (2011) Common variants at ABCA7, MS4A6A/MS4A4E, EPHA1, CD33 and CD2AP are associated with Alzheimer's disease. Nat Genet 43: 429-435. 16. Hu X, Pickering E, Liu YC, Hall S, Fournier H, et al. (2011) Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease. PLoS One 6: e16616.

40

17. Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, et al. (2009) Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 18: 2078-2090. 18. Lechner M, Hohn V, Brauner B, Dunger I, Fobo G, et al. (2012) CIDeR: multifactorial interaction networks in human diseases. Genome Biol 13: R62. 19. Reiman EM, Webster JA, Myers AJ, Hardy J, Dunckley T, et al. (2007) GAB2 alleles modify Alzheimer's risk in APOE epsilon4 carriers. Neuron 54: 713-720. 20. Lee JH, Cheng R, Graff-Radford N, Foroud T, Mayeux R (2008) Analyses of the National Institute on Aging Late-Onset Alzheimer's Disease Family Study: implication of additional loci. Arch Neurol 65: 1518-1526. 21. Wijsman EM, Pankratz ND, Choi Y, Rothstein JH, Faber KM, et al. (2011) Genome-wide association of familial late-onset Alzheimer's disease replicates BIN1 and CLU and nominates CUGBP2 in interaction with APOE. PLoS Genet 7: e1001308. 22. Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, et al. (2007) The NCBI dbGaP database of genotypes and phenotypes. Nat Genet 39: 1181-1186. 23. The_International_HapMap_Consortium (2005) A haplotype map of the human genome. Nature 437: 1299-1320. 24. Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet Epidemiol 34: 816-834. 25. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81: 559-575. 26. Choi Y, Wijsman EM, Weir BS (2009) Case-control association testing in the presence of unknown relationships. Genet Epidemiol 33: 668-678. 27.Ramanan VK, Kim S, Holohan K, Shen L, Nho K, et al. (2012) Genome-wide pathway analysis of Formatted: Space After: 0 pt memory impairment in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort implicates gene candidates, canonical pathways, and networks. Brain Imaging Behav 6: 634-648. 27. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38: 904-909. 28. Team R (2012) R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna Austria. 2829. Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, et al. (2004) The UCSC Table Browser data retrieval tool. Nucleic Acids Res 32: D493-496. 2930. Moher D, Liberati A, Tetzlaff J, Altman DG, Group P (2009) Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med 6: e1000097. 3031. Li MX, Gui HS, Kwan JS, Sham PC (2011) GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am J Hum Genet 88: 283-293. 3132. Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, et al. (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 39: D561-568. 3233. Cline MS, Smoot M, Cerami E, Kuchinsky A, Landys N, et al. (2007) Integration of biological networks and gene expression data using Cytoscape. Nat Protoc 2: 2366-2382. 3334. Ideker T, Ozier O, Schwikowski B, Siegel AF (2002) Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18 Suppl 1: S233-240. 35. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258-261.

41

36. 34Arakawa K, Kono N, Yamada Y, Mori H, Tomita M (2005) KEGG-based pathway visualization tool for complex omics data. In Silico Biol 5: 419-423. 37. Grossmann S, Bauer S, Robinson PN, Vingron M (2007) Improved detection of overrepresentation of Gene-Ontology annotations with parent child analysis. Bioinformatics 23: 3024-3031. 38. 35. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, et al. (2004) The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res 32: D258-261. 36. Warde-Farley D, Donaldson SL, Comes O, Zuberi K, Badrawi R, et al. (2010) The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res 38: W214-220. 3739. Westfall PH, Young SS (1993) Resampling-based multiple testing: Examples and methods for p-value adjustment: Wiley-Interscience. 3840. Kanehisa M, Goto S (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28: 27-30. 41. Team R (2010) R: A language and environment for statistical computing. R Foundation for Statistical Computing Vienna Austria. 42. Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, et al. (2012) An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489: 391-399. 3943. Miller JA, Cai C, Langfelder P, Geschwind DH, Kurian SM, et al. (2011) Strategies for aggregating gene expression data: the collapseRows R function. BMC Bioinformatics 12: 322. 4044. Saldanha AJ (2004) Java Treeview--extensible visualization of microarray data. Bioinformatics 20: 3246-3248. 4145. Melum E, Franke A, Karlsen TH (2009) Genome-wide association studies--a summary for the clinical gastroenterologist. World J Gastroenterol 15: 5377-5396. 4246. Fleiss JL, Gross AJ (1991) Meta-analysis in epidemiology, with special reference to studies of the association between exposure to environmental tobacco smoke and lung cancer: a critique. J Clin Epidemiol 44: 127-139. 4347. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Control Clin Trials 7: 177-188. 4448. Ades AE, Lu G, Higgins JP (2005) The interpretation of random-effects meta-analysis in decision models. Med Decis Making 25: 646-654. 4549. Barsh GS, Copenhaver GP, Gibson G, Williams SM (2012) Guidelines for genome-wide association studies. PLoS Genet 8: e1002812. 4650. Aisen PS, Petersen RC, Donohue MC, Gamst A, Raman R, et al. (2010) Clinical Core of the Alzheimer's Disease Neuroimaging Initiative: progress and plans. Alzheimers Dement 6: 239-246. 51. Holmans P (2010) Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Adv Genet 72: 141-179. 52. Grabrucker AM, Schmeisser MJ, Schoen M, Boeckers TM (2011) Postsynaptic ProSAP/Shank scaffolds in the cross-hair of synaptopathies. Trends Cell Biol 21: 594-603. 4753. Fotuhi M, Do D, Jack C (2012) Modifiable factors that alter the size of the hippocampus with ageing. Nat Rev Neurol 8: 189-202. 4854. de Jong LW, Ferrarini L, van der Grond J, Milles JR, Reiber JH, et al. (2011) Shape abnormalities of the striatum in Alzheimer's disease. J Alzheimers Dis 23: 49-59. 4955. de Leeuw FE, Barkhof F, Scheltens P (2005) Progression of cerebral white matter lesions in Alzheimer's disease: a new window for therapy? J Neurol Neurosurg Psychiatry 76: 1286- 1288.

42

5056. Alix JJ, Domingues AM (2011) White matter synapses: form, function, and dysfunction. Neurology 76: 397-404. 5157. Tamminga CA, Southcott S, Sacco C, Wagner AD, Ghose S (2012) Glutamate dysfunction in hippocampus: relevance of dentate gyrus and CA3 signaling. Schizophr Bull 38: 927-935. 5258. Xu J, Kurup P, Nairn AC, Lombroso PJ (2012) Striatal-enriched protein tyrosine phosphatase in Alzheimer's disease. Adv Pharmacol 64: 303-325. 5359. Cruchaga C, Nowotny P, Kauwe JS, Ridge PG, Mayo K, et al. (2011) Association and expression analyses with single-nucleotide polymorphisms in TOMM40 in Alzheimer disease. Arch Neurol 68: 1013-1019. 5460. Roses AD, Lutz MW, Amrine-Madsen H, Saunders AM, Crenshaw DG, et al. (2010) A TOMM40 variable-length polymorphism predicts the age of late-onset Alzheimer's disease. Pharmacogenomics J 10: 375-384. 5561. Jun G, Vardarajan BN, Buros J, Yu CE, Hawk MV, et al. (2012) Comprehensive Search for Alzheimer Disease Susceptibility Loci in the APOE Region. Arch Neurol: 1-10. 5662. Antunez C, Boada M, Gonzalez-Perez A, Gayan J, Ramirez-Lorca R, et al. (2011) The membrane-spanning 4-domains, subfamily A (MS4A) gene cluster contains a common variant associated with Alzheimer's disease. Genome Med 3: 33. 5763. Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, et al. (2013) Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat Genet 45: 1452-1458. 64. Torkamani A, Topol EJ, Schork NJ (2008) Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 92: 265-272. 58. Jones L, Holmans PA, Hamshere ML, Harold D, Moskvina V, et al. (2010) Genetic evidence implicates the immune system and cholesterol metabolism in the aetiology of Alzheimer's disease. PLoS One 5: e13950. 59. Shobab LA, Hsiung GY, Feldman HH (2005) Cholesterol in Alzheimer's disease. Lancet Neurol 4: 841-852. 60. Bu G (2009) Apolipoprotein E and its receptors in Alzheimer's disease: pathways, pathogenesis and therapy. Nat Rev Neurosci 10: 333-344. 61. Rapp A, Gmeiner B, Huttinger M (2006) Implication of apoE isoforms in cholesterol metabolism by primary rat hippocampal neurons and astrocytes. Biochimie 88: 473-483. 62. Rogaeva E, Meng Y, Lee JH, Gu Y, Kawarai T, et al. (2007) The neuronal sortilin-related receptor SORL1 is genetically associated with Alzheimer disease. Nat Genet 39: 168-177. 63. De Ferrari GV, Papassotiropoulos A, Biechele T, Wavrant De-Vrieze F, Avila ME, et al. (2007) Common genetic variation within the low-density lipoprotein receptor-related protein 6 and late-onset Alzheimer's disease. Proc Natl Acad Sci U S A 104: 9434-9439. 6465. Mattson MP (2008) Glutamate and neurotrophic factors in neuronal plasticity and disease. Ann N Y Acad Sci 1144: 97-112. 6566. Yang JL, Sykora P, Wilson DM, 3rd, Mattson MP, Bohr VA (2011) The excitatory neurotransmitter glutamate stimulates DNA repair to increase neuronal resiliency. Mech Ageing Dev 132: 405-411. 6667. Revett TJ, Baker GB, Jhamandas J, Kar S (2012) Glutamate system, amyloid ss peptides and tau protein: functional interrelationships and relevance to Alzheimer disease pathology. J Psychiatry Neurosci 38: 6-23. 6768. Almeida CG, Tampellini D, Takahashi RH, Greengard P, Lin MT, et al. (2005) Beta-amyloid accumulation in APP mutant neurons reduces PSD-95 and GluR1 in synapses. Neurobiol Dis 20: 187-198.

43

6869. Shankar GM, Bloodgood BL, Townsend M, Walsh DM, Selkoe DJ, et al. (2007) Natural oligomers of the Alzheimer amyloid-beta protein induce reversible synapse loss by modulating an NMDA-type glutamate receptor-dependent signaling pathway. J Neurosci 27: 2866-2875. 6970. Nakamura T, Lipton SA (2009) Cell death: protein misfolding and neurodegenerative Formatted: Space After: 10 pt diseases. Apoptosis 14: 455-468.

70. Ramanan VK, Kim S, Holohan K, Shen L, Nho K, et al. (2012) Genome-wide pathway analysis of Formatted: Space After: 0 pt memory impairment in the Alzheimer's Disease Neuroimaging Initiative (ADNI) cohort implicates gene candidates, canonical pathways, and networks. Brain Imaging Behav 6: 634-648.

44

45

Figure Legends Formatted: Justified

Figure 1. Study strategy. To properly combine AD GWAS datasets we performed the kinship correction for theGenotype imputation was carried out in the Tgen1 and NIA-

LOAD/NCRAD dataset, due to its familial composition (upper-left box), and a genotype imputation for the TGen1 dataset (upper-right box). Afterdatasets (asterisk). The meta- analysis was conducted using the inverse variance method in PLINK, after pruning bad genotypes and samples using standard quality control (QC) tests, the inverse variance method was applied to perform the meta-analysis.. The ADNI dataset was included for replication following similar QC procedures at this step. Meta-analysis (dark grey arrows) and ADNI associations obtained (light grey arrows) results were annotated and the Gene-

Wise and GATES single gene p-values were calculated using the Gene-Wise (GW) method or the more stringent GATES procedure (threshold p< < 0.05) for each gene and then). We next introduced this information to their respective FPAN (Middle blue square; scores: >0.7 and >0.9). Finally, modulefrom STRING database). Module search was performed 10 times, side by side with the permuted data and without genome-wide-significant (WGW) results, which served as internal controls. The significantSignificant sub-networks obtained(white squares) were compared and assayed for gene ontology (GO) term and

KEGG pathways enrichment with Ontologizer obtainingto obtain the final overrepresented

GO termspathways associated with AD, inside each sub-network associated to AD. Equal results between Meta and WGW analysis (“=”) that could not be obtained with the permuted control (“≠”) were expected.

Figure 2. Quantile-Quantile (Q-Q) plots for GWAS datasets and combined meta- analysis. Comparison of the association results for each SNP (black dots) with those

46

expected by chance (red line) in TGen1 imputed (A), NIA-LOAD/NCRAD (B), Pfizer (C) and the final meta-analysis (D). In each dataset, the genomic inflation factor (λ) is shown.

Values of λ between 0.9 and 1.1 are considered unbiased by the population structure.

Figure 32. Genome wide meta-analysis results in AD. Manhattan plot showing the p- Formatted: Justified values obtained in the meta-analysis. The end and beginning of a chromosome is denoted by the change of color pattern of the SNPs (black, grey and brown dots). Genome-wide significance threshold is denoted by a red line (5.0x10-8). The Y axis has been truncated to show all associated SNPs inside the APOE loci and to improve visualization of suggestive associations (.rs474951 and rs528823 inside MS4A3 gene). Formatted: Font: Arial, 9 pt, Font color: Black

Formatted: Font color: Red

Figure 43. Sub-network search results. (A) Total meanThe number of significant sub- networks (size < 50 and score > 3) obtained for Gene-Wise (right panelin Meta-GW (light green) and Meta-GATES (left panel) procedures using Cytoscape JActive modules sorfwaredark green) is shown compared with: real data (Meta-AD, Meta-AD-GATES); real data without genome wide significant results (Meta-AD-WGW, Meta-AD-GATES-WGW), real data same values permuted (across the FPAN: Meta-ADGW-Permuted, (light grey) and Meta-AD-GATES-Permuted) and real data permuted without genome wide significant results (Meta-AD-WGW-Permuted, Meta-AD-GATES-WGW-Permuted). (dark grey). (B)

Score comparison of the top 10 sub-networks obtained in the corresponding module searches presented in (A). (C) The number of significant sub-networks in the replication step for ADNI-GW (light blue) and ADNI-GATES (dark blue) analysis, in comparison with their corresponding permuted controls: ADNI-GW-Permuted (light grey) and ADNI-

GW-Permuted (dark grey). (D) Score comparison of the top 10 sub-networks obtained for

47

each module searches presented in (C). Caped bar/points denote SD; Significant differences between real and permuted data observed in GW and GATES analysis are denoted by an asterisk and those between real and permuted data observed only in GW analysis are denoted by a plus sign (two-sided Student’s t-test; p < 0.01) and caped bar denotes SD.).

Formatted: Font color: Red

Figure 5. OntologicalFigure 4. Structure and relationships between glutamate signaling sub-networks overrepresented in AD. Selected sub-networks from the Gene- Formatted: Font: Bold

Wise (SN1 and SN5) and GATES approximation (SNG1, SNG2, SNG3) arise from the

FPAN represented by the Sub-network in the middle based upon the information provided by STRING database. Top sub-network gene composition (nodes) and interactions (edges) are shown for: Meta-SN1 in the upper left corner with green edges (which includes interactions of SN1 (blue edges); SNG1 (green edges) and SNG2 (redGW and GATES modules, GNAZ* gene only present in GW); ADNI-GW SN3 in the botton left corner with dark blue edges) which are all related to glutamate signaling. Bottom sub-network includes interactions of SN5 (orange; ADNI-GW SN4 in the bottom right corner with blue edges) and SNG3 (yellow and ADNI-GW SN7 in the upper right corner with light blue edges) both enriched with genes involved in lipid metabolism.. Genes shared by at least 2 sub- networks are in bold font. Network legend is provided in the middle-right panel based on the meta-analysis results, were the nodelocated at the center in bold font and cross interactions between genes inside each module are denoted by light grey edges. Node color represents the OR behavior in a gradient from green to red values (i.e. green: OR<1; red

OR>1; white: OR = 1), denoting protection and risk, respectively. Similarly, node size is proportional to the -log10 p-value obtained from the meta-analysis (if absent, node size is Formatted: Not Superscript/ Subscript the minimum). Triangle shaped nodes marks genes belonging to the glutamate signaling

48

pathway GO term (GO:0007215); Diamond shaped nodes denotes genes belonging to

KEGG Glutamatergic synapse pathway (hsa04724); Squares Square shaped nodes denotes genes belonging to both gene ontology term GO:0007215 and KEGG hsa04724 pathway.

Formatted: Font: Bold

Figure 5. Overrepresented glutamate signaling components at the functional synapse.

The original version of the hsa04724 KEGG pathway (Glutamatergic synapse) is shown.

Black arrows denotes direct molecular interaction or relation between gene products (green squares) or other types of molecules (unfilled circles), while black arrows with dashed lines denotes an indirect effect between the each nodes. Relationships with other KEGG pathways are shown with the presence of white round rectangles. Gene symbols in components belonging to Meta-GW and META-GATES SN1 are denoted in red.

Formatted: Justified

Figure 6. Gene expression analysis of glutamate signaling components in selected human brain regions.

Figure 6. Gene expression analysis of glutamate signaling components in selected human brain regions. Heatmap and dendrogram of normalized expression levels of the 20 genes of interest shared between SN1, SNG1 and SNG2 showingdisplaying significant clustering in the hipocampal: (A) hippocampal formation (HIF) (A), striatum (STR)); (B) and hypothalamus (HY); (C) Dorsal Thalamus (DT); and (D) white matter (WM) (C).

Heatmaps were generated using normalized Z score gene-wise expression values, which were averaged from 46 brain donor individuals (ids. H0351.2001, H0351.2002,

H0351.1009 and, H0351.1012)., H0351.1015 and H0351.1016). Bright red and green color indicates high expression (Z > 2) and bright green low expression (Z < 2).low expression

(Z < 2). Highly correlated gene clusters (Euclidean distance correlation coefficient r > 0.7)

49

are denoted by colored lines in the dendrograms: green clusters, indicates low expression patterns; red clusters show high levels of expression of correlated genes. Gene expression patterns in the corresponding substructures are shown for HIF: Dentate Gyrus (DG); Cornu

Ammonis 1 (CA1); Cornu Ammonis 2 (CA2); Cornu Ammonis 3 (CA3); Cornu Ammonis

4 (CA4) and Subiculum (S). For STR: Body of the CaudateFor HY: Anterior Hypothalamic

Area (AHA); Lateral hypothalamic Area (LHA); Paraventricular Nucleus (BCd); Head of the CaudateHypothalamus (PVH); Supraoptic Nucleus (HCd); Tail of the CaudateSO);

Lateral Hypothalamic Area, Mammillary Region (LHM); Mammillary Body (MB);

Posterior Hypothalamic Area (PHA); Supramammillary Nucleus (TCd);SuM);

Tuberomammillary Nucleus Accumbens (Acb); Putamen (Pu) and for(TM); Preoptic

Region (PrOR); Arcuate Nucleus of the Hypothalamus (ARH); Dorsomedial Hypothalamic

Nucleus (DMH); Lateral Hypothalamic Area, Tuberal Region (LHT); Lateral Tuberal

Nucleus (LTu); Perifornical Nucleus (PeF); Ventromedial Hypothalamic Nucleus (VMH).

For DT: Anterior Group of Nuclei (DTA); Caudal Group of intralaminar Nuclei (ILc);

Dorsal Lateral Geneiculate Nucleus (LGd); Lateral Group of Nuclei, Dorsal Division

(DTLd); Lateral Group of Nuclei, Ventral Division (DTLv); Medial Geniculate Complex

(MG); Medial Group of Nuclei (DTM); Posterior Group of Nuclei (DTP); Rostral Group of

Intralaminar Nuclei (ILr). For WM: Cc: Corpus callosum; Cgb: Cingulum bundle. Highly correlated gene clusters (Euclidean distance correlation coefficient r > 0.7) are denoted by colored lines in the dendrograms: green clusters, indicates low expression patterns; red clusters show high levels of expression of correlated genes.

50

Figure S1. Quantile-Quantile (Q-Q) plots for GWAS datasets and combined meta- Formatted: Justified analysis. Comparison of the association results for each SNP (black dots) with those expected by chance (red line) in TGen1 (A), NIA-LOAD/NCRAD (B), Pfizer (C) the final meta-analysis (D) and in the ADNI replication dataset. In each dataset, the genomic inflation factor (λ) is shown. Values of λ between 0.9 and 1.1 are considered unbiased by the population structure. Formatted: Font: Not Bold

Figure S1. Gene-wise ontologicalFigure S2. Gene structure comparison between modules detected with real and permuted data. Gene coincidences between Meta-GW

SN1 (49 genes, light grey circle) and Meta-GATES SN1 (48 genes, dark grey circle) are shown in a Venn diagram and compared with the total number of genes in the first 10 modules of each permuted analysis: Meta-GW SN1 to SN10 (397 genes, light grey circle) and Meta-GATES SN1 to SN10 (354 genes, dark grey circle).

Figure S3. Glutamate signaling sub-networks overrepresented in AD. Selected sub- Formatted: Justified networks SN1, SN2, SN3, SN4 and SN5 (AMeta-GW SN1 in conjunction with Meta-

GATES SN1, and ADNI-GW SN3, ADNI-GW SN4, ADNI-GW SN7 sub-networks are shown in A through ED, respectively).. Nodes represent genes and edges their corresponding interactions extracted from FPAN based upon the information provided byin the STRING database. Network legend is provided inat the bottom-left panel were: the node color represents the OR behavior in a gradient from green to red values (i.e. green:

OR<1; red OR>1; white: OR = 1), denoting protection and risk, respectively. Similarly,

51

node size isand edge thickness are proportional to the -log10 p-value obtained fromin the meta-analysis (if absent, node size is the minimum).) and the combined score of interaction.

Asterisk in GNAZ gene is a reminder that this gene is only present in Meta-GW SN1.

Formatted: Font: Bold

Figure S2. GATES ontological sub-networks overrepresented in AD. Significant sub- networks SNG1, SNG2 and SNG3. Nodes represent genes and edges their corresponding interactions extracted from FPAN based upon the information provided by STRING database. Network legend is provided in the bottom-left panel were the node color represents the OR behavior in a gradient from green to red values (i.e. green: OR<1; red

OR>1; white: OR = 1), denoting protection and risk, respectively. Similarly, node size is proportional to the -log10 p-value obtained from the meta-analysis (if absent, node size is the minimum).

Figure S3. Logical relationships among sub-networks. (A) Venn diagram for the sub- networks related to lipid/apolipoprotein metabolism ontologies: SN5 composed by 49 genes (blue circle) and SNG3 composed by 44 genes (yellow circle), sharing APOE and

ABCA1 genes. (B) Venn diagram for the sub-networks associated to Glutamate signaling pathway: SN1 (blue circle), SNG1 (red circle) and SNG2 (green circle) composed by 43, 33 and 37 genes respectively. Five genes are shared by all sub-networks (GRM1, ITPR1,

GRIN2B, HOMER1 and GRIK1), while SN1 and SNG1 have 3 more genes in common

(SHANK1, SHANK2 and PIK3CA), SN1 and SNG2 are far more related having 11 more genes present in both modules (GRM8, LPHN3, GRM7, HOMER2, MAP2K4, GRM4,

GNAS, GRIN2A, GNAQ, GNAO1 and GRM3), in contrast to SNG2 and SNG3 that only have one additional gene shared (NGF).

52

Figure S4. Gene expression analysis of glutamate signaling components in human brain regions. Normalized expression levels of the 20 genes of interest shared between

SN1, SNG1 and SNG2 in the 27 major normal human brain regions analyzed. Heatmaps were generated using normalized Z score gene-wise expression values averaged from 4 brain donor individuals (H0351.2001, H0351.2002, H0351.1009 and H0351.1012). FL:

Frontal Lobe; Ins: Insula; CgG: Cingulate gyrus; HiF: hippocampal formation; Br: Brain;

OL: ; PL: ; TL: ; Amg: ; BF: Basal

Forebrain; GP: Globus Pallidus; Str: Striatum; ET: Epithalamus; Hy: Hypothalamus; DT:

Dorsal Thalamus; MES: Mesencephalon; CbCx: Cerebellar Cortex; CbN: Cerebellar

Nuclei; Bpons: Basal Part of Pons; PTg: Pontine Tegmentum; MY: Myelencephalon; WM:

White Matter; SS: Sulci & Spaces; PHG: ; Cl: Claustrum; SbT:

Subthalamus; VT: Ventral Thalamus.

53

Table 1. Datasets used in this study Formatted: Font: 9 pt, Font color: Black Formatted: Space After: 0 pt, Line spacing: Table 1. Genome-Wide Association Studies main single features. Formatted: Font: Times New Roman, 12 pt, DatasetsStu IndSamples Ethnici Case Contro Platform SNPs Not Bold dy ty s ls Formatted: Font: Times New Roman, 12 pt TGen1 1,364 Caucas 829 535 Affymetrix 500K G 1,231,7 Deleted Cells ian eneChip 04a NIA-‐ 3,6891,680 Caucas 1,837 1,8527 Illumina 610-‐Quad 524,20 Deleted Cells LOAD/NCRA ian 978 02 01,245, Deleted Cells a D 964 Formatted Table Pfizer 1,525 Caucas 733 792 Illumina 550K, 610- 413,28 Deleted Cells ian Quad 8439,1 13 Deleted Cells TOTAL Total 6,5784,569 Caucas 3,399 3,1792 - 516,42 Deleted Cells b Meta-analysis ian 2540 029 4 1,216 Formatted: Font: 9 pt, Font color: Black ,213b Formatted Table ADNI 693 Caucas 524,99 499c 194 Illumina 610‐Quad (replication) ian 3 Formatted Table a imputed b a b c SNPs Imputed. SNPs shared in at least 2between the 3 studies . AD + mild cognitive impairment (MCI) individuals. Formatted: Font color: Auto Formatted: Font color: Auto Formatted Table

54

55

Formatted ... Formatted Table ... Formatted ... Formatted ... Formatted ... Table 2. Meta-analysis 25 top 20 associationshits. Formatted ... Formatted ... CHR CBP SNP A1 A2 N p- value P OR Gene Formatted ... hr Formatted ... 19 5008745950, rs2075650 G A 3 3.89E- 3.1405TOMM40, PVRL2 Formatted ... 087,459 1328.54E- 4.48 116 Formatted ... 19 5010067650, rs157580rs40 TG GA 2 1.25E- 1.6591APOE, TOMM40, PVRL2 Formatted ... 087,106 5509 349.60E-35 0.51 Formatted ... 19 50,106,291 rs439401 T C 2 8.82E-29 0.54 APOC1, APOE Formatted ... 19 5007387450, rs6859 A G 3 4.11E- 1.6106PVRL2 Formatted 073,874 347.87E-28 70 ... 19 5009350650, rs405509rs81 G AT 2 4.54E- 0.5870TOMM40, APOE Formatted ... 100,676 06922 342.29E-27 57 Formatted ... 19 5008710650, rs8106922rs1 G A 2 2.37E- 0.5851PVRL2, TOMM40 Formatted ... 093,506 57580 331.17E-25 57 Formatted 19 50106291 rs439401 T C 2 1.01E-25 0.6263 APOC1 ... Formatted ... 19 5002105450, rs10402271 G T 3 4.19E- 1.3550BCAM Formatted ... 021,054 161.98E-17 46 Formatted 19 50,018,504 rs10405693 T C 2 2.83E-12 1.49 BCAM ... 19 50,074,901 rs3852861 T G 2 5.32E-11 0.64 PVRL2 Formatted ... 19 4985775249, rs714948 A C 3 2.06E-0903E- 1.4254PVR Formatted ... 857,752 10 60 Formatted ... 19 4994600849, rs8103315 A C 2 4.40E- 1.3869BCL3 Formatted 946,008 091.87E-08 50 ... 19 5032804150, rs377702rs11 TA CG 3 1.05E99E-07 1.3609PPP1R37PVRL2 Formatted ... 054,507 14832 29 Formatted ... 19 49,933,947 rs2927438 A G 2 3.32E-07 1.35 intergenic Formatted ... 19 50,421,115 rs346763 A G 2 3.67E-07 1.80 EXOC3L2 Formatted 12 93,848,520 rs249153 C T 2 4.38E-07 1.41 intergenic ... 19 5005306450, rs440277 A G 2 1.27E4.79E- 0.7886PVRL2 Formatted ... 053,064 07 76 Formatted ... 1119 5959519750, rs1871047rs4 G TA 3 1.74E4.87E- 0.8089PVRL2MS4A3 Formatted ... 043,586 74951 07 79 Formatted 115 5959367330, rs528823rs13 TC CT 3 2.14E6.60E- 0.8099MS4A3intergenic ... 214,770 178362 07 75 Formatted ... 19 50342226 rs1048699 T C 3 3.43E-07 1.3480 NKPD1, PPP1R37 Formatted ... Formatted 1912 4993394793, rs249166rs29 AT G 2 7.58E6.91E- 1.2613Intergenicintergenic ... 852,604 27438 07 40 Formatted ... 12 9709773393, rs10860303rs TA CT 2 9.40E6.91E- 0.7967Intergenicintergenic Formatted ... 854,404 249167 07 1.40 Formatted ... 19 5005450750, rs10422797rs AC GT 32 1.40E02E-06 1.2111PVRL2EXOC3L2 Formatted 417,946 377702 82 ... 11 59,595,197 rs474951 G T 3 1.25E-06 0.79 MS4A3 Formatted ... 1911 4992965259, rs528823rs29 CT TC 3 1.76E55E-06 0.8191MS4A3Intergenic Formatted ... 593,673 65101 79 Formatted ... 3 17036772210rs9809961rs1 CG TA 3 3.23E1.63E- 1.2539MECOMFANCD2, Formatted ,110,577 552244 06 0.76 FANCD2OS ... 193 5004358610, rs9849434rs1 GA AG 32 4.21E1.88E- 0.8342FANCD2, Formatted ... 108,710 871047 06 71 FANCD2OSPVRL2 Formatted ... Formatted ... 56 Formatted ...

Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ...

Keys: CHRChr: Chromosome; BP: Physical position;Position (Base Pair, NCBI 36.3/Hg18); A1: Allele 1 Formatted: Font: 8 pt (Affected); A2: Allele 2 (referenceReference); N: Number of datasets containingwith information; p-value: Formatted: Font: 8 pt Fixed effect model p-value; OR: Odd Ratio. Formatted Table Formatted: Font: 8 pt Formatted: Font: 8 pt Formatted: Font: 8 pt Formatted: Font: 8 pt

Table 3. Gene ontology terms overrepresented in AD

57

Formatted ... Formatted ... Formatted Table ... Formatted ... Formatted ... Formatted ... Table 3. GO terms enriched in Meta-GW and Meta-GATES top 3 Sub-Networks. Formatted ... SN TypeGO ID GO IDName GISNGO GIP TGGISp-value Formatted ... Name N Formatted ... Meta-AD (gene-wise p value)GW Formatted ... SN1 BP GO:0007215 glutamate receptor signaling pathway 11 2.38E67E- 46 1149 11 Formatted ... Formatted GO:0007610 behavior 17 497 49 1.02E-08 ... GO:0003001 generation of a signal involved in cell-cell signaling 31112 1.86E03E-08 Inserted Cells 11332 49 ... GO:0023061 signal release 311 11 1.86E-08 Formatted ...

CC GO:00444560045202 synapse part 32821 9.40E- 2.87E-18 Formatted ... 17461 1749 Formatted Table ... GO:00452020044456 synapse part 43719 5.33E- Formatted 18347 49 1690E-18 ... GO:00970600031234 synapticextrinsic to internal side of plasma 1889 2.23E13E- Formatted ... membrane 1550 49 13 Formatted ... MF GO:0008066 glutamate receptor activity 13 5.65E- Inserted Cells ... 24 1149 161.21E-18 Formatted Table GO:0035254 glutamate receptor binding 198 2.18E03E- ... 722 49 10 Formatted ... 1.31E- GO:0031683 G-protein beta/gamma-subunit complex binding 218 4.63E-10 Formatted ... 622 0749 Formatted ... SN2 BP GO:00313440007411 regulation of cell projection organizationaxon 22533 1.94E20E- guidance 13341 46 09 Inserted Cells ... GO:00511300040012 positive regulation of cellular component 42414 3.59E- 1.16E-07 Formatted ... organizationlocomotion 16435 0846 Formatted ... GO:0035385 Roundabout signaling pathway 3 6.85E3.81E- Formatted 3 46 07 ... CC GO:0031252 cell leading edge 23112 1.83E- 9.35E-11 Inserted Cells ... 10245 0846 Formatted Table ... GO:0005955 calcineurin complex 3 1.69E-07 4 46 Formatted ... GO:0031256 leading edge membrane 8 87 46 4.67E-07 Formatted ... MF GO:0005042 netrin receptor activity 4 3.90E- 6.30E-10 Formatted ... 34 0746 Formatted ... GO:0048495 Roundabout binding 3 4 46 6.77E-08 Formatted ... SN3 BP GO:00072290035385 integrin-mediatedRoundabout signaling pathway 693 4.31E- 1.60E-07 103 1145 Formatted ... CC GO:0008305 integrin complex 29 10 2.61E-15 Inserted Cells ... GO:0043235 receptor complex 157 10 1.21E-08 Formatted ... SN4 BP GO:0006470 protein dephosphorylation 131 16 8.45E-17 Formatted Table ... GO:0016311 dephosphorylation 211 17 3.89E-11 Formatted ... GO:00461460061364 tetrahydrobiopterin metabolicapoptotic process 73 2.83E- 1.74E-07 involved in luteolysis 3 0545 Formatted ... MF GO:0019198 transmembrane receptor protein phosphatase activity 18 9 7.76E-14 Formatted ... GO:00425780021889 phosphoric ester hydrolase activityolfactory bulb 3134 3.05E15E- Inserted Cells ... interneuron differentiation 1811 45 06 Formatted ... SN5 BP GO:0016125 sterol metabolic process 112 15 5.32E-10 Formatted MF GO:00082020048495 steroid metabolic processRoundabout binding 2633 6.69E9.46E- ... 164 45 07 Inserted Cells ... GO:0008610 lipid biosynthetic process 460 13 1.48E-06 Formatted ... Formatted Table ... 58 Formatted ...

Formatted ... Formatted ... Inserted Cells ... Formatted ... Formatted ... Inserted Cells ... Formatted Table ... Formatted ... Formatted ... Formatted ... Formatted ... Formatted ... Inserted Cells ... Formatted ... Formatted ... Inserted Cells ... Formatted ... Formatted Table ... Formatted ... Formatted ... Formatted ... Formatted ... Inserted Cells ... Formatted ... Formatted ... Formatted ... Formatted ... Inserted Cells ... Formatted Table ... Formatted ... Formatted ... Inserted Cells ... Formatted ... Formatted Table ... Formatted ... Formatted ... Inserted Cells ... Formatted Table ... Formatted ... Formatted ... Inserted Cells ... Formatted ... Formatted ... Formatted Table ... Formatted ... Formatted CC GO:0008076 voltage-gated potassium channel complex 60 8 6.88E-08 ...

GO:0034702 ion channel complex 178 8 7.51E-06 Formatted ...

Meta-AD-GATES Formatted ... SNG1SN1 BP GO:0007215 glutamate receptor signaling pathway 4011 1.47E- Inserted Cells ... 646 48 0686E-11 Formatted Table ... SNG2 BP GO:00420440007610 fluid transportbehavior 3217 1.18E- 6.58E-09 7497 0848 Formatted ... GO:00356370003001 multicellular organismalgeneration of a signal 41212 2.47E- 7.92E-09 Inserted Cells ... involved in cell-cell signaling 16332 0848 Formatted ... GO:0007215 glutamate receptor signaling pathway 40 8 9.71E-08 Formatted ... MFCC GO:00099750045202 cyclase activitysynapse 1221 1.57E- Formatted 5461 48 0969E-18 ... Formatted GO:0044456 synapse part 19 347 48 3.70E-18 ... Inserted Cells ... GO:0097060 synaptic membrane 16 202 48 2.55E-13 Formatted Table MF GO:0008066 glutamate receptor activity 1813 4.75E- 1.21E-18 ... 624 0848 Formatted ... GO:00168290035254 lyase activityglutamate receptor binding 728 2.34E- 1.14E-10 Formatted ... 522 0548 G-protein beta/gamma-subunit complex Inserted Cells ... GO:0031683 7 8.22E-09 binding 22 48 Formatted ... regulation of plasma lipoprotein particle SNG3SN2 BP GO:00970060008202 4211 2.55E- Formatted Table ... levelssteroid metabolic process 10269 50 1313E-09 Formatted ... GO:0010876 lipid localization 12613 2.45E- 3.81E-09 16221 1150 Formatted ... GO:0006869 lipid transport 11212 1.10E8.12E- Formatted ... 16195 50 09 Inserted Cells ... CC GO:0032994 protein-lipid complex 215 1.90E- 2.51E-07 Formatted 436 0650 ... MF GO:00341850048495 apolipoproteinRoundabout binding 143 2.39E- Formatted ... 74 50 1202E-06 Inserted Cells ... GO:00703250071814 lipoprotein particle receptorprotein-lipid complex 154 2.20E3.22E- Formatted Table ... binding 523 50 06 Formatted ... GO:1901681 sulfur compound binding 7 163 50 8.26E-06 Formatted SN3 BP GO:0046777 protein autophosphorylation 16 174 21 2.30E-12 ... Formatted GO:0040012 regulation of locomotion 10 435 21 6.55E-08 ... Inserted Cells ... GO:0051270 regulation of cellular component movement 10 443 21 1.14E-07 Formatted ... CC GO:0045202 synapse 7 461 21 3.28E-06 Formatted MF GO:00053190019199 lipid transportertransmembrane receptor protein 3012 5.43E- 1.18E-12 ... kinase activity 682 0621 Formatted ... GO:0019838 growth factor binding 9 99 21 3.42E-12 Inserted Cells ... GO:0004713 protein tyrosine kinase activity 16 136 21 1.09E-10 Formatted Table ... SN: Sub-network; GO ID:Gene ontology term ID; GIP: Genes in population; GISN: Genes in sub-network; TG: Total Formatted ... genes in SN; BP: Biological process; CC:Cellular component; MF: Molecular function Formatted ... SN: sub-networtk; GO: gene ontology term; GIP: genes in population; GISN: genes in sub-network; BP: biological process; Formatted ... CC: cellular component; MF: molecular function. Formatted ... Inserted Cells ... Formatted ... Formatted ... Formatted ... Formatted Table ... 59 Formatted ...

Inserted Cells ... Formatted ...

Table 4. Glutamate positive sub-networks in ADNI-GW analysis. SN Type GO ID GO Name GISN GIP TG p-value SN3 BP GO:0035637 multicellular organismal signaling 19 483 48 5.87E-12 GO:0019226 transmission of nerve impulse 19 467 48 6.22E-10 GO:0007215 glutamate receptor signaling pathway 7 44 48 5.09E-08 CC GO:0044447 axoneme part 6 20 48 9.54E-11 GO:0044463 cell projection part 12 235 48 1.12E-10 GO:0008328 ionotropic glutamate receptor complex 6 21 48 3.24E-08 MF GO:0008066 glutamate receptor activity 5 19 48 1.78E-06 SN4 BP GO:0031280 Neg. regulation of cyclase activity 4 19 22 2.59E-06 GO:0051350 Neg. regulation of lyase activity 4 20 22 3.17E-06 GO:0030809 Neg. regulation of nucleotide biosynthetic process 4 23 22 2.38E-06 MF GO:0008066 glutamate receptor activity 6 19 22 7.14E-08 SN7 BP GO:0007215 glutamate receptor signaling pathway 7 44 40 5.09E-08 CC GO:0008328 ionotropic glutamate receptor complex 4 21 40 2.09E-05 MF GO:0008066 glutamate receptor activity 5 19 40 3.40E-08 SN: Sub-network; GO ID: Gene ontology term ID; GIP: Genes in population; GISN: Genes in sub-network; TG: Total genes in SN; BP: Biological process; CC: Cellular component; MF: Molecular function

60

Response to Reviewers

Centro de Investigaciones Biomédicas Facultad Ciencias Biológicas Facultad de Medicina República Nº 239, piso 2 Interior

Perez-Palma, Bustos et al. Your Reference: PONE-D-13-13386, Revise Response to Reviewers

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions? Reviewer #1: Yes Reviewer #2: Partly

Please explain (optional). Reviewer #2: Pathways analysis is an intriguing approach to complex disorders with the problem of missing heritability, thus this is a worthy project. However, I do have some minor concerns regarding the quality of the data. First, rather than only imputing the TGEN sample (which was genotyped on a different platform), it would be better to impute all three samples. Second, there was no mention of the QC procedures on the Pfizer dataset, and only the MIND threshold was mentioned regarding the TGEN dataset (and this threshold, 0.05, was different from the NIA-LOAD/NCRAD threshold of 0.02. Why were different thresholds used across samples?) Third, the NIA-LOAD/NCRAD contained family data. Instead of correcting for the kinship coefficient, it would be cleaner to simply analyze one individual from each family and exclude the other individuals. Fourth, what was the rational/inclusion criteria for choosing only the 4 brain samples for the expression analysis?

ANSWERS TO REVIEWER #2 First, rather than only imputing the TGEN sample (which was genotyped on a different platform), it would be better to impute all three samples. R: It was not possible to impute all the samples since the Pfizer dataset comes from the summary statistics results downloaded directly from the supplementary information (provided in the original publication), and thus no genotypes were available for imputation. Nonetheless, we did impute the NCRAD dataset. Briefly, while in the previous submission we applied a “kinship correction” to correct for familial bias (see below), here we removed the familial component by selecting only unrelated individuals, as it has been described in Antunez et al. (2011). Such data was subsequently imputed for missing genotypes (as it was done in the TGEN1 dataset) in order to increase the number of genetic variants shared between datasets.

Second, there was no mention of the QC procedures on the Pfizer dataset, and only the MIND threshold was mentioned regarding the TGEN dataset (and this threshold, 0.05, was different from the NIA-LOAD/NCRAD threshold of 0.02. Why were different thresholds used across samples?)

CAMPUS CASONA LAS CONDES CAMPUS REPUBLICA CAMPUS VIÑA DEL MAR FERNANDEZ CONCHA 700, LAS CONDES AV. REPUBLICA 252, SANTIAGO * LOS FRESNOS 91, MIRAFLORES TELEFONO: (2) 6618500 FAX: (2) 2153767 TELEFONO: (2) 6618000 FAX: (2) 6711936 TELEFONO: (32) 328800 FAX: (32) 673082 e-mail: [email protected] e-mail: [email protected] * ALVARES 2138, CHORRILLOS www.unab.cl TELEFONO: (32) 673204 FAX: (32) 630427 e-mail: [email protected]

Centro de Investigaciones Biomédicas Facultad Ciencias Biológicas Facultad de Medicina República Nº 239, piso 2 Interior

R: We corrected the MIND threshold for the NCRAD dataset (MIND = 0.05) in order to make it comparable to the TGEN dataset. Since we were not able to obtain the genotypic data, we did not modify the QC procedures in the Pfizer dataset, which were described in the original publication by Hu X et al. (2011). We note that the missing call rate per sample is an informative indicator of sample quality and that different studies generally fail samples with missing call rates >5% (see for instance Laurie et al., 2010; Génin et al., 2011; Merjonen et al., 2011). Finally, we did not observe significant changes in the association results when comparing the NCRAD data with different MIND thresholds (0.01 and 0.05).

Third, the NIA-LOAD/NCRAD contained family data. Instead of correcting for the kinship coefficient, it would be cleaner to simply analyze one individual from each family and exclude the other individuals. R: As suggested by the reviewer, and following the recommendation of Antunez et al. (2011), using family trees provided with the dbGAP data, we excluded all related controls and kept only 1 case per family in order to clean the data. This procedure also allowed imputing NCRAD data.

Fourth, what was the rational/inclusion criteria for choosing only the 4 brain samples for the expression analysis?" R: At the time of our previous submission, we took the 4 available brain donors with expression data from the Allen Institute for Brain Science. In this revised version, we have updated the expression analysis including data for 2 new individuals, which is the complete series of brain donors.

2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: No

Please explain (optional). Reviewer #2: The results were declared significant after being compared via t-test to ten permutations. Although computationally intensive, a much larger amount of permutations (i.e. 1000 permutations) would be required to convincingly establish statistical significance. This would represent a more accurate null distribution. R: We note that every time JActive modules is interrogated in order to report a significant sub-network, the program internally creates a null distribution by permuting 10,000 times the p-values over the genes introduced in a Monte Carlo procedure. Therefore, each set of p-values interrogated is indeed permuted 100,000 times. We included a detailed explanation of this procedure in the Methods and Results sections of the manuscript.

4. Is the manuscript presented in an intelligible fashion and written in standard English?

CAMPUS CASONA LAS CONDES CAMPUS REPUBLICA CAMPUS VIÑA DEL MAR FERNANDEZ CONCHA 700, LAS CONDES AV. REPUBLICA 252, SANTIAGO * LOS FRESNOS 91, MIRAFLORES TELEFONO: (2) 6618500 FAX: (2) 2153767 TELEFONO: (2) 6618000 FAX: (2) 6711936 TELEFONO: (32) 328800 FAX: (32) 673082 e-mail: [email protected] e-mail: [email protected] * ALVARES 2138, CHORRILLOS www.unab.cl TELEFONO: (32) 673204 FAX: (32) 630427 e-mail: [email protected]

Centro de Investigaciones Biomédicas Facultad Ciencias Biológicas Facultad de Medicina República Nº 239, piso 2 Interior

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors below. Reviewer #1: Yes Reviewer #2: No

Please explain (optional). Reviewer #2: Some of the main analytical concepts presented in the paper are difficult to follow. For example, a more explicit description of exactly what the authors did for the FPAN analysis would be helpful. (Did they input all of the top genes from the association analysis into the "multiple names" tab in STRING?) An explanation of the Simes (GATES) procedure, and how it differs from the genewise min-p is needed. Additionally, an introduction to the cytoscape package and specifically what this does is necessary. Finally, although overall the English is quite good, there are a few typos (i.e. major instead of major), as well as a few awkwardly worded sentences. R: First, the final input for each analysis (i.e. gene-wise and GATES) was the intersection of genes annotated between top meta-analysis associations (results with a cutoff p-value < 0.05) and genes present in the FPAN (cut-off interaction score 0.7, high confidence). In this regard, not all genes associated with AD had high confidence functional interactions on STRING. This issue was further explained in the newer version of the manuscript. Second, the main difference between the GW and GATES procedure is that while the former uses the single strongest p-value present inside a gene without any correction, the latter corrects the p-values by gene-size and LD structure. The procedure has also been explained in detail in the text. Third, we expanded the Method section to include a general description of Cytoscape and its use in network interaction studies. Finally, a native speaking colleague has revised the text to improve the message of our manuscript.

5. Additional Comments to the Author (optional) Please offer any additional comments here, including concerns about dual publication or research or publication ethics. Reviewer #1: 1) In methods association analysis section, it is not clear how TGEN1 data was cleaned. Is it comparable to NIA-LOAD/NCRAD dataset? R: In this newer version of the manuscript, the QC procedures to clean both TGEN1 and NCRAD have been standardized in order to make the studies comparable (see also above). QC parameters in both studies are now: MAF 0.05; GENO 0.02; MIND 0.05; and HWE 1x10E-6. This information was included in the Method section.

2) Could the author explain why they used different interaction score cutoff (0.7 for Meta-AD, but 0.9 for Meta-AD-GATES) for the prior network? R: The initial idea was to compare results with different levels of confidence (interaction), but as corrections were made in sample size (to eliminate the familial contribution) the use of the STRING cutoff = 0.9 became less informative and too

CAMPUS CASONA LAS CONDES CAMPUS REPUBLICA CAMPUS VIÑA DEL MAR FERNANDEZ CONCHA 700, LAS CONDES AV. REPUBLICA 252, SANTIAGO * LOS FRESNOS 91, MIRAFLORES TELEFONO: (2) 6618500 FAX: (2) 2153767 TELEFONO: (2) 6618000 FAX: (2) 6711936 TELEFONO: (32) 328800 FAX: (32) 673082 e-mail: [email protected] e-mail: [email protected] * ALVARES 2138, CHORRILLOS www.unab.cl TELEFONO: (32) 673204 FAX: (32) 630427 e-mail: [email protected]

Centro de Investigaciones Biomédicas Facultad Ciencias Biológicas Facultad de Medicina República Nº 239, piso 2 Interior

stringent and therefore it was eliminated. In the revised manuscript all module searches (i.e. gene-wise, GATES, permuted and without genome-wide) have been performed using the same FPAN with an interaction score cutoff of 0.7, which reflects High Confidence Interactions, as defined by the STRING database.

3) In the module search section, it is not clear how the significant sub-network was defined and identified. Please give more details about how the Monte Carlo procedure was performed for the standardization of the score. Is the permutation for permuting phenotype, or just permuting the p values across genes? R: First, a significant sub-network was defined as a connected group of genes enriched with minor associations in AD disease with a degree that could not be reached by chance. These sub-networks have been identified using JActive modules, which is a procedure that internally compares the sub-network structure with 10,000 randomly generated sub-networks (Monte Carlo). A detailed explanation in relation to this subject was added to the manuscript. Finally, the permutation refers to just permuting p-values across the genes, since other controls related to bias in gene-size and LD were also included (GATES procedures).

4) Figure 4A), how to get the average number of significant modules for Meta-AD or Meta-AD-GATES? In addition, there is no description for C and D panels. R: In this version of the manuscript, Figure 4 corresponds to Figure 3, and it has been edited to accommodate the ADNI replication dataset. The average number of significant modules is obtained after 10 JActive iteration searches. It is shown in the corresponding figure and provided also in the text (Pathway Analysis section of Results). Note that Meta-AD-GW and Meta-AD-GATES have changed for clarity to Meta-GW and Meta-GATES, respectively. We finally acknowledge that in the previous version there was no description of panels C and D in Figure 4, which was a mistake that has been corrected.

References: Antunez C, Boada M, Gonzalez-Perez A, Gayan J, Ramirez-Lorca R, et al. (2011) The membrane- spanning 4-domains, subfamily A (MS4A) gene cluster contains a common variant associated with Alzheimer's disease. Genome Med 3: 33.

Génin E, Schumacher M, Roujeau JC, Naldi L, Liss Y, Kazma R, Sekula P, Hovnanian A, Mockenhaupt M. Genome-wide association study of Stevens-Johnson Syndrome and Toxic Epidermal Necrolysis in Europe. Orphanet J Rare Dis. 2011 Jul 29;6:52. Hu X, Pickering E, Liu YC, Hall S, Fournier H, et al. (2011) Meta-analysis for genome-wide association study identifies multiple variants at the BIN1 locus associated with late-onset Alzheimer's disease. PLoS One 6: e16616.

Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, Boehm F, Caporaso NE, Cornelis MC, Edenberg HJ, Gabriel SB, Harris EL, Hu FB, Jacobs KB, Kraft P, Landi MT, Lumley T, Manolio TA, McHugh C, Painter I, Paschall J, Rice JP, Rice KM, Zheng X, Weir BS; GENEVA Investigators. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet Epidemiol. 2010 Sep;34(6):591-602.

CAMPUS CASONA LAS CONDES CAMPUS REPUBLICA CAMPUS VIÑA DEL MAR FERNANDEZ CONCHA 700, LAS CONDES AV. REPUBLICA 252, SANTIAGO * LOS FRESNOS 91, MIRAFLORES TELEFONO: (2) 6618500 FAX: (2) 2153767 TELEFONO: (2) 6618000 FAX: (2) 6711936 TELEFONO: (32) 328800 FAX: (32) 673082 e-mail: [email protected] e-mail: [email protected] * ALVARES 2138, CHORRILLOS www.unab.cl TELEFONO: (32) 673204 FAX: (32) 630427 e-mail: [email protected]

Centro de Investigaciones Biomédicas Facultad Ciencias Biológicas Facultad de Medicina República Nº 239, piso 2 Interior

Merjonen P, Keltikangas-Järvinen L, Jokela M, Seppälä I, Lyytikäinen LP, Pulkki-Råback L, Kivimäki M, Elovainio M, Kettunen J, Ripatti S, Kähönen M, Viikari J, Palotie A, Peltonen L, Raitakari OT, Lehtimäki T. Hostility in adolescents and adults: a genome-wide association study of the Young Finns. Transl Psychiatry. 2011 Jun 21;1:e11.

CAMPUS CASONA LAS CONDES CAMPUS REPUBLICA CAMPUS VIÑA DEL MAR FERNANDEZ CONCHA 700, LAS CONDES AV. REPUBLICA 252, SANTIAGO * LOS FRESNOS 91, MIRAFLORES TELEFONO: (2) 6618500 FAX: (2) 2153767 TELEFONO: (2) 6618000 FAX: (2) 6711936 TELEFONO: (32) 328800 FAX: (32) 673082 e-mail: [email protected] e-mail: [email protected] * ALVARES 2138, CHORRILLOS www.unab.cl TELEFONO: (32) 673204 FAX: (32) 630427 e-mail: [email protected]