Gene-Based and Pathway-Based Genome-Wide Association Study of Alcohol Dependence
Total Page:16
File Type:pdf, Size:1020Kb
• 111 • Shanghai Archives of Psychiatry, 2015, Vol. 27, No. 2 •Original research article• Gene-based and pathway-based genome-wide association study of alcohol dependence Lingjun ZUO1, Clarence K. ZHANG2,3*, Frederick G. SAYWARD4,5, Kei-Hoi CHEUNG4, Kesheng WANG6, John H. KRYSTAL1, Hongyu ZHAO2, Xingguang LUO1* Background: The organization of risk genes within signaling pathways may provide clues about the converging neurobiological effects of risk genes for alcohol dependence. Aims: Identify risk genes and risk gene pathways for alcohol dependence. Methods: We conducted a pathway-based genome-wide association study (GWAS) of alcohol dependence using a gene-set-rich analytic approach. Approximately one million genetic markers were tested in the discovery sample which included 1409 European-American (EA) alcohol dependent individuals and 1518 EA healthy comparison subjects. An additional 681 African-American (AA) cases and 508 AA healthy subjects served as the replication sample. Results: We identified several genome-wide replicable risk genes and risk pathways that were significantly associated with alcohol dependence. After applying the Bonferroni correction for multiple testing, the ‘cell- extracellular matrix interactions’ pathway (p<2.0E-4 in EAs) and the PXN gene (which encodes paxillin) (p=3.9E-7 in EAs) within this pathway were the most promising risk factors for alcohol dependence. There were also two nominally replicable pathways enriched in alcohol dependence-related genes in both EAs (0.015≤p≤0.035) and AAs (0.025≤p≤0.050): the ‘Na+/Cl- dependent neurotransmitter transporters’ pathway and the ‘other glycan degradation’ pathway. Conclusions: These findings provide new evidence highlighting several genes and biological signaling processes that may be related to the risk for alcohol dependence. Key words: gene-based GWAS; pathway-based GWAS; cell-extracellular matrix interaction pathway; PXN; paxillin; alcohol dependence [Shanghai Arch Psychiatry. 2015; 27(2): 111-118. doi: http://dx.doi.org/10.11919/j.issn.1002-0829.215031] 1. Introduction complex interplay of variants within genes, genomic [8] Conventional genome-wide association studies regions, or gene pathways. Gene variants that (GWASs) focused on the impact of single nucleotide individually contribute slightly to alcoholism risk but that polymorphisms (SNPs) have identified a large number may have a more important effect in moderating the impact of other risk genes may be missed by the single- of significant or suggestive risk genes for alcohol [9] dependence and alcohol consumption.[1-7] However, SNP analytic strategy. This problem may be reduced single-SNP analysis often identifies only a few of the by employing gene-based and pathway-based analytic most significant SNPs of the genome and they can approaches. only explain a small proportion of the genetic risk Gene- and pathway-based methods have many for diseases. Accumulating evidence suggests that advantages over the single-SNP approach.[9,10] First, susceptibility to alcohol dependence emerges from a the functions of many individual SNPs are not well 1 Department of Psychiatry, Yale University School of Medicine, New Haven, CT, United States 2 Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT, United States 3 Biostatistics Resource, Keck Laboratory, Department of Genetics, Yale University School of Medicine, New Haven, CT, United States 4 Center for Medical Informatics, Yale University School of Medicine, New Haven, CT, United States 5 Cooperative Studies Program Coordinating Center, VA Connecticut Healthcare System, West Haven, CT, United States. 6 Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson City, TN, United States *correspondence: XG Luo, [email protected]; CK Zhang, [email protected] A full-text Chinese translation of this article will be available at http://dx.doi.org/10.11919/j.issn.1002-0829.215031 on June 6, 2015. Shanghai Archives of Psychiatry, 2015, Vol. 27, No. 2 • 112 • characterized but the functions of whole genes and data were all approved by the respective institutional particular gene pathways are more clearly characterized; review boards, all subjects participating in the projects many functional studies (e.g., gene expression studies) provided written informed consent, and the current have been conducted at the gene or pathway level analysis was approved by the institutional review board making it possible to assess the association of biological of Yale University. functions with specific genes and pathways. Second, locus heterogeneity (i.e., alleles at different loci cause Figure 1. Enrollment of subjects in the study diseases in different populations) make it difficult to replicate association findings for a single marker, but replication at the gene or pathway level might still be SAGE dataset COGA dataset possible when locus heterogeneity exists because a (Study of Addiction: (Collaborative Study On The Genetics and Environment) Genetics of Alcoholism) gene or pathway can harbor multiple alleles of the 4316 subjects 1957 subjects heterogeneous risk markers. Finally, because the numbers of genes and pathways across the genome are 1477 overlapping subjects much less than the number of single markers, gene- excluded based and pathway-based analyses can substantially reduce the number of comparisons considered and, 4796 subjects thus, lead to better statistical power. In the present study, we aimed to identify the risk 680 subjects do not meet inclusion criteria genes for alcohol dependence and the pathways that are enriched in alcohol dependence-related genes. In view of the fact that the effects of an entire gene that 4116 subjects integrates many SNPs would be different from those of a single SNP, and the effects of an entire pathway that integrates many genes would be different from those 2927 European Americans 1189 African-Americans of a single gene, it is anticipated that the results from gene-based analyses might not be completely consistent 1518 controls 681 cases 508 controls with those from pathway-based analyses that use the 1409 cases same dataset, and, similarly, the results from gene- and pathway-based analyses might not be completely consistent with those from SNP-based analyses in 2.2 Genotyping previous GWASs on the same datasets. In other words, All subjects were genotyped on the Illumina Human 1M gene- and pathway-based analysis may lead to novel beadchip. Phenotype and genotype data were rigorously findings. cleaned before association analysis. Subjects with poor genotypic data, allele discordance, problematic sample identification (relatedness, misidentification, 2. Materials and Methods misspecification), duplicated identifiers, gender or 2.1 Subjects chromosomal anomalies, ethnicity issues (including missing information, non-EA or AA, mismatch between The identification of the GWAS data used in this analysis self- and genetically-inferred ethnicity), or with a is shown in Figure 1. Data from 1409 European-American missing genotype call rate ≥2% across all SNPs were (EA) cases with alcohol dependence (based on DSM-IV excluded. Furthermore, SNPs with allele discordance, criteria),1518 EA healthy controls,681 African-American chromosomal anomalies or batch effect, SNPs with an (AA) cases, and 508 AA healthy controls were included overall missing genotype call rate ≥2%, monomorphic in this analysis. Detailed demographic data on these [11,12] SNPs, SNPs with minor allele frequencies <0.01 in subjects were presented in previous GWASs. These either EAs or AAs, and SNPs that deviated from Hardy- data came from the merged SAGE (Study of Addiction: Weinberg equilibrium (p<10−4) within EA or AA controls Genetics and Environment) and COGA (Collaborative were also excluded. This selection process yielded Study On The Genetics of Alcoholism) datasets, which are 805,814 SNPs in EAs and 895,714 SNPs in AAs.[11,12] available on the database of Genotypes and Phenotypes (dbGaP)( https://www.ncbi.nlm.nih.gov/gap). The SAGE dataset (dbGaP access number: phs000092. 2.3 Statistical methods v1.p1) included COGA, COGEND (Collaborative Genetic The genotyping data on autosomes were extracted from Study of Nicotine Dependence), and FSCD (Family an Oracle database and stored efficiently in flat files for Study of Cocaine Dependence) subsets. This COGA gene- and pathway-based analysis. subset included in the SAGE dataset was a subset of the main dbGaP COGA dataset (access number: phs000125.v1.p1), so when we merged the SAGE and 2.3.1 Gene-based GWAS analysis COGA datasets, one copy of 1477 overlapping subjects The genotype was configured into a genotype score of were excluded.[11] The projects that collected these 1, 2, or 3: 1 represented a minor allele homozygote, 2 • 113 • Shanghai Archives of Psychiatry, 2015, Vol. 27, No. 2 represented a heterozygote, and 3 represented a major was significantly associated with alcohol dependence allele homozygote. SNPs were mapped to known genes/ (p=3.9E-7). If flanking regions were reduced to ±10kb, exons/introns boundaries obtained from the National PXN (±10kb) remained significantly associated with Center for Biotechnology Information (NCBI). Principal alcohol dependence in EAs (p<E-8), and the other 19 component analysis was applied to the SNPs within top-ranked risk genes remained nominally significant the defined gene boundary and then the components (p<0.05). which explained at least 85% of the variation were used