Two-Dimensional Enrichment Analysis for Mining High-Level Imaging Genetic Associations
Total Page:16
File Type:pdf, Size:1020Kb
Brain Informatics (2017) 4:27–37 DOI 10.1007/s40708-016-0052-4 Two-dimensional enrichment analysis for mining high-level imaging genetic associations Xiaohui Yao . Jingwen Yan . Sungeun Kim . Kwangsik Nho . Shannon L. Risacher . Mark Inlow . Jason H. Moore . Andrew J. Saykin . Li Shen Received: 21 January 2016 / Accepted: 29 April 2016 / Published online: 13 May 2016 Ó The Author(s) 2016. This article is published with open access at Springerlink.com Abstract Enrichment analysis has been widely applied in expression data from Allen Human Brain Atlas and the genome-wide association studies, where gene sets imaging genetics data from Alzheimer’s Disease Neu- corresponding to biological pathways are examined for roimaging Initiative as test beds, we present an IGEA significant associations with a phenotype to help increase framework and conduct a proof-of-concept study. This statistical power and improve biological interpretation. In empirical study identifies 25 significant high-level two-di- this work, we expand the scope of enrichment analysis into mensional imaging genetics modules. Many of these brain imaging genetics, an emerging field that studies how modules are relevant to a variety of neurobiological path- genetic variation influences brain structure and function ways or neurodegenerative diseases, showing the promise measured by neuroimaging quantitative traits (QT). Given of the proposal framework for providing insight into the the high dimensionality of both imaging and genetic data, mechanism of complex diseases. we propose to study Imaging Genetic Enrichment Analysis (IGEA), a new enrichment analysis paradigm that jointly Keywords Imaging genetics Á Enrichment analysis Á considers meaningful gene sets (GS) and brain circuits Genome-wide association study Á Quantitative trait (BC) and examines whether any given GS–BC pair is enriched in a list of gene–QT findings. Using gene 1 Introduction For the Alzheimer’s Disease Neuroimaging Initiative: Data used in preparation of this article were obtained from the Alzheimer’s Brain imaging genetics is an emerging field that studies Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). how genetic variation influences brain structure and func- As such, the investigators within the adni contributed to the design and implementation of ADNI and/or provided data but did not tion. Genome-wide association studies (GWAS) have been participate in analysis or writing of this report. A complete listing of performed to identify genetic markers such as single ADNI investigators can be found at: http://adni.loni.usc.edu/wp- nucleotide polymorphisms (SNPs) that are associated with content/uploads/how_to_apply/adni_acknowledgement_list. brain imaging quantitative traits (QTs) [20, 21]. Using biological pathways and networks as prior knowledge, X. Yao Á J. Yan Á S. Kim Á K. Nho Á S. L. Risacher Á M. Inlow Á A. J. Saykin Á L. Shen (&) enrichment analysis has also been performed to discover Radiology and Imaging Sciences, Indiana University School of pathways or network modules enriched by GWAS findings Medicine, 355 West 16th Street Suite 4100, Indianapolis, IN, USA to enhance statistical power and help biological interpre- e-mail: [email protected] tation [6]. For example, numerous studies on complex X. Yao diseases have demonstrated that genes functioning in the School of Informatics and Computing, Indiana University same pathway can influence imaging QTs collectively even Indianapolis, Indianapolis, IN, USA when constituent SNPs do not show significant association individually [18]. Enrichment analysis can also help iden- J. H. Moore Biomedical Informatics, School of Medicine, University of tify relevant pathways and improve mechanistic under- Pennsylvania, Philadelphia, PA, USA standing of underlying neurobiology [7, 11, 15, 19]. 123 28 X. Yao et al. In the genetic domain, enrichment analysis has been 2.1 Brain Wide Genome-Wide Association Study widely studied in gene expression data analysis and has (BWGWAS) recently been modified to analyze GWAS data. GWAS- based enrichment analysis first maps SNP-level scores to The imaging and genotyping data used for BWGWAS were gene-based scores, and then tests whether a pre-defined obtained from the Alzheimer’s Disease Neuroimaging gene set S (e.g., a pathway) is enriched in a set of signifi- Initiative (ADNI) database (adni.loni.usc.edu). The ADNI cant genes L (e.g., GWAS findings). Two strategies are was launched in 2003 as a public–private partnership, led often used to compute enrichment significance: threshold- by Principal Investigator Michael W. Weiner, MD. The based [4, 5, 9, 24] and rank-based [23]. Threshold-based primary goal of ADNI has been to test whether serial approaches aim to solve an independence test problem magnetic resonance imaging (MRI), positron emission (e.g., chi-square test, hypergeometric test, or binomial z- tomography (PET), other biological markers, and clinical test) by treating genes as significant if their scores exceed a and neuropsychological assessment can be combined to threshold. Rank-based methods take into account the score measure the progression of mild cognitive impairment of each gene to determine if the members of S are ran- (MCI) and early Alzheimer’s disease (AD). For up-to-date domly distributed throughout L. information, see http://www.adni-info.org. In brain imaging genetics, the above enrichment analy- Preprocessed [18F]Florbetapir PET scans (i.e., amyloid sis methods are applicable only to genetic findings asso- imaging data) were downloaded from adni.loni.usc.edu, ciated with each single imaging QT. Our ultimate goal is to then aligned to the corresponding MRI scans and normal- discover high-level associations between meaningful gene ized to the Montreal Neurological Institute (MNI) space as sets (GS) and brain circuits (BC), which typically include 2 Â 2 Â 2 mm voxels. ROI level amyloid measurements multiple genes and multiple QTs. To achieve this goal, we were further extracted based on the MarsBaR AAL atlas. propose to study Imaging Genetic Enrichment Analysis Genotype data of both ADNI-1 and ADNI-GO/2 phases (IGEA), a new enrichment analysis paradigm that jointly were also downloaded, and then quality controlled, impu- considers sets of interest (i.e., GS and BC) in both genetic ted, and combined as described in [10]. A total of 980 non- and imaging domains and examines whether any given Hispanic Caucasian participants with both complete amy- GS–BC pair is enriched in a list of gene–QT findings. loid measurements and genome-wide data were studied. Using whole brain whole genome gene expression data Associations between 105 (out of a total 116) baseline from Allen Human Brain Atlas (AHBA) and imaging amyloid measures and 5,574,300 SNPs were examined by genetics data from Alzheimer’s Disease Neuroimaging performing SNP-based GWAS using PLINK [17] with sex, Initiative (ADNI) as test beds, we present a novel IGEA age, and education as covariates. To facilitate the subse- framework and conduct a proof-of-concept study to explore quent enrichment analysis, a gene-based p value was high-level imaging genetic associations based on brain determined as the smallest p value of all SNPs located in wide genome-wide association study (BWGWAS) results. Æ20 K bp of the gene [14]. For consistency purpose, in this paper, we use GS to indicate a set of genes and BC to indicate a set of regions of 2.2 Constructing GS–BC modules using AHBA interest (ROIs) in the brain. The proposed framework consists of the following steps (see also Fig. 1): (1) conduct There are many types of prior knowledge that can be used BWGWAS on ADNI amyloid imaging genetics data to to define meaningful GS and BC entities. In the genomic identify SNP-QT and gene–QT associations, (2) use AHBA domain, the prior knowledge could be based on Gene to identify meaningful GS–BC modules, (3) perform IGEA Ontology or functional annotation databases; in the imag- to identify GS–BC modules significantly enriched by gene– ing domain, the prior knowledge could be neuroanatomic QT associations using a threshold-based strategy, and (4) ontology or brain databases. In this work, to demonstrate visualize and interpret the identified GS–BC modules. the proposed IGEA framework, we use gene expression data from the Allen Human Brain Atlas (AHBA, Allen Institute for Brain Science, Seattle, WA, USA; available from http://www.brain-map.org/) to extract GS and BC 2 Methods and materials modules such that genes within a GS share similar expression profiles and so do ROIs within a BC. We We write matrices and vectors as bold uppercase and hypothesize that, given these similar co-expression patterns lowercase letters, respectively. Given a matrix M ¼½mij, across genes and ROIs, each GS–BC pair forms an inter- i we denote its ith row as m and jth column as mj. Given esting high-level imaging genetic entity that may be related two column vectors a and b, we use corrða; bÞ to denote to certain biological function and can serve as a valuable their Pearson’s correlation coefficient. candidate for two-dimensional IGEA. 123 Two-dimensional enrichment analysis for mining high-level 29 (A) SNP-level GWAS (C) Microarray Expression Matrix Gene-ROI Expression Matrix ROI1 . ROIn Loci1 . Lociq ROI1 . ROIn Probe1 SNP1 Genotyping Gene1 X X Data . Mapping & X Merging X X Imaging SNPk X X Data Probep Genem 2-Dimensional Clustering Mapping (E) Imaging-Genec Enrichment Analysis Clustering Filtering (B) Gene-level GWAS (D) Gene Set-Brain Circuit Modules ROI1 . ROIn ROI1 . ROIn Gene1 Gene1 X XX X X . X . X XX Genem X X (F) Annotaon Genem Funconal Annotaon Imaging Annotaon GO Terms Brain Mapping KEGG Pathways OMIM Diseases Fig. 1 Overview of the proposed Imaging Genetic Enrichment clustering, and then filter out 2D clusters with an average correlation Analysis (IGEA) framework. A Perform SNP-level GWAS of brain below a user-given threshold.