Alzheimer Disease Pathology-Associated Polymorphism in a Complex Variable Number of Tandem Repeat Region Within the MUC6 Gene, Near the AP2A2 Gene
Total Page:16
File Type:pdf, Size:1020Kb
J Neuropathol Exp Neurol Vol. 79, No. 1, January 2020, pp. 3–21 doi: 10.1093/jnen/nlz116 ORIGINAL ARTICLE Alzheimer Disease Pathology-Associated Polymorphism in a Complex Variable Number of Tandem Repeat Region Within the MUC6 Gene, Near the AP2A2 Gene Yuriko Katsumata, PhD, David W. Fardo, PhD, Adam D. Bachstetter, PhD, Sergey C. Artiushin, PhD, Wang-Xia Wang, PhD, Angela Wei, Lena J. Brzezinski, Bela G. Nelson, Qingwei Huang, Erin L. Abner, PhD, Sonya Anderson, Indumati Patel, Benjamin C. Shaw, Douglas A. Price, Dana M. Niedowicz, PhD, Donna W. Wilcock, PhD, Gregory A. Jicha, MD, PhD, Janna H. Neltner, MD, Linda J. Van Eldik, PhD, Steven Estus, PhD, and Peter T. Nelson, MD, PhD ies were probably unable to detect this association), and AP2A2 was Abstract often colocalized with neurofibrillary tangles in LOAD. We found evidence of late-onset Alzheimer disease (LOAD)-as- sociated genetic polymorphism within an exon of Mucin 6 (MUC6) Key Words: ADSP, AP-2, Digital pathology, GWAS, ScanScope, and immediately downstream from another gene: Adaptor Related Whole-exome sequencing. Protein Complex 2 Subunit Alpha 2 (AP2A2). PCR analyses on ge- nomic DNA samples confirmed that the size of the MUC6 variable number tandem repeat (VNTR) region was highly polymorphic. In a cohort of autopsied subjects with quantitative digital pathology data INTRODUCTION (n ¼ 119), the size of the polymorphic region was associated with Late-onset Alzheimer disease (LOAD) is a devastating the severity of pTau pathology in neocortex. In a separate replication neurological condition with aspects of heritable risk that are cohort of autopsied subjects (n ¼ 173), more pTau pathology was incompletely understood (1–3). The gold standard endopheno- again observed in subjects with longer VNTR regions (p ¼ 0.031). types for indicating the presence and severity of LOAD are the Unlike MUC6, AP2A2 is highly expressed in human brain. AP2A2 neuropathologic hallmarks: Amyloid-b (Ab) plaques and neu- expression was lower in a subset analysis of brain samples from per- rofibrillary tangles (NFTs) composed of Tau proteins (4). In sons with longer versus shorter VNTR regions (p ¼ 0.014 normaliz- clinical-pathologic correlation and neuroimaging biomarker ing with AP2B1 expression). Double-label immunofluorescence studies, the distribution of NFT/Tau proteinopathy in neocorti- studies showed that AP2A2 protein often colocalized with neurofi- cal regions was strongly associated with cognitive impairment brillary tangles in LOAD but was not colocalized with pTau protein- (5–7). opathy in progressive supranuclear palsy, or with TDP-43 Single nucleotide polymorphisms (SNPs) in or near proteinopathy. In summary, polymorphism in a repeat-rich region >20 different genes have been linked to LOAD risk (8– near AP2A2 was associated with neocortical pTau proteinopathy 12). The APOE e4 allele, a highly impactful genetic risk (because of the unique repeats, prior genome-wide association stud- factor, was demonstrated to be associated with the LOAD phenotype by testing APOE alleles in relatively small groups: 95 LOAD cases were initially compared with 139 From the Sanders-Brown Center on Aging (YK, DWF, ADB, SCA, W-XW, “unaffected” controls (13). Genome-wide association stud- AW, LJB, BGN, QH, ELA, SA, IP, DAP, DMN, DWW, GAJ, LJVE, ies (GWAS) have been performed subsequently, with geno- PTN); Department of Biostatistics (YK, DWF); Spinal Cord & Brain In- typing based on preselected SNP probes and imputing jury Research Center (ADB); Department of Neuroscience (ADB, DWW, LJVE); Department of Epidemiology (ELA); Department of Neu- additional variants. Recent LOAD GWAS sample sizes rology (DWW, GAJ); Department of Physiology (BCS, SE); and Depart- were large, and imputation approaches have improved (10, ment of Pathology (W-XW, JHN, PTN), University of Kentucky, 14, 15). These advances enabled the identification of many Lexington, Kentucky. individual gene variants that have relatively subtle associa- Send correspondence to: Peter T. Nelson, MD, PhD, Sanders-Brown Center on Aging, University of Kentucky, 311, 800 S. Limestone Ave., Lexing- tions with LOAD risk, although collectively these SNPs ton, KY 40536; E-mail: [email protected] may have large impact (16). Funding included grants P30 AG028383, R01 AG057187, R01 AG042475, Unfortunately, the complexity of the human genome and U01 AG016976 from the National Institute on Aging (NIA)/National and shortcomings of extant sequence characterization methods Institutes of Health (NIH). Please see the additional Acknowledgment section at the end of the Discussion section. are limiting factors, so that some genomic phenomena were The authors have no duality or conflicts of interest to declare. not surveyed completely in prior studies. For example, rare Supplementary Data can be found at academic.oup.com/jnen. gene variants were often ignored or removed from GWAS VC 2019 American Association of Neuropathologists, Inc. 3 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecom- mons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please [email protected] Katsumata et al J Neuropathol Exp Neurol • Volume 79, Number 1, January 2020 analyses because they were not imputed accurately (17–19). ble S1). ADSP generated WES data for 10 913 unrelated Many incompletely annotated genomic regions, and areas subjects with the following features: (1) paired-end reads with repetitive sequences, were also trimmed off by software were mapped to the GRCh37 human reference genome in algorithms used to operationalize the GWAS data (20–22). Binary Alignment Map (BAM) files at 3 Large Scale Se- As may be expected because of the abovementioned quencing/Analysis Centers—the Human Genome Sequenc- limitations, a relatively large amount of genetic influence on ing Center at Baylor College of Medicine (BAYLOR), LOAD risk is not explained by prior GWAS; hence, the so- Broad Institute Genome Center (BROAD), and Genome In- called “missing heritability” problem (23–25). For example, stitute at Washington University; (2) genotypes were called heritability explained 79% of LOAD risk in a Swedish twin by BAYLOR and BROAD, generating project-level Variant study (24), whereas common risk variants identified by Call Format (VCF) files; and, (3) quality control and concor- GWAS explained only 20%–50% of total phenotypic variance dance checks were combined and made available in a single of LOAD (23, 24). These and other data (26–28) indicated that VCF file. Subjects with diagnostic data who were 65 years a substantial proportion of LOAD-associated genetic risk fac- of age at the last visit or at death (n ¼ 10 468) were included tor(s) remain unidentified. in a principal component analysis (PCA) to identify ethnic out- Recent advances in sequencing technologies have en- liers. PCA was performed in PLINK v1.90a (35, 36)usinga abled more comprehensive genotype-phenotype association linkage disequilibrium (LD) pruned subset of markers (pairwise studies. For example, whole-exome sequencing (WES) data r2 < 0.2) from these data and 1000 Genomes Project Phase 3 derive from polymerase chain reaction (PCR)-based reads in (1000 Genomes) (37) data after removing symmetric SNPs and annotated exons. These data provide more accurate genotyp- flipping SNPs discordant for DNA strands between ADSP WES ing of rare variants and insertions/deletions (indels) in com- and 1000 Genomes data. First and second PCs were plotted for parison to the preselected gene variant probes in most each individual (n ¼ 10 468 from ADSP WES and n ¼ 2504 GWAS studies. For analyzing sequencing data, tailored from 1000 Genomes). Based on the PCA, 437 subjects were re- computational methods such as the Sequence Kernel Associ- moved as ethnic outliers, yielding n ¼ 10 031 (5142 AD cases ation Test (SKAT) can be used to test for association be- and 4889 controls) of European ancestry (Supplementary Data tween rare variants and a phenotype of interest (29). Figs. S1 and S2). The disease phenotyping was based on clinical However, even in the high-quality Alzheimer’s Disease Se- criteria: AD cases were determined by National Institute of Neu- quencing Project (ADSP) WES data set (30, 31), many rological and Communicative Disorders and Stroke, and the regions of the human genome were filtered out due to in- Alzheimer’s Disease and Related Disorders Association complete mapping, ambiguous reads, or imperfectly aligned (NINCDS-ADRDA) clinical criteria for possible, probable, or data pipelines (22, 32). definite AD (38), and controls were free of dementia by direct, In the present study, we analyzed ADSP WES data de- documented cognitive assessment (30, 39). rived from over 10 000 individuals with the goal of identify- WES data contained biallelic indels (n ¼ 49 244) and ing novel genetic variants associated with the LOAD single nucleotide variants (SNVs) (n ¼ 1 586 687) genotypes, phenotype. Preliminary results found evidence of LOAD- generated by applying a quality control protocol designed by linked genetic variation within an exon of the Mucin 6 the ADSP Quality Control Working Group (https://www.niag- (MUC6) gene, and 4000 base pairs downstream from an- ads.org/adsp/content/sequencing-pipelines). All variants were other gene that encodes Adaptor Related Protein Complex 2 in Hardy-Weinberg equilibrium (cut-off