Washington University School of Medicine Digital Commons@Becker

Open Access Publications

2019 FeatSNP: An interactive database for brain-specific epigenetic annotation of SNPs Chun-yu Ma

Pamela Madden

Paul Gontarz

Ting Wang

Bo Zhang

Follow this and additional works at: https://digitalcommons.wustl.edu/open_access_pubs fgene-10-00262 March 29, 2019 Time: 18:50 # 1

METHODS published: 02 April 2019 doi: 10.3389/fgene.2019.00262

FeatSNP: An Interactive Database for Brain-Specific Epigenetic Annotation of Human SNPs

Chun-yu Ma1, Pamela Madden2, Paul Gontarz1, Ting Wang3* and Bo Zhang1*

1 Center of Regenerative Medicine, Department of Developmental , Washington University School of Medicine, Edited by: St. Louis, MO, United States, 2 Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, Shandar Ahmad, United States, 3 Department of , The Edison Family Center for Genome Sciences & , Washington Jawaharlal Nehru University, India University School of Medicine, St. Louis, MO, United States Reviewed by: Wei-Hua Chen, Huazhong University of Science FeatSNP is an online tool and a curated database for exploring 81 million common and Technology, China SNPs’ potential functional impact on the human brain. FeatSNP uses the brain Sandeep Kumar Dhanda, transcriptomes of the human population to improve functional annotation of human La Jolla Institute for (LJI), United States SNPs by integrating transcription factor binding prediction, public eQTL information, and *Correspondence: brain specific epigenetic landscape, as well as information of Topologically Associating Ting Wang Domains (TADs). FeatSNP supports both single and batched SNP searching, and [email protected] Bo Zhang its interactive user interface enables users to explore the functional annotations and [email protected] generate publication-quality visualization results. FeatSNP is freely available on the internet at FeatSNP.org with all major web browsers supported. Specialty section: This article was submitted to Keywords: SNP, database, , brain, transcription factor and , a section of the journal INTRODUCTION Frontiers in Genetics Received: 19 November 2018 Genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) analyses Accepted: 08 March 2019 have identified thousands of genetic variants that are associated with a wide range of Published: 02 April 2019 human phenotypes, shedding lights on the understanding of the genetic effect to human Citation: diseases. However, a key challenge for scientists in the human genetics community is to Ma C-y, Madden P, Gontarz P, understand the molecular mechanism connecting significant genetic variant and specific Wang T and Zhang B (2019) phenotype. More than 90% of SNPs associated with human phenotypes are located in non- FeatSNP: An Interactive Database for Brain-Specific Epigenetic -coding regions, and cannot be explained by alteration of amino acid sequence of Annotation of Human SNPs. (Welter et al., 2014). Recently, mounting evidence suggests that disease-associated non- Front. Genet. 10:262. coding SNPs are highly enriched in tissue-specific regulatory elements including enhancers, doi: 10.3389/fgene.2019.00262 which can be detected and defined by specific chromatin modifications (Carey et al., 2015;

Frontiers in Genetics| www.frontiersin.org 1 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 2

Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs

Zhou et al., 2015; Agrawal et al., 2018). Moreover, some non- dbSNP (V144), with SNP accession number as unique identifier coding SNPs are found to be located within transcription factor in the database. Human dbSNP build 144 was downloaded (TF) binding motifs, which affect the TF binding affinity and from ftp.ncbi.nih.gov/snp, which includes 84,435,229 SNPs result in switching and/or allele-specific regulation of target records, 1,591,294 insertions records, 2,595,517 deletions (Andersson et al., 2014; Roadmap et al., 2015; records, 33,234 records, and 110 Multiple Nelson et al., 2016). These evidences underscore the potential Polymorphisms (MNPs) records. After filtering redundant causal role of non-coding genetic variants in affecting human records, 81,144,876 of 84,435,229 biallelic SNPs were used diseases and phenotypes through regulation of expression to generate functional annotations and were curated by the (Claussnitzer et al., 2015). FeatSNP database. The genome coordinates (hg19) of 81,144,876 Here we introduce FeatSNP,an online tool and database which SNPs were used to associate the SNPs with their nearest genes provides an interactive user interface (UI) for inquiring brain- based on 56,642 records of GENOCDE gene annotation Release specific functional and epigenetic annotation of human SNPs. 19 (GRCh37.p13). Unlike traditional SNP functional annotation databases, such To predict impact of allele-specific TF binding affinity by as RegulomeDB (Boyle et al., 2012) and HaploReg (Ward and SNPs, the Position Weight Matrix (PWM) of 519 vertebrate TFs Kellis, 2012), FeatSNP focuses on the collection and curation of were collected from JASPAR (Core Vertebrate 2016) (Mathelier brain-specific functional data, including epigenomes, et al., 2016). After evaluating the motif weight PWM of 519 TFs transcriptomes, and eQTL data, to better annotate the regulatory at base-pair resolution (Supplementary Figure S2), the reference potential of single SNP. Specifically, FeatSNP supplies a series of and alternate for every SNP with flanking 10 bp of genomic new features to facilitate research understanding the functional sequences both upstream and downstream were obtained from annotation of SNP on human brain (Supplementary Table S1). the UCSC Genome Browser. FIMO (Grant et al., 2011) was FeatSNP uses human brain transcriptomes to improve and refine used to scan the 21 bp sequence to identify binding motifs the prediction of allele-specific TF binding motifs. The expression matching any of the 519 TF PWMs, and calculate the TFBS correlation between SNP-associated gene and predicted SNP- motif scores. Only instances where a motif in the sequence associated TFs was used to determine the best allele-associated (i) passed the threshold of P < 1e-2 in either the reference TF candidate. The interactive UI allows the users easily to browse or the alternate allele, and (ii) contained the SNP location functional annotation and generate analysis results and high and (iii) the difference of motif scores between the reference quality figures. and the alternate allele was greater than 2, were recorded in the database. 1,259 transcriptome datasets of 13 brain tissues generated METHODS by the GTEx consortium (Gibson, 2015) were used to calculate the Pearson correlation between each SNP associated gene and FeatSNP consists of a front end UI implemented with predicted binding TFs. The lowly expressed gene and TFs HTML/PHP/JavaScript, and a backend NoSQL database (expression of all samples in one tissue less than 0.2RPKM) were implemented with MongoDB (v3.2.7) as shown in Figure 1. The removed. The correlation and gene expression in 13 brain tissues current SNP dataset contains 81,144,876 bi-allelic SNPs from were visualized by using JavaScript package Highcharts (v5.0.2).

FIGURE 1 | Analysis flowchart and infrastructure of FeatSNP.

Frontiers in Genetics| www.frontiersin.org 2 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 3

Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs

FIGURE 2 | Information of SNP rs8070723 retrieved from FeatSNP. (A) Genomic information of rs8070723, with external links to NCBI dbSNPs and GWAS Catalog. (B) DNA sequence around rs8070723. (C) Predicted TFs binding motifs containing rs8070723 with A/G alleles.

eQTL data of 10 brain tissues generated by GTEx consortium brain tissues generated by Roadmap Epigenomics Project, were negative-log10 transformed and further visualized by using enhanced epilogos visualization1 of all 127 epigenomes, and Highcharts (v5.0.2). topologically associating domains (TAD) data of GM12878, Histone modification ChIP-seq data of 10 brain tissues were IMR90, and Hap1 cell lines (Rao et al., 2014; Sanborn et al., downloaded from NIH Roadmap Epigenomics data portal. 2015). eQTL data of 10 brain tissues generated by GTEx Bedtools was used to identify SNPs residing in peaks of 7 consortium were also visualized on the embedded WashU histone modification marks (H3K4me3, H3K36me3, H3K27me3, epigenome browser. H3K4me1, H3K27ac, H3K9me3, and H3K9ac) that were The association records of SNP and human disease/traits (V1.0.2) were downloaded from GWAS Catalog. 33,894 identified by macs2 (Zhang et al., 2008) with default parameters. associations with p-value smaller then 5E-8 were kept and To enhance the user experience, the WashU epigenome browser classified based on 1,374 human disease/traits categories. (Zhou et al., 2015) was embedded in the UI to display The functional annotations of these 33,894 SNPs were epigenetic landscape in a 200 bp region surrounding each SNP. The browser also displays DNA methylation data (Whole Genome Bisulfite Sequencing) of 4 neuronal progenitor and 1epilogos.altiusinstitute.org

Frontiers in Genetics| www.frontiersin.org 3 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 4

Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs

FIGURE 3 | Expression information of SNP rs8070723 tagged gene retrieved from FeatSNP. (A). Pearson Correlation Matrix between rs8070723 tagged MAPT and potential bound TFs. (B). Scatter plot of gene expression of MAPT and PBX1 in anterior cingulate cortex, frontal cortex, and nucleus accumbens.

reported on FeatSNP.org/html_file/disease_classification.html To better understand the regulatory potential of this human (Supplementary Figure S3). disease-associated SNP, we inquired the epigenetic annotation of rs8070723 in FeatSNP through Single SNP ID Searching function on SNP Query Page (Supplementary Figure S4). The RESULTS database first reported the basic information of SNP rs8070723, including genomic location, allelic frequency, surrounding DNA To illustrate the use of FeatSNP, we performed the analysis sequence, and associated gene (Figures 2A–C). Users can further using rs8070723 as an example. rs8070723 is an intronic A/G access the genetic information and associated human disease SNP (major allele A frequency 0.881, minor allele G frequency or traits of inquired SNPs on dbSNP and GWAS Catalog 0.119) in MAPT, the gene that encodes the microtubule- through external links. associated protein tau, and is associated with Progressive FeatSNP found four potential TF binding motifs harboring Supranuclear Palsy (Hoglinger et al., 2011) and with Parkinson’s rs8070723 with A allele, including PBX1, Hoxa9, Dux, and EN2. Disease (UK Parkinson’s Disease Consortium et al., 2011). All four TF binding motifs had high TFBS scores in A allele,

Frontiers in Genetics| www.frontiersin.org 4 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 5

Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs

FIGURE 4 | Epigenetic annotation and eQTL information of SNP rs8070723 tagged genomic loci retrieved from FeatSNP. (A) Clustering visualization of epigenetic annotation to the genomic loci tagged by SNP rs8070723. (B) WashU EpiGenome Brower view of genomic loci tagged by rs8070723. Top: DNA methylation level of CpG sites in four neuronal cells. Middle track: Epilogos visualization of chromHMM predicted chromatin status. Followed eQTL tracks: log10 transformed p-value of eQTL in 10 brain regions. TAD track: Topological Associated Domain tracks of GM12878, IMR90, and Hap1. Bottom: RefSeq gene annotation track. (C) Complete eQTL information of SNP rs8070723 in 10 brain regions.

and the TFBS motifs were destroyed with G allele with low TFBS brain (Supplementary Figure S1A) and did not correlate with scores (Figure 2C). PBX1 encodes a nuclear protein that belongs expression level of MAPT (Figure 3A). We found that PBX1 to the PBX homeobox family of transcriptional factors, and highly expressed in different brain regions (Supplementary studies suggested PBX1 regulates the patterning of the cerebral Figure S1B), and we also found the expression of MAPT cortex (Golonzhka et al., 2015) and its transcriptional network had strong and specific correlation with PBX1 in multiple controls dopaminergic neuron development in Parkinson’s brain regions (Figure 3A), especially in anterior cingulate disease (Villaescusa et al., 2016). EN2 encodes homeodomain- cortex (r = 0.808), nucleus accumbens (r = 0.768), and containing proteins and has been implicated in the control of frontal cortex (r = 0.768) (Figure 3B), which were considered pattern formation during development of the central nervous as major affected regions of Progressive Supranuclear Palsy system (Genestine et al., 2015). Hoxa9 is an important homeobox (Salmon et al., 1997). transcription factor and plays important roles in myeloid We further explored the epigenetic annotation of the leukemogenesis (Siriboonpiputtana et al., 2017). Dux-family genomic regions tagged by rs8070723 in 10 brain regions by transcription factors were recently identified to regulate zygotic using epigenome data generated from Roadmap Consortium, genome activation in placental mammals (De Iaco et al., 2017). which were also curated in FeatSNP database. We found the Thus, PBX1 and EN2 could be the potential master TFs affected regions tagged by SNP rs8070723 enriched for strong active by the SNP rs8070723. histone modification signals including H3K4me1, H3K9ac, and Since FeatSNP curated 1,259 transcriptome data of 13 H3K27ac in 8 brain tissues (Figure 4A). Such active histone brain tissues generated by the GTEx consortium (Gibson, modifications were generally associated with active enhancer 2015), we were able to further check the expression level and promoter functions. Chromatin epigenetic status prediction of PBX1 and EN2 in multiple brain regions in FeatSNP based on chromHMM (Ernst and Kellis, 2012) suggested that database. EN2 was only expressed in the cerebellum of the the regions tagged by SNP rs8070723 could be considered

Frontiers in Genetics| www.frontiersin.org 5 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 6

Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs

as strong enhancers (Figure 4B). Finally, we explored the DATA AVAILABILITY eQTL data in 13 brain tissues, and found rs8070723 was associated with several genes’ expression, including MAPT Publicly available datasets were analyzed in this study. This data (Figures 4B,C). MAPT gene mutations have been associated can be found here: http://www.roadmapepigenomics.org/. with several neurodegenerative disorders such as Alzheimer’s disease and Parkinson’s disease. Our result suggests that rs8070723 G allele might influence MAPT expression level by AUTHOR CONTRIBUTIONS reducing the binding affinity of upstream regulatory protein PBX1, therefore providing a mechanistic association with C-yM and BZ performed the data analysis, C-yM and PG neurodegenerative diseases including Progressive Supranuclear developed the database and website. PM, TW, and BZ designed Palsy and Parkinson’s Disease. and supervised the study.

FUNDING CONCLUSION This work was supported by National Institutes of Health In summary, FeatSNP is an interactive database providing grant DA027995, HG007175, HG007354, and Goldman Sachs brain-specific resources to investigate the Philanthropy Fund. regulatory potential of human SNPs. This database provides a multitude types of functional annotations, including TF binding motif prediction, epigenetic landscape, expression correlation SUPPLEMENTARY MATERIAL and eQTL information. We anticipate that this database will facilitate scientists to investigate the functional impact of their The Supplementary Material for this article can be found candidate genetic variants in a more streamlined, rapid, and online at: https://www.frontiersin.org/articles/10.3389/fgene. efficient fashion. 2019.00262/full#supplementary-material

REFERENCES Golonzhka, O., Nord, A., Tang, P. L. F., Lindtner, S., Ypsilanti, A. R., Ferretti, E., et al. (2015). Pbx regulates patterning of the cerebral cortex in progenitors Agrawal, A., Chou, Y. L., Carey, C. E., Baranger, D. A. A., Zhang, B., Sherva, R., and postmitotic neurons. Neuron 88, 1192–1207. doi: 10.1016/j.neuron.2015. et al. (2018). Nabis dependence. Mol. Psychiatry 23, 1293–1302. doi: 10.1111/ 10.045 den.13366 Grant, C. E., Bailey, T. L., and Noble, W. S. (2011). FIMO: scanning for occurrences Andersson, R., Gebhard, C., Miguel-Escalada, I., Hoof, I., Bornholdt, J., Boyd, M., of a given motif. Bioinformatics 27, 1017–1018. doi: 10.1093/bioinformatics/ et al. (2014). An atlas of active enhancers across human cell types and tissues. btr064 Nature 507, 455–461. doi: 10.1038/nature12787 Hoglinger, G. U., Melhem, N. M., Dickson, D. W., Sleiman, P. M., Wang, L. S., Boyle, A. P., Hong, E. L., Hariharan, M., Cheng, Y., Schaub, M. A., Klei, L., et al. (2011). Identification of common variants influencing risk of Kasowski, M., et al. (2012). Annotation of functional variation in personal the tauopathy progressive supranuclear palsy. Nat. Genet. 43, 699–705. doi: genomes using RegulomeDB. Genome Res. 22, 1790–1797. doi: 10.1101/gr.137 10.1038/ng.859 323.112 Mathelier, A., Fornes, O., Arenillas, D. J., Chen, C. Y., Denay, G., Lee, J., et al. Carey, C. E., Agrawal, A., Zhang, B., Conley, E. D., Degenhardt, L., Heath, (2016). JASPAR 2016: a major expansion and update of the open-access A. C., et al. (2015). Monoacylglycerol lipase (MGLL) rs604300 database of transcription factor binding profiles. Nucleic Acids Res. 44, D110– interacts with childhood adversity to predict cannabis dependence symptoms D115. doi: 10.1093/nar/gkv1176 and amygdala habituation: evidence from an endocannabinoid system- Nelson, E. C., Agrawal, A., Heath, A. C., Bogdan, R., Sherva, R., Zhang, B., et al. level analysis. J. Abnorm. Psychol. 124, 860–877. doi: 10.1037/abn000 (2016). Evidence of CNIH3 involvement in opioid dependence. Mol. Psychiatry 0079 21, 608–614. doi: 10.1038/mp.2015.102 Claussnitzer, M., Dankel, S. N., Kim, K. H., Quon, G., Meuleman, W., Rao, S. S., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., and Haugen, C., et al. (2015). FTO obesity variant circuitry and adipocyte Robinson, J. T. (2014). A 3D map of the human genome at kilobase resolution browning in . N. Engl. J. Med. 373, 895–907. doi: 10.1056/NEJMoa150 reveals principles of chromatin looping. Cell 159, 1665–1680. doi: 10.1016/j.cell. 2214 2014.11.021 De Iaco, A., Planet, E., Coluccio, A., Verp, S., Duc, J., and Trono, D. (2017). Roadmap Epigenomics, C., Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., DUX-family transcription factors regulate zygotic genome activation Yen, A., et al. (2015). Integrative analysis of 111 reference human epigenomes. in placental mammals. Nat. Genet. 49, 941–945. doi: 10.1038/ng. Nature 518, 317–330. doi: 10.1038/nature14248 3858 Salmon, E., Van der Linden, M. V., and Franck, G. (1997). Anterior cingulate Ernst, J., and Kellis, M. (2012). ChromHMM: automating chromatin-state and motor network metabolic impairment in progressive supranuclear palsy. discovery and characterization. Nat. Methods 9, 215–216. doi: 10.1038/nmeth. Neuroimage 5, 173–178. doi: 10.1006/nimg.1997.0262 1906 Sanborn, A. L., Rao, S. S., Huang, S. C., Durand, N. C., Huntley, M. H., Jewett, A. I., Genestine, M., Lin, L., Durens, M., Yan, Y., Jiang, Y., Prem, S., et al. (2015). et al. (2015). Chromatin extrusion explains key features of loop and domain Engrailed-2 (En2) deletion produces multiple neurodevelopmental formation in wild-type and engineered genomes. Proc. Natl. Acad. Sci. U.S.A. defects in monoamine systems, forebrain structures and neurogenesis 112, E6456–E6465. doi: 10.1073/pnas.1518552112 and behavior. Hum. Mol. Genet. 24, 5805–5827. doi: 10.1093/hmg/ Siriboonpiputtana, T., Zeisig, B. B., Zarowiecki, M., Fung, T. K., Mallardo, M., ddv301 Tsai, C. T., et al. (2017). Transcriptional memory of cells of origin overrides Gibson, G. (2015). Human genetics. GTEx detects genetic effects. Science 348, beta-catenin requirement of MLL cancer stem cells. EMBO J. 36, 3139–3155. 640–641. doi: 10.1126/science.aab3002 doi: 10.15252/embj.201797994

Frontiers in Genetics| www.frontiersin.org 6 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 7

Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs

UK Parkinson’s Disease Consortium, Wellcome Trust Case Control Consortium, Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., Spencer, C. C., Plagnol, V., Strange, A., Gardner, M., et al. (2011). Dissection et al. (2008). Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9:R137. of the genetics of Parkinson’s disease identifies an additional association 5’ doi: 10.1186/gb-2008-9-9-r137 of SNCA and multiple associated at 17q21. Hum. Mol. Genet. 20, Zhou, X., Li, D., Zhang, B., Lowdon, R. F., Rockweiler, N. B., Sears, R. L., 345–353. doi: 10.1093/hmg/ddq469 et al. (2015). Epigenomic annotation of genetic variants using the Roadmap Villaescusa, J. C., Li, B., Toledo, E. M., Rivetti di Val Cervo, P., Yang, S., Stott, S. R., Epigenome Browser. Nat. Biotechnol. 33, 345–346. doi: 10.1038/nbt.3158 et al. (2016). A PBX1 transcriptional network controls dopaminergic neuron development and is impaired in Parkinson’s disease. EMBO J. 35, 1963–1978. Conflict of Interest Statement: The authors declare that the research was doi: 10.15252/embj.201593725 conducted in the absence of any commercial or financial relationships that could Ward, L. D., and Kellis, M. (2012). HaploReg: a resource for exploring be construed as a potential conflict of interest. chromatin states, conservation, and regulatory motif alterations within sets of genetically linked variants. Nucleic Acids Res. 40, D930–D934. doi: 10.1093/nar/ Copyright © 2019 Ma, Madden, Gontarz, Wang and Zhang. This is an open-access gkr917 article distributed under the terms of the Creative Commons Attribution License Welter, D., MacArthur, J., Morales, J., Burdett, T., Hall, P., Junkins, H., (CC BY). The use, distribution or reproduction in other forums is permitted, provided et al. (2014). The NHGRI GWAS Catalog, a curated resource of SNP- the original author(s) and the copyright owner(s) are credited and that the original trait associations. Nucleic Acids Res. 42, D1001–D1006. doi: 10.1093/nar/ publication in this journal is cited, in accordance with accepted academic practice. No gkt1229 use, distribution or reproduction is permitted which does not comply with these terms.

Frontiers in Genetics| www.frontiersin.org 7 April 2019| Volume 10| Article 262