An Interactive Database for Brain-Specific Epigenetic Annotation of Human Snps Chun-Yu Ma
Total Page:16
File Type:pdf, Size:1020Kb
Washington University School of Medicine Digital Commons@Becker Open Access Publications 2019 FeatSNP: An interactive database for brain-specific epigenetic annotation of human SNPs Chun-yu Ma Pamela Madden Paul Gontarz Ting Wang Bo Zhang Follow this and additional works at: https://digitalcommons.wustl.edu/open_access_pubs fgene-10-00262 March 29, 2019 Time: 18:50 # 1 METHODS published: 02 April 2019 doi: 10.3389/fgene.2019.00262 FeatSNP: An Interactive Database for Brain-Specific Epigenetic Annotation of Human SNPs Chun-yu Ma1, Pamela Madden2, Paul Gontarz1, Ting Wang3* and Bo Zhang1* 1 Center of Regenerative Medicine, Department of Developmental Biology, Washington University School of Medicine, Edited by: St. Louis, MO, United States, 2 Department of Psychiatry, Washington University School of Medicine, St. Louis, MO, Shandar Ahmad, United States, 3 Department of Genetics, The Edison Family Center for Genome Sciences & Systems Biology, Washington Jawaharlal Nehru University, India University School of Medicine, St. Louis, MO, United States Reviewed by: Wei-Hua Chen, Huazhong University of Science FeatSNP is an online tool and a curated database for exploring 81 million common and Technology, China SNPs’ potential functional impact on the human brain. FeatSNP uses the brain Sandeep Kumar Dhanda, transcriptomes of the human population to improve functional annotation of human La Jolla Institute for Immunology (LJI), United States SNPs by integrating transcription factor binding prediction, public eQTL information, and *Correspondence: brain specific epigenetic landscape, as well as information of Topologically Associating Ting Wang Domains (TADs). FeatSNP supports both single and batched SNP searching, and [email protected] Bo Zhang its interactive user interface enables users to explore the functional annotations and [email protected] generate publication-quality visualization results. FeatSNP is freely available on the internet at FeatSNP.org with all major web browsers supported. Specialty section: This article was submitted to Keywords: SNP, database, epigenetics, brain, transcription factor Bioinformatics and Computational Biology, a section of the journal INTRODUCTION Frontiers in Genetics Received: 19 November 2018 Genome-wide association studies (GWAS) and expression quantitative trait loci (eQTL) analyses Accepted: 08 March 2019 have identified thousands of genetic variants that are associated with a wide range of Published: 02 April 2019 human phenotypes, shedding lights on the understanding of the genetic effect to human Citation: diseases. However, a key challenge for scientists in the human genetics community is to Ma C-y, Madden P, Gontarz P, understand the molecular mechanism connecting significant genetic variant and specific Wang T and Zhang B (2019) phenotype. More than 90% of SNPs associated with human phenotypes are located in non- FeatSNP: An Interactive Database for Brain-Specific Epigenetic protein-coding regions, and cannot be explained by alteration of amino acid sequence of Annotation of Human SNPs. proteins (Welter et al., 2014). Recently, mounting evidence suggests that disease-associated non- Front. Genet. 10:262. coding SNPs are highly enriched in tissue-specific regulatory elements including enhancers, doi: 10.3389/fgene.2019.00262 which can be detected and defined by specific chromatin modifications (Carey et al., 2015; Frontiers in Genetics| www.frontiersin.org 1 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 2 Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs Zhou et al., 2015; Agrawal et al., 2018). Moreover, some non- dbSNP (V144), with SNP accession number as unique identifier coding SNPs are found to be located within transcription factor in the database. Human dbSNP build 144 was downloaded (TF) binding motifs, which affect the TF binding affinity and from ftp.ncbi.nih.gov/snp, which includes 84,435,229 SNPs result in allele switching and/or allele-specific regulation of target records, 1,591,294 insertions records, 2,595,517 deletions genes (Andersson et al., 2014; Roadmap Epigenomics et al., 2015; records, 33,234 indel records, and 110 Multiple Nucleotide Nelson et al., 2016). These evidences underscore the potential Polymorphisms (MNPs) records. After filtering redundant causal role of non-coding genetic variants in affecting human records, 81,144,876 of 84,435,229 biallelic SNPs were used diseases and phenotypes through regulation of gene expression to generate functional annotations and were curated by the (Claussnitzer et al., 2015). FeatSNP database. The genome coordinates (hg19) of 81,144,876 Here we introduce FeatSNP,an online tool and database which SNPs were used to associate the SNPs with their nearest genes provides an interactive user interface (UI) for inquiring brain- based on 56,642 records of GENOCDE gene annotation Release specific functional and epigenetic annotation of human SNPs. 19 (GRCh37.p13). Unlike traditional SNP functional annotation databases, such To predict impact of allele-specific TF binding affinity by as RegulomeDB (Boyle et al., 2012) and HaploReg (Ward and SNPs, the Position Weight Matrix (PWM) of 519 vertebrate TFs Kellis, 2012), FeatSNP focuses on the collection and curation of were collected from JASPAR (Core Vertebrate 2016) (Mathelier brain-specific functional genomics data, including epigenomes, et al., 2016). After evaluating the motif weight PWM of 519 TFs transcriptomes, and eQTL data, to better annotate the regulatory at base-pair resolution (Supplementary Figure S2), the reference potential of single SNP. Specifically, FeatSNP supplies a series of and alternate alleles for every SNP with flanking 10 bp of genomic new features to facilitate research understanding the functional sequences both upstream and downstream were obtained from annotation of SNP on human brain (Supplementary Table S1). the UCSC Genome Browser. FIMO (Grant et al., 2011) was FeatSNP uses human brain transcriptomes to improve and refine used to scan the 21 bp sequence to identify binding motifs the prediction of allele-specific TF binding motifs. The expression matching any of the 519 TF PWMs, and calculate the TFBS correlation between SNP-associated gene and predicted SNP- motif scores. Only instances where a motif in the sequence associated TFs was used to determine the best allele-associated (i) passed the threshold of P < 1e-2 in either the reference TF candidate. The interactive UI allows the users easily to browse or the alternate allele, and (ii) contained the SNP location functional annotation and generate analysis results and high and (iii) the difference of motif scores between the reference quality figures. and the alternate allele was greater than 2, were recorded in the database. 1,259 transcriptome datasets of 13 brain tissues generated METHODS by the GTEx consortium (Gibson, 2015) were used to calculate the Pearson correlation between each SNP associated gene and FeatSNP consists of a front end UI implemented with predicted binding TFs. The lowly expressed gene and TFs HTML/PHP/JavaScript, and a backend NoSQL database (expression of all samples in one tissue less than 0.2RPKM) were implemented with MongoDB (v3.2.7) as shown in Figure 1. The removed. The correlation and gene expression in 13 brain tissues current SNP dataset contains 81,144,876 bi-allelic SNPs from were visualized by using JavaScript package Highcharts (v5.0.2). FIGURE 1 | Analysis flowchart and infrastructure of FeatSNP. Frontiers in Genetics| www.frontiersin.org 2 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 3 Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs FIGURE 2 | Information of SNP rs8070723 retrieved from FeatSNP. (A) Genomic information of rs8070723, with external links to NCBI dbSNPs and GWAS Catalog. (B) DNA sequence around rs8070723. (C) Predicted TFs binding motifs containing rs8070723 with A/G alleles. eQTL data of 10 brain tissues generated by GTEx consortium brain tissues generated by Roadmap Epigenomics Project, were negative-log10 transformed and further visualized by using enhanced epilogos visualization1 of all 127 epigenomes, and Highcharts (v5.0.2). topologically associating domains (TAD) data of GM12878, Histone modification ChIP-seq data of 10 brain tissues were IMR90, and Hap1 cell lines (Rao et al., 2014; Sanborn et al., downloaded from NIH Roadmap Epigenomics data portal. 2015). eQTL data of 10 brain tissues generated by GTEx Bedtools was used to identify SNPs residing in peaks of 7 consortium were also visualized on the embedded WashU histone modification marks (H3K4me3, H3K36me3, H3K27me3, epigenome browser. H3K4me1, H3K27ac, H3K9me3, and H3K9ac) that were The association records of SNP and human disease/traits (V1.0.2) were downloaded from GWAS Catalog. 33,894 identified by macs2 (Zhang et al., 2008) with default parameters. associations with p-value smaller then 5E-8 were kept and To enhance the user experience, the WashU epigenome browser classified based on 1,374 human disease/traits categories. (Zhou et al., 2015) was embedded in the UI to display The functional annotations of these 33,894 SNPs were epigenetic landscape in a 200 bp region surrounding each SNP. The browser also displays DNA methylation data (Whole Genome Bisulfite Sequencing) of 4 neuronal progenitor and 1epilogos.altiusinstitute.org Frontiers in Genetics| www.frontiersin.org 3 April 2019| Volume 10| Article 262 fgene-10-00262 March 29, 2019 Time: 18:50 # 4 Ma et al. FeatSNP: Database for Epigenetic Annotation of SNPs FIGURE