An Integrated Analysis of Public Genomic Data Unveils a Possible
Total Page:16
File Type:pdf, Size:1020Kb
Kubota and Suyama BMC Medical Genomics (2020) 13:8 https://doi.org/10.1186/s12920-020-0662-9 RESEARCH ARTICLE Open Access An integrated analysis of public genomic data unveils a possible functional mechanism of psoriasis risk via a long- range ERRFI1 enhancer Naoto Kubota1,2 and Mikita Suyama1* Abstract Background: Psoriasis is a chronic inflammatory skin disease, for which genome-wide association studies (GWAS) have identified many genetic variants as risk markers. However, the details of underlying molecular mechanisms, especially which variants are functional, are poorly understood. Methods: We utilized a computational approach to survey psoriasis-associated functional variants that might affect protein functions or gene expression levels. We developed a pipeline by integrating publicly available datasets provided by GWAS Catalog, FANTOM5, GTEx, SNP2TFBS, and DeepBlue. To identify functional variants on exons or splice sites, we used a web-based annotation tool in the Ensembl database. To search for noncoding functional variants within promoters or enhancers, we used eQTL data calculated by GTEx. The data of variants lying on transcription factor binding sites provided by SNP2TFBS were used to predict detailed functions of the variants. Results: We discovered 22 functional variant candidates, of which 8 were in noncoding regions. We focused on the enhancer variant rs72635708 (T > C) in the 1p36.23 region; this variant is within the enhancer region of the ERRFI1 gene, which regulates lipid metabolism in the liver and skin morphogenesis via EGF signaling. Further analysis showed that the ERRFI1 promoter spatially contacts with the enhancer, despite the 170 kb distance between them. We found that this variant lies on the AP-1 complex binding motif and may modulate binding levels. Conclusions: The minor allele rs72635708 (rs72635708-C) might affect the ERRFI1 promoter activity, which results in unstable expression of ERRFI1, enhancing the risk of psoriasis via disruption of lipid metabolism and skin cell proliferation. Our study represents a successful example of predicting molecular pathogenesis by integration and reanalysis of public data. Keywords: Psoriasis, GWAS, Public data integration, Functional variant, Regulatory element, ERRFI1 Background they usually lie within noncoding regions. Such variants are Genome-wide association studies (GWAS) are useful means predicted to affect gene expression via cis-regulatory ele- for identifying trait- or disease-associated single nucleotide ments, such as enhancers or repressors. If the responsible el- polymorphisms (SNPs) [1]. Many GWAS have been con- ements and target genes can be identified, noncoding ducted for various phenotypes, with a large number of trait- variantsmaybeusefulnotonlyasdiagnosticmarkersbut associated SNPs accumulated in public databases, such as also to further shed light on the molecular pathogenesis of the GWAS Catalog [2]. However, understanding why these diseases. Thus, elucidating the mechanisms of how noncod- SNPs are associated with diseases has been difficult because ing variants are associated with diseases is critical. In recent years, various genomic and epigenetic data have accumulated due to next-generation sequencing * Correspondence: [email protected] 1Division of Bioinformatics, Medical Institute of Bioregulation, Kyushu (NGS). Epigenetics is the study of heritable changes University, Fukuoka 812-8582, Japan without changes in the DNA sequence, including DNA Full list of author information is available at the end of the article © The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Kubota and Suyama BMC Medical Genomics (2020) 13:8 Page 2 of 14 methylation, histone modifications, and chromatin re- curated database of published GWAS. The data of vari- modeling [3]. Advanced epigenetic studies have made it ants reported to be associated with “Psoriasis”, “Cutane- possible to predict the location of cis-regulatory elements, ous psoriasis”, and “Psoriasis vulgaris” from 2008 to such as promoters, enhancers, and repressors in noncoding 2018 were used for the following analysis. LDlink [32] regions. Furthermore, over the past decade, many experi- was programmatically used to obtain data about variants mental methods to discern the higher-order genomic struc- strongly linked (r2 > 0.8) with psoriasis risk SNPs in the ture have been developed [4]. Hi-C, which is the NGS-based European population, of which genotype data originated method of capturing genome-wide DNA–DNA interactions, from Phase 3 (Version 5) of the 1000 Genomes Project has revealed that the genome is partitioned into megabase- (Utah Residents from North and West Europe, Toscani scale structural domains called topologically associating do- in Italia, Finnish in Finland, British in England and mains (TADs) [5–7].Thesedatahavealsobeenusedto Scotland, Iberian population in Spain) [33]. detect locality and contact between gene promoters and dis- tant cis-regulatory elements in various living tissues and cells. Exonic and splice site variant annotation Recent studies have shown that noncoding variants associ- We performed annotation of exonic or splice site vari- ated with diseases affect the regulation of distant genes, even ants linked with psoriasis risk SNPs using the Ensembl at megabase distances [8–17]. Therefore, integrated analysis Variant Effect Predictor [34]. For exonic variants, we ex- of various epigenetic data, including Hi-C data, can help in tracted deleterious ones, such as missense variants anno- the functional annotation of noncoding sequences involved tated as “probably damaging” or “possibly damaging” per in the long-range regulation of gene expression, disruptions the PolyPhen-2 score [35], frameshift, start lost, and/or of which lead to various diseases. stop gained variants. For variants in splice acceptor or In this study, we focused on psoriasis, an autoimmune donor sites, we defined those, of which differences of the and chronic inflammatory skin disease characterized by MaxEntScan scores [36] between protective and risk al- overproliferation of keratinocytes. This is one of the most leles are more than 5, as deleterious splice site variants. common diseases in the European population, and the in- The effect on gene expression was assessed using eQTL volvement of genetic factors has been strongly suggested, data provided by GTEx Analysis V7 [37, 38]. The statis- with many susceptibility loci identified by GWAS [18–31]. tical significance was ascertained by GTEx project Several risk markers have been reported to have potential (https://storage.googleapis.com/gtex-public-data/Portal_ regulatory functions, especially in immune cells [30]. How- Analysis_Methods_v7_09052017.pdf). ever, the detailed mechanisms governing the onset of psor- iasis via cis-regulatory elements, in particular which are Noncoding variant annotation true causal variants, have not been thoroughly investigated. For variants in the promoter regions, those located 2 kb To understand the molecular pathology of psoriasis and de- upstream of the transcription start site as defined by velop therapeutic strategies, we used public data regarding NCBI RefSeq were used for further analysis. The BED psoriasis risk variants and constructed a computational ana- format file of promoter regions was downloaded using lysis pipeline to further discover functional variants. We the UCSC Table Browser and intersected with the BED collected variants in high linkage disequilibrium with risk format file of variants using BEDTools [39]. For identify- variants and searched for those with changed amino acids, ing enhancer variants and their targets, we used enhan- modulated splicing, or variants located in transcriptional cer–promoter correlation data across a broad panel of regulatory sequences. In particular, for variants of regula- cells from the FANTOM5 project [40]. We integrated tory sequences, we identified target genes using expression the promoter or enhancer variant data with eQTL data quantitative trait loci (eQTL) data and enhancer–promoter to obtain variants that affected expression levels of their correlations. Physical interactions were then validated using target genes. Next, we used data regarding variants on Hi-C and ChIA-PET data. Additionally, we searched for TFBSs as analyzed by SNP2TFBS [41] to discover those transcription factors (TFs) affected by the variants. Through that affect binding levels of TFs. To verify that the this analysis, we reported several previously unknown func- TFBSs were indeed functional, peak files of the corre- tional variants of psoriasis and have shown that long dis- sponding TF ChIP-seq were obtained using DeepBlue tance transcriptional regulation may be affected by one API access [42]. noncoding variant. This study revealed a new aspect of the molecular pathogenesis of psoriasis. Enrichment analysis Using ChIP-Atlas [43], we performed the enrichment Methods analysis of histone marks for 104 promoter regions in- Psoriasis