Genome-Wide Identification of Splicing Qtls in the Human Brain and Their
Total Page:16
File Type:pdf, Size:1020Kb
ARTICLE Received 25 Aug 2016 | Accepted 6 Jan 2017 | Published 27 Feb 2017 DOI: 10.1038/ncomms14519 OPEN Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci Atsushi Takata1,2, Naomichi Matsumoto2 & Tadafumi Kato1 Detailed analyses of transcriptome have revealed complexity in regulation of alternative splicing (AS). These AS events often undergo modulation by genetic variants. Here we analyse RNA-sequencing data of prefrontal cortex from 206 individuals in combination with their genotypes and identify cis-acting splicing quantitative trait loci (sQTLs) throughout the genome. These sQTLs are enriched among exonic and H3K4me3-marked regions. Moreover, we observe significant enrichment of sQTLs among disease-associated loci identified by GWAS, especially in schizophrenia risk loci. Closer examination of each schizophrenia- associated loci revealed four regions (each encompasses NEK4, FXR1, SNAP91 or APOPT1), where the index SNP in GWAS is in strong linkage disequilibrium with sQTL SNP(s), sug- gesting dysregulation of AS as the underlying mechanism of the association signal. Our study provides an informative resource of sQTL SNPs in the human brain, which can facilitate understanding of the genetic architecture of complex brain disorders such as schizophrenia. 1 Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan. 2 Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan. Correspondence and requests for materials should be addressed to A.T. (email: [email protected]) or to T.K. (email: [email protected]). NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519 lternative splicing (AS) is the process by which different control filters (see Methods for details), we identified a total of splice sites in precursor messenger RNA are selected to 102,469 AS events in autosomes, consisting of 29,271 Alt EX, Agenerate multiple mRNA isoforms. AS events are often 3,310 Alt SS (of which 1,265 were at the 50-donor site and 2,045 regulated in a cell type-, condition- or species-specific manner. were at the 30-acceptor site) and 69,888 IRs. We next analysed this Notably, recent studies have demonstrated that complexity of AS list of AS in combination with quality-controlled SNP genotyping regulation is highest in primates1, and that there is a distinct and data of the same individuals using Matrix eQTL19 to identify cis- more complex pattern of AS in brain tissues2,3. Such highly acting (within ±100 kb of the AS event) sQTLs in a genome- intricate regulation of AS in the human brain can play an wide manner (see Methods for details). To conservatively define important role in normal function and development of the central sQTL SNPs, we first applied a standard correction for multiple nervous system. For example, a number of genetic mutations that testing implemented in Matrix eQTL (Benjamini–Hochberg affect global regulation of AS or alter AS of a specific gene are procedure) to the P-values for all SNP–AS pairs and then the known to be associated with various brain disorders4,5. More corrected P-values were further subjected to Bonferroni recently, it was reported that a subset of de novo germline correction with the number of AS events within the ±100 kb mutations, whose important roles in the genetic aetiology of window for each SNP. This is because a SNP with many AS neuropsychiatric disorders such as autism spectrum disorders events in the surrounding region should have a higher chance to (ASDs) and schizophrenia has been established6–10, probably show significant association (see Methods for details). After contribute to the risk of ASD and schizophrenia by affecting performing these procedures, we identified a total of 8,966 sQTL AS11. In addition, dysregulation of AS is reported in multiple SNPs with the ‘double-corrected’ P-value o0.05. The full list of postmortem brain studies of ASD12,13 and schizophrenia14,15. sQTL SNPs along with information of the associated AS events is As represented by canonical splice site variants disrupting available in Supplementary Data 1. Consistent with previous exon–intron boundaries, regulation of AS can be controlled by studies of non-neuronal tissues3,16, when we plotted the double- genetic variants. Not only variants directly changing splice site corrected P-value and the distance to the nearest AS event for sequences, it has been demonstrated that genetic variants each SNP, we observed that variants at the proximity of AS are controlling AS events, referred to as splicing quantitative trait enriched for sQTL SNPs (Fig. 1a). loci (sQTLs), spread throughout the genome. In particular, recent The identified 8,966 sQTL SNPs are involved in 1,595 AS large-scale studies utilizing the data of RNA sequencing (RNA- events of 1,341 unique genes. When we performed a gene-set seq) have successfully identified sQTLs in a genome-wide enrichment analysis of these 1,341 genes using the Database for manner3,16. However, these studies are primarily focusing on Annotation, Visualization and Integrated Discovery20, we found non-neuronal tissues and thereby sQTLs in the human brain have highly significant enrichment of ‘SP_PIR_KEYWORDS: not yet been well characterized. Although a previous microarray- alternative splicing’ (Benjamini-corrected P ¼ 8.6 Â 10 À 29) and based study has identified exon-specific QTLs in brain ‘UP_SEQ_FEATURE: splice variants’ (Benjamini-corrected tissues, detectable AS events depend on array design and also P ¼ 1.1 Â 1028), which denote genes with known splicing are restricted to exon skipping. Therefore, a study utilizing RNA- isoforms (Supplementary Data 2). Therefore, on the one hand, seq data has a particular advantage in identifying more AS our result is compatible with the existing knowledge of genes events17. undergo AS and, on the other hand, the list of genes regulated by To comprehensively detect sQTLs in the human brain, here we sQTL SNPs identified here provides new candidates for genes analyse RNA-seq data of dorsolateral prefrontal cortex (DLPFC) with splicing isoforms, because 440% of the input genes were tissues from 4200 individuals in combination with their not included in ‘SP_PIR_KEYWORDS: alternative splicing’ or microarray-based genotype data. After applying stringent filtering ‘UP_SEQ_FEATURE: splice variants’ (Supplementary Data 2) criteria, we identify a total of B1,500 sQTL single-nucleotide but in fact have detectable alternatively spliced regions. polymorphisms (SNPs) that are likely to be independent of each other. By analysing characteristics of these brain sQTL SNPs, we describe functional properties of these variants and their potential Functional characterization of sQTL SNPs. We consequently roles in the genetic aetiology of human diseases, particularly in attempted to functionally characterize sQTL SNPs. For this brain disorders such as schizophrenia. We also show an example purpose, we first extracted the best sQTL SNP for each AS event how the information of sQTLs can be utilized to better (N ¼ 1,595) and then performed linkage disequilibrium (LD)- understand the complex genetic architecture of human diseases based pruning (see Methods). After performing this procedure, and to specify promising candidates for culprit genes using the there was a set of 1,539 sQTL SNPs that are likely to be inde- data of large-scale genome-wide association study (GWAS) for pendent of each other. Next, we performed LD-based pruning on schizophrenia18. SNPs with an uncorrected P-value40.05 (N ¼ 170,241) with the same parameters applied to sQTL SNPs, leaving 89,367 SNPs that are unlikely to be associated with AS (we considered these as non- Results sQTL SNPs). From this list of non-sQTL SNPs, we generated a set Identification of cis-acting splicing QTLs in human brain.We of 48,068 SNPs with the distribution of minor allele frequency first analysed RNA-seq data of DLPFC samples (all from Brod- (MAF) matched to the set of 1,539 sQTL SNPs and used them for mann area 9) from genetically homogenous 206 individuals comparison (see Supplementary Fig. 2 and Methods). (Supplementary Fig. 1, extracted by using the result of multi- By functionally classifying the SNPs according to the definition dimensional scaling) without neuropsychiatric diseases or neu- in SnpEff21, we found that sQTL SNPs are significantly enriched rological insults immediately prior to death (downloaded from among variants in exonic regions (that is, nonsense, readthrough, the CommonMind Consortium Knowledge Portal, summary start-loss, frameshift, canonical splice site, missense, synonymous, statistics are available in Supplementary Table 1, see also splice region, 50-untranslated region (UTR), 30-UTR and non- Methods) to comprehensively identify AS events in the human coding exon variants; shown in warm colours in Fig. 1b) when brain. For this purpose we used vast-tools13, a software package compared with non-sQTL SNPs (P ¼ 8.6 Â 10 À 87, odds ratio designed to identify various types of AS events, including (OR) ¼ 3.84, two-tailed Fisher’s exact test). By analysing alternative exon skipping (Alt EX), alternative usage of splice enrichment of sQTL SNPs among each functional type of sites (Alt SS) and intron retentions (IRs). After applying quality variants, as expected, we found significant enrichment with the 2 NATURE COMMUNICATIONS