ARTICLE

Received 25 Aug 2016 | Accepted 6 Jan 2017 | Published 27 Feb 2017 DOI: 10.1038/ncomms14519 OPEN Genome-wide identification of splicing QTLs in the human brain and their enrichment among schizophrenia-associated loci

Atsushi Takata1,2, Naomichi Matsumoto2 & Tadafumi Kato1

Detailed analyses of transcriptome have revealed complexity in regulation of alternative splicing (AS). These AS events often undergo modulation by genetic variants. Here we analyse RNA-sequencing data of prefrontal cortex from 206 individuals in combination with their genotypes and identify cis-acting splicing quantitative trait loci (sQTLs) throughout the genome. These sQTLs are enriched among exonic and H3K4me3-marked regions. Moreover, we observe significant enrichment of sQTLs among disease-associated loci identified by GWAS, especially in schizophrenia risk loci. Closer examination of each schizophrenia- associated loci revealed four regions (each encompasses NEK4, FXR1, SNAP91 or APOPT1), where the index SNP in GWAS is in strong linkage disequilibrium with sQTL SNP(s), sug- gesting dysregulation of AS as the underlying mechanism of the association signal. Our study provides an informative resource of sQTL SNPs in the human brain, which can facilitate understanding of the genetic architecture of complex brain disorders such as schizophrenia.

1 Laboratory for Molecular Dynamics of Mental Disorders, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan. 2 Department of Human Genetics, Yokohama City University Graduate School of Medicine, Yokohama 236-0004, Japan. Correspondence and requests for materials should be addressed to A.T. (email: [email protected]) or to T.K. (email: [email protected]).

NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519

lternative splicing (AS) is the process by which different control filters (see Methods for details), we identified a total of splice sites in precursor messenger RNA are selected to 102,469 AS events in autosomes, consisting of 29,271 Alt EX, Agenerate multiple mRNA isoforms. AS events are often 3,310 Alt SS (of which 1,265 were at the 50-donor site and 2,045 regulated in a cell type-, condition- or species-specific manner. were at the 30-acceptor site) and 69,888 IRs. We next analysed this Notably, recent studies have demonstrated that complexity of AS list of AS in combination with quality-controlled SNP genotyping regulation is highest in primates1, and that there is a distinct and data of the same individuals using Matrix eQTL19 to identify cis- more complex pattern of AS in brain tissues2,3. Such highly acting (within ±100 kb of the AS event) sQTLs in a genome- intricate regulation of AS in the human brain can play an wide manner (see Methods for details). To conservatively define important role in normal function and development of the central sQTL SNPs, we first applied a standard correction for multiple nervous system. For example, a number of genetic mutations that testing implemented in Matrix eQTL (Benjamini–Hochberg affect global regulation of AS or alter AS of a specific are procedure) to the P-values for all SNP–AS pairs and then the known to be associated with various brain disorders4,5. More corrected P-values were further subjected to Bonferroni recently, it was reported that a subset of de novo germline correction with the number of AS events within the ±100 kb mutations, whose important roles in the genetic aetiology of window for each SNP. This is because a SNP with many AS neuropsychiatric disorders such as autism spectrum disorders events in the surrounding region should have a higher chance to (ASDs) and schizophrenia has been established6–10, probably show significant association (see Methods for details). After contribute to the risk of ASD and schizophrenia by affecting performing these procedures, we identified a total of 8,966 sQTL AS11. In addition, dysregulation of AS is reported in multiple SNPs with the ‘double-corrected’ P-value o0.05. The full list of postmortem brain studies of ASD12,13 and schizophrenia14,15. sQTL SNPs along with information of the associated AS events is As represented by canonical splice site variants disrupting available in Supplementary Data 1. Consistent with previous exon–intron boundaries, regulation of AS can be controlled by studies of non-neuronal tissues3,16, when we plotted the double- genetic variants. Not only variants directly changing splice site corrected P-value and the distance to the nearest AS event for sequences, it has been demonstrated that genetic variants each SNP, we observed that variants at the proximity of AS are controlling AS events, referred to as splicing quantitative trait enriched for sQTL SNPs (Fig. 1a). loci (sQTLs), spread throughout the genome. In particular, recent The identified 8,966 sQTL SNPs are involved in 1,595 AS large-scale studies utilizing the data of RNA sequencing (RNA- events of 1,341 unique . When we performed a gene-set seq) have successfully identified sQTLs in a genome-wide enrichment analysis of these 1,341 genes using the Database for manner3,16. However, these studies are primarily focusing on Annotation, Visualization and Integrated Discovery20, we found non-neuronal tissues and thereby sQTLs in the human brain have highly significant enrichment of ‘SP_PIR_KEYWORDS: not yet been well characterized. Although a previous microarray- alternative splicing’ (Benjamini-corrected P ¼ 8.6 10 29) and based study has identified exon-specific QTLs in brain ‘UP_SEQ_FEATURE: splice variants’ (Benjamini-corrected tissues, detectable AS events depend on array design and also P ¼ 1.1 1028), which denote genes with known splicing are restricted to exon skipping. Therefore, a study utilizing RNA- isoforms (Supplementary Data 2). Therefore, on the one hand, seq data has a particular advantage in identifying more AS our result is compatible with the existing knowledge of genes events17. undergo AS and, on the other hand, the list of genes regulated by To comprehensively detect sQTLs in the human brain, here we sQTL SNPs identified here provides new candidates for genes analyse RNA-seq data of dorsolateral prefrontal cortex (DLPFC) with splicing isoforms, because 440% of the input genes were tissues from 4200 individuals in combination with their not included in ‘SP_PIR_KEYWORDS: alternative splicing’ or microarray-based genotype data. After applying stringent filtering ‘UP_SEQ_FEATURE: splice variants’ (Supplementary Data 2) criteria, we identify a total of B1,500 sQTL single-nucleotide but in fact have detectable alternatively spliced regions. polymorphisms (SNPs) that are likely to be independent of each other. By analysing characteristics of these brain sQTL SNPs, we describe functional properties of these variants and their potential Functional characterization of sQTL SNPs. We consequently roles in the genetic aetiology of human diseases, particularly in attempted to functionally characterize sQTL SNPs. For this brain disorders such as schizophrenia. We also show an example purpose, we first extracted the best sQTL SNP for each AS event how the information of sQTLs can be utilized to better (N ¼ 1,595) and then performed linkage disequilibrium (LD)- understand the complex genetic architecture of human diseases based pruning (see Methods). After performing this procedure, and to specify promising candidates for culprit genes using the there was a set of 1,539 sQTL SNPs that are likely to be inde- data of large-scale genome-wide association study (GWAS) for pendent of each other. Next, we performed LD-based pruning on schizophrenia18. SNPs with an uncorrected P-value40.05 (N ¼ 170,241) with the same parameters applied to sQTL SNPs, leaving 89,367 SNPs that are unlikely to be associated with AS (we considered these as non- Results sQTL SNPs). From this list of non-sQTL SNPs, we generated a set Identification of cis-acting splicing QTLs in human brain.We of 48,068 SNPs with the distribution of minor allele frequency first analysed RNA-seq data of DLPFC samples (all from Brod- (MAF) matched to the set of 1,539 sQTL SNPs and used them for mann area 9) from genetically homogenous 206 individuals comparison (see Supplementary Fig. 2 and Methods). (Supplementary Fig. 1, extracted by using the result of multi- By functionally classifying the SNPs according to the definition dimensional scaling) without neuropsychiatric diseases or neu- in SnpEff21, we found that sQTL SNPs are significantly enriched rological insults immediately prior to death (downloaded from among variants in exonic regions (that is, nonsense, readthrough, the CommonMind Consortium Knowledge Portal, summary start-loss, frameshift, canonical splice site, missense, synonymous, statistics are available in Supplementary Table 1, see also splice region, 50-untranslated region (UTR), 30-UTR and non- Methods) to comprehensively identify AS events in the human coding exon variants; shown in warm colours in Fig. 1b) when brain. For this purpose we used vast-tools13, a software package compared with non-sQTL SNPs (P ¼ 8.6 10 87, odds ratio designed to identify various types of AS events, including (OR) ¼ 3.84, two-tailed Fisher’s exact test). By analysing alternative exon skipping (Alt EX), alternative usage of splice enrichment of sQTL SNPs among each functional type of sites (Alt SS) and intron retentions (IRs). After applying quality variants, as expected, we found significant enrichment with the

2 NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519 ARTICLE

a

150 Proportion of sQTL SNPs (%)

6

100

value 4 P-

10 50 –Log 2

0 0 –100 –50 0 50 100 Distance to the nearest AS (kb)

b Missense Missense Canonical Synonymous Canonical Nonsense, splice site 11.0% splice site 4.1% Synonymous 6.6% 1.2% Splice region readthrough, 0.3% 0.0% 0.1% Nonsense, start-loss and Splice region readthrough, 5′-UTR frameshift 0.6% 0.4% start-loss and 0.3% frameshift 3′-UTR Intergenic 5′-UTR 0.1% 2.1% 18.7% 1.8% Intergenic Non-coding 3′-UTR 40.1% exon 4.5% 0.5%

Non-coding exon 0.8% Intron 51.4% Intron 55.3% sQTL SNPs Non-sQTL SNPs –11 –7 –69 –27 c –38

64 = 0.012 = 1.9 × 10 = 0.016 = 0.0069 = 3.0 × 10 = 2.8 × 10 = 0.56 = 0.029 = 2.4 × 10 P P P= 4.4 × 10 P P P P P P P 32 16 8 4 2 1 0.5 0.25 OR (compared with non-sQTL) with OR (compared 0.125 -UTR -UTR Intron ′ ′ 3 5 Missense Intergenic Synonymous Splice region Non-coding exon Canonical splice site splice Canonical start-loss and frameshift and start-loss Nonsense, readthrough, Figure 1 | Characterization of identified sQTL SNPs. (a) Each blue dot indicates a SNP plotted according to its distance to the nearest AS event and statistical significance for association with AS (–log10 P-value). Red line indicates proportion of SNPs (%) that were classified as sQTL SNPs. Proportions in each 1,000 bp window were plotted. (b) Pie charts indicating proportions of SNPs annotated with each functional category (nonsense, readthrough, start- loss, frameshift, canonical splice site, missense, synonymous, splice region, 50-UTR, 30-UTR, non-coding exon, intron and intergenic). SNPs in exonic regions (nonsense, readthrough, start-loss, frameshift, canonical splice site, missense, synonymous, splice region, 50-UTR, 30-UTR and non-coding exon) and SNPs in non-exonic regions (intron and intergenic) are indicated by warm and cold colours, respectively. (c) Enrichment analyses of sQTL SNPs in each different functional type of variants. Exonic variants are shown in red and non-exonic variants are shown in blue. P-values were calculated by two-tailed Fisher’s exact test with Bonferroni correction according to the number of functional types analysed (that is, ten types). Bars indicate 95% confidence intervals. highest OR among canonical splice site variants (P ¼ 0.012, (https://genome.ucsc.edu/) (Supplementary Fig. 3) and transcripts OR ¼ 10.4, two-tailed Fisher’s exact test with Bonferroni spliced at this position (chr11: 58,378,426) were not detected in correction), followed by 50-UTR and synonymous variants our analysis. Among the eight sQTL SNPs associated with AS of (Fig. 1c). In contrast, there was significant underrepresentation the adjacent exon, three variants are contributing to known of sQTL SNPs among intergenic variants. (annotated by Ensembl Gene Predictions) AS events (Fig. 2). In By manually inspecting all individual sQTL SNPs at the the case of rs2276611 at chr2: 170,441,001 in PPIG, alternative canonical splice sites (N ¼ 9, from the full list of 8,966 SNPs splice sites are almost exclusively used depending on the alleles before pruning), we found that 8 out of the 9 SNPs are associated (Fig. 2a). Around rs3803354 at chr15: 40,856,989 in C15orf57, with AS of the adjacent exon. The remaining one sQTL there are three different splice sites (Fig. 2b). Although the major SNP (rs8873 at chr11: 58,378,424 in ZFP91) is at a splice isoform is spliced at chr15: 40,857,175 (blue arrow head in site that is found in the RefSeq Genes track but not in the Fig. 2b), proportion of the isoform spliced at chr15: 40,856,990 Ensembl Gene Predictions track of the UCSC Genome Browser (red arrow head) increases in C allele carriers in an additive

NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519

a

–101 Alt SS in PPIG P= 2.7×10 rs227661166 G>A 100%

chr2: 170,441,000 170,441,005 170,441,010 ---> T G CAGGT A A GTGGTAT RefSeq Genes PPIG Ensembl Gene Predictions - archive 75 - feb2014 ENST00000260970 50% ENST00000433207 ENST00000409714 ENST00000530152 ENST00000462903 ENST00000448752

Proportion of splice site used site Proportion of splice 0% G/G A/G A/A b Alt SS in C15orf57 P= 2.8× 10–49 rs3803354 C>T 100%

TTACCCCA

chr15: 40,857,000 40,857,050 40,857,100 40,857,150 RefSeq Genes C15orf57 C15orf57 C15orf57 C15orf57

TAACCCC A TACTCGT A 50% Ensembl Gene Predictions - archive 75 - feb2014 ENST00000560305 ENST00000561011 ENST00000559291 ENST00000559911 ENST00000558113 ENST00000358005 ENST00000416810 used site Proportion of splice ENST00000558750 ENST00000558918 ENST00000559103 ENST00000558871 ENST00000560109 0% T/T C/T C/C c P= 1.5× 10–29 Alt SS in CSRP2BP 100% rs80113248 A>G

chr20: 18,142,460 18,142,465 18,142,470 ---> TTTCCCCCC TTAAGGTAAAA RefSeq Genes CSRP2BP 50% CSRP2BP Ensembl Gene Predictions - archive 75 - feb2014 ENST00000435364 ENST00000377681 ENST00000489634

Proportion of splice site used site Proportion of splice 0% A/A A/G G/G Figure 2 | sQTL SNPs at canonical splice sites of genes with known transcript isoforms. sQTL SNPs at the canonical splice sites of PPIG (a), C15orf57 (b) and CSRP2BP (c) controlling alternative usage of splice sites. Schematic of transcript isoforms at each (RefSeq Genes and Ensembl Gene Predictions tracks from the UCSC Genome Browser (https://genome.ucsc.edu/) with the genomic sequences and coordinates) are shown in the left panels. Orange arrows indicate the positions of sQTL SNPs. Arrowheads indicate alternative splice sites. In b, detailed sequences around three differently used splice sites (chr15: 40,856,965, 40,856,990 and 40,857,175) are shown in magnified view. Proportions of alternative splice sites used are shown in the right panels. The averages among the carriers of each genotype are shown as stacked bars. The colours of stacked bars (blue, red and green) correspond to the alternative splice sites (arrowheads) in the left panels. Double-corrected P-values (see Methods) are indicated above the bars. manner and also there is a minor isoform (average percent- sQTL SNPs and genetic regulatory elements. In a recent study spliced-in (PSI)o1) spliced at chr15: 40,856,965 (green arrow of non-neuronal tissues, enrichment of sQTL SNPs among var- head). In the case of rs80113248 at chr20: 18,142,462 in ious regulatory elements was reported3. By using the data of the CSRP2BP, both the two splice sites 3 bp distant to each other ENCODE project22, we analysed whether the brain sQTL SNPs (chr20: 18,142,464 and 18,142,467) are used in A allele carriers, identified in this study are enriched among variants within whereas in G/G carriers the transcripts are exclusively spliced at genomic regions with the following regulatory annotations; chr20: 18,142,467 (Fig. 2c). For the other five canonical splice site DNase I hypersensitive sites, monomethylated histone H3 lysine sQTL SNPs at the proximity of associated AS, we also found that 4 (H3K4me1), trimethylated histone H3 lysine 4 (H3K4me3), disruption of canonical splice site by the variant allele causes acetylated histone H3 lysine 9 (H3K9ac), acetylated histone H3 increased proportion of exon skipping or IR (Supplementary lysine 27 (H3K27ac) and transcription factor (TF) binding sites. Fig. 4). Although the number of canonical splice site variants We found significant enrichment of sQTL SNPs among variants analysed in this study is small, identification of these ‘positive within H3K4me3 marks (P ¼ 1.7 10 11,OR¼ 2.10, two-tailed control’ variants regulating AS in an expected way could support Fisher’s exact test with Bonferroni correction) and significant the validity of our analyses. depletion of these SNPs among H3K4me1 (P ¼ 9.0 10 6,

4 NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519 ARTICLE a Ontology (EFO)24 term ‘EFO_0000408: disease’), we found –6 –11 significant enrichment when compared with non-sQTL SNPs (P ¼ 1.7 10 8,OR¼ 1.33, one-tailed Fisher’s exact test). We = 9.0×10 = 1 = 1 = 1.7×10 = 1 4 = 0.015 next analysed enrichment of sQTL SNPs using the data of nine P P P P P P individual diseases with the largest numbers of genome-wide 2 significantly associated SNPs in the Catalog (breast cancer, 1 colorectal cancer, inflammatory bowel disease, multiple sclerosis, prostate cancer, psoriasis, rheumatoid arthritis, schizophrenia and 0.5 type 2 diabetes), as well as four additional brain disorders

0.25 TF (autism, Alzheimer’s disease, bipolar disorder and Parkinson’s DHS disease) and two most intensively investigated non-disease traits H3K9ac OR (compared with non-sQTL) with OR (compared H3K27ac H3K4me1 H3K4me3 (height and body mass index). We observed significant enrichment of sQTL SNPs among the loci associated with b inflammatory bowel disease (P ¼ 0.0065, OR ¼ 1.41, one-tailed 4 PHF8 Fisher’s exact test with Bonferroni correction), schizophrenia CHD1 (P ¼ 0.0092, OR ¼ 2.53) and psoriasis (P ¼ 0.011, OR ¼ 2.57) E2F1 after performing correction for multiple testing (Fig. 4a). As we RFX5 3 found that in some cases (for example, in the case of psoriasis) the GABPB1 HDAC2 enrichment was mostly driven by variants in the major MXI1 SIN3A histocompatibility complex (MHC) locus when checking ELF1 individual SNPs in the associated loci, we also performed 2 ZNF263 POLR2A REST enrichment analyses excluding the data of SNPs in the NFIC MHC locus. In these analyses, there was significant enrichment MYC of sQTL SNPs among the loci associated with schizophrenia OR (compared with non-sQTL) with OR (compared 1 (P ¼ 9.9 10 5,OR¼ 3.72), inflammatory bowel disease (P ¼ 0.0014, OR ¼ 1.43) and multiple sclerosis (P ¼ 0.036, 024 6 8 OR ¼ 3.71) (Fig. 4a). –Log10 P-value In line with the fact that the data set used in this study derives from brain tissues, diseases whose associated loci are enriched for Figure 3 | Enrichment analyses of sQTL SNPs among variants within sQTL SNPs with the highest ORs include autism, schizophrenia genetic regulatory elements. (a) Enrichment analysis of sQTL SNPs among and multiple sclerosis, whereas enrichment among autism- variants within six types of regulatory elements (DNase I hypersensitive associated loci was not statistically significant (Fig. 4a, analyses sites (DHS), H3K4 monomethylation marks (H3K4me1), H3K4 excluding MHC variants). Among these diseases, most statisti- trimethylation marks (H3K4me3), H3K9 acetylation marks (H3K9ac), cally significant enrichment was observed for schizophrenia- H3K27 acetylation marks (H3K27ac) and TF binding sites). P-values were associated loci (P ¼ 9.9 10 5 after performing Bonferroni calculated by two-tailed Fisher’s exact test with Bonferroni correction correction). We next focused on this observation and performed according to the number of regulatory elements analysed (six elements). several confirmatory analyses to test the credibility of this result. Bars indicate 95% confidence intervals. (b) Plots of log P-values (x 10 First, we repeated the analysis using the data of well-defined 108 axis) and OR (y axis) obtained from enrichment analysis of sQTL SNPs schizophrenia-associated loci described in the largest GWAS to among variants within binding sites for each TF. The dashed blue line date conducted by the Psychiatric Genomics Consortium18 (PGC indicates P 0.05 and the solid blue line indicates P 0.05/ ¼ ¼ GWAS). This was because some of the SNPs identified by PGC 65 7.7 10 4 (Bonferroni-corrected P-value threshold, binding sites for ¼ GWAS were not included in the GWAS Catalog and the a total of 65 TF were tested). associated loci were defined in a more sophisticated way in PGC GWAS. With this data set, we confirmed that there was OR ¼ 0.67) and H3K27ac (P ¼ 0.015, OR ¼ 0.76) variants significant enrichment of sQTL SNPs among the risk loci (Fig. 4b, (Fig. 3a). We next looked at the data of binding sites for P ¼ 1.1 10 7,OR¼ 4.01, one-tailed Fisher’s exact test). individual TF. After performing Bonferroni correction with the Second, to test whether the enrichment is driven by higher number of TF subjected to our analyses (65 TF in total), proportion of exonic variants among sQTL SNPs (these variants significant enrichment of sQTL SNPs was observed for 14 TF would be more likely to be functional and thereby associated with (Fig. 3b and Supplementary Data 3). The most significant schizophrenia regardless of their impacts on AS), we performed enrichment was observed for POLR2A-binding sites an analysis using the data of SNPs in non-exonic (that is, intronic (P ¼ 7.1 10 17,OR¼ 1.85, two-tailed Fisher’s exact test with and intergenic) regions (N of SNPs ¼ 1,139). We found that non- Bonferroni correction), followed by PHF8 with the highest OR exonic sQTL SNPs are significantly enriched among (P ¼ 9.0 10 7,OR¼ 4.07) and SIN3A (P ¼ 6.6 10 5, schizophrenia-associated loci when compared with non-sQTL OR ¼ 2.41), CHD1 (P ¼ 0.00039, OR ¼ 3.21) and ELF1 SNPs in non-exonic regions (Fig. 4c, P ¼ 0.0030, OR ¼ 2.66, one- (P ¼ 0.0048, OR ¼ 2.22). tailed Fisher’s exact test). On the other hand, there was no statistically significant enrichment of exonic sQTL SNPs among schizophrenia risk loci when compared with exonic non-sQTL Enrichment analysis of sQTLs among disease-associated loci. SNPs (Fig. 4c, P ¼ 0.36, OR ¼ 1.26), suggesting that non-exonic To analyse the property of sQTL SNPs in the context of their sQTL SNPs are particularly contributing to schizophrenia risk by potential contribution to disease risks, we performed enrichment their impacts on splicing regulation. Third, we performed an analyses using the data of the GWAS Catalog23, a collection of analysis excluding sQTL SNPs associated with IR (N of excluded data from GWAS for various human diseases and traits (see SNPs ¼ 398). This was because often detection of IR is more Methods for definition of the associated loci). When we tested challenging than Alt EX and Alt SS, and the RNA-seq data set whether sQTL SNPs are globally enriched among loci associated used in this study derives from libraries prepared by ribosomal with various human diseases (defined by the Experimental Factor RNA depletion (not poly-A selection; thus, premature RNA

NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519

a Autism P = 0.2 P = 0.2

Schizophrenia P = 0.00061** P = 0.0000066**

Multiple sclerosis P = 0.015* P = 0.0024** Colorectal cancer P = 0.073 P = 0.072 Psoriasis P = 0.00072** P = 0.081 Body mass index P = 0.0061* P = 0.0059*

Rheumatoid arthritis P = 0.064 P = 0.032*

Alzheimer's disease P = 0.36 P = 0.36

Bipolar disorder P = 0.21 P = 0.35 P = 0.25 Breast cancer P = 0.25 P = 0.069 Parkinson's disease P = 0.044* P = 0.00043** Inflammatory bowel disease P = 0.00095** P = 0.0046* Height P = 0.0069* P = 0.075 Type 2 diabetes MHC included P = 0.11 MHC excluded P = 0.2 Prostate cancer P = 0.39 0.03125 0.125 0.5 2 8 32 OR (compared with non-sQTL) bcd P = 1.1×10–7 P = 0.00052 8 8 8 = 0.36 = 0.0030

4 P P 4 4 2 2

1 2 1

0.5 0.5

0.25 1 OR (compared with non-sQTL) OR (compared with non-sQTL) 0.25

0.5 OR (compared with non-sQTL) GWAS

Exonic Non-exonic sQTL SNPs schizophrenia associated with Alt EX Alt EX or Alt SS 108 loci in PGC 0.25

Figure 4 | Enrichment analyses of sQTL SNPs among disease-associated loci. (a) Results of enrichment analyses of sQTL SNPs among loci associated with 15 diseases/traits (nine diseases with the largest numbers of genome-wide significantly associated SNPs in the GWAS Catalog23: breast cancer, colorectal cancer, inflammatory bowel disease, multiple sclerosis, prostate cancer, psoriasis, rheumatoid arthritis, schizophrenia and type 2 diabetes; fouradditionalbrain disorder groups: autism, Alzheimer’s disease, bipolar disorder, Parkinson’s disease; and two most intensively investigated non-disease traits: height and body mass index). Red and blue bars indicate the results from analyses including and excluding variants in the MHC locus, respectively. Results are shown inthe order of OR from the analyses excluding MHC variants. Uncorrected P-values calculated by one-tailed Fisher’s exact test are shown. *Po0.05 and **Po0.05/ 15 ¼ 0.0033 (corresponding to the significance threshold considering the number of diseases/traits tested). (b) An enrichment analysis using the data of PGC GWAS instead of the data based on the GWAS Catalog. (c) Enrichment analyses dividing SNPs into exonic and non-exonic variants. (d) An enrichment analysis excluding sQTL SNPs associated with IRs. P-values were calculated by one-tailed Fisher’s exact tests. Bars indicate 95% confidence intervals. containing intronic regions can be to some extent included in the associated loci observed above indicates that some of these SNPs libraries). We found that sQTL SNPs associated with Alt EX or could causally contribute to the risk of schizophrenia by affecting Alt SS are significantly enriched among schizophrenia risk loci AS. We next sought to identify plausible candidates for such when compared with non-sQTLs (Fig. 4d, P ¼ 0.00052, sQTL SNPs. For this purpose, we utilized the data of PGC OR ¼ 2.85, one-tailed Fisher’s exact test). Taken together, these GWAS18 and selected candidate sQTL SNPs with the following results support credibility of the enrichment of sQTL SNPs criteria: (1) in LD with an index schizophrenia-associated SNP among schizophrenia-associated loci. identified in the PGC GWAS at r240.8 (it is noteworthy that we considered the most significantly associated SNP with available sQTLs that can be causally associated with schizophrenia. information of LD in the 1000 Genomes March 2012 data set at Significant enrichment of sQTL SNPs among schizophrenia- each locus as the index SNP, see Methods for more details), (2) by

6 NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519 ARTICLE themselves associated with schizophrenia at the level of genome- of sQTL SNPs among variants within binding sites for POLR2A wide significance (Po5 10 8) and (3) included in the list of (this encoding the largest subunit of RNA polymerase II is ‘credible SNPs’ (the sets of SNPs 99% likely to contain the causal included in the list of TF in ENCODE) (Fig. 3b), which is known variants; see Methods and ref. 18 for more details). We found that to be involved in AS regulation32,33. Strong enrichment was also four schizophrenia-associated loci harbour sQTL SNPs fulfilling observed for binding sites for various chromatin regulators such the selection criteria (Fig. 5). One was found on as PHF8, SIN3A and CHD1 (Fig. 3b). As partly discussed above, 3p21, where the index schizophrenia-associated SNP (rs2535627, this observation is in concordance with their roles in regulation of P for schizophrenia association in the PGC GWAS ¼ AS27,28,34,35. Gene-set enrichment analysis of genes regulated by 4.0 10 11) itself was identified as an sQTL SNP significantly sQTL SNPs found enrichment of genes with known splicing associated with an Alt EX of NEK4 (double-corrected P for isoforms, whereas it does not mean that all tested genes are sQTL ¼ 7.8 10 5) (Fig. 5a). On chromosome 3, there was involved in regulation by AS nor that all genes in the ‘splicing’ another locus (3q26) with an sQTL SNP that is in strong LD with term could be determined by our analysis. the index SNP. At this locus, rs1805564 associated with an Alt EX In the enrichment analysis of sQTL SNPs using the data of of FXR1 (double-corrected P for sQTL ¼ 0.019) is in LD with the GWAS, we observed significant overrepresentation of these index SNP rs34796896 (P for schizophrenia association ¼ 6.2 variants among loci associated with various human diseases, 10 11)atr2 ¼ 0.94 (Fig. 5b). On chromosome 6q14, an sQTL indicating roles of SNPs regulating AS in genetic disease SNP rs217323 was associated with an IR of SNAP91 (double- aetiologies. This observation is in agreement with the growing corrected P for sQTL ¼ 2.1 10 21) and this SNP is in LD with evidence that the majority of SNPs identified in GWAS contribute the index SNP rs3798869 (P for schizophrenia association ¼ to the disease risks through their impact on gene regulatory 1.2 10 9)atr2 ¼ 0.97 (Fig. 5c). The last one was found on functions36,37. Specifically, we found that sQTL SNPs identified in chromosome 14q32, where an sQTL SNP rs7148456 associated this study using the data of human brains are strongly enriched with an Alt EX of APOPT1 (also known as C14orf153, double- among schizophrenia-associated loci. Besides SNPs controlling corrected P for sQTL ¼ 3.2 10 10) is in LD with the index SNP gene-level expression (eQTL) or DNA methylation (mQTL or rs12887734 (P for schizophrenia association ¼ 2.3 10 12)at meQTL), whose contribution to the schizophrenia risk has been r2 ¼ 0.86 (Fig. 5d). Identification of these SNPs suggests demonstrated in recent studies38–40, our results indicate that dysregulation of AS at these loci as plausible biological basis sQTL SNPs, which are in most cases not overlap with gene-level explaining the association signals and points to the genes whose eQTLs41,42, can explain an additional part of the genetic AS is regulated by sQTL SNPs (that is, NEK4, FXR1, SNAP91 and architecture of schizophrenia. APOPT1) as promising candidates for causally associated genes By utilizing the list of sQTL SNPs, we could specify four among multiple genes included in each risk locus. promising candidate disease susceptibility genes for schizophre- nia (that is, NEK4, FXR1, SNAP91 and APOPT1), whose AS are regulated by sQTL SNPs in strong LD with the index SNPs Discussion identified in the PGC GWAS. NEK4 encodes a member of never- In this study, we analysed a large-scale data set of human brain in-mitosis A kinase that regulates cell cycle and response to transcriptome in combination with the genotyping data and double-stranded DNA damage43. It is of note that this gene is identified variants controlling AS events, sQTL SNPs, in a most highly expressed in the brain among multiple adult -wide manner. To our knowledge, this is the first study tissues44 and plays a key role in stabilization of neuronal cilia44, comprehensively identifying sQTLs using RNA-seq data derived whose contribution to various neural functions including nervous from human brain samples. system development and adult neurogenesis45, as well as possible By characterizing properties of the detected sQTL SNPs, we involvement in the pathophysiology of schizophrenia46,47, have found that these SNPs are enriched among exonic variants, been reported. FXR1 encodes a homologue of fragile-X mental including coding SNPs (Fig. 1b,c). This observation is consistent retardation protein (FMRP) that is responsible for fragile X with a recently introduced notion that many of the coding syndrome and the encoded protein (fragile X mental retardation variants not only define the sequence of the encoded protein but syndrome-related protein 1) is known to interact with FMRP48,49. also have an impact on various regulatory functions25,26. We also Recent large-scale genetic studies have consistently indicated observed that sQTL SNPs are enriched among variants within involvement of FMRP targets in the genetic architectures of H3K4me3 marks (Fig. 3a). There is accumulating evidence that schizophrenia9,50 and ASD10,51, indicating this gene as a this histone mark is not only associated with transcriptional particularly good candidate disease-associated gene. SNAP91 activation, but also plays a role in AS27,28. This process can be encodes the clathrin-associated protein AP180. AP180 is enriched mediated by physical binding of spliceosome to H3K4me3 via a in the presynaptic terminal of neurons52 and play an essential role chromo-helicase protein CHD1 (ref. 27), whose binding sites in synaptic neurotransmission53,54. AP180 KO mice show were enriched for sQTLs (Fig. 3b). It is also known that various excitatory/inhibitory imbalance53, which has been reported in epigenetic marks including H3K4me3 can be locally influenced by patients and animal models of neuropsychiatric disorders genetic variants29. Therefore, some of the SNPs in H3K4me3 including schizophrenia55. APOPT1 encodes a mitochondrial would alter epigenetic status and thereby act as sQTL SNPs. This protein that induces apoptotic cell death56. Causal contribution of possible scenario can be related to enrichment of sQTL SNPs this gene in cavitating leukoencephalopathy57, a rare brain among 50-UTR variants, which showed the second highest OR in disorder, as well as accumulating evidence, suggesting our analysis of various functional types of SNPs (Fig. 1c). This is involvement of mitochondrial dysfunction in neuropsychiatric because H3K4me3 marks are enriched in the 50end of gene bodies disorders58,59, imply a potential role of APOPT1 in the often including 50-UTRs30, besides well-known enrichment at pathogenesis of schizophrenia. promoter regions. It would be also of note that AS of histone- Considering several limitations of this study, first, although the modifying genes such as KDM1A and EHMT2 themselves are sample size in this study is substantial (N ¼ 206), it would not be known to play a role in global epigenetic regulation and neuronal sufficient to confidently identify all brain sQTL SNPs. Second, in differentiation31. Thus, it would be worthwhile to take this AS- this study we could only analyse the data of adult brain tissues chromatin feedback loop into account. In the analysis of binding from the single brain region (DLPFC). Analyses of sQTLs using sites for individual TF, we found the most significant enrichment large-scale data sets with higher spatial and temporal resolutions

NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications 7 otoldb QLSP nsrn D( LD strong in SNPs sQTL by controlled ahcrl niae N htaeclu-oe codn oterL ( LD their to according colour-coded are that SNP a indicates circle Each n ilnposo S fA nec eoye(ih aes o orlc nopsigA of AS encompassing loci four for panels) (right genotype each in AS of PSI of plots violin and soito (–log association h ein(oiotlbaklns n neqatl ag IR ht oe) ulesaesona lc dots. black as boxplots shown overlaid are The genotype. Outliers each boxes). in white PSI of (IQR; distributions range show interquartile panels right and the lines) in black plots (horizontal Violin events. median AS the associated the of positions the indicate ARTICLE 8 schizophrenia. for genes susceptibility candidate localize to sQTLs of Utilization | 5 Figure hehl ( threshold P ¼ 5 10 10 P vle)adtercmiainrt r obepotdo the on double-plotted are rate recombination the and -values) 8 .Gnsi h CCGnm rwe hts/gnm.cceu)aesoni h aesblwtelclpos e lines Red plots. local the below panels the in shown are (https://genome.ucsc.edu/) Browser Genome UCSC the in Genes ). c a b d –Log10(P-value) –Log10(P-value) –Log10(P-value) –Log10(P-value) 10 12 10 10 10 2 4 6 8 0 0 2 4 6 8 0 2 4 6 8 0 5 STAB1 SMIM4 TTC14 NT5DC2 418. 4384.4 84.3 84.2 84.1 ME1 CCDC39 52.6 r 8. 8. 8. 180.7 180.6 180.5 180.4 r SNORA28 r r 103.8 2 2 0.2 0.4 0.6 0.8 2 2 0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8 EIF5 4 BM E4MUSTN1 NEK4 PBRM1 .)wt h ne Nsi h WS oa ltfiue ntelf aeswr eeae yLocusZoom by generated were panels left the in figures plot Local GWAS. the in SNPs index the with 0.8) rs7148456 52.7 LOC101928882 SNORD19B SNORD19 103.9 PRSS35 GNL3 MARK3 Position onchr3(Mb) Position onchr6(Mb) Position onchr3(Mb) Position onchr14(Mb) 2852.9 52.8 TRMT61A TH SFMBT1 ITIH1 CKB 0 104.1 104 ITIH3 SNAP91 APOPT1 BAG5 ITIH4 AUECOMMUNICATIONS NATURE FXR1 DNAJC19 rs2535627 rs1805564 rs217323 KLC1 4584.6 84.5 353.1 53 ZFYVE21 XRCC3 8. 180.9 180.8 104.2 RIPPLY2 r 2 PPP1R13B ihtesT N idctdb upearw) h ttsia tegho the of strength statistical The arrows). purple by (indicated SNP sQTL the with ) LINC00637 SOX2-OT CYB5R4 RFT1 r 104.3 2 0.2 0.4 0.6 0.8 0 20 40 60 80 100 omitted Six genes 0 0 20 40 60 80 60 80 20 40 0 100 100 20 40 60 80 100

y :41 O:1.08nom159|www.nature.com/naturecommunications | 10.1038/ncomms14519 DOI: | 8:14519 | (cM/Mb) rate Recombination

xs lehrzna ie niaetegnm-iesignificance genome-wide the indicate lines horizontal Blue axis. Recombination rate (cM/Mb) Recombination rate (cM/Mb) (cM/Mb) rate Recombination

PSI 10.1038/ncomms14519 DOI: | COMMUNICATIONS NATURE PSI PSI PSI 10 15 10 15 20 25 100 NEK4 oa lt fterslso h G GWAS PGC the of results the of plots Local 100 0 5 0 5 70 80 90 60 70 80 90 Chr14:104,040,444-104,040,507 Chr3:180,688,863-180,688,943 Chr6:84,315,523-84,317,417 Chr3:52,799,903-52,800,010 ( / / G/G A/G A/A / / G/G A/G A/A a / / C/C C/T T/T C/C AS exonin ), AS exonin AS exonin IR in FXR1 rs1805564 rs2535627 rs217323 rs7148456 SNAP91 C/T ( b APOPT1 NEK4 ), FXR1 SNAP91 T/T

( c and ) APOPT1 18 ( d lf panels) (left ,wihare which ), indicate 65 . NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519 ARTICLE will provide a further informative data resource, especially in the ‘align’ module of vast-tools with default parameters. Next, the results were merged context of identification of genes and variants associated with a into a single file containing PSI of each AS event in each individual using the disease attributable to deficits in the specific brain region(s) and/ ‘combine’ module of vast-tools. By using the quality scores in the combined file (Column 8), we first excluded AS events whose Score 1 (read coverage based on or at the particular time period(s). Third, in this study we only actual reads) and Score 2 (read coverage based on corrected reads) in Column 8 did show statistically significant association between SNP and AS, not meet the minimum threshold (mapped reads Z10, in principle) in420% of and have not experimentally validated impact of sQTL SNPs on the individuals. We next excluded AS events whose PSI was 0 or 100% in 490% of AS, whereas often it is very difficult to determine whether an the individuals. After performing these procedures, there were a total of 102,469 AS events. According to the predefined types of AS in vast-tools13, these were classified sQTL SNP associated with AS directly regulates splicing or just into Alt EX, Alt SS and IRs. tags a functional variant, which is not investigated here (such as a rare splice region variant). Identification of sQTL SNPs. Correlation between genotypes and PSI of AS was In summary, we in this study comprehensively identified analysed by using Matrix eQTL19 with the additive linear model. To control SNPs regulating AS events in the human brain, described the potential confounding factors, the following parameters were included in the characteristics of these sQTL SNPs and demonstrated that the list analysis as covariates; gender, age of death, research institute where the samples of brain sQTL SNPs can be used to identify plausible candidate were collected (Mount Sinai, Pennsylvania or Pittsburg), post-mortem interval, brain pH, RNA integrity number and sequencing library batch. We considered all genes/variants causally associated with schizophreni and will also AS-SNP pairs when the distance between AS and SNP is less than 100 kb. This be useful to generate animal models. Our results provide a new ±100 kb window was determined by referring previous studies reporting that insight into the genetic architecture of schizophrenia. By sQTL SNPs are particularly enriched among the proximal regions16,41,62. When integrating various data resources (for example, sQTLs, eQTLs, there are multiple AS events within the ±100 kb window around a SNP, we used the smallest P-value to define sQTL SNPs. The smallest P-value for each SNP was mQTLs and more), we will obtain a more detailed picture of the then subjected to Bonferroni correction with the number of AS within the genomic landscape of complex brain disorders. ±100 kb counted by window function of BEDtools63. This was because a SNP with a large number of AS in the window should have higher chance to show significant Methods association. RNA-seq data of DLPFC. RNA-seq data (BAM files) of DLPFC from individuals without neuropsychiatric diseases or neurological insults immediately before death Gene-set enrichment analysis of genes regulated by sQTL SNPs. A gene-set (N of individuals ¼ 285) were downloaded from ‘Raw’ directory of the Common- enrichment analysis of genes with AS regulated by sQTL SNPs was performed by Mind Consortium Knowledge Portal (https://www.synapse.org/#!Synapse:- using the Database for Annotation, Visualization and Integrated Discovery20 with syn4923029) using Synapse Python Client (http://python-docs.synapse.org/ default parameters. In total, there were 1,341 unique genes with AS regulated by index.html). The data set was generated as a part of the CommonMind Consortium sQTL SNPs. The input genes can be found in Supplementary Data 1. supported by funding from Takeda Pharmaceuticals Company Limited, F. Hoff- man-La Roche Ltd and NIH grants R01MH085542, R01MH093725, P50MH066392, P50MH080405, R01MH097276, RO1-MH-075916, P50M096891, sQTL and non-sQTL data sets for comparison. To generate a set of sQTL SNPs P50MH084053S1, R37MH057881 and R37MH057881S1, HHSN271201300031C, probably contributing to AS regulation independently of each other, we first AG02219, AG05138 and MH06692. Brain tissue for the study was obtained from extracted the best sQTL SNP for each AS event (N of SNPs ¼ 1,595). We next the following brain bank collections: the Mount Sinai NIH Brain and Tissue performed LD-based pruning of these 1,595 SNPs using –indep-pairwise function 61 Repository, the University of Pennsylvania Alzheimer’s Disease Core Center, the of PLINK with the following parameters: window size in SNPs ¼ 50, the number 2 University of Pittsburgh NeuroBioBank and Brain and Tissue Repositories, and the of SNPs to shift the window at each step ¼ 5 and the r threshold ¼ 0.5. For this 64 NIMH Human Brain Collection Core. CMC Leadership: Pamela Sklar, Joseph analysis, the 1000 Genomes Project March 2012 EUR (Europeans) data set 65 Buxbaum (Icahn School of Medicine at Mount Sinai), Bernie Devlin, David Lewis downloaded as a part of the LocusZoom package was used as the reference. After (University of Pittsburgh), Raquel Gur, Chang-Gyu Hahn (University of Penn- performing LD-based pruning, there were a total of 1,539 sQTL SNPs. To generate sylvania), Keisuke Hirai, Hiroyoshi Toyoshiba (Takeda Pharmaceuticals Company a control data set of non-sQTL SNPs, we first extracted SNPs for which the smallest Limited), Enrico Domenici, Laurent Essioux (F. Hoffman-La Roche Ltd), Lara uncorrected P-value was larger than 0.05 (N of SNPs ¼ 170,241). We then Mangravite, Mette Peters (Sage Bionetworks), Thomas Lehner and Barbara performed LD-based pruning with the same parameters and the reference 1000 Lipska (NIMH). Detailed procedures for tissue collection, sample preparation, Genomes data set used for sQTL SNPs and generated a set of 89,367 SNPs, which RNA-seq and data processing are available in the Consortium’s wiki page are unlikely to be associated with AS and not strongly dependent of each other (https://www.synapse.org/#!Synapse:syn2759792/wiki/69613). Briefly, ribosomal (non-sQTL SNPs). We then stratified these non-sQTL SNPs into 2% MAF bins and RNA was depleted from about 1 mg of total RNA using Ribozero Magnetic Gold kit extracted 48,068 SNPs with the distribution of MAF matched to the set of 1,539 (Illumina, San Diego, CA). The sequencing library was prepared using the TruSeq sQTL SNPs (Supplementary Fig. 2). We used these sets of 1,539 sQTL SNPs and RNA Sample Preparation Kit v2 (Illumina). Sequencing was performed by using 48,068 non-sQTL SNPs in the downstream analyses to characterize the properties HiSeq2500 (Illumina). As the sequencing libraries are prepared by using rRNA of sQTL SNPs. depletion procedures, the RNA-seq data should contain the information from total RNA including non-coding RNA and precursor mRNA. Downloaded BAM files Functional annotation of sQTL and non-sQTL SNPs. We functionally annotated for mapped and unmapped reads from each individual were merged by using 1,539 sQTL and 48,068 non-sQTL SNPs by using SnpEff21. Information of SnpEff SAMtools60. Merged BAM files were converted into the fastq format using annotation was collected by using MyVariant.info (http://myvariant.info/) and bam2fastq (https://gsl.hudsonalpha.org/information/software/bam2fastq). Variant Effect Predictor (VeP)66. According to these annotations, SNPs were classified into the following categories: nonsense, readthrough, start-loss, SNP genotyping data. Quality-controlled genotyping data (SNPs with zero frameshift, canonical splice site, missense, synonymous, splice region, 50-UTR, alternate alleles, genotyping call rate o0.98 or Hardy–Weinberg P-value 30-UTR, non-coding exon, intron and intergenic variants. Splice region variants o5 10 5 and individuals with genotyping call rate o0.90 were removed) were were defined as variants either within 1–3 bases of the exon or 3–8 bases of the downloaded from ‘QCd’ directory of the CommonMind Consortium Knowledge intron from the splice site21. When a SNP was annotated with multiple functional Portal (https://www.synapse.org/#!Synapse:syn4551740). Genotyping was per- types, we assigned the SNP to the functional class probably having the highest formed by using Infinium HumanOmniExpressExome v1.1 DNA Analysis Kit impact (that is, the leftmost one among the functional categories described above). (Illumina). With these genotype data, we performed multidimensional scaling We considered nonsense, readthrough, start-loss, frameshift, canonical splice site, using PLINK61. As expected, the first dimension (the x axis of Supplementary missense, synonymous, splice region, 50-UTR, 30-UTR and non-coding exon Fig. 1) represents ethnicities of the participants. We extracted the data of variants as exonic SNPs, and intron and intergenic variants as non-exonic SNPs. Caucasians included in the single largest cluster indicated by the red box in Enrichment analyses of sQTL SNPs according to their functionalities were Supplementary Fig. 1 (N of individuals ¼ 206, summary statistics for these performed by two-tailed Fisher’s exact test with the following 2 2 table: columns; individuals are available in Supplementary Table 1). After excluding SNPs with sQTL SNPs and non-sQTL SNPs, rows; SNPs ‘assigned’ and ‘not assigned’ to the MAFo1% among these 206 individuals with a homogeneous genetic background, particular functional class. For enrichment analysis of each functional class of there were 607,993 autosomal SNPs. Of these SNPs, we extracted 313,906 SNPs variants, we performed Bonferroni correction according to the number of that are within ±100 kb of any of the identified AS events and used them in the functional types subjected to the analysis (ten types: canonical splice site, the other analysis of sQTL SNPs. loss-of-function, missense, synonymous, splice region, 50-UTR, 30-UTR, non- coding exon, intron and intergenic variants). Comprehensive detection of AS events. Comprehensive detection of AS events was performed by using vast-tools (version 0.2.1)13. We first mapped the reads in Enrichment analyses of sQTL SNPs among regulatory elements. Annotation the fastq files generated above onto the reference human genome (hg19) using the files for DNase I hypersensitive sites, H3K4me1, H3K4me3, acetylated histone H3

NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519 lysine 9, H3K27ac and TF-binding sites were downloaded from the ENCODE Data availability. The mapped RNA-seq data (BAM files) that support the portal (https://www.encodeproject.org/data/annotations/, accessed January 2016, findings of this study are available in CommonMind Consortium Knowledge Portal the data from Roadmap Epigenomics Consortium67 were also integrated to these (https://www.synapse.org/#!Synapse:syn4923029) upon authentication by the data sets). We analysed whether a SNP is included in each regulatory element using Consortium. BEDtools63. Enrichment analyses of sQTL SNPs among regulatory elements were performed by two-tailed Fisher’s exact test with the following 2 2 table: columns; sQTL SNPs and non-sQTL SNPs, rows; SNPs within and not within the regulatory References element. In the enrichment analyses of binding sites for individual TF, we excluded 1. Barbosa-Morais, N. L. et al. The evolutionary landscape of alternative splicing B TF for which the number of records (each record is 150 bp genomic region) in in vertebrate species. Science 338, 1587–1593 (2012). the annotation file was smaller than 50,000. There were a total of 65 TF with 50,000 2. Merkin, J., Russell, C., Chen, P. & Burge, C. B. Evolutionary dynamics of gene or more records of binding sites in the bed file downloaded from the ENCODE and isoform regulation in mammalian tissues. Science 338, 1593–1599 (2012). portal. According to this number we applied Bonferroni correction. 3. GTEx Consortium, Human genomics. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348, 648–660 (2015). Enrichment analyses of sQTLs among disease-associated loci. The list of SNPs 23 4. Licatalosi, D. D. & Darnell, R. B. Splicing regulation in neurologic disease. associated with various human traits was downloaded from the GWAS Catalog Neuron 52, 93–101 (2006). (http://www.ebi.ac.uk/gwas, the gwas_catalog_v1.0.1 file, accessed June 2016). We 8 5. Raj, B. & Blencowe, B. J. Alternative splicing in the mammalian nervous system: included SNPs with genome-wide significant association (Po5 10 )inour recent insights into mechanisms and functional roles. Neuron 87, 14–27 (2015). analyses. An associated genomic locus for each SNP was defined as the genomic 6. Xu, B. et al. De novo gene mutations highlight patterns of genetic and neural region containing SNPs in LD with the index SNP at r240.6. SNPs in LD with the complexity in schizophrenia. Nat. Genet. 44, 1365–1369 (2012). index SNP were identified by using PLINK61 with the 1000 Genomes Project 7. Xu, B. et al. Exome sequencing supports a de novo mutational paradigm for March 2012 EUR data set. Next, we analysed whether each SNP falls within the schizophrenia. Nat. Genet. 43, 864–868 (2011). disease-associated loci using the BEDtools63. SNPs associated with human diseases 8. De Rubeis, S. et al. Synaptic, transcriptional and chromatin genes disrupted in were extracted by using the EFO24 ID tags (MAPPED_TRAIT_URI column of the GWAS catalog file). We considered SNPs associated with any of the child terms of autism. Nature 515, 209–215 (2014). ‘EFO_0000408: disease’ as disease-associated SNPs. Information of child terms of 9. Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic ‘EFO_0000408: disease’ was collected by using the ontoCAT package68 of R. networks. Nature 506, 179–184 (2014). We evaluate whether there is enrichment of sQTL SNPs among associated loci by 10. Iossifov, I. et al. The contribution of de novo coding mutations to autism one-tailed Fisher’s exact test with the following 2 2 table: columns; sQTL SNPs spectrum disorder. Nature 515, 216–221 (2014). and non-sQTL SNPs, rows; SNPs within and not within the disease-associated loci. 11. Takata, A., Ionita-Laza, I., Gogos, J. A., Xu, B. & Karayiorgou, M. De Novo We performed these analyses for the following diseases/traits: (1) all human synonymous mutations in regulatory elements contribute to the genetic diseases (EFO_0000408: disease); (2) nine individual diseases with the largest etiology of autism and schizophrenia. Neuron 89, 940–947 (2016). numbers of genome-wide significantly associated SNPs in the GWAS Catalog23 12. Voineagu, I. et al. Transcriptomic analysis of autistic brain reveals convergent (N of SNPsZ80): breast cancer, colorectal cancer, inflammatory bowel disease molecular pathology. Nature 474, 380–384 (2011). (including Crohn’s disease and ulcerative colitis), multiple sclerosis, prostate 13. Irimia, M. et al. A highly conserved program of neuronal microexons is cancer, psoriasis, rheumatoid arthritis, schizophrenia and type 2 diabetes; (3) four misregulated in autistic brains. Cell 159, 1511–1523 (2014). additional brain disorders: autism, Alzheimer’s disease, bipolar disorder and 14. Chung, D. W. et al. Dysregulated ErbB4 splicing in schizophrenia: selective Parkinson’s disease; and (4) two most intensively investigated non-disease traits: effects on parvalbumin expression. Am. J. Psychiatry 173, 60–68 (2016). height and body mass index. For analyses excluding SNPs in the MHC locus, we 15. Clinton, S. M., Haroutunian, V., Davis, K. L. & Meador-Woodruff, J. H. Altered did not use the information of SNPs in chr6:28,477,797–33,448,354 (hg19; based on transcript expression of NMDA receptor-associated postsynaptic in the definition by The Genome Reference Consortium http://www.ncbi.nlm.nih. the thalamus of subjects with schizophrenia. Am. J. Psychiatry 160, 1100–1109 gov/projects/genome/assembly/grc/region.cgi?name=MHC&asm=GRCh37). (2003). 16. Battle, A. et al. Characterizing the genetic basis of transcriptome diversity through RNA-sequencing of 922 individuals. Genome Res. 24, 14–24 (2014). Confirmatory analyses for sQTLs in schizophrenia risk loci. A confirmatory 17. Marioni, J. C., Mason, C. E., Mane, S. M., Stephens, M. & Gilad, Y. RNA-seq: an enrichment analysis using the data of 108 loci defined in PGC GWAS18 was assessment of technical reproducibility and comparison with gene expression performed using the data downloaded from the PGC portal (https:// arrays. Genome Res. 18, 1509–1517 (2008). www.med.unc.edu/pgc/files/resultfiles/scz2.regions.zip). Enrichment analyses 18. Schizophrenia Working Group of the Psychiatric Genomics Consortium. excluding exonic variants were performed by extracting the data of intronic and Biological insights from 108 schizophrenia-associated genetic loci. Nature 511, 21 intergenic variants according to the SnpEff annotations described above (see 421–427 (2014). ‘Functional Annotation of sQTLs and Non-sQTL SNPs’ section). An analysis 19. Shabalin, A. A. Matrix eQTL: ultra fast eQTL analysis via large matrix excluding sQTL SNPs associated with IR was performed by excluding 398 sQTL operations. Bioinformatics 28, 1353–1358 (2012). SNPs whose most significantly associated AS was IR. Enrichment analyses of sQTL 20. Huang, da, W., Sherman, B. T. & Lempicki, R. A. Systematic and integrative SNPs among schizophrenia-associated loci using the data of PGC GWAS, analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. excluding the data of exonic variants or sQTL SNPs associated with IR were 4, 44–57 (2009). performed by one-tailed Fisher’s exact test. 21. Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 Identification of sQTL SNPs in strong LD with the index SNP. The full result of 18 (2012). the PGC schizophrenia GWAS (https://www.med.unc.edu/pgc/files/resultfiles/ 22. Encode_Project_Consortium. An integrated encyclopedia of DNA elements in scz2.snp.results.txt.gz) and the data of credible causal sets of SNPs (sets of SNPs the human genome. Nature 489, 57–74 (2012). that were 99% likely to contain the causal variants18; these sets were defined 23. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait for each schizophrenia-associated locus https://www.med.unc.edu/pgc/files/ associations. Nucleic Acids Res. 42, D1001–D1006 (2014). resultfiles/pgc.scz2.credible.SNPs.zip) were downloaded from the PGC portal 24. Malone, J. et al. Modeling sample variables with an Experimental Factor (http://www.med.unc.edu/pgc/downloads). By using these data sets, we extracted 26, sQTL SNPs that are: (1) in LD with an index schizophrenia-associated SNP Ontology. Bioinformatics 1112–1118 (2010). identified in the PGC GWAS at r240.8, (2) by themselves associated with 25. Birnbaum, R. Y. et al. Coding exons function as tissue-specific enhancers of 22, schizophrenia at the level of genome-wide significance (Po5 10 8) and (3) nearby genes. Genome Res. 1059–1068 (2012). included in the list of ‘credible SNPs’ described above. In total, we found sQTL 26. Stergachis, A. B. et al. Exonic transcription factor binding directs codon choice SNPs satisfying these criteria in four independent loci. In two instances, and affects protein evolution. Science 342, 1367–1372 (2013). information of LD in the 1000 Genomes March 2012 EUR data set was not 27. Sims, 3rd R. J. et al. Recognition of trimethylated histone H3 lysine 4 facilitates available for the index SNPs described in Supplementary Table 2 of ref. 18 the recruitment of transcription postinitiation factors and pre-mRNA splicing. (chr3_180594593_I and chr6_84280274_D). In these cases, we considered the most Mol. Cell 28, 665–676 (2007). significantly associated SNP with available information of LD in each locus as the 28. Luco, R. F. et al. Regulation of alternative splicing by histone modifications. index SNP (rs34796896 for chr3_180594593_I and rs3798869 for Science 327, 996–1000 (2010). chr6_84280274_D). Regional visualization of the PGC GWAS result (the 29. Grubert, F. et al. Genetic control of chromatin states in humans involves local scz2.snp.results.txt.gz file) with information of sQTL SNPs and associated AS was and distal chromosomal interactions. Cell 162, 1051–1065 (2015). performed by using LocusZoom65 based on the 1000 Genomes March 2012 EUR 30. Davie, J. R., Xu, W. & Delcuve, G. P. Histone H3K4 trimethylation: dynamic data set. SNPs not included in this reference data set were not displayed in the interplay with pre-mRNA splicing. Biochem. Cell Biol. 94, 1–11 (2016). figure. LD (r2) between the index SNP and sQTL SNP was computed by using 31. Fiszbein, A. & Kornblihtt, A. R. Histone methylation, alternative splicing and PLINK61 with the same 1000 Genomes March 2012 EUR data set. neuronal differentiation. Neurogenesis (Austin) 3, e1204844ll (2016).

10 NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/ncomms14519 ARTICLE

32. Ip, J. Y. et al. Global impact of RNA polymerase II elongation inhibition on 59. Rajasekaran, A., Venkatasubramanian, G., Berk, M. & Debnath, M. alternative splicing regulation. Genome Res. 21, 390–401 (2011). Mitochondrial dysfunction in schizophrenia: pathways, mechanisms and 33. de la Mata, M. & Kornblihtt, A. R. RNA polymerase II C-terminal domain implications. Neurosci. Biobehav. Rev. 48, 10–21 (2015). mediates regulation of alternative splicing by SRp20. Nat. Struct. Mol. Biol. 13, 60. Li, H. et al. The Sequence Alignment/Map format and SAMtools. 973–980 (2006). Bioinformatics 25, 2078–2079 (2009). 34. Kornblihtt, A. R. et al. Alternative splicing: a pivotal step between eukaryotic 61. Purcell, S. et al. PLINK: a tool set for whole-genome association and transcription and translation. Nat. Rev. Mol. Cell Biol. 14, 153–165 (2013). population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). 35. Luco, R. F., Allo, M., Schor, I. E., Kornblihtt, A. R. & Misteli, T. Epigenetics in 62. Pan, Q., Shai, O., Lee, L. J., Frey, B. J. & Blencowe, B. J. Deep surveying of alternative pre-mRNA splicing. Cell 144, 16–26 (2011). alternative splicing complexity in the human transcriptome by high-throughput 36. Nicolae, D. L. et al. Trait-associated SNPs are more likely to be eQTLs: sequencing. Nat. Genet. 40, 1413–1415 (2008). annotation to enhance discovery from GWAS. PLoS Genet. 6, e1000888 (2010). 63. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing 37. Maurano, M. T. et al. Systematic localization of common disease-associated genomic features. Bioinformatics 26, 841–842 (2010). variation in regulatory DNA. Science 337, 1190–1195 (2012). 64. The 1000 Genomes Project Consortium. An integrated map of genetic variation 38. Richards, A. L. et al. Schizophrenia susceptibility alleles are enriched for alleles from 1,092 human genomes. Nature 491, 56–65 (2012). that affect gene expression in adult human brain. Mol. Psychiatry 17, 193–201 65. Pruim, R. J. et al. LocusZoom: regional visualization of genome-wide (2012). association scan results. Bioinformatics 26, 2336–2337 (2010). 39. Hannon, E. et al. Methylation QTLs in the developing brain and their 66. McLaren, W. et al. Deriving the consequences of genomic variants with enrichment in schizophrenia risk loci. Nat. Neurosci. 19, 48–54 (2016). the Ensembl API and SNP effect predictor. Bioinformatics 26, 2069–2070 40. Jaffe, A. E. et al. Mapping DNA methylation across development, genotype and (2010). schizophrenia in the human frontal cortex. Nat. Neurosci. 19, 40–47 (2016). 67. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference 41. Zhang, X. et al. Identification of common genetic variants controlling transcript human epigenomes. Nature 518, 317–330 (2015). isoform variation in human whole blood. Nat. Genet. 47, 345–352 (2015). 68. Kurbatova, N., Adamusiak, T., Kurnosov, P., Swertz, M. A. & Kapushesky, M. 42. Li, Y. I. et al. RNA splicing is a primary link between genetic variation and ontoCAT: an R package for ontology traversal and search. Bioinformatics 27, disease. Science 352, 600–604 (2016). 2468–2470 (2011). 43. Nguyen, C. L. et al. Nek4 regulates entry into replicative senescence and the response to DNA damage in human fibroblasts. Mol. Cell Biol. 32, 3963–3977 Acknowledgements (2012). 44. Coene, K. L. et al. The ciliopathy-associated protein homologs RPGRIP1 and This work was supported by JSPS KAKENHI Grant Number JP 16H06254, the Strategic RPGRIP1L are linked to cilium integrity through interaction with Nek4 serine/ Research Program for Brain Sciences from Japan Agency for Medical Research and threonine kinase. Hum. Mol. Genet. 20, 3592–3605 (2011). development (AMED) and grants to the Laboratory for Molecular Dynamics of Mental 45. Guemez-Gamboa, A., Coufal, N. G. & Gleeson, J. G. Primary cilia in the Disorders, RIKEN BSI. developing and mature brain. Neuron 82, 511–521 (2014). 46. Marley, A. & von Zastrow, M. DISC1 regulates primary cilia that display Author contributions specific dopamine receptors. PLoS ONE 5, e10902 (2010). A.T. designed the study, performed the analyses and wrote the paper. N.M. and T.K. 47. Marley, A. & von Zastrow, M. A simple cell-based assay reveals that diverse supervised the study and contributed to the interpretation of the results. neuropsychiatric risk genes converge on primary cilia. PLoS ONE 7, e46647 (2012). 48. Siomi, M. C., Zhang, Y., Siomi, H. & Dreyfuss, G. Specific sequences in the Additional information fragile X syndrome protein FMR1 and the FXR proteins mediate their binding Supplementary Information accompanies this paper at http://www.nature.com/ to 60S ribosomal subunits and the interactions among them. Mol. Cell Biol. 16, naturecommunications 3825–3832 (1996). 49. Zhang, Y. et al. The fragile X mental retardation syndrome protein interacts Competing financial interests: T.K. received a research grant from Takeda Pharma- with novel homologs FXR1 and FXR2. EMBO J. 14, 5358–5366 (1995). ceuticals Company Limited outside of this work. The remaining authors declare no 50. Purcell, S. M. et al. A polygenic burden of rare disruptive mutations in competing financial interests. schizophrenia. Nature 506, 185–190 (2014). 51. Iossifov, I. et al. De novo gene disruptions in children on the autistic spectrum. Reprints and permission information is available online at http://npg.nature.com/ Neuron 74, 285–299 (2012). reprintsandpermissions/ 52. Yao, P. J., Coleman, P. D. & Calkins, D. J. High-resolution localization of How to cite this article: Takata, A. et al. Genome-wide identification of splicing QTLs clathrin assembly protein AP180 in the presynaptic terminals of mammalian in the human brain and their enrichment among schizophrenia-associated loci. neurons. J. Comp. Neurol. 447, 152–162 (2002). Nat. Commun. 8, 14519 doi: 10.1038/ncomms14519 (2017). 53. Koo, S. J. et al. Vesicular Synaptobrevin/VAMP2 levels guarded by AP180 control efficient neurotransmission. Neuron 88, 330–344 (2015). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in 54. Zhang, B. et al. Synaptic vesicle size and number are regulated by a clathrin published maps and institutional affiliations. adaptor protein required for endocytosis. Neuron 21, 1465–1475 (1998). 55. Marin, O. Interneuron dysfunction in psychiatric disorders. Nat. Rev. Neurosci. 13, 107–120 (2012). This work is licensed under a Creative Commons Attribution 4.0 56. Yasuda, O. et al. Apop-1, a novel protein inducing cyclophilin D-dependent but International License. The images or other third party material in this Bax/Bak-related channel-independent apoptosis. J. Biol. Chem. 281, article are included in the article’s Creative Commons license, unless indicated otherwise 23899–23907 (2006). in the credit line; if the material is not included under the Creative Commons license, 57. Melchionda, L. et al. Mutations in APOPT1, encoding a mitochondrial protein, users will need to obtain permission from the license holder to reproduce the material. cause cavitating leukoencephalopathy with cytochrome c oxidase deficiency. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ Am. J. Hum. Genet. 95, 315–325 (2014). 58. Kato, T. & Kato, N. Mitochondrial dysfunction in bipolar disorder. Bipolar Disord. 2, 180–190 (2000). r The Author(s) 2017

NATURE COMMUNICATIONS | 8:14519 | DOI: 10.1038/ncomms14519 | www.nature.com/naturecommunications 11