Molecular Psychiatry (2014) 19, 1179–1185 © 2014 Macmillan Publishers Limited All rights reserved 1359-4184/14 www.nature.com/mp

ORIGINAL ARTICLE RNA-sequencing of the brain transcriptome implicates dysregulation of neuroplasticity, circadian rhythms and GTPase binding in bipolar disorder

N Akula1, J Barb2, X Jiang1, JR Wendland1, KH Choi3, SK Sen4, L Hou1, DTW Chen1, G Laje1, K Johnson5, BK Lipska6, JE Kleinman7, H Corrada-Bravo8, S Detera-Wadleigh1, PJ Munson2 and FJ McMahon1

RNA-sequencing (RNA-seq) is a powerful technique to investigate the complexity of expression in the human brain. We used RNA-seq to survey the brain transcriptome in high-quality postmortem dorsolateral prefrontal cortex from 11 individuals diagnosed with bipolar disorder (BD) and from 11 age- and gender-matched controls. Deep sequencing was performed, with over 350 million reads per specimen. At a false discovery rate of o5%, we detected five differentially expressed (DE) and 12 DE transcripts, most of which have not been previously implicated in BD. Among these, Prominin 1/CD133 and ATP-binding cassette-sub-family G-member2 (ABCG2) have important roles in neuroplasticity. We also show for the first time differential expression of long noncoding RNAs (lncRNAs) in BD. DE transcripts include those of serine/arginine-rich splicing factor 5 (SRSF5) and regulatory factor X4 (RFX4), which along with lncRNAs have a role in mammalian circadian rhythms. The DE genes were significantly enriched for several categories. Of these, genes involved with GTPase binding were also enriched for BD-associated SNPs from previous genome-wide association studies, suggesting that differential expression of these genes is not simply a consequence of BD or its treatment. Many of these findings were replicated by microarray in an independent sample of 60 cases and controls. These results highlight common pathways for inherited and non-inherited influences on disease risk that may constitute good targets for novel therapies.

Molecular Psychiatry (2014) 19, 1179–1185; doi:10.1038/mp.2013.170; published online 7 January 2014 Keywords: gene expression; Gene Ontology; genome-wide association studies; long noncoding RNAs; microarrays

INTRODUCTION Here we report findings from the first RNA-seq-based survey of Gene expression profiling studies using microarray technology on the brain transcriptome in high-quality postmortem brain tissue postmortem brain tissues have identified a number of differentially derived from two independent sets of individuals diagnosed with expressed (DE) genes in bipolar disorder (BD), but with limited BD along with controls who were carefully matched for age and overlap of findings across studies.1–5 The available microarray gender. In this initial analysis, we have focused on known genes technology lacks sensitivity for low-abundance mRNA and may miss and transcripts. The results, which were largely replicated in a differentially spliced genes and isoforms that might have relevance microarray study in an independent sample, demonstrate that to disease risk and progression. RNA-sequencing (RNA-seq) involves genes involved in neuroplasticity, circadian rhythms and GTPase massively parallel sequencing of the transcribed portion of the binding are DE in BD and may contribute to disease risk. genome, providing a direct estimate of transcript abundance, with counts proportional to the absolute abundance of target. Differential expression can be measured at various levels (gene, MATERIALS AND METHODS fl transcript, exon and so on) that may re ect alternative splicing and Samples other important regulatory phenomena. For these reasons, RNA-seq Total RNA extracted from the dorsolateral prefrontal cortex (DLPFC) is sensitive to low-abundance genes and transcripts not well 6 (Brodmann area 46) of postmortem brain tissue of 11 BD cases and 8 represented on microarrays. As sequence data is not limited by psychiatrically healthy controls was obtained from the Stanley Medical 7 known gene annotations, novel genes can also be detected. Research Institute (www.stanleyresearch.org). Additional total RNA

1Human Genetics Branch, National Institute of Mental Health Intramural Research Program, National Institutes of Health, US Department of Health and Human Services, Bethesda, MD, USA; 2Mathematical and Statistical Computing Laboratory, Center for Information Technology, National Institutes of Health, US Department of Health and Human Services, Bethesda, MD, USA; 3Department of Psychiatry, Uniformed Services University of the Health Sciences, Bethesda, MD, USA; 4Genetic Disease Research Branch, National Research Institute, National Institutes of Health, US Department of Health and Human Services, Bethesda, MD, USA; 5Bioinformatics Section, Information Technology & Bioinformatics Program, Division of Intramural Research, National Institute of Neurological Disorders & Stroke, National Institutes of Health, US Department of Health and Human Services, Bethesda, MD, USA; 6Human Brain Collection Core, Division of Intramural Research Programs, National Institute of Mental Health Intramural Research Program, National Institutes of Health, US Department of Health and Human Services, Bethesda, MD, USA; 7Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD, USA and 8Department of Computer Science, Institute for Advanced Computer Studies and Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA. Correspondence: Dr N Akula, Human Genetics Branch, National Institute of Mental Health Intramural Research Program, National Institutes of Health, US Department of Health and Human Services, Bethesda, MD 20892, USA. E-mail: [email protected] Received 28 March 2013; revised 24 October 2013; accepted 29 October 2013; published online 7 January 2014 Brain transcriptome in bipolar disorder N Akula et al 1180 specimens prepared from the same region of postmortem brain from three psychiatrically healthy controls were provided by the Clinical Brain Disorders Branch, National Institute of Mental Health (NIMH) with informed consent from the legal next of kin under NIMH protocol 90-M-0142. Clinical characterization, neuropathological screening, toxicological analyses and dissections of the DLPFC for Clinical Brain Disorders Branch postmortem brains were previously described.8 Cases and controls were matched on age and gender (Supplementary Table 1). All specimens had an RNA integrity number (RIN) ⩾6.0, except one case with a RIN of 4.9 that was dropped from analysis (see below).

RNA-seq RNA-seq was performed at National Institutes of Health (NIH) Intramural Sequencing Center (NISC) that included library preparation, fragmentation and PCR enrichment of target RNA according to standard protocols. The first batch of specimens (five BD cases and five controls), referred to herein as ‘NISC1’, underwent paired-end sequencing on five lanes (one lane of 51 bp reads and four lanes of 76 bp reads) of the Illumina GA-IIx system (Illumina, San Diego, CA, USA). The second batch of six cases and six controls, referred to herein as ‘NISC2’, was sequenced on one lane (100 Figure 1. Significant correlation of differential gene expression in bp reads) of the Illumina HiSeq system (Illumina). two independent sets of dorsolateral prefrontal cortex (DLPFC) brain samples. Fold change values are shown on the log2 scale for both National Institutes of Health (NIH) Intramural Sequencing Center Gene-level analysis (NISC)-1 (x axis) and NISC2 (y axis). Each point represents one gene. FASTQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and Colored lines represent 5% density contours. The blue diagonal line FASTX-toolkit (http://hannonlab.cshl.edu/fastx_toolkit/) were used to represents the line of identity. examine the quality and trim reads. Paired-end reads were aligned to human genome build 19 using TopHat (TopHat version 2.0.4; http://tophat. cbcb.umd.edu/).9 All NISC1 and NISC2 samples passed quality control (QC) except one case (‘BP5’) that had RINo7; it was excluded from further Technical validation analysis. For the gene-level analysis, a gene was considered the union of We selected 41 genes that were significant in the NISC1 gene-level analysis all its exons in any known isoforms, based on ENSEMBL annotation. Using for validation by digital molecular barcoding using the Nanostring nCounter HTSeq (http://www-huber.embl.de/users/anders/HTSeq/doc/overview.html), system (http://www.nanostring.com/) following the manufacturer-recom- read counts were generated for genes on the Ensemble.gtf file downloaded mended protocol. Any probes that were not unique to our genes of interest from UCSC genome browser (http://genome.ucsc.edu/). Any reads that fell were excluded before the experimental validation. Six housekeeping genes in multiple genes or in genes that mapped to the sex were were included for data normalization. excluded from the analysis. We included all genes with >5 reads in at least four BD cases and four controls in NISC1 and NISC2, which is a minimum of 20 reads per group. To minimize outlier effects, we excluded any genes with Gene expression microarrays ≥ variance 1 in the log2 reads per million values within cases or controls. Microarray gene expression profiling was performed using Illumina HumanHT-12_V4 chips (Illumina) on independent postmortem brain Transcript-level analysis samples composed of 30 BD cases and 30 psychiatrically healthy controls (RIN ⩾ 7) from the "array collection" of the Stanley Medical Research The reads from NISC1 and NISC2 were mapped to the human genome Institute, Rockville, MD, USA. build 19 Ensemble transcriptome using Bowtie (Bowtie version 0.12.8; http://bowtie-bio.sourceforge.net/index.shtml).10 Read counts for each of the Ensemble transcripts were estimated by eXpress11 (http://bio.math. Microarray data analysis berkeley.edu/eXpress/overview.html) using default parameters. eXpress Microarray data were analyzed using ‘R’ (www.cran.r-project.org). The data differentiates between exons assigned to a particular isoform which it analysis flowchart and QC plots are shown in Supplementary Figure 1. Any counts uniquely. Exons that are shared are partitioned among all isoforms, gene probes that had >80% failure (that is, mean expression value ⩽7.75; based on an EM algorithm, thus avoiding double counting of reads for a Supplementary Figure 1E) in cases or controls were removed, leaving 11 shared exon. Only those transcripts that map to autosomes were included 943 probes quantifying 8969 genes for analysis. Only autosomal genes in the analysis. All transcripts with read counts >5 in at least four BD cases were included in the analysis. After QC, quantile normalized data from 22 and four controls in NISC1 and NISC2 were included. cases and 26 controls were analyzed for differential expression by analysis of covariance correcting for differences in age, gender, race, postmortem Differential expression interval, brain pH and refrigerator index. Count data for genes and transcripts were analyzed in R using DESeq (http://www-huber.embl.de/users/anders/DESeq/) software package12 with Quantitative PCR validation of microarray data default settings. Normalization was done within DESeq. NISC1 and NISC2 Fifteen DE genes from the microarray analysis were validated using samples were analyzed separately because of technical differences in quantitative PCR on a Roche Lightcycler480 (Roche Applied Science, sequencing methods (see above). Indianapolis, IN, USA) using a standard protocol. The primer sequences used in this experiment are listed in Supplementary Table 2. The raw data was normalized to the geometric mean of the two housekeeping genes Meta-analysis that met the criteria of no difference between the groups under fi There was a highly signi cant correlation in log2 fold change values of examination. All measurements were performed in triplicate and the genes that passed quality control in both NISC1 and NISC2 (r = 0.3, resulting data were analyzed using the ΔCt method. Po0.0001; Figure 1). Thus, to improve the statistical power to detect disease-related differences in these samples, meta-analysis was performed. P-values obtained from DESeq were converted to Z-scores and combined Functional analysis across NISC1 and NISC2 using the Liptak–Stouffer method,13 taking into DE genes and transcripts with meta P-valueo0.05 were subjected account the direction of the fold change and weighting by sample size. to functional annotation in DAVID (http://david.abcc.ncifcrf.gov/).14,15 All The total weighted Z-score was then corrected to a mean of 1 and con- genes and transcripts in the RNA-seq data that passed QC were used verted to a two-tailed P-value according to the standard normal as background in DAVID. We used high stringency criteria to select the distribution. significantly enriched Gene Ontology (GO) categories. Functional

Molecular Psychiatry (2014), 1179 – 1185 © 2014 Macmillan Publishers Limited Brain transcriptome in bipolar disorder N Akula et al 1181

Chromosome 3

RNA-seq Gene-level

Microarray

RNA-seq Transcript-level

GWAS

Ion binding

Cell development

Figure 2. A circos plot showing diferentially expressed (DE) genes and transcripts in RNA-sequencing (RNA-seq) and microarray data, direction of expression, Gene Ontology (GO) term categories and chromosomal location. Each circle from the periphery to the core represents the following: chromosomal location; in yellow background: 1225 DE genes in RNA-seq; in gray background: 332 DE genes in microarray that overlap with RNA-seq (gene and/or transcript level); in yellow background: 2776 DE transcripts from RNA-seq, with red indicating upregulation and blue indicating downregulation at nominal Po0.05, and the height of the bars represents relative expression levels; in gray background: genome-wide association studies (GWAS) single-nucleotide polymorphisms ( SNPs)18 linked to DE genes and transcripts; at the core, green radii connect genes involved in ion binding and black radii connect genes involved in regulation of cell development.

categories with a Benjamini q-value o5% were considered to be the reads in NISC1 and NISC2, respectively, mapped to the UCSC significantly enriched. To confirm the results in a larger sample, the human genome build 19. For downstream analysis, we included a functional analysis was repeated after addition of DE genes (with fold total of ~1.25 billion properly paired reads in NISC1 and ~3.3 change >1.25) identified in the microarray experiment. billion properly paired reads in NISC2 (Supplementary Table 3).

Genome-wide association study enrichment testing Basic statistics and principal component analysis To identify genes that may have a causal role in BD, we performed fi genome-wide association study (GWAS) enrichment analysis, similar in The rst three principal components represent 64% and 67% of principle to some published approaches.16,17 SNPs from a published GWAS the variability in NISC1 and NISC2, respectively. When principal meta-analysis study18 were pruned for linkage disequilibrium (r2 o0.5) by component 2 was plotted against principal component 3, there PLINK (http://pngu.mgh.harvard.edu/ ~ purcell/plink/).19 For each gene was a significant separation between the BD and healthy control within an enriched GO term category, we assigned SNPs that mapped groups in NISC1; similar, but nonsignificant, separation was within 110 kb upstream of the transcription start site and 40 kb down- observed in NISC2 (Supplementary Figure 2). This analysis 20 stream of the 3′ untranslated region. We then determined the proportion indicates that overall gene expression patterns differ between o 18 of SNPs with a nominal association P-value 0.05 in the GWAS analysis. cases and controls but does not point to specific genes. Raw count We used 10 000 random draws from the GWAS data (with replacement) to data for all genes (N = 25 017) in each specimen are shown in determine the number of draws that included at least as many nominally associated SNPs as in the nonrandom set, providing an Supplementary Figures 3 and 4. empirical GWAS enrichment P-value. To manage multiple comparisons, we excluded from this analysis Differential expression in BD enriched GO term categories that overlapped by >80% at the gene level with other enriched GO term categories containing more genes. For The high-sequencing depth in RNA-seq permitted the analysis of example, genes assigned to the GO term categories ‘phospholipase C expression at two biological levels, genes and transcripts. At the activity (12 genes)’, ‘homophilic cell adhesion (35 genes)', ‘potassium ion gene level, all the exons assigned to multiple isoforms of a gene binding (36 genes)’, ‘cation binding (652 genes)’ and ‘metal ion binding were included in the analysis. At the transcript level, exons (647 genes)’ overlap >80% at the gene level with those assigned to ‘ion belonging to a particular spliced isoform of a gene were included. binding (669 genes).’ So instead of testing all six categories, we performed In total, 25 017 genes and 182 884 transcripts were analyzed GWAS enrichment solely on ‘ion binding.' The final reported enriched (Supplementary Table 4). Figure 2 summarizes the overall results terms may not cover all important functional information, but this strategy in a circos plot,21 and our analysis pipeline is presented in strikes a balance between more comprehensive results and loss of power from multiple testing. Supplementary Figures 5a and 5b.

Gene-level analysis: DE genes RESULTS Meta-analysis identified five DE genes at false discovery rate (FDR) Read depth and mapping o5%: LINC00173, Prominin 1 (PROM1), ATP-binding cassette-sub- A total of ~7.5 billion reads were generated in NISC1 and NISC2 family G-member2 (ABCG2), Alpha-2-macroglobulin (A2M) and (an average of 272 million reads/specimen in NISC1 and 419 Friend leukemia virus integration 1 (FLI1) (Table 1). Except for million reads/specimen in NISC2). Approximately, 96% and 87% of LINC00173, a long noncoding RNA (lncRNA), all of these genes

© 2014 Macmillan Publishers Limited Molecular Psychiatry (2014), 1179 – 1185 Brain transcriptome in bipolar disorder N Akula et al 1182

Table 1. Differentially expressed genes and transcripts identified by RNA-seq

Gene symbol Ensemble accessiona NISC1 NISC2 Meta FDR P-value

b b b b Base mean Base mean log2 fold P-value Base mean Base mean log2 fold P-value controls cases change controls cases change

LINC00173 ENSG00000196668 210 306 0.5 2.88E-02 259 564 1.1 8.33E-06 1.23E-06 0.02 PROM1 ENSG00000007062 546 348 − 0.6 2.18E-03 811 482 − 0.8 6.08E-04 4.07E-06 0.03 ABCG2 ENSG00000118777 2687 2046 − 0.4 4.73E-02 3979 2137 − 0.9 3.24E-05 5.59E-06 0.03 A2M ENSG00000175899 17 572 13 187 − 0.4 1.31E-01 15 104 8451 − 0.8 1.74E-05 1.25E-05 0.04 FLI1 ENSG00000151702 576 453 − 0.3 1.16E-01 915 488 − 0.9 2.22E-05 1.29E-05 0.04 OSBPL3-203 ENST00000431825 27 85 1.6 3.46E-03 80 402 2.3 2.14E-06 3.02E-08 0.001 RQCD1-202 ENST00000542068 710 638 − 0.2 9.28E-01 1308 204 − 2.7 1.62E-10 2.28E-07 0.006 KLHDC2-004 ENST00000554589 766 428 − 0.8 2.79E-02 817 81 − 3.3 3.08E-06 4.25E-07 0.007 TC2N-004 ENST00000340892 144 74 − 1.0 3.61E-02 479 107 − 2.2 3.50E-06 6.52E-07 0.008 COX7A2-008 ENST00000509698 27 374 21 207 − 0.4 2.80E-01 32 519 346 − 6.6 2.79E-07 1.90E-06 0.019 GANAB-016 ENST00000529737 195 46 − 2.1 2.55E-03 1067 196 − 2.4 2.89E-04 2.40E-06 0.020 SRSF5-016 ENST00000553548 5695 3073 − 0.9 4.12E-01 5582 1361 − 2.0 1.70E-07 2.85E-06 0.020 RFX4-001 ENST00000229387 84 414 2.3 1.11E-04 20 237 3.6 3.51E-03 3.34E-06 0.021 IMPDH1-002 ENST00000348127 1232 865 − 0.5 1.87E-01 253 56 − 2.2 2.93E-06 5.66E-06 0.028 MMS19-006 ENST00000370782 1917 811 − 1.2 9.48E-04 1895 757 − 1.3 1.36E-03 5.31E-06 0.028 RUFY1-001 ENST00000319449 60 64 0.1 9.86E-01 24 210 3.1 1.65E-08 6.16E-06 0.028 BCAR1-006 ENST00000535626 39 110 1.5 8.49E-03 13 70 2.4 3.60E-04 9.55E-06 0.039 Abbreviations: ABCG2, ATP-binding cassette-sub-family G-member2; FDR, false discovery rate; FLI1, Friend leukemia virus integration 1; GANAB, glucosidase alpha neutral AB; NISC, National Institutes of Health Intramural Sequencing Center; OSBPL3, oxysterol binding protein-like 3; PROM1, Prominin 1; RFX4, regulatory factor X4; RNA-seq, RNA-sequencing. aENSG/ENST = ensemble gene accession/ensemble transcript accession. bBase mean = number of reads/ normalization constant of the sample.

Figure 3. (a, b) Volcano plots generated by meta-analysis of differential gene expression at the level of gene and transcript. The x axis shows the weighted average meta log2 fold change across both National Institutes of Health (NIH) Intramural Sequencing Center (NISC)-1 and NISC2 samples. The meta-analysis P-value (-log base 10) for differential expression is plotted on the y axis. At a nominal meta-analysis P-valueo0.05, downregulated genes (blue dots) and upregulated genes (red dots) are represented. The larger dots indicate genes DE at false discovery rate (FDR) o5% (corresponding to a P-value o0.000012). All other genes that passed quality control are shown as gray dots.

were downregulated in BD cases relative to healthy controls unique genes had a meta-analysis P-value o0.05 (Supplementary (Supplementary Figure 6). A total of 1225 genes were shown to Table 5). Figure 3b depicts all transcripts that passed QC along have a nominal meta-analysis P-value o0.05 (Supplementary with log2 fold changes and meta-analysis P-values. Table 5). A volcano plot of log2 fold change versus the meta- analysis P-value for all genes that passed quality control is presented in Figure 3a. Heatmaps and parallel plots for each Gene- and transcript analysis overlap sample are presented in Supplementary Figures 7 and 8, and the Transcript-level analysis of 1225 DE genes revealed that 358 genes corresponding genes are shown in Supplementary Tables 6 and 7. have DE isoforms (Supplementary Table 5). This overlap between gene- and transcript-level analyses was highly significant (hyper- geometric P-value o10 − 16) with an odds ratio of 3.85. A range of Transcript-level analysis: DE transcripts 1–5 DE isoforms were observed per gene. This illustrates the Overall, 12 transcripts were DE at FDR o5% in the meta-analysis degree of complexity in regulation of the brain transcriptome as (Table 1 and Supplementary Figure 9), of which eight trans- revealed through RNA-seq. cripts were downregulated. Supplementary Figures 10 and 11 represent the heatmaps and parallel plots of all transcripts with P-value o0.05 in NISC1 and NISC2, respectively. The DE trans- Technical validation of RNA-seq data cripts with Po0.05 in each sample are listed in Supplementary The correlation coefficient for log2 fold change between RNA-seq Tables 8 and 9. Overall, 2776 DE transcripts encoded by 2280 and Nanostring was 0.72 (Po0.0001), indicating a good

Molecular Psychiatry (2014), 1179 – 1185 © 2014 Macmillan Publishers Limited Brain transcriptome in bipolar disorder N Akula et al 1183 agreement. A scatterplot of normalized counts from RNA-seq versus Nanostring is shown in Supplementary Figure 12. a a a -value GWAS

P Microarray results and comparison with RNA-seq enrichment re-sampling We compared the RNA-seq (gene- and transcript level) results with those obtained in a gene expression microarray study of an independent sample of 22 BD cases and 26 controls. Out of 8969 -value

q genes that passed QC by RNA-seq, 1357 genes were DE in the Benjamini microarray data by RNA-seq (Supplementary Table 10). Of the 7579 genes that passed QC in both RNA-seq (gene- and

-value transcript level) and microarray studies, 1608 genes were DE P

Enrichment (Po0.05) by RNA-seq and 1 342 genes were DE (Po0.05) by microarray analysis. A total of 332 genes were DE in both studies (Supplementary Table 10), a modest but highly significant overlap

Fold across samples and methods (odds ratio = 1.3; hypergeometric − 4 enrichment P-value = 2.61 × 10 ; correlation in log2 fold change, r = 0.42, GO functional enrichment GWAS − 0.05. P-value o6.66 × 10 16). Of the overlapping DE genes, 171 were o

P consistently downregulated and 92 were consistently upregu- 2794 2.4 1.8 5.90E-05 3.34E-08 4.10E-03 9.78E-06 0.0029 0.012 199 1.6 6.56E-12 5.05E-10 0.0015 Gene

count lated; 69 genes were inconsistent between RNA-seq and micro- array results. Filtering microarray results at a fold change of >1.25 (one s.d. of the fold change distribution in cases and controls)

a a a resulted in 70 genes (Supplementary Table 10) which gave even

value more significant gene overlap (odds ratio = 2.8; hypergeometric GWAS

P- − 10 enrichment

re-sampling P = 1.53 × 10 ) with RNA-seq data. These results provide additional validation for the RNA-seq data and independent replication for a significant subset of the findings. value q- Benjamini Quantitative PCR results Of the 15 DE genes in microarray data validated by quantitative cant at Sidak multiple-test corrected

fi PCR, 13 genes were concordant in direction of fold change with -value P

Enrichment the microarray data and 2 genes were discordant (Pearson Signi a correlation = 0.83, P-value = 0.00012; Supplementary Table 11).

Fold GO analysis enrichment GO functional enrichment GWAS Functional enrichment analysis was performed on 1225 DE genes fi 23 2.4 1.34E-04 7.03E-03 0.644and 25 2776 2.4 DE transcripts. 6.99E-05 4.32E-03 0.692 This analysis revealed signi cant Gene count enrichment for 16 GO categories at a Benjamini q-value o5% (Table 2). We found >5-fold enrichment in ‘transmembrane receptor protein phosphatase activity’; >2-fold enrichment in ‘regulation of transmission of nerve impulse’, ‘GTPase binding’, ‘regulation of cyclic nucleotide metabolic process’ and ‘cell part morphogenesis’. These results successfully replicated when we added the DE genes from the microarray experiment (Table 2).

GWAS enrichment analysis In order to distinguish between potentially causal genes and those whose differential expression may be a consequence of BD or its treatment, we tested genes assigned to enriched GO categories for enrichment of nominally associated SNPs in a recent GWAS of BD.18 We detected significant enrichment of small association P-values for SNPs near genes assigned to five GO categories: ‘cell fraction', ‘GTPase binding’, ‘nucleoside-triphosphatase regulator activity’, ‘ion binding’ and ‘passive transmembrane transporter activity’ (Table 2). The remaining categories showed no GWAS enrichment. The GWAS enrichment results were strengthened by addition of genes identified in the microarray experiment. All the metabolic process GO categories that showed GWAS enrichment in the RNA-seq data were also significantly enriched after addition of the microarray data (Table 2).

DISCUSSION Functional and GWAS enrichment of differentially expressed genes and transcripts To our knowledge, this is the first RNA-seq based profiling of the brain transcriptome in BD. The results demonstrate that a number GO:0000267 Cell fraction 188 1.6 4.09E-10 2.80E-08 0.002 GO:0032990GO:0051969GO:0006163GO:0060284GO:0051674 Cell part morphogenesis Regulation of transmission of Purine nerve nucleotide impulse metabolic process Regulation of cell development Localization of cell 33 2.5 58 1.67E-06 62 41 2.50E-04 1.7 5.47E-05 2.2 0.838 1.9 3.94E-03 5.28E-09 1.53E-04 54 1.78E-06 0.907 7.78E-03 35 0.742 1.6 0.922 59 2.6 5.06E-04 1.86E-02 65 42 4.29E-07 1.7 0.994 7.12E-05 2.2 1.8 8.25E-05 0.714 4.87E-03 55 1.35E-09 1.65E-04 4.67E-07 0.907 7.91E-03 1.6 0.719 0.924 6.85E-04 2.28E-02 0.995 GO number GO termGO:0051020GO:0060589GO:0043167GO:0022803GO:0019725GO:0043067/GO:0010941GO:0031982 Regulation GTPase ofGO:0019198 binding programmed cell Nucleoside-triphosphatase death/GO:0031328 regulator cell activity death IonGO:0030799/GO:0006140 binding Passive transmembrane transporter Regulation activity Cellular of homeostasis cyclic nucleotide metabolic process/nucleotide Vesicle Transmembrane receptor protein phosphatase Positive activity regulation of 133 cellular biosynthetic process 93 1.4 11 80 7.71E-05 1.8 5.02E-03 103 1.30E-08 8.9 1.6 4.64E-06 0.086 RNA-seq gene- and 2.92E-08 1.4 transcript 4.71E-05 level 5.93E-06 0.010 3.53E-03 27 1.86E-04 142 9.01E-03 0.532 0.037 86 2.4 669 0.643 1.4 1.5 3.32E-05 11 83 1.2 2.78E-03 2.13E-04 1.36E-05 107 2.65E-08 1.01E-02 1.13E-03 126 0.003 6.28E-06 8.6 1.6 0.065 0.043 1.4 1.3 0.03 4.04E-08 3.06E-05 9.87E-06 1.53E-03 2.98E-03 1.37E-04 Microarray 2.96E-02 89 7.01E-03 687 0.533 0.035 0.34 0.678 1.5 1.2 133 1.87E-04 6.42E-08 8.39E-03 1.34E-05 1.3 0.07 0.04 3.56E-04 7.67E-03 0.218 Abbreviations: GWAS, genome-wide association studies; RNA-seq, RNA-sequencing; GO, Gene Ontology. of genes and transcripts are DE in brain tissue of people with BD. Table 2. Some of these genes were previously implicated in genetic

© 2014 Macmillan Publishers Limited Molecular Psychiatry (2014), 1179 – 1185 Brain transcriptome in bipolar disorder N Akula et al 1184 association or gene expression studies of BD.22–30 The DE genes are dynamic range. We captured some genes and transcripts significantly enriched for several GO categories. Genes involved in 5 expressed at ⩽100 reads and some with ⩾30 000 reads, a ~300- of these functional categories are also enriched for GWAS signals, fold dynamic range. suggesting a causal role in risk for BD. These results were also The main limitations of this study derive from our necessary supported by microarray data in a larger and independent sample. reliance on postmortem brain tissue that is subject to agonal and A few individual gene- and transcript findings merit discussion. postmortem variables that are difficult to control, along with PROM1/CD133 and ABCG2 were both significantly under expressed individual differences in experience, health and medication in BD. Both genes are strongly expressed in neural stem cells and treatment.54 However, it is very unlikely that the majority of the other highly plastic tissues.31,32 This finding is consistent with the expression changes we observed can be attributed to such idea of reduced neuroplasticity in BD. Another top finding is confounding. Many of the gene-level findings were supported underexpression in BD of the FLI1/ERGB gene. Defects in FLI1 in three independent samples and by two different methods, cause significant defects in neural crest development.33 RNA-seq and microarray. Further, we carefully matched BD cases We found several lncRNAs to be DE in BD. This finding and controls on age and sex, and included only those tissue highlights a unique attribute of RNA-seq for transcriptome samples with the highest quality RNA. We also detected analysis, as lncRNAs are not detected by conventional microarray enrichment of GWAS signals in several groups of DE genes and chips. Coon et al.34 showed that lncRNAs have a role in the transcripts this is inconsistent with a secondary effect of vertebrate circadian cycle that is known to be disrupted during BD confounding variables. episodes. Previous studies have also shown that altered expres- We focused only on one brain region, chosen based on prior sion of lncRNAs may have a role in several neurological and evidence of involvement of DLPFC in BD.55 Studies of additional neuropsychiatric diseases.35–39 brain regions are needed and may point to different genes and The high depth of sequencing in RNA-seq uncovered multiple additional pathways. We also did not attempt to differentiate DE transcripts. Although, several of these are novel in BD, some among RNA derived from the distinct cell types that comprise have been previously implicated in related phenotypes. Oxysterol DLPFC. As single-cell sequencing technology advances, this will be binding protein-like 3 (OSBPL3) is disrupted by sleep deprivation in the next step for RNA-seq studies, enabling even greater precision model organisms.40,41 Glucosidase alpha neutral AB (GANAB)is than was possible here. DE in DLPFC in schizophrenia,42 and is one of several genes Our findings implicate novel genes in BD and may point to implicated in synaptic dysfunction in that disorder.43 Serine/ novel therapeutic targets. These RNA-seq data may also support arginine-rich splicing factor 5 (SRSF5) has a role in circadian additional inferences about novel genes, RNA-editing and regulation.44 Proteins encoded by regulatory factor X4 (RFX4) have allele-specific expression that we did not pursue in this study. also been shown to be involved in circadian rhythms in Methods for such analyses remain under development, and mammals,45 and the gene was implicated in BD by Glaser et al.46 larger samples than we had available may be needed. We have Differential expression of transcripts may reflect differential probably uncovered only a fraction of what these data may reveal exon usage related to alternative splicing, variable transcription about BD. We have submitted our raw data to the Stanley Medical start sites or other mechanisms. We do not present an exon-level Research Institute and Gene Expression Omnibus (GEO: GSE53239) analysis here, as the available sample size was limited and the where they are available upon request for further studies. analysis tools are still evolving rapidly. We did observe good agreement in differential expression at the gene- and transcript levels, but in some instances the gene- and transcript-level expression appeared to vary in opposite directions. Most of these ACKNOWLEDGMENTS disordered expression patterns did not replicate between NISC1 This study was supported by the Intramural Research Program of the NIMH (grant and NISC2 and may thus reflect false positives. no. ZIAMH002810 and K99MH085098). RNA from postmortem brain tissue was Differential splicing in brain is strongly influenced by RNA donated by The Stanley Medical Research Institute’s brain collection courtesy of 47 binding proteins such as Fox1 and Fox2 encoded by RBFOX1 Drs Michael B Knable, E Fuller Torrey, Maree J Webster, Serge Weis and Robert (A2BP1) and RBFOX2, respectively. Both proteins are preferentially H Yolken. Data analysis was done on the high-performance computational Biowulf expressed in brain and muscle. Deletions of RBFOX1 have been cluster at NIH. found in autism and schizophrenia,48,49 abnormal brain expression of A2BP1 has been reported in autism17 and common variation in RBFOX1 has been implicated in BD.50 We examined the overlap REFERENCES between DE transcripts in this study and predicted targets of Fox- 1 Kim S, Webster MJ. The Stanley Neuropathology Consortium integrative database: 47,51 1/Fox2 in humans. A significant minority of the DE transcripts a novel, web-based tool for exploring neuropathological markers in psychiatric we detected overlap with identified Fox1/Fox2 targets; an overlap disorders and the biological processes associated with abnormalities of those not seen with the list of DE genes. This suggests that transcrip- markers. Neuropsychopharmacology 2010; 35: 473–482. tome dysregulation in BD differs at the gene- and transcript levels, 2 Ryan MM, Lockstone HE, Huffaker SJ, Wayland MT, Webster MJ, Bahn S. Gene expression analysis of bipolar disorder reveals downregulation of the ubiquitin and that the latter is related to Fox1/ Fox2. 11 – At the pathway level, our results not surprisingly point to cycle and alterations in synaptic genes. Mol Psychiatry 2006; :965 978. 3 Konradi C, Eaton M, MacDonald ML, Walsh J, Benes FM, Heckers S. Molecular several functions important in brain development and function. evidence for mitochondrial dysfunction in bipolar disorder. Arch Gen Psychiatry On the other hand, the GWAS enrichment analysis suggested that 2004; 61:300–308. genes involved in second-messenger signaling (GTPase binding 4 Nakatani N, Hattori E, Ohnishi T, Dean B, Iwayama Y, Matsumoto I et al. Genome- and nucleoside-triphosphatase regulator activity) have a causal wide expression analysis detects eight genes with robust alterations specificto role in BD. These results are consistent with the large body of bipolar I disorder: relevance to neuronal network perturbation. Hum Mol Genet evidence implicating second-messenger systems in BD.52 GTPases 2006; 15: 1949–1962. are often highly ‘druggable’ targets, so may be particularly 5 Elashoff M, Higgs BW, Yolken RH, Knable MB, Weis S, Webster MJ et al. Meta- 31 – promising leads for the development of novel therapeutics. analysis of 12 genomic studies in bipolar disorder. J Mol Neurosci 2007; :221 243. In contrast to some previous studies, we found a strong 6 Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009; 10:57–63. correlation between the RNA-seq results and those obtained from fl 7 Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ et al. microarray analysis. This may re ect the higher precision and Transcript assembly and quantification by RNA-Seq reveals unannotated tran- 53 lower technical variability of RNA-seq data. Another advantage scripts and isoform switching during cell differentiation. Nat Biotechnol 2010; 28: of RNA-seq is that it can capture genes expressed over a wide 511–515.

Molecular Psychiatry (2014), 1179 – 1185 © 2014 Macmillan Publishers Limited Brain transcriptome in bipolar disorder N Akula et al 1185 8 Lipska BK, Deep-Soboslay A, Weickert CS, Hyde TM, Martin CE, Herman MM et al. 32 Islam MO, Kanemura Y, Tajria J, Mori H, Kobayashi S, Hara M et al. Functional Critical factors in gene expression in postmortem human brain: focus on studies expression of ABCG2 transporter in human neural stem/progenitor cells. Neurosci in schizophrenia. Biol Psychiatry 2006; 60:650–658. Res 2005; 52:75–82. 9 Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with 33 Coles EG, Lawlor ER, Bronner-Fraser M. EWS-FLI1 causes neuroepithelial defects RNA-Seq. Bioinformatics 2009; 25: 1105–1111. and abrogates emigration of neural crest stem cells. Stem Cells 2008; 26: 10 Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient align- 2237–2244. ment of short DNA sequences to the human genome. Genome Biol 2009; 10:R25. 34 Coon SL, Munson PJ, Cherukuri PF, Sugden D, Rath MF, Moller M et al. Circadian 11 Roberts A, Pachter L. Streaming fragment assignment for real-time analysis of changes in long noncoding RNAs in the pineal gland. Proc Natl Acad Sci USA 2012; sequencing experiments. Nat Methods 2013; 10:71–73. 109: 13319–13324. 12 Anders S, Huber W. Differential expression analysis for sequence count data. 35 Millar JK, James R, Brandon NJ, Thomson PA. DISC1 and DISC2: discovering and Genome Biol 2010; 11: R106. dissecting molecular mechanisms underlying psychiatric illness. Ann Med 2004; 13 Liptak T. On the combination of independent tests. Magyar Tud. Akad. Mat. 36:367–378. Kutato Int. Kozl 1958; 3: 171–197. 36 Williams JM, Beck TF, Pearson DM, Proud MB, Cheung SW, Scott DA. A 1q42 14 Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of deletion involving DISC1, DISC2, and TSNAX in an autism spectrum disorder. Am J large gene lists using DAVID bioinformatics resources. Nat Protoc 2009; 4:44–57. Med Genet A 2009; 149A: 1758–1762. 15 Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths 37 Qureshi IA, Mattick JS, Mehler MF. Long non-coding RNAs in nervous system toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res function and disease. Brain Res 2010; 1338:20–35. 2009; 37:1–13. 38 Lin M, Pedrosa E, Shah A, Hrabovsky A, Maqbool S, Zheng D et al. RNA-Seq of 16 Wang K, Li M, Bucan M. Pathway-based approaches for analysis of genomewide human neurons derived from iPS cells reveals candidate long non-coding RNAs association studies. Am J Hum Genet 2007; 81:1278–1283. involved in neurogenesis and neuropsychiatric disorders. PLoS ONE 2011; 6: 17 Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S et al. Transcriptomic e23356. analysis of autistic brain reveals convergent molecular pathology. Nature 2011; 39 Wapinski O, Chang HY. Long noncoding RNAs and human disease. Trends Cell Biol 474: 380–384. 2011; 21:354–361. 18 Chen DT, Jiang X, Akula N, Shugart YY, Wendland JR, Steele CJ et al. Genome-wide 40 Wang H, Liu Y, Briesemann M, Yan J. Computational analysis of gene regulation in association study meta-analysis of European and Asian-ancestry samples identi- animal sleep deprivation. Physiol Genomics 2010; 42: 427–436. fies three novel loci associated with bipolar disorder. Mol Psychiatry 2013; 18: 41 Yang S, Wang K, Valladares O, Hannenhalli S, Bucan M. Genome-wide expression 195–205. profiling and bioinformatics analysis of diurnally regulated genes in the mouse 19 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D et al. PLINK: a prefrontal cortex. Genome Biol 2007; 8: R247. tool set for whole-genome association and population-based linkage analyses. 42 Glatt SJ, Everall IP, Kremen WS, Corbeil J, Sasik R, Khanlou N et al. Comparative Am J Hum Genet 2007; 81:559–575. gene expression analysis of blood and brain provides concurrent validation of 20 Veyrieras JB, Kudaravalli S, Kim SY, Dermitzakis ET, Gilad Y, Stephens M et al. High- SELENBP1 up-regulation in schizophrenia. Proc Natl Acad Sci USA 2005; 102: resolution mapping of expression-QTLs yields insight into human gene regula- 15533–15538. tion. PLoS Genet 2008; 4: e1000214. 43 Lips ES, Cornelisse LN, Toonen RF, Min JL, Hultman CM, Holmans PA et al. 21 Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D et al. Circos: an Functional gene group analysis identifies synaptic gene groups as risk factor for information aesthetic for comparative genomics. Genome Res 2009; 19: schizophrenia. Mol Psychiatry 2012; 17:996–1006. 1639–1645. 44 McGlincy NJ, Valomon A, Chesham JE, Maywood ES, Hastings MH, Ule J. Regu- 22 Goldstein I, Lerer E, Laiba E, Mallet J, Mujaheed M, Laurent C et al. Association lation of alternative splicing by the circadian clock and food related cues. Genome between sodium- and potassium-activated adenosine triphosphatase alpha iso- Biol 2012; 13: R54. forms and bipolar disorders. Biol Psychiatry 2009; 65:985–991. 45 Araki R, Takahashi H, Fukumura R, Sun F, Umeda N, Sujino M et al. Restricted 23 Chetcuti A, Adams LJ, Mitchell PB, Schofield PR. Microarray gene expression expression and photic induction of a novel mouse regulatory factor X4 transcript profiling of mouse brain mRNA in a model of lithium treatment. Psychiatr Genet in the suprachiasmatic nucleus. J Biol Chem 2004; 279: 10237–10242. 2008; 18:64–72. 46 Glaser B, Kirov G, Bray NJ, Green E, O'Donovan MC, Craddock N et al. Identification 24 Grimes CA, Jope RS. Cholinergic stimulation of early growth response-1 DNA of a potential bipolar risk haplotype in the gene encoding the winged-helix binding activity requires protein kinase C and mitogen-activated protein kinase transcription factor RFX4. Mol Psychiatry 2005; 10: 920–927. kinase activation and is inhibited by sodium valproate in SH-SY5Y cells. 47 Fogel BL, Wexler E, Wahnich A, Friedrich T, Vijayendran C, Gao F et al. RBFOX1 J Neurochem 1999; 73: 1384–1392. regulates both splicing and transcriptional networks in human neuronal devel- 25 Kim S, Choi KH, Baykiz AF, Gershenfeld HK. Suicide candidate genes associated opment. Hum Mol Genet 2012; 21: 4171–4186. with bipolar disorder and schizophrenia: an exploratory gene expression profiling 48 Szatmari P, Paterson AD, Zwaigenbaum L, Roberts W, Brian J, Liu XQ et al. analysis of post-mortem prefrontal cortex. BMC Genomics 2007; 8: 413. Mapping autism risk loci using genetic linkage and chromosomal rearrange- 26 Matigian N, Windus L, Smith H, Filippich C, Pantelis C, McGrath J et al. Expression ments. Nat Genet 2007; 39:319–328. profiling in monozygotic twins discordant for bipolar disorder reveals dysregu- 49 Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM et al. lation of the WNT signalling pathway. Mol Psychiatry 2007; 12: 815–825. Rare structural variants disrupt multiple genes in neurodevelopmental pathways 27 Perez-Santiago J, Diez-Alarcia R, Callado LF, Zhang JX, Chana G, White CH et al. A in schizophrenia. Science 2008; 320: 539–543. combined analysis of microarray gene expression studies of the human prefrontal 50 Baum AE, Akula N, Cabanero M, Cardona I, Corona W, Klemens B et al. A genome- cortex identifies genes implicated in schizophrenia. J Psychiatr Res 2012; 46: wide association study implicates diacylglycerol kinase eta (DGKH) and 1464–1474. several other genes in the etiology of bipolar disorder. Mol Psychiatry 2008; 13: 28 LeBlanc M, Kulle B, Sundet K, Agartz I, Melle I, Djurovic S et al. Genome-wide study 197–207. identifies PTPRO and WDR72 and FOXQ1-SUMO1P1 interaction associated with 51 Zhang C, Zhang Z, Castle J, Sun S, Johnson J, Krainer AR et al. Defining the neurocognitive function. J Psychiatr Res 2012; 46: 271–278. regulatory network of the tissue-specific splicing factors Fox-1 and Fox-2. Genes 29 Soria V, Martinez-Amoros E, Escaramis G, Valero J, Perez-Egea R, Garcia C et al. Dev 2008; 22: 2550–2563. Differential association of circadian genes with mood disorders: CRY1 and NPAS2 52 Niciu MJ, Ionescu DF, Mathews DC, Richards EM, Zarate CA. Second messenger/ are associated with unipolar major depression and CLOCK and VIP with bipolar signal transduction pathways in major mood disorders: moving from membrane disorder. Neuropsychopharmacology 2010; 35: 1279–1289. to mechanism of action, part II: bipolar disorder. CNS Spectr 2013; 11:1–10. 30 Vacic V, McCarthy S, Malhotra D, Murray F, Chou HH, Peoples A et al. Duplications 53 Oshlack A, Robinson MD, Young MD. From RNA-seq reads to differential of the neuropeptide receptor gene VIPR2 confer significant risk for schizophrenia. expression results. Genome Biol 2010; 11: 220. Nature 2011; 471: 499–503. 54 McCullumsmith RE, Meador-Woodruff JH. Novel approaches to the study of 31 Florek M, Haase M, Marzesco AM, Freund D, Ehninger G, Huttner WB et al. postmortem brain in psychiatric illness: old limitations and new challenges. Biol Prominin-1/CD133, a neural and hematopoietic stem cell marker, is expressed in Psychiatry 2011; 69:127–133. adult human differentiated cells and certain types of kidney cancer. Cell Tissue Res 55 Price JL, Drevets WC. Neurocircuitry of mood disorders. Neuropsychopharmacol- 2005; 319:15–26. ogy 2010; 35: 192–216.

Supplementary Information accompanies the paper on the Molecular Psychiatry website (http://www.nature.com/mp)

© 2014 Macmillan Publishers Limited Molecular Psychiatry (2014), 1179 – 1185