Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

1 TITLE: An Atlas of Chromatin Accessibility in the Adult Brain 2 Alternative: Multiregional ATAC-Seq Map of the Human Brain 3 4 John F. Fullard1,2,3,12, Mads E. Hauberg1,2,7,8,9,12, Jaroslav Bendl1,2,3, Gabor Egervari1,2,4, Maria- 5 Daniela Cirnaru5, Sarah M. Reach3, Jan Motl10, Michelle E. Ehrlich3,5,6, Yasmin L. Hurd1,2,4, 6 Panos Roussos1,2,3,11 7 8 9 1Department of Psychiatry, 2Friedman Brain Institute, 3Department of Genetics and Genomic 10 Science and Institute for Multiscale Biology, 4Department of Neuroscience, 5Department of 11 Neurology, 6Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 12 10029, USA 13 7iPSYCH, The Lundbeck Foundation Initiative for Integrative Psychiatric Research, Denmark 14 8Department of Biomedicine, 9Centre for Integrative Sequencing (iSEQ), Aarhus University, 15 Aarhus, Denmark 16 10Department of Theoretical Computer Science, Faculty of Information Technology, Czech 17 Technical University in Prague, Prague, Czech Republic 18 11Mental Illness Research, Education, and Clinical Center (VISN 2 South), James J. Peters VA 19 Medical Center, Bronx, NY, USA 20 21 12These authors contributed equally to this study 22 23 Corresponding author: 24 Dr. Panos Roussos 25 Icahn School of Medicine at Mount Sinai 26 Department of Psychiatry, Friedman Brain Institute, Department of Genetics and Genomic 27 Science and Institute for Multiscale Biology 28 1470 Madison Ave, Fl. 9, Rm 107 29 New York, NY, 10029, USA 30 [email protected] 31 32 33 Keywords: open chromatin, ATAC-seq, human postmortem brain, , 34 PPP1R1B, lncRNA 35

1 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

36 ABSTRACT 37 38 The majority of common genetic risk variants associated with neuropsychiatric disease are non- 39 coding and are thought to exert their effects by disrupting the function of cis regulatory elements 40 (CREs), including promoters and enhancers. Within each cell, chromatin is arranged in specific 41 patterns to expose the repertoire of CREs required for optimal spatiotemporal regulation of gene 42 expression. To further our understanding of the complex mechanisms that modulate 43 transcription in the brain, we utilized frozen postmortem samples to generate the largest human 44 brain and cell type-specific open chromatin dataset to date. Using the Assay for Transposase 45 Accessible Chromatin followed by sequencing (ATAC-seq), we created maps of chromatin 46 accessibility in 2 cell types (neurons and non-neurons) across 14 distinct brain regions of 5 47 individuals. Chromatin structure varies markedly by cell type, with neuronal chromatin displaying 48 higher regional variability than that of non-neurons. Among our findings is an open chromatin 49 region (OCR) specific to neurons of the striatum. When placed in the mouse, a human 50 sequence derived from this OCR recapitulates the cell-type and regional expression pattern 51 predicted by our ATAC-seq experiments. Furthermore, differentially accessible chromatin 52 overlaps with the genetic architecture of neuropsychiatric traits and identifies differences in 53 molecular pathways and biological functions. By leveraging transcription factor binding analysis, 54 we identify protein coding and long noncoding RNAs (lncRNAs) with cell-type and brain region 55 specificity. Our data provides a valuable resource to the research community and we provide 56 this human brain chromatin accessibility atlas as an online database “Brain Open Chromatin 57 Atlas (BOCA)” to facilitate interpretation. 58 59 60 61 62 63 64 65 66 67 68

2 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

69 INTRODUCTION 70 71 Within the human brain, combinational binding of transcription factors at chromatin accessible 72 cis regulatory elements (CREs), such as promoters and enhancers, orchestrates gene 73 expression in different cell types and brain regions. Understanding the role of CREs in the 74 human brain is of great interest, as the majority of common genetic risk variants associated with 75 neuropsychiatric disease affects transcriptional regulatory mechanisms as opposed to protein 76 structure and function (Maurano et al. 2012; Gusev et al. 2014; Roussos et al. 2014; Roadmap 77 Epigenomics Consortium 2015; Fullard et al. 2017). Previous efforts to map CREs in human 78 brain were limited either by their use of homogenate tissue, consisting of a mixture of markedly 79 different cell types (Maurano et al. 2012; Andersson et al. 2014; Roadmap Epigenomics 80 Consortium 2015) or focused on a single cortical region (Fullard et al. 2017). 81 82 In order to further our understanding of the role of CREs in human brain function, we sought to 83 generate a comprehensive map of open chromatin regions (OCRs). We applied ATAC-seq to 84 post-mortem nuclei extracted from two broad cell types (neuronal (NeuN+) and non-neuronal 85 (NeuN-)), isolated from 14 discrete brain regions of 5 adult individuals. Chromatin accessibility 86 varies enormously between neurons and non-neurons and both show enrichment with known 87 cell type markers. While the pattern of open chromatin in non-neurons is largely invariable, 88 significant variability in neuronal chromatin structure is observed across different brain regions 89 with the most extensive differences seen between neurons of the cortical regions, hippocampus, 90 thalamus, and striatum. We identify numerous cell-type and region-specific OCRs. Transcription 91 factor (TF) footprinting analysis infers cell-type differences in the regulation of 92 and identifies protein coding and long noncoding RNAs (lncRNA) with cell type and brain region 93 specificity. Moreover, cell- and region-specific differentially accessible OCRs are enriched for 94 genetic variants associated with neuropsychiatric traits. 95 96 Overall, our findings emphasize the importance of conducting cell-type and region-specific 97 epigenetic studies to elucidate regulatory and disease-associated mechanisms in the human 98 brain. Our data provides a valuable resource to the research community and we provide our raw 99 data, as well as genome browser tracks, to facilitate further studies of gene regulation in the 100 human brain. 101 102 103 104

3 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

105 RESULTS 106 107 Maps of chromatin accessibility in neuronal and non-neuronal nuclei across 14 brain 108 regions 109 110 To map chromatin accessibility in neuronal and non-neuronal nuclei across 14 brain regions 111 (Fig. 1 and Supplemental Fig. S1), we combined fluorescence activated nuclear sorting 112 (FANS) followed by ATAC-seq on 122 nuclear preparations obtained from 14 brain regions of 5 113 control subjects (Supplemental Table S1). We processed these data bioinformatically 114 (Supplemental Fig. S2) and multiple metrics, including genotype concordance, gender, and 115 evaluation of cell types did not indicate sample mislabeling or contamination (Supplemental 116 Fig. S3). Quality control (QC) metrics (confirmed by visual inspection of the mapped reads) led 117 to the exclusion of 7 libraries, leaving a final total of 115 libraries (Supplemental Table S2; 118 Supplemental Fig. S3A). Overall, we obtained 4.3 billion (average of 37.8 million) uniquely 119 mapped reads after removing duplicate reads (mean 24%) and those aligning to the 120 mitochondrial genome (mean 1%) (Supplemental Table S3). Samples within the same brain 121 region and cell type were very strongly correlated (Pearson correlation, r = 0.913), indicating 122 high reproducibility among the samples (Supplemental Fig. S4). In order to assess the quality 123 of our data, we compared it to 5 publically available data sets generated using more optimal 124 starting material, such as fresh tissue and cell lines (Qu et al. 2015; Corces et al. 2016; 125 Novakovic et al. 2016; Corces et al. 2017; Banovich et al. 2018) (Supplemental Fig. S5). Our 126 data compared favorably and showed the lowest fraction of mitochondrial reads and the highest 127 amount of uniquely mapped, non-duplicated, paired-end reads. 128 129 We detected an average of 73,350 and 42,942 OCRs for neuronal and non-neuronal libraries, 130 accounting for 1.05 % and 0.709% of the genome, respectively (Supplemental Fig. S6A). 131 Analysis of known neuronal and non-neuronal specific genes indicate that our data identifies 132 OCRs in a cell type specific manner (Fig. 2A-B). The neuronal OCRs were more distal to 133 transcription start sites (TSSs) compared to non-neuronal OCRs (Fig. 2C, Supplemental Fig. 134 S6B). Further, there was a high overlap of OCRs within the neuronal and non-neuronal samples 135 across the different brain regions, with more than 56.6% and 67.7% of OCRs found in two or 136 more neuronal and non-neuronal samples, respectively. In general, promoter OCRs and non- 137 neuronal OCRs were more frequently identified in multiple samples (Fig. 2D, Supplemental 138 Fig. S7). Jointly, these findings suggest higher regional variability of OCRs and more distal 139 regulation of genes in neurons compared to non-neurons. 140

4 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

141 Cell type and regional differences in chromatin accessibility 142 143 To quantitatively analyze differences among cell types and brain regions, we generated a 144 consensus set of 300,444 OCRs by taking the union of peaks called in the individual cells/brain 145 regions (Methods). We next quantified how many reads overlapped each. Covariate analyses 146 (Methods) reveal that, besides cell type and brain region, fraction of reads within peaks (FRiP) 147 explains a large proportion of variation in our data (Supplemental Fig. S8). After covariate 148 correction, all variables besides cell type and brain region, explained < 1% of variance. t-SNE 149 based clustering using the adjusted read counts clearly separated neuronal from non-neuronal 150 samples (Fig. 2E). In addition, we also observed a more modest separation among neuronal 151 samples into neo- and sub-cortical regions (hippocampus, striatum and thalamus), indicating 152 that regional differences are more prominent in neurons. 153 154 To further assess differences in cell-type and/or brain-region, we performed pair-wise 155 comparisons among all samples and quantified the level of statistical significance based on the 156 proportion of true tests, pi1. The pi1 (which equals to 1-pi0), is an estimate of the fraction of 157 OCRs that are differentially accessible between two groups; “1” corresponds to all OCRs 158 estimated to have differential accessibility whereas “p0” corresponds to none of the OCRs 159 having differential accessibility. This yielded results comparable to the t-SNE based clustering: 160 among the pairwise comparisons, those between neuronal and non-neuronal cells (inter-cell 161 type comparison) showed a large pi1 (median = 0.59, standard deviation [SD] = 0.10). For the 162 intra-cell type comparisons, there was, on average, a higher pi1 among pairs of neuronal 163 samples than pairs of non-neuronal samples (median = 0.27, SD = 0.19 vs. median = 0.064, SD 164 = 0.10) (Fig. 2F). Furthermore, multidimensional scaling of the samples, based on the pi1 165 estimates as the distance metric, showed a clear distinction between neurons and non-neurons 166 in the first dimension (Fig. 2G). Here, the neuronal samples displayed distinct clustering among 167 different regions of the brain in the second dimension. Within the neocortex, the primary visual 168 cortex has the most unique profile. The hippocampus and amygdala clusters showed a more 169 similar profile with the neocortical regions when compared to the mediodorsal thalamus and 170 striatum (putamen and nucleus accumbens). Convincingly, these findings are in agreement with 171 those identified by gene expression analysis of homogenate tissue (Kang et al. 2011) and 172 suggest a significant neuronal contribution to the regional variability described in the previous 173 study (Kang et al. 2011). 174 175 To define cell-type and brain region specific OCRs, we next performed differential chromatin 176 accessibility analysis. For the brain region analysis, we only considered neuronal samples, due

5 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

177 to the comparably minor variance seen in non-neuronal samples. The cell type analysis 178 identified 221,957 neuronal and 46,299 non-neuronal differential OCRs at false discovery rate 179 (FDR) of 5% (Supplemental Figs. S9 and S10A; Supplemental Table S4). Regional analysis 180 identified neuronal OCRs specific to neocortex (61,410), primary visual cortex (22,248), 181 hippocampus (11,535), mediodorsal thalamus (42,560), and striatum (97,707) at FDR 5% 182 (Supplemental Figs. S9 and S10B; Supplemental Table S4). Due to the complementary 183 nature of the two approaches, in the following sections we used these cell- (neuronal and non- 184 neuronal) and region- (neocortex, primary visual cortex, hippocampus, striatum, and 185 mediodorsal thalamus) specific OCRs in parallel with all OCRs identified in each brain region 186 and cell type. 187 188 189 Overlap with existing epigenomic annotations, cell/region specific genes, and biological 190 processes 191 192 We compared the OCRs with existing epigenetic data from the Roadmap Epigenomics Mapping 193 Consortium (REMC) (Ernst and Kellis 2015; Roadmap Epigenomics Consortium 2015), 194 considering both DNase-seq (Supplemental Fig. S11) and chromatin states (Supplemental 195 Fig. S12) from homogenate brain tissue, brain derived cells and non-brain tissues (referred as 196 “Other”). Overall, we identified a higher overlap between our OCRs and REMC brain related 197 DNase-seq and active chromatin states. Notably, we saw a comparatively higher overlap with 198 non-neuronal specific OCRs than those of neurons (Fig. 3A). This may be an indication that 199 many neuron-specific regulatory elements are not captured when studying homogenate tissue 200 due to an abundance of non-neurons relative to neurons. 201 202 To examine the overlap with cell- and region-specific genes, as well as genes involved in 203 various biological processes, we next employed the approach from GREAT (McLean et al. 204 2010) (Methods). Using cell-type specific genes (Zhang et al. 2014; Zeisel et al. 2015), we 205 identified an overlap between neuronal OCRs and genes of pyramidal cells and interneurons, 206 whereas non-neuronal OCRs overlapped with oligodendrocyte and astrocyte specific genes 207 (Supplemental Fig. S13). 208 209 Similarly, we explored the overlap with genes displaying region-specific expression profiles 210 (Hawrylycz et al. 2012) (Supplemental Fig. S14) and show that region-specific OCRs 211 overlapped predominantly with genes expressed in the same brain region. While this analysis 212 describes high order enrichment of OCRs with region specific genes, regional and cell type

6 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

213 specificity of chromatin accessibility is readily visualized at the gene level. As representative 214 examples, we consider genes with preferential expression in cortical regions (SATB2, GJD4, 215 STX1A and CALHM1), mediodorsal thalamus (CHRNA2 and PLCD4) and striatum (DRD2, 216 ADORA2A and RGS9) (Supplemental Fig. S15). 217 218 Finally, we agnostically examined the overlap with biological processes and pathways (Fig. 3B, 219 Supplemental Fig. S16). In this analysis, neuron specific OCRs overlapped ion channels and a 220 range of brain related functions, whereas non-neuronal OCRs overlapped with terms relating, 221 among others, to the NOTCH pathway, gliogenesis, and ensheathment of neurons. 222 223 224 Overlap of open chromatin with neuropsychiatric traits 225 226 We used an LD-score partitioned heritability approach (Finucane et al. 2015) to assess the 227 overlap of OCRs with genetic variants associated with 15 neuropsychiatric and unrelated traits. 228 We found significant enrichment only for neuropsychiatric traits (Fig. 4A, Supplemental Fig. 229 S17; Supplemental Table S5). For the cell and region-specific OCRs, for example, neuronal- 230 and striatal-specific OCRs were enriched for schizophrenia associated variants, whereas 231 neocortical- and striatal-specific OCRs were enriched for variants correlated with educational 232 attainment. Further exploration of OCRs identified in each brain region and cell type showed 233 that neuronal OCRs in hippocampus, nucleus accumbens and superior temporal cortex provide 234 the most significant enrichment with schizophrenia risk variants (Fig. 4B). These finding are in 235 agreement with a recent study highlighting striatal medium spiny neurons and hippocampal C1A 236 pyramidal neurons in schizophrenia (Skene et al. 2018) and DRD2, an antipsychotic drug target, 237 being highly expressed in medium spiny neurons (Schizophrenia Working Group of the 238 Psychiatric Genomics Consortium 2014; Skene et al. 2018). By applying the LD-score 239 partitioned heritability approach to OCRs from homogenate brain or other tissues, we observed 240 the strongest enrichment of schizophrenia risk variants with neuronal ATAC-seq and 241 homogenate fetal brain OCRs (Supplemental Fig. S18; Supplemental Table S5), which is 242 consistent with the neurodevelopmental hypothesis of schizophrenia (Rapoport et al. 2012). 243 244 245 Classifying brain sample epigenomes using machine learning 246 247 We applied a support vector machine approach to identify OCR signatures that predict cell type 248 and brain region in ATAC-seq samples of unknown origin (Supplemental Table S6). For 249 accurate classification of cell type (neuron vs. non-neuron) alone, cell type and

7 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

250 cortical/subcortical regions, and cell type and five different brain regions defined based on the 251 differential chromatin accessibility analysis, signatures of 3, 115 and 252 OCRs were needed, 252 respectively (Supplemental Figs. S19A-D and S20; Supplemental Tables S7 and S8). To 253 corroborate our finding, we tested our models on independent ATAC-seq data sets (Egervari et 254 al. 2017; Fullard et al. 2017). Here, the cell type classifier (3 OCR signature) achieved perfect 255 accuracy in distinguishing neuronal and non-neuronal samples (Supplemental Fig. S19E; 256 Supplemental Table S7). The cell type and cortical/subcortical classifier (115 OCR signature) 257 attained an overall accuracy of 90% (Supplemental Fig. S19F; Supplemental Table S7). 258 However, the classification of non-neuronal samples into cortical and subcortical groups 259 seemed more challenging, yielding an accuracy of 86% versus an accuracy of 96% obtained for 260 neurons. This difference was also evident using the cell type and five brain region model (252 261 OCR signature). While overall accuracy using the validation data set attained 85% 262 (Supplemental Fig. S19F; Supplemental Table S7), neuronal and non-neuronal subgroups 263 were classified with accuracies of 92% and 79%, respectively. The difficulties in classifying the 264 brain region of non-neuronal samples mirror the previously observed small inter-region 265 differences in MDS clustering and Pi1 estimates, and provide evidence for lesser regional 266 variability among non-neuronal cells. 267 268 Regulatory effects of transcription factor binding analysis on gene expression 269 270 To explore gene regulation in the brain, we performed footprinting analysis using PIQ 271 (Sherwood et al. 2014) to infer transcription factor (TF) binding within the OCRs for cell-type and 272 region, independently. This approach utilized 431 TF motifs representing 807 TFs aggregated 273 from a meta-database (Weirauch et al. 2014) (Methods). We estimated a regulatory score for 274 the impact of each TF on gene expression by weighing each TF binding site by the probability of 275 that site being bound and the distance to the TSS. We found the overall regulatory score of a 276 gene (sum of regulatory scores across all TFs for a given gene) to correlate markedly with gene 277 expression (range of Sperman’s rho: 0.318 - 0.523) (Supplemental Fig. S21), which is greater 278 than the null (estimated based on permutation analysis: mean = -1.1 Ɛ 10-4, 95% CI range = - 279 2 Ɛ10-4; -1.5 Ɛ10-5). The correlation is higher for brain-derived expression compared to whole 280 blood and this difference is more prominent in neuronal ATAC-seq libraries (Supplemental Fig. 281 S21A). 282 283 We next explored cell type (neuronal and non-neuronal) and brain region (cortical and 284 subcortical) regulatory differences among samples at the gene level (protein coding, lncRNA 285 and miRNA) by using a regulatory divergence score. This score takes into account both the

8 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

286 difference in regulatory burden between samples and the regulatory divergence (Qu et al. 2015) 287 (defined as one minus the correlation of the gene regulation) (Methods). 288 289 For protein coding genes, this approach highlighted, among others, HOOK1 and KCNB2 for 290 neuronal cells and SOX8 and HES1 for non-neuronal cells (Fig. 5A). The top 500 most 291 neuronal and top 500 most non-neuronal genes were enriched in biologically relevant pathways. 292 Similar analysis identified multiple protein coding genes, including PPP1R1B, DRD2, and 293 CACNG4 for subcortical neurons, and NRN1 and SERTM1 for cortical neurons (Fig. 5A). 294 PPP1R1B had the twelfth highest subcortical regulatory divergence score. OCRs in the 295 promoter and upstream enhancer region of PPP1R1B are only present in neurons of the 296 striatum (Fig. 5B). PPP1R1B (also known as DARPP-32; dopamine and cyclic AMP-regulated 297 phosphoprotein) shows high expression in the dopaminoceptive medium spiny projection 298 neurons (MSNs) of the striatum. Although frequently used as a marker of MSNs, PPP1R1B is 299 actually widely active throughout the forebrain, and in the Purkinje cells of the 300 (Ouimet et al. 1984; Brene et al. 1994). To validate the function of the PPP1R1B regulatory 301 elements identified by ATAC-seq, we constructed a vector extending from 4.5kb upstream of the 302 TSS through the 5’ end of Exon 2 (Supplemental Fig. S22) engineered to express EGFP 303 downstream of an Internal Ribosomal Entry Site (IRES). Pronuclear injection of this transgenic 304 vector into mice yielded 9 transgene-positive animals. Histological examination of the brain at 305 two months of age showed that 7/9 expressed EGFP in the majority of dorsal and ventral MSNs, 306 and in the piriform cortex, a site of endogenous PPP1R1B (DARPP-32) expression (Fig. 5C-G). 307 None showed expression outside of these regions, including in other regions with endogenous 308 PPP1R1B expression. 309 310 We performed similar regulatory divergence analysis using a recent lncRNA gene assembly 311 (Hon et al. 2017) and identified potential differentially regulated lncRNAs with cell type (neuronal 312 and non-neuronal) and brain region (cortical and subcortical) specificity (Fig. 6A). We applied 313 the same data used to create the aforementioned assembly and confirmed the cell type and 314 brain region specificity of identified lncRNAs in closely related tissues. For example, non- 315 neuronal and subcortical lncRNAs are more abundant in expression profiles derived from white 316 matter (Neuron Projection Bundle in Fig. 6A) and striatum, respectively. Furthermore, cell type 317 and regional specificity was validated by qPCR gene expression studies for 2 lncRNAs from 318 each group (neuronal, non-neuronal, cortical and subcortical) (Fig. 6B; Supplemental Table 319 S9). Finally, we examined the regulation of microRNA genes in a similar manner 320 (Supplemental Fig. S23). This analysis identified a number of differential miRNAs, including

9 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

321 mir-124-1 for neurons, let-7a-3 for non-neurons, mir-148a for subcortical neurons and mir-3139 322 for cortical neurons. 323 324 325 326 Transcription factors underlying cell and regional differences 327 328 To infer TFs that underlie the regulatory differences between cell types and brain regions, we 329 calculated the fold-change enrichment in the corresponding peaks compared to the background 330 of all peaks (Fig. 7, Supplemental Fig. S24). As TFs within a given TF family share binding 331 motifs (Weirauch et al. 2014), it is difficult to determine those family members that are 332 biologically relevant in a given context. We note, however, that a number of studies support our 333 findings: Basic Helix-loop helix (bHLH) TFs in neurons (Lee 1997); RFX1 in neurons and the 334 hippocampus (Ma et al. 2006) and the RORA/RORB nuclear receptor TFs in the dorsal 335 thalamus (Ino 2004). In addition, a recent study has shown that neocortical expression of the 336 bHLH TFs, TWIST1 and TWIST2 may be unique to primates and that both genes have human- 337 specific expressions in the neocortex compared to macaque and chimpanzee (Sousa et al. 338 2017). Together with our finding of enriched exposure of TWIST1 and TWIST2 binding sites in 339 the human neocortex, this implicates these sites in the regulation of primate and human-specific 340 neocortical genes. 341 342 343 DISCUSSION 344 345 The generation of a cell type- and brain region-specific atlas of open chromatin enabled 346 exploration of gene regulation in the adult human brain with previously unattained detail. 347 Differential accessibility analyses and machine learning inferred cell type- and brain region- 348 specific signatures of open chromatin. Compared to non-neuronal populations, open chromatin 349 regions in neurons were found to be more extensive, to be more distal to TSS, to show a 350 smaller overlap with previously reported OCRs from bulk brain tissue, to show greater regional 351 variability and to show significant enrichment in generic risk variants of various neuropsychiatric 352 traits. Enrichment analysis highlighted an overlap of open chromatin with previously reported 353 genes showing cell type and region specific expression and further implicated cell and region- 354 specific molecular pathways. 355 356 We utilized the open chromatin patterns to infer transcription factor binding and to impute 357 downstream gene regulation and expression. Despite limitations in predicting transcription factor

10 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

358 binding, and ambiguity in subsequently linking its OCR to the gene(s) it regulates (Sherwood et 359 al. 2014; Dixon et al. 2015; Maurano et al. 2015), we found a convincing correlation with gene 360 expression studies. Using this regulatory analysis, we predicted cell and region specific protein 361 coding genes, lncRNAs, and microRNAs. As an example, we identified, and functionally 362 validated, regulatory elements of the striatal, neuronal gene PPP1R1B. In addition, we predicted 363 and experimentally validated cell type- and regional patterns of lncRNA expression. Finally, we 364 identified cell and region specific TFs based on the enrichment of their cognate binding motifs, 365 which overlap with previous literature. We acknowledge, however, that footprinting analysis 366 based on ATAC-seq data is limited due to the widespread sharing of recognition motifs between 367 TFs (Weirauch et al. 2014) and future studies using other approaches such as ChIP-seq for 368 specific TFs can complement our observations. 369 370 The most distinct brain region based on the neuronal OCRs was the striatum (putamen and 371 nucleus accumbens). An explanation for this could be that, in contrast to the other assayed 372 brain regions, the majority of neurons here are GABAergic medium spiny neurons (Kemp and 373 Powell 1971). All experiments were performed on nuclei extracted from frozen postmortem brain 374 specimens. Following thawing of the samples, the cell membrane is lost and, with it, many cell- 375 type specific antigens that would facilitate separation of different cell types by FANS. Future 376 studies targeting additional neuronal subtypes using single cell approaches or by cytometric 377 separation into secondary cell subtypes could further elucidate gene regulation across the brain. 378 379 In conclusion, our findings indicate the utility of our open chromatin atlas in studying the 380 regulation of gene expression in the brain and the impact of neuropsychiatric disease risk 381 variants. We provide to the research community an atlas of chromatin accessibility in human 382 brain as an online database “Brain Open Chromatin Atlas (BOCA)” to facilitate interpretation 383 and future studies. 384 385 386 METHODS 387 388 Brain tissue specimens from 14 brain regions of 5 controls with no history of psychiatric disorder 389 and drug use were processed using a FACSAria flow cytometer to neuronal (NeuN+) and non- 390 neuronal (NeuN-) nuclei. The Assay for Transposase Accessible Chromatin followed by 391 sequencing (ATAC-seq) was performed using an established protocol (Buenrostro et al. 2013) 392 and sequenced on HiSeq 2500 (Illumina) obtaining 2Ɛ50 paired-end reads. Reads from each 393 sample were aligned on hg19 (GRCh37) reference genome using the STAR aligner (Dobin et al.

11 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

394 2013) (v2.5.0). We excluded reads that: [1] mapped to more than one locus using SAMtools (Li 395 et al. 2009); [2] were duplicated using PICARD (v2.2.4); and [3] mapped to the mitochondrial 396 genome. We merged the BAM-files of samples from the same brain region and cell type and 397 subsampled to a uniform depth. We subsequently called peaks using the model-based Analysis 398 of ChIP-seq (MACS) (Zhang et al. 2008) v2.1, and created a joint set of peaks requiring each 399 peak to be called in at least one of the merged BAM-files. After removing peaks overlapping the 400 blacklisted genomic regions, 300,444 peaks remained. We subsequently quantified read counts 401 of all the individual non-merged samples within these peaks using the featureCounts function in 402 RSubread (Liao et al. 2014) (v.1.15.0). 403 404 We used the voomWithQualityWeights function from the limma package (Liu et al. 2015) to 405 model the normalized read counts including fraction of reads within peaks as covariates. We 406 performed differential chromatin accessibility analysis by fitting weighted least-squares linear 407 regression models for the effect of cell type (neuronal and non-neuronal) and/or brain region. P- 408 values were adjusted for multiple hypothesis testing using false discovery rate (FDR) ≤ 5%. The 409 protein interaction quantitation PIQ framework (Sherwood et al. 2014) was used to predict 410 transcription factor binding sites from the genome sequence. To integrate functional annotations 411 and GWAS results, we used the LD-score partitioned heritability (Finucane et al. 2015) 412 approach. More details are described in the Supplemental Material. 413 414 415 DATA ACCESS 416 417 The data discussed in this publication have been submitted to the NCBI Gene Expression 418 Omnibus (GEO; https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE96949) under 419 accession number GSE96949. We further provide the online database “Brain Open Chromatin 420 Atlas (BOCA)” as UCSC tracks and download links at our webpage 421 (http://icahn.mssm.edu/boca). 422 423 424 ACKNOWLEDGEMENTS 425 426 Data on coronary artery disease/myocardial infarction have been contributed by 427 CARDIoGRAMplusC4D investigators. We additionally thank the International Genomics of 428 Alzheimer's Project (IGAP) for providing GWAS summary results. The investigators within IGAP 429 contributed to the design and implementation of IGAP and/or provided data but did not

12 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

430 participate in analysis or writing of this report. IGAP was made possible by the generous 431 participation of the control subjects, the patients, and their families. The i–Select chips was 432 funded by the French National Foundation on Alzheimer's disease and related disorders. EADI 433 was supported by the LABEX (laboratory of excellence program investment for the future) 434 DISTALZ grant, Inserm, Institut Pasteur de Lille, Université de Lille 2 and the Lille University 435 Hospital. GERAD was supported by the Medical Research Council (Grant n° 503480), 436 Alzheimer's Research UK (Grant n° 503176), the Wellcome Trust (Grant n° 082604/2/07/Z) and 437 German Federal Ministry of Education and Research (BMBF): Competence Network Dementia 438 (CND) grant n° 01GI0102, 01GI0711, 01GI0420. CHARGE was partly supported by the NIH/NIA 439 grant R01 AG033193 and the NIA AG081220 and AGES contract N01–AG–12100, the NHLBI 440 grant R01 HL105756, the Icelandic Heart Association, and the Erasmus Medical Center and 441 Erasmus University. ADGC was supported by the NIH/NIA grants: U01 AG032984, U24 442 AG021886, U01 AG016976, and the Alzheimer's Association grant ADGC–10–196728. 443 Next generation sequencing was performed at the New York Genome Center. FANS was 444 performed at the Mount Sinai Flow Cytometry CoRE and mouse transgenics performed at the 445 Mount Sinai Mouse Genetics CoRE facility. 446 447 This work was supported by the National Institutes of Health (R01AG050986 Roussos, 448 R01MH109677 Roussos and R01NS100529 Ehrlich), Brain Behavior Research Foundation 449 (20540 Roussos), Alzheimer's Association (NIRG-340998 Roussos) New York State Stem Cell 450 Science (N13G-169 Ehrlich) and the Veterans Affairs (Merit grant BX002395 Roussos). This 451 study was additionally funded by The Lundbeck Foundation, Denmark (Grant number R102- 452 A9118). Further, this work was supported in part through the computational resources and staff 453 expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. 454 The funders had no role in the design and conduct of the study; collection, management, 455 analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and 456 decision to submit the manuscript for publication. 457 458 Author contributions 459 460 J.F.F., M.E.H., M.E.E., Y.H. and P.R. contributed to experimental and study design and 461 planning analytical strategies. G.E. and Y.H. dissected and provided brain specimens. J.F.F. 462 conducted FANS, ATAC-seq and generated sequencing libraries. J.F.F. and S.M.R. performed 463 the qPCR experiments. M.D.C. and M.E.E. designed and conducted the PPP1R1B 464 experiments. M.E.H. carried out primary data analyses. J.B. and J.M. carried out machine 465 learning analysis. M.E.H. and P.R. performed the transcription factor, clustering, and differential

13 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

466 accessibility analyses. P.R. conducted the qPCR analysis. J.F.F., M.E.H., J.B., M.E.E., Y.H. and 467 P.R. wrote the manuscript. All authors contributed to data interpretation and approved the final 468 version of the text. 469 470 471 DISCLOSURE DECLARATION 472 The authors declare no competing financial interests. 473 474 475 476 REFERENCES 477 478 Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, 479 Schmidl C, Suzuki T. 2014. An atlas of active enhancers across human cell types and 480 tissues. Nature 507: 455-461. 481 Banovich NE, Li YI, Raj A, Ward MC, Greenside P, Calderon D, Tung PY, Burnett JE, Myrthil M, 482 Thomas SM et al. 2018. Impact of regulatory variation across human iPSCs and 483 differentiated cells. Genome Res 28: 122-131. 484 Brene S, Lindefors N, Ehrlich M, Taubes T, Horiuchi A, Kopp J, Hall H, kan, Sedvall G, an et al. 485 1994. Expression of mRNAs encoding ARPP-16/19, ARPP-21, and DARPP-32 in human 486 brain tissue. Journal of Neuroscience 14: 985-998. 487 Buenrostro JD, Giresi PG, Zaba LC, Chang HY, Greenleaf WJ. 2013. Transposition of native 488 chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding 489 proteins and nucleosome position. Nature methods 10: 1213-1218. 490 Corces MR, Buenrostro JD, Wu B, Greenside PG, Chan SM, Koenig JL, Snyder MP, Pritchard 491 JK, Kundaje A, Greenleaf WJ et al. 2016. Lineage-specific and single-cell chromatin 492 accessibility charts human hematopoiesis and leukemia evolution. Nat Genet 48: 1193- 493 1203. 494 Corces MR, Trevino AE, Hamilton EG, Greenside PG, Sinnott-Armstrong NA, Vesuna S, 495 Satpathy AT, Rubin AJ, Montine KS, Wu B et al. 2017. An improved ATAC-seq protocol 496 reduces background and enables interrogation of frozen tissues. Nature methods 14: 497 959-962. 498 Dixon JR, Jung I, Selvaraj S, Shen Y, Antosiewicz-Bourget JE, Lee AY, Ye Z, Kim A, Rajagopal 499 N, Xie W. 2015. Chromatin architecture reorganization during stem cell differentiation. 500 Nature 518: 331. 501 Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras 502 TR. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29: 15-21. 503 Egervari G, Landry J, Callens J, Fullard JF, Roussos P, Keller E, Hurd YL. 2017. Striatal H3K27 504 Acetylation Linked to Glutamatergic Gene Dysregulation in Human Heroin Abusers 505 Holds Promise as Therapeutic Target. Biol Psychiatry 81: 585-594. 506 Ernst J, Kellis M. 2015. Large-scale imputation of epigenomic datasets for systematic 507 annotation of diverse human tissues. Nature biotechnology 33: 364-376. 508 Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y, Loh P-R, Anttila V, Xu H, Zang C, 509 Farh K. 2015. Partitioning heritability by functional annotation using genome-wide 510 association summary statistics. Nature genetics 47: 1228-1235. 511 Fullard JF, Giambartolomei C, Hauberg ME, Xu K, Voloudakis G, Shao Z, Bare C, Dudley JT, 512 Mattheisen M, Robakis NK et al. 2017. Open chromatin profiling of human postmortem 513 brain infers functional roles for non-coding schizophrenia loci. Hum Mol Genet 26: 1942- 514 1951.

14 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

515 Gusev A, Lee SH, Trynka G, Finucane H, Vilhjalmsson BJ, Xu H, Zang C, Ripke S, Bulik- 516 Sullivan B, Stahl E et al. 2014. Partitioning heritability of regulatory and cell-type-specific 517 variants across 11 common diseases. American journal of human genetics 95: 535-552. 518 Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, Miller JA, Van De Lagemaat 519 LN, Smith KA, Ebbert A, Riley ZL. 2012. An anatomically comprehensive atlas of the 520 adult human brain transcriptome. Nature 489: 391-399. 521 Hon C-C, Ramilowski JA, Harshbarger J, Bertin N, Rackham OJ, Gough J, Denisenko E, 522 Schmeier S, Poulsen TM, Severin J. 2017. An atlas of human long non-coding RNAs 523 with accurate 5′ ends. Nature 543: 199-204. 524 Ino H. 2004. Immunohistochemical Characterization of the Orphan Nuclear Receptor RORα in 525 the Mouse Nervous System. Journal of Histochemistry & Cytochemistry 52: 311-323. 526 Kang HJ, Kawasawa YI, Cheng F, Zhu Y, Xu X, Li M, Sousa AM, Pletikos M, Meyer KA, 527 Sedmak G et al. 2011. Spatio-temporal transcriptome of the human brain. Nature 478: 528 483-489. 529 Kemp JM, Powell T. 1971. The structure of the caudate nucleus of the cat: light and electron 530 microscopy. Philosophical Transactions of the Royal Society of London B: Biological 531 Sciences 262: 383-401. 532 Lee JE. 1997. Basic helix-loop-helix genes in neural development. Current opinion in 533 neurobiology 7: 13-20. 534 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 535 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078- 536 2079. 537 Liao Y, Smyth GK, Shi W. 2014. featureCounts: an efficient general purpose program for 538 assigning sequence reads to genomic features. Bioinformatics 30: 923-930. 539 Liu R, Holik AZ, Su S, Jansz N, Chen K, Leong HS, Blewitt ME, Asselin-Labat ML, Smyth GK, 540 Ritchie ME. 2015. Why weight? Modelling sample and observational level variability 541 improves power in RNA-seq analyses. Nucleic Acids Res 43: e97. 542 Ma K, Zheng S, Zuo Z. 2006. The transcription factor regulatory factor X1 increases the 543 expression of neuronal glutamate transporter type 3. Journal of Biological Chemistry 544 281: 21250-21255. 545 Maurano MT, Haugen E, Sandstrom R, Vierstra J, Shafer A, Kaul R, Stamatoyannopoulos JA. 546 2015. Large-scale identification of sequence variants influencing human transcription 547 factor occupancy in vivo. Nature genetics. 548 Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H, Reynolds AP, Sandstrom 549 R, Qu H, Brody J et al. 2012. Systematic localization of common disease-associated 550 variation in regulatory DNA. Science 337: 1190-1195. 551 McLean CY, Bristor D, Hiller M, Clarke SL, Schaar BT, Lowe CB, Wenger AM, Bejerano G. 552 2010. GREAT improves functional interpretation of cis-regulatory regions. Nat Biotech 553 28: 495-501. 554 Novakovic B, Habibi E, Wang SY, Arts RJW, Davar R, Megchelenbrink W, Kim B, Kuznetsova 555 T, Kox M, Zwaag J et al. 2016. beta-Glucan Reverses the Epigenetic State of LPS- 556 Induced Immunological Tolerance. Cell 167: 1354-1368 e1314. 557 Ouimet C, Miller P, Hemmings H, Walaas SI, Greengard P. 1984. DARPP-32, a dopamine-and 558 adenosine 3': 5'-monophosphate-regulated phosphoprotein enriched in dopamine- 559 innervated brain regions. III. Immunocytochemical localization. Journal of Neuroscience 560 4: 111-124. 561 Qu K, Zaba LC, Giresi PG, Li R, Longmire M, Kim YH, Greenleaf WJ, Chang HY. 2015. 562 Individuality and variation of personal regulomes in primary human T cells. Cell Systems 563 1: 51-61. 564 Rapoport JL, Giedd JN, Gogtay N. 2012. Neurodevelopmental model of schizophrenia: update 565 2012. Mol Psychiatry 17: 1228-1238. 566 Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A, 567 Heravi-Moussavi A, Kheradpour P, Zhang Z, Wang J et al. 2015. Integrative analysis of 568 111 reference human epigenomes. Nature 518: 317-330.

15 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

569 Roussos P, Mitchell Amanda C, Voloudakis G, Fullard John F, Pothula Venu M, Tsang J, Stahl 570 Eli A, Georgakopoulos A, Ruderfer Douglas M, Charney A et al. 2014. A Role for 571 Noncoding Variation in Schizophrenia. Cell Reports 9: 1417-1429. 572 Schizophrenia Working Group of the Psychiatric Genomics Consortium 2014. Biological insights 573 from 108 schizophrenia-associated genetic loci. Nature 511: 421-427.Sherwood RI, 574 Hashimoto T, O'Donnell CW, Lewis S, Barkal AA, van Hoff JP, Karun V, Jaakkola T, 575 Gifford DK. 2014. Discovery of directional and nondirectional pioneer transcription 576 factors by modeling DNase profile magnitude and shape. Nat Biotechnol 32: 171-178. 577 Skene NG, Bryois J, Bakken TE, Breen G, Crowley JJ, Gaspar HA, Giusti-Rodriguez P, Hodge 578 RD, Miller JA, Munoz-Manchado AB et al. 2018. Genetic identification of brain cell types 579 underlying schizophrenia. Nat Genet 50: 825-833. 580 Sousa AMM, Zhu Y, Raghanti MA, Kitchen RR, Onorati M, Tebbenkamp ATN, Stutz B, Meyer 581 KA, Li M, Kawasawa YI et al. 2017. Molecular and cellular reorganization of neural 582 circuits in the human lineage. Science 358: 1027-1032. 583 Weirauch MT, Yang A, Albu M, Cote AG, Montenegro-Montero A, Drewe P, Najafabadi HS, 584 Lambert SA, Mann I, Cook K. 2014. Determination and inference of eukaryotic 585 transcription factor sequence specificity. Cell 158: 1431-1443. 586 Zeisel A, Manchado AB, Codeluppi S, Lonnerberg P, La Manno G, Jureus A, Marques S, 587 Munguba H, He L, Betsholtz C et al. 2015. Cell types in the mouse cortex and 588 hippocampus revealed by single-cell RNA-seq. Science doi:10.1126/science.aaa1934. 589 Zhang Y, Chen K, Sloan SA, Bennett ML, Scholze AR, O'Keeffe S, Phatnani HP, Guarnieri P, 590 Caneda C, Ruderisch N. 2014. An RNA-sequencing transcriptome and splicing database 591 of glia, neurons, and vascular cells of the . The Journal of Neuroscience 592 34: 11929-11947. 593 Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum C, Myers RM, 594 Brown M, Li W et al. 2008. Model-based analysis of ChIP-Seq (MACS). Genome biology 595 9: R137. 596 597 598

16 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

599 FIGURE LEGENDS 600 601 Figure 1. Schematic outline of the study design. Dissections from 14 brain regions of 5 control 602 subjects were obtained from frozen human postmortem tissue. We combined fluorescence 603 activated nuclear sorting (FANS) with ATAC-seq, followed by downstream analyses, to identify 604 cell type-specific open chromatin regions. The brain regions and abbreviations are described in 605 Supplemental Table S2. 606 607 Figure 2. Comparisons between neuronal and non-neuronal OCRs of various brain regions. 608 (A) Representative cell type specific open chromatin tracks in the DLPFC and hippocampus at 609 known neuron specific (CAMK2A) and (B) non-neuron specific (OLIG1 and OLIG2) genes. 610 (C) Neuronal and non-neuronal OCRs show distinct distribution of genomic contexts. OCRs 611 within 3kb of a TSS were considered as promoter OCRs. 612 (D) The distribution of the number of brain regions in which a consensus OCR was found, 613 stratified by cell type and promoter/non-promoter OCRs. OCRs within 3kb of a TSS were 614 considered as promoter OCRs. 615 (E) Clustering of the individual samples (n=115) using t-SNE. Brain regions are grouped in 6 616 broad areas: amygdala (AMY), hippocampus (HIP), mediodorsal thalamus (MDT), neocortex 617 (NCX), primary visual cortex (PVC), and striatum (STR). 618 (F) Distribution of statistical dissimilarity (quantified based on the proportion of true tests, pi1) for 619 inter- and intra-cell type pairwise comparisons. Larger pi1 indicates a larger fraction of OCRs 620 estimated to be different between samples based on pairwise comparisons. 621 (G) Multidimensional scaling of brain regions and cell types (n=28) using the pi1 estimates of 622 statistical dissimilarity as distance. Same abbreviations as in E. The MDT non-neuronal group is 623 immediately adjacent to, and partly obscured by, the leftmost non-neuronal striatum group. 624 625 Figure 3. Overlap with other epigenomes and biological functions. 626 (A) Overlap between DNase-seq OCRs and promoter/primary enhancer states of 127 627 epigenomes from REMC and neuronal and non-neuronal OCRs identified by ATAC-seq. 628 Samples from REMC are split into 3 groups: brain tissue, brain derived cells and non-brain 629 tissues (referred to as “Other”). The full results for the individual REMC samples are shown in 630 Supplemental Figs. S11 and S12. 631 (B) Overlap between cell and region specific open chromatin (ATAC-seq) and gene sets 632 representing biological processes and pathways. Only those that were within the top 5 most 633 significant gene sets in one or more ATAC-seq categories are shown. Pathways were clustered 634 by the Jaccard index using the WardD method based on the overlap between the genes in the

17 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

635 different gene sets and not the enrichments. This was done to show how enrichments varied by 636 cell type and region in terms of related pathways. “#”: FDR < 0.001; “ · ”: FDR < 0.05; “Bi”: 637 BIOCARTA; “GO”: gene ontology; “KG”: KEGG; “Re”: REACTOME. In this analysis, the region- 638 specific OCRs were derived from neuronal samples only. 639 640 Figure 4. Overlap between genetic variants associated with various complex traits and 641 identified OCRs assayed using LD-score partitioned heritability. 642 (A) Overlap between cell type and region-specific OCRs and genetic risk variants of various 643 traits. The region-specific OCRs are based only on neuronal samples. 644 (B) Overlap between all OCRs identified in 14 brain regions by two cell types and schizophrenia 645 genetic risk variants. OCRs were in all cases padded with 1,000 bp to also capture adjacent 646 genetic variants. “Chronotype”: whether one is a morning or an evening person; ” · ”: Nominally 647 significant; ”#”: Significant after FDR correction of multiple testing across all traits and OCRs 648 sets. 649 650 Figure 5. Identification of cell- and region-specific regulation of protein coding genes. 651 (A) Ranking of protein coding genes based on their regulatory divergence score averaged 652 across all neuronal versus all non-neuronal samples (left), and cortical neuronal samples versus 653 subcortical neuronal samples (right). The regulatory divergence score is a combined measure 654 for the difference in the regulatory burden for each gene, multiplied by how different the 655 regulatory landscape is surrounding the gene (Methods). A gene set enrichment analysis using 656 general gene sets and the top 500 most specific genes for either cell-type/region using a one- 657 sided Fisher’s exact test was performed - the top 3 gene-sets with P-values corrected for 658 multiple testing using FDR are indicated. SOX8, AC009041.2, and LMF1 are all located in the 659 same genetic locus. 660 (B) Regional plot in the PPP1R1B locus showing OCRs. The promoter OCR and putative 661 proximal enhancer OCRs are highlighted (dashed box). 662 (C) The identified human PPP1R1B upstream OCR along with Exon1, Intron1, and the 5' end of 663 Exon2 were used to direct expression of EGFP in transgenic mice. Expression identified with 664 anti-PPP1R1B and DAB is restricted to the dorsal (dStr) and ventral striatum (vStr) 665 (dorsal>ventral) and their projections [globus pallidus (gp) and substantia nigra (sn)], and the 666 piriform cortex (pc). The black box indicates the region shown at higher magnification using 667 immunofluorescence in D-G. 668 (D) anti-EGFP (green), 669 (E) anti-PPP1R1B (DARPP-32) (red), 670 (F) DAPI (blue) and

18 Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

671 (G) a merged image. EGFP is expressed exclusively in PPP1R1B positive neurons. 672 673 Figure 6. Identification of cell- and region-specific regulation of lncRNA. 674 (A) Top ranking of lncRNA genes based on their regulatory divergence score averaged across 675 all neuronal versus all non-neuronal samples (left) and cortical neuronal samples versus 676 subcortical neuronal samples (right). The regulatory divergence score is a combined measure of 677 the difference in the regulatory burden for each gene multiplied by how different the regulatory 678 landscape is surrounding that gene (Methods). lncRNA genes were obtained from the FANTOM 679 CAT Robust category, from which only genes from the category “far from protein coding genes” 680 were retained. Genes with coding status “uncertain” were excluded. Bottom: heatmaps of 681 whether gene expression (CAGE) identified genes associated with the given anatomical 682 structure. “Neuron Projection Bundle” includes samples from the corpus callosum and the optic 683 nerve, which are depleted in neuronal nuclei. Red indicates a high gene density and blue a low 684 gene density. Numbers in parenthesis indicate the number of lncRNAs associated with the 685 ontology. 686 (B) qPCR validation of cell-type specific (left) and brain-region specific (right) lncRNA identified 687 by a regulatory divergence analysis based on ATAC-seq data. Shown are fold differences in 688 expression for neuronal (positive values) to non-neuronal (negative values) gene expression 689 (left), and cortical (positive values) to subcortical (negative values) (right). Error bars indicate 690 standard deviation. “PFC”: prefrontal cortex; “STR”: striatum; *P < 0.05; **P < 0.01; ***P < 691 0.005. 692 693 Figure 7. Top 10 Transcription factors motifs showing the highest fold enrichment of footprinted 694 binding sites within peaks specific to a given cell type or brain region, compared to all peaks. 695 The region specific TFs are based only on neuronal samples. TF motifs are grouped by TF

696 family and line width indicates the log2 transformed fold enrichment. All shown enrichments 697 were statistically significant after correcting for multiple testing in a one-sided binomial test. 698 Similar plots of TF motif enrichments stratified by genomic context are shown in Supplemental 699 Fig. S24. 700

19 Frozen postmortem tissue PMC PMC STC DLPFC ACC PUT VLPFC MDT PVC

INS

PVC OFC OFC ITC Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press ITC AMY NAc 5x HIP Cell sorting with ATAC-seq Anti-NeuN FANS non-Neurons

Alignment Neurons and QC

1. Cell and region specific open chromatin 2. Annotation and biological functions 3. Overlap with neuropsychiatric risk variants 4. Regulation of genes by transcription factors A B Scale 50 kb hg19 Scale 20 kb hg19 chr5: 149,600,000 149,650,000 chr21: 34,400,000 34,410,000 34,420,000 34,430,000 34,440,000 34,450,000 34,460,000 SLC6A7 CAMK2A ARSI OLIG2 OLIG1 CAMK2A OLIG2 CAMK2A CAMK2A 127 _ 127 _ Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press DLPFC neuronal

0 _ 0 _ 127 _ 127 _

HIPP neuronal

0 _ 0 _ 127 _ 127 _

DLPFC non-Neuronal

0 _ 0 _ 127 _ 127 _

HIPP non-Neuronal

0 _ 0 _ C D Number of Neuronal Neuronal non-Neuronal overlaps Exon & UTRs Exon & UTRs Promoter (3.81%) (3.27%) 10+ Promoter Other (24%) Distal Intergenic Distal Intergenic (25.8%) Promoter non-Neuronal (34.4%) (42.4%) Promoter

Other 1 Intron Intron (37.8%) (28.6%) 0.00 0.25 0.50 0.75 1.00

E F 1.00 G 0.4

10 pi1 ) 0.75 SNE 2 − 0.2 0 0.50 based MDS 2 − 0.25 −10 Pi1 0.0 Sample level t Sample level different accessibility ( Estimated fraction of OCRs with 0.00

−20 −10 0 10 20 Within non−Neuronal cells −0.2 0.0 0.2 0.4 Sample level t−SNE 1 (regional differences) Pi1−based MDS 1 Within Neuronal cells Cell Type Brain Region (regional differences) Cell Type Brain Region Count non−Neuronal NCX HIP STR Between different cell types non−Neuronal NCX HIP STR 1 Neuronal PVC MDT AMY (Neuronal vs non-Neuronal) Neuronal PVC MDT AMY 5 A DownloadedDNase from genome.cshlp.org on September 30, 2021Promoter - Published by Cold Spring Harbor Laboratory PressPrimary Enhancer 0.20 0.100 0.15 Brain tissue

0.15 Brain derived cells 0.075 Other 0.10 0.10 0.050

Jaccard index Jaccard 0.05 0.05 0.025

0.00 0.00 0.000 non− non− non− Neuronal Neuronal Neuronal Neuronal Neuronal Neuronal

B

Striatum Minus log P 50 Mediodorsal Thalamus 40 30 20 Hippocampus 10 Primary Visual Cortex 0 Neocortex

Neuronal

non−Neuronal

Striatum Log2 fold # # # # # # # # # · # # # # # · # # # # · # · · change Mediodorsal Thalamus · # # · # # # # · # # # · # # # · # # # # # 1

Hippocampus # # # · · # · · · · # # # · # · # 0

Primary Visual Cortex · # # # # · # # · · # # · # −1 Neocortex # # # · # · # # # · # · · # # · # # · Neuronal # # # # # # # # # # # # # # # # # # # # · # · # # # Cell and region specifc open chromatin (ATAC-seq) chromatin open specifc and region Cell non−Neuronal # # # # · # # # # # # #

Ovulation (GO) Gliogenesis (GO) Phagocytosis (GO) Action Potential (GO) Cacam Pathway (Bi)

Potassium Channels (Re) Metal Ion Transport (GO) Signaling By NOTCH (Re) Neuron Recognition (GO) Reg. of Ion Transport (GO) Reg. of Axonogenesis (GO) Potassium Ion Import (GO) Potassium Ion Transport (GO) Toll Receptor Cascades (Re) NOTCH Signaling Pathway (KG) Reg. of Cell Development (GO) Reg. of Membrane Potential (GO) Reg. of Homeostatic Process (GO) Actin Filament Based Process (GO) Reg. of Cation Channel Activity (GO)Reg. of Neurotransmitter Levels (GO) Voltage Gated Potassium Channels (Re) Multicellular Organismal Signaling (GO) Reg. of NeuronalReg. of SynapticSynapseModulation Plasticity Structure of Synaptic(GO) or Activity Transmission (GO) (GO)

NOTCH1 Intracellular Domain Regulates Transcription (Re) RAS Activation Uopn Ca2 Infux Through NMDA Receptor (Re)

Neg. Reg. of Transcription From RNA Polymerase II Promoter (GO) Reg. of Ryanodine Sensitive Calcium Release Channel Activity (GO) Biological processes and pathways Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

A B Schizophrenia Neuronal Coronary Artery Disease PMC PMC ACC Ulcerative Colitis −logP STC DLPFC 5 PUT Crohn's Disease VLPFC MDT 4 PVC Height 3 Other traits BMI 2 · INS Sleep Duration 1 · PVC Chronotype OFC OFC ITC Depressive Symptoms ·· ITC AMY NAc HIP Neuroticism ··· # · Subjective Well Being · ·· Schizophrenia non-Neuronal Education Years ·· · PMC # # PMC ACC Autism STC DLPFC PUT ·· MDT ADHD · VLPFC PVC Alzheimer's Disease · Schizophrenia

Neuropsychiatric traits # ··· # INS

PVC OFC Neuronal Striatum OFC Neocortex ITC non−Neuronal Hippocampus AMY NAc ITC HIP - logP Primary VisualMediodorsal Cortex Thalamus

0 8.4 Neuronal System (Re) 1.5E-16 Reg. of Ion Transport (GO) 1.6E-3 Cell Cell Signaling (GO) 9.4E-13 Integration of Energy Metabolism (Re) 3.2E-3 Potassium Channels (Re) 2.0E-11 Dev. Growth Involved in Morphogenesis (GO) 3.4E-3 A { {

20260 protein HOOK1 20260 protein NRN1 KCNB2 NETO1 1e+05 coding genes NRN1 coding genes NR4A2 KCNV1 MAN1A2 RALYL GAB1 PTPRR 25000 CNN3

PTCHD4 SERTM1 Mostly cortical 5e+04

C8orf34 Mostly Neuronal DDAH1 RP11−664D7.4 CYR61 GRIN2B FOXQ1

0e+00 0

HES1 ZFHX3 NKX6−2 DRD2 −5e+04 NKX2−2 AC005544.1 ZIC2 FBXL20 ZIC5 −25000 CACNG1 ARHGEF10 FHDC1 SALL1 PARD3 −1e+05 SOX8 ZIC1 AC009041.2 CACNG4 Regulatory divergence score (ATAC-seq) score divergence Regulatory Mostly non-Neuronal LMF1 ZIC4 Mostly sub-cortical { { Gliogenesis (GO) 2.3E-11 Neg. Reg. of Trans. from RNA PolII Prom (GO) 0.0054 Oligodendrocyte Diff. (GO) 3.5E-08 Generic Transc. Pathway (Re) 0.011 Neg. Reg. of Trans. from RNA PolII Prom (GO) 9.3E-07 Morphogenesis of a Branching Structure (GO) 0.018

5 kb hg19 chr17: 37,781,000 37,785,000 37,790,000

B Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press Caudate Nucleus accumbens Putamen Cell type: Neuronal PPP1R1B non-Neuronal

DLPFC OFC VLPFC - ACC STC NCX ITC PMC INS PVC AMY HIP MDT NAC

STR PUT DLPFC OFC VLPFC ACC STC NCX ITC PMC INS PVC AMY HIP MDT NAC

STR PUT

C D E

GFP PPP1R1B F G dStr

vStr sn gp pc DAPI merge Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

10717 lncRNA genes 10717 lncRNA genes

Neuronal non-Neuronal Cortical Sub-cortical lncRNA lncRNA Neuronal Neuronal lncRNA lncRNA Downloaded from genome.cshlp.org on September 30, 2021Cortex Visual Primary - Published by Cold Spring Harbor Laboratory Press Mediodorsal Thalamus

Hippocampus

Neuronal − non Neocortex Unknown

Striatum Neuronal bHLH

USF1/2 Sox

TWIST2

TWIST1 NRF1 PURA TCFL5

NEUROD2,NEUROG1/2/3TCF15 LEF1,TCF7/7L2

TAL2 SCXA/B TCF7L1

BHLHE22/23,OLIG1/2/3NEUROD1/4/6 SMAD1/5

BHLHA15

family RFX −

NFIL3

JUN−family FOS−family ESR1/2

DBP,HLF,TEF ESRRA/B/G Nuclear receptor bZIP BATF BACH1/2 NR1D1 NR1D2 NR1H4

NR4A1/2/3

ZNF32 NR5A1/2

ZNF263 RORA/B/C

ZNF148,ZNF281

ZFX,ZFY

ZBTB14 MBD2

family SP −

SNAI1/3

PLAGL1 POU1F1

PLAG1 POU4F1/2/3 C2H2 ZF

PATZ1 MAZ POU Homeodomain,

INSM1 ALX1/3,OTP,PHOX2A/B,PRRX1/2,RAX HINFP DMBX1,ISX DRGX

PBX1/2/3/4

DNMT1 ZFHX3

TFDP1/2/3 FOXM1 E2F4/5 SPI1,SPIC,CTD−2545M3.6 domain Homeo- Downloaded from genome.cshlp.org on September 30, 2021 - Published by Cold Spring Harbor Laboratory Press

An atlas of chromatin accessibility in the adult human brain

John F Fullard, Mads E Hauberg, Jaroslav Bendl, et al.

Genome Res. published online June 26, 2018 Access the most recent version at doi:10.1101/gr.232488.117

Supplemental http://genome.cshlp.org/content/suppl/2018/07/13/gr.232488.117.DC1 Material

P

Accepted Peer-reviewed and accepted for publication but not copyedited or typeset; accepted Manuscript manuscript is likely to differ from the final, published version.

Creative This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the Commons first six months after the full-issue publication date (see License http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.

Email Alerting Receive free email alerts when new articles cite this article - sign up in the box at the Service top right corner of the article or click here.

Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance online articles are citable and establish publication priority; they are indexed by PubMed from initial publication. Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial publication.

To subscribe to Genome Research go to: https://genome.cshlp.org/subscriptions

Published by Cold Spring Harbor Laboratory Press