ARTICLE

https://doi.org/10.1038/s41467-020-18526-1 OPEN Integrative genomics identifies a convergent molecular subtype that links epigenomic with transcriptomic differences in autism

Gokul Ramaswami1, Hyejung Won 1,8, Michael J. Gandal 1,2,3,4, Jillian Haney1,2, Jerry C. Wang 1, ✉ Chloe C. Y. Wong5, Wenjie Sun6, Shyam Prabhakar 6, Jonathan Mill 7 & Daniel H. Geschwind 1,3,4

1234567890():,; Autism spectrum disorder (ASD) is a phenotypically and genetically heterogeneous neuro- developmental disorder. Despite this heterogeneity, previous studies have shown patterns of molecular convergence in post-mortem brain tissue from autistic subjects. Here, we integrate genome-wide measures of mRNA expression, miRNA expression, DNA methylation, and histone acetylation from ASD and control brains to identify a convergent molecular subtype of ASD with shared dysregulation across both the epigenome and transcriptome. Focusing on this convergent subtype, we substantially expand the repertoire of differentially expressed in ASD and identify a component of upregulated immune processes that are associated with hypomethylation. We utilize eQTL and conformation datasets to link differentially acetylated regions with their cognate genes and identify an enrichment of ASD genetic risk variants in hyperacetylated noncoding regulatory regions linked to neuronal genes. These findings help elucidate how diverse genetic risk factors converge onto specific molecular processes in ASD.

1 Program in Neurogenetics, Department of Neurology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA. 2 Department of Psychiatry, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA. 3 Center for Autism Research and Treatment, Semel Institute, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA. 4 Department of Human Genetics, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA 90095, USA. 5 Institute of Psychiatry, Psychology and Neuroscience, King’s College London, De Crespigny Park, London SE5 8AB, UK. 6 Computational and Systems Biology, Genome Institute of Singapore, Singapore 138672, Singapore. 7 University of Exeter Medical School, University of Exeter, Exeter EX2 5DW, UK. 8Present address: ✉ Department of Genetics and UNC Neuroscience Center, University of North Carolina, Chapel Hill, NC 27599, USA. email: [email protected]

NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1

utism spectrum disorder (ASD) is a prevalent neurode- previous studies found ASD dysregulated features were pre- velopmental disorder characterized by impaired social dominantly localized to the cerebral cortex and substantially A 1 interactions with repetitive and restrictive behaviors . attenuated in the cerebellum. For mRNA and miRNA expression, Although ASD is highly heritable, its genetic etiology is complex, we used normalized quantifications and differential with ~1000 risk genes implicated2. Assessment of ASD risk is expression summary statistics from the previous studies9,12 challenging due to its genetic architecture which encompasses (“Methods”). alleles of varying frequencies (common, rare, very rare) and For DNA methylation, we used the normalized probe inheritance patterns (Mendelian autosomal and X-linked, addi- quantifications from the previous study11 and collapsed probe- tive, de novo)3–5 that likely interact together within individuals level measurements onto 21880 gene promoters and 24458 gene and families6,7. bodies to facilitate comparisons between methylation and Surprisingly, despite this genetic complexity, molecular studies expression (“Methods”). An initial differential methylation have identified consistent patterns of changes in post-mortem analysis identified 2578 and 1262 differentially methylated brain tissue from ASD subjects8–12. At the transcriptomic level, promoters and gene bodies, respectively, at an FDR < 10% ASD brains exhibit downregulation of genes involved in neuronal (“Methods”, Supplementary Fig. 1a, b, and Supplementary activity with a concomitant upregulation of genes involved in Data 4). The genes with differential promoter methylation were microglial and astrocyte-mediated inflammation9,13. Addition- largely distinct from genes with differential gene body methyla- ally, there is a shared pattern of microRNA (miRNA) dysregu- tion (Supplementary Fig. 1e, f). However, the loadings for each lation directly targeting downregulated neuronal genes as well as sample along the first principal component of differential upregulated astrocyte genes12. At the epigenomic level, ASD promoter and gene body methylation were almost identical brains exhibit DNA methylation differences in genomic regions (Supplementary Fig. 1d) and not correlated with any potential related to immunity and neuronal regulation11,14. Additionally, confounders (Supplementary Fig. 1c), suggesting a coherent there are differences in histone acetylation (H3K27ac) associated regulatory mechanism. with genes involved in synaptic transmission and morphogen- For histone acetylation, we quantified the consensus H3K27ac esis10. To date, these molecular datasets have not been compre- peaks identified in the previous study10 in all ASD and control hensively integrated and analyzed together, which could provide a samples including the samples marked as atypical in the previous better understanding of how epigenetic changes directly regulate study10 (“Methods”). An initial differential acetylation analysis the expression of their cognate genes and how these processes are identified 2156 differentially acetylated regions at an FDR < 20% related. Additionally, despite evidence for shared patterns of (“Methods”, Supplementary Fig. 2a, and Supplementary Data 5). molecular dysregulation, only approximately two-thirds of ASD Although this was fewer differentially acetylated regions than brain samples exhibit this major shared molecular pattern, indi- previously identified (Supplementary Fig. 2b), we show that this cating the potential for distinct molecular subtypes. Such het- depletion is an artifact of subtype heterogeneity in the ASD erogeneity among ASD cases would also be expected to reduce samples (see below “Subtype-specific histone acetylation differ- power to identify disease-related signals, providing another ences in ASD” section). rationale for the identification of subtypes. Systems-level integration of multi-omic datasets has been a successful strategy to identify molecular subtypes and elucidate Identification of two ASD molecular subtypes. In the previous causal mechanisms in cancer15,16. However, it has not yet been molecular studies9–12, we noticed that approximately two-thirds applied to neurodevelopmental disorders, including ASD. In this of ASD brain samples clustered together based on the differential study, we utilize similarity network fusion (SNF), an integrative signal for each dataset. To formally assess ASD molecular het- method that has identified molecular subtypes when integrating erogeneity across the four different datasets, we used SNF17 to transcriptomic with epigenomic datasets in cancer17, to integrate integrate differential mRNA expression, miRNA expression, mRNA expression, miRNA expression, DNA methylation, and DNA methylation, and histone acetylation for 30 ASD and 17 histone acetylation datasets from ASD brain (Fig. 1a). This control samples that were present in all 4 molecular datasets unbiased data-driven analysis identifies two distinct molecular (“Methods”). SNF creates an integrative sample-sample similarity subtypes of ASD, one, which represents the majority of cases, network by quantifying sample-sample relationships within each showing a cohesive molecular pattern, and the other without individual dataset and then integrating these sample-sample consistent changes in molecular measures. By analyzing ASD relationships across all of the datasets17. The clustering of sample brains according to subtype, which significantly reduces hetero- relationships is a major advantage of SNF, as compared to geneity, we are able to identify substantially more differentially alternative data integration methods that cluster gene relation- expressed mRNA genes compared to previous analyses. We ships which can be sensitive to differing normalization methods identify differentially expressed miRNAs, differentially methy- between data types18. lated promoters and gene bodies, as well as differentially acety- The sample loadings along the first principal component of lated genomic regions and assess the extent to which these each differential molecular level recapitulated known regulatory regulatory mechanisms influence gene expression in ASD. Finally, relationships, with differential acetylation (R = 0.73) and differ- we find an enrichment of ASD genetic risk in regulatory regions ential miRNA expression (R = 0.51) being highly correlated to linked to neuronal genes that are hyperacetylated in ASD brains, differential mRNA expression, whereas differential methylation suggesting a causal role for these elements. was less correlated with expression (R = 0.13)19,20 (Fig. 1b). Using this similarity network, samples divided into two distinct clusters (Fig. 1c), one of which consisted entirely of ASD samples Results that loaded strongly onto the differential transcriptomic and Integration of multi-omic data from ASD and control brains. epigenomic signatures (SNF Group 2). Therefore, we grouped We integrated previously published datasets on mRNA expres- these samples together as the ASD Convergent Subtype. The sion9, miRNA expression12, DNA methylation11, and histone other cluster consisted of ASD samples that did not load onto the acetylation10 from a cohort of 48 ASD and 45 control brains initial differential signatures and were indistinguishable from (Supplementary Data 1 and Fig. 1a). We only analyzed the controls (SNF Group 1). Therefore, we grouped these samples samples originating from the frontal and temporal cortex, because together as the ASD Disparate Subtype. We built a logistic

2 NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 ARTICLE

a Similarity network Data types Fusion subtyping Molecular differences

miRNA expression changes

miRNA-seq Convergent Disparate ASD ASD Control Targetscan Autism 1 brain miRNA RNA-seq 0.5 mRNA expression changes mRNA Z–score Promoter Illumina 450k 0 Gene body Control methylation Differential DNAm –0.5 DNA methylation changes brain H3K27ac H3K27ac eQTL –1 ChIP-seq Hi-C Histone acetylation changes

bc−10 −5 0 5 −60 −20 20 60 −50 0 50 Control ASD Differential 50 50 Differential SNF Group 1 / control SNF Group 1 / ASD (disparate subtype) mRNA 0.51 0.13 0.73 0 0 mRNA expression expression SNF Group 2 / ASD (convergent subtype) −50 −50

5 5 Differential Differential 0 0 miRNA 0.54 0.55 miRNA −5 expression −5 expression

−10 −10

80 80

Differential 40 40 Differential DNA 0.17 DNA methylation 0 0 methylation −40 −40

60 20 20

10 10 20 Differential Differential histone 0 histone 0 −20 acetylation −10 acetylation −10

−60 −20 −20

−50 0 50 −40 0 40 80 −50 0 50 −10 −5 0 5 −40 0 40 80 −20 −10 0 10 20

d e Sample numbers SNF 1 SNF + LR classifier Control ASD 0.9 Group 1 17 7 SNF Group 2 023 0.8

Group 1 50 22 0.7 LR classifier Group 2 11 39 Classification concordance 0.6

0.5 mRNA miRNA DNA Histone Expression Expression Methylation Acetylation Held Out Data Set

Fig. 1 SNF to identify ASD molecular subtypes. a Overview of data integration and molecular subtyping to characterize the cascade of molecular changes in ASD. b Relationship between sample loadings on the first principal component of differential mRNA expression, miRNA expression, DNA methylation, and histone acetylation. c Identification of two sample clusters using SNF: SNF Group 1 and SNF Group 2. ASD samples in SNF Group 1 constitute the Disparate Subtype, whereas ASD samples in SNF Group 2 constitute the Convergent Subtype. d Number of samples classified into the two cluster groups using SNF and logistic regression (LR) classifiers. e Comparison of SNF clustering and logistic regression (LR) classification assignments when utilizing three out of four datasets (see Supplementary Fig. 5). The concordance of sample assignments to those when using the complete dataset are plotted.

NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 regression classifier to assign the 61 ASD and 61 control samples M3, a module representing mitochondrial function in neurons, that were not used in at least one of the four classification datasets which has been previously implicated in ASD21, mRNA.M7, a into either the Convergent or Disparate subtypes (“Methods”, module with no functional enrichments, and mRNA.M17, a Fig. 1d and Supplementary Fig. 4). Interestingly, 11 of the 43 ASD module representing synaptic signaling and vesicle transport in individuals with samples from both frontal and temporal cortex neurons (Fig. 2d, e). To gain insight into neuronal down- were classified into different molecular subtypes in the two regulation in ASD at a finer resolution, we compared ASD cortical regions (Supplementary Fig. 4d), a significant difference downregulated modules to neuronal cell type-specific markers in comparison to only 1 of the 33 control individuals (p = 0.0096, identified from single-nuclei RNA sequencing of post-mortem Fisher’s Exact Test). This finding is consistent with potential human cortex22. ASD downregulated modules are significantly molecular heterogeneity across different cortical regions in ASD. enriched with markers of both inhibitory and excitatory neurons SNF subtype assignments were robust to clustering methodol- (Fig. 2f). The strongest enrichments are inhibitory neuron ogy and comprehensive leave one out cross validation (Supple- subtypes expressing SST or PVALB, derived from the medial mentary Fig. 3c, d). We further tested clustering robustness by ganglionic eminence, as well as deep layer excitatory neurons leaving out each dataset and performing SNF clustering and expressing RORB or FEZF2 (Fig. 2f), suggesting that the number logistic regression classification using the remaining three and/or activity of these cells is decreased in ASD, consistent with datasets. We found the resultant sample subtype assignments a recent single-cell analysis of post mortem ASD brain23. were highly concordant with the subtype assignments identified Three of the additionally identified ASD-associated modules using the entire four dataset collection (range = 0.86–0.93; Fig. 1e were upregulated in ASD: mRNA.M15, a module representing and Supplementary Fig. 5). Additionally, the ASD subtype metabolic processes and transcriptional regulation in glia, mRNA. assignments were not correlated with, or driven by, biological M21, a module representing ribosomal translational, and mRNA. or technical covariates, including age, sex, RNA quality, cell M23, a module enriched with astrocyte markers. Module mRNA. fraction, and post mortem interval (Supplementary Fig. 6). M15 (Fig. 2g, h) was particularly interesting, because one of its Finally, to compare with the sample classifications above, which top hub genes is REST, a transcriptional repressor with critical were generated using only differential features in each dataset, we roles in repressing neural genes in non-neural cells24. Although attempted to cluster the samples with SNF using all features in the module mRNA.M15 is enriched with microglial markers transcriptomic and epigenomic datasets. We find no clear (Supplementary Fig. 7g), it exhibits a markedly different separation between ASD and control samples (Supplementary transcriptional profile25 than the previously identified ASD Fig. 3a, b), demonstrating that molecular differences between upregulated microglial module mRNA.M19 (Fig. 2i). Module ASD and control brains are restricted to specific differentiating mRNA.M19 is specifically enriched with genes marking micro- features and are not a general genome-wide phenomenon. glial activation26, suggesting that it is directly related to neural- immune response. In contrast, module mRNA.M15 is enriched with markers of juvenile or aging glia, suggesting it may be related Subtype-specific mRNA expression differences in ASD.To to glial growth and maturation. In general, cellular processes leverage the increased power of analyzing a more homogenous set underlying broad categories of immuno-glial cell types are of cases, we performed differential mRNA expression analyses upregulated in ASD. separately for ASD Convergent and Disparate subtypes against control samples for each gene (“Methods” and Supplementary Fig. 7a, b). For the ASD Convergent subtype, we observed 5439 Subtype-specific miRNA expression differences in ASD.We differentially expressed genes at an FDR < 5%, 2283 of which were conducted differential miRNA expression analyses for ASD upregulated and 3156 downregulated in ASD (Supplementary Convergent and Disparate subtypes against control samples for Data 2). We reproduce 94.5% of the differentially expressed genes each mature miRNA transcript (Methods and Supplementary from the previous study9 and identify an additional 4356 genes at Fig. 8a, b). For the ASD Convergent subtype, we identified 43 the same statistical threshold (Fig. 2a, b and Supplementary differentially expressed miRNAs at an FDR < 5%, 28 upregulated Fig. 7f), demonstrating the utility of this subtype-specific and 15 downregulated in ASD that highly overlapped with previous approach. The top enrichments are very similar work (Supplementary Fig. 8c, d and Supplementary Data 3)12. to those previously identified, showing an upregulation of genes We analyzed differentially associated miRNA co-expression involved in immune response and a downregulation of genes modules in the miRNA co-expression network defined pre- involved in synaptic transmission and neuronal ion transport viously12. For the ASD Convergent subtype, we found the same (Supplementary Fig. 7c, d). In contrast, for the ASD Disparate three miRNA co-expression modules differentially associated at subtype, we found no differentially expressed genes. an FDR < 5% as the previous study: miRNA.brown, which is Next, we identified mRNA co-expression modules that were downregulated in ASD, as well as miRNA.magenta and miRNA. differentially associated between ASD and control individuals in yellow, which are upregulated (Supplementary Fig. 8e). We the cortical co-expression network defined previously9. For each used TargetScan27 to predict mRNA targets for the top hubs of gene module, we tested whether the two ASD subtypes and each differentially associated miRNA co-expression module control samples had differences in their association with the (Supplementary Data 3). We found an enrichment of genes in module eigengene, a summary measure of module expression the upregulated mRNA.M19 module within the predicted tar- level (Supplementary Fig. 7e). For the ASD Convergent subtype, gets of the miRNA.brown module (Supplementary Fig. 8f), we found 13 differentially associated co-expression modules at an suggesting the downregulation of these miRNAs may contribute FDR < 5%, including the 6 ASD-associated modules identified to the upregulation of immune processes in ASD. We also found previously9 and 7 ASD-associated modules additionally identified an enrichment of genes in the downregulated mRNA.M16 and in this study (Fig. 2c). In contrast, for the ASD Disparate subtype, mRNA.M3 modules within the predicted targets of miRNA. we did not find any differentially associated co-expression magenta and miRNA.yellow (Supplementary Fig. 8f), suggesting modules. the upregulation of these miRNAs may contribute to the down- Of the seven additionally identified ASD-associated co- regulation of neuronal processes in ASD. In contrast, for the ASD expression modules in this study, four were downregulated in Disparate subtype, we did not find any differentially expressed ASD: mRNA.M1, a module representing neurogenesis, mRNA. miRNAs or differentially associated miRNA co-expression

4 NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 ARTICLE

a b c mRNA.M1 (−) This study This study mRNA.M3 (N) mRNA.M4 (N)

mRNA.M7 (−) Parikshak2016 Parikshak2016 38 25 mRNA.M10 (N) mRNA.M9 (A)

mRNA.M15 (M) mRNA.M16 (N) 2618 538 1738 545 mRNA.M17 (N) mRNA.M19 (M) mRNA.M20 (−) mRNA.M21 (−)

mRNA.M23 (A)

−6 −4 −2 0 2 4 6

log10(signed Bonferroni corrected p−value)

d SYT3 egAGTRAP h RP11−58B17.2 RAB6A mRNA.M17 SP1 NFATC1 mRNA.M15 C4orf50 SLC17A7 ANXA5 STAG2 Positive regulation of KIF3C NRSN2 Synaptic signaling SLC39A1 CHST3 sulfur metabolic RGS7BP TMEM59L NEDD1 TLN1 process Vesicle−mediated IQSEC2 CACNG3 TANC2 FOXO1 REST NFE2L2 CBFB Cell activation MEF2D transport in synapse

BCL11A NDFIP1 PPP3CB LPHN1 Synapse organization ROM1 HMG20B CSF1 HIST2H2BE Hemopoiesis

Calcium ion regulated PITPNB PACSIN1 MAST3 PPP2R2C RNF114 NECAP2 ITGB1 TIPARP Steroid metabolic exocytosis process RAB3A ATL3 LBR TCP11L1 CDK5R2 Regulation of MVP STAT5A Regulation of lipid ANKRD34A vesicle−mediated metabolic process WASF1 BSN transport HNRNPF IRAK4 LARP1 TBC1D25 TRIP6 SYT7 012345 AFF1 HPS5 0.0 1.0 2.0 3.0 −log (FDR corrected p−value) −log (FDR corrected p−value) 10 10

f mRNA.M1 4 mRNA.M3

3 Odds ratio mRNA.M4 mRNA.M7 2 mRNA.M10 1 mRNA.M16 mRNA.M17 0 Inh L5−6 SST TH Inh L1−2 VIP LBH Inh L2−5 VIP TYR Inh L1 SST NMBR Inh L1−3 VIP GGH Inh L3−6 SST NPY Exc L2 LAMP5 LTK Inh L1−4 VIP PENK Inh L2−6 VIP QPCT Inh L2−4 SST FRZB Inh L3−6 SST HPGD Inh L2−4 VIP CBLN1 Inh L1 SST CHRNA4 Inh L2−3 VIP CASC6 Inh L1−3 PAX6 SYT6 Inh L1−3 SST CALB1 Exc L4−6 RORB C1R Inh L1−3 VIP CHRM2 Inh L1−4 VIP OPRM1 Inh L2−6 LAMP5 CA1 Exc L4−6 FEZF2 IL26 Inh L1−2 SST BAGE2 Inh L1−2 LAMP5 DBP Exc L6 FEZF2 OR2T8 Exc L5−6 FEZF2 ABO Inh L2−4 VIP SPAG17 Inh L1−2 GAD1 MC4R Inh L1−2 VIP PCDH20 Inh L4−5 SST STK32A Inh L1−4 VIP CHRNA6 Exc L3−5 RORB ESR1 Inh L4−6 SST B3GAT2 Inh L4−6 SST GXYLT2 Inh L1−4 LAMP5 LCP2 Inh L1−2 PAX6 CDH12 Inh L5−6 PVALB LGR5 Inh L5−6 GAD1 GLP1R Inh L4−5 PVALB MEPE Inh L1−2 VIP TSPAN12 Inh L3−5 SST ADGRG6 Inh L1−3 VIP CCDC184 Exc L6 FEZF2 SCUBE1 Exc L5−6 RORB TTC12 Inh L4−6 PVALB SULF1 Exc L4−5 RORB DAPK2 Inh L5−6 SST NPM1P10 Inh L2−5 VIP SERPINF1 Exc L5−6 SLC17A7 IL15 Inh L5−6 SST KLHDC8A Inh L5−6 SST MIR548F2 Inh L3−6 VIP HS3ST3A1 Exc L4−5 FEZF2 SCN4B Exc L3−5 RORB FILIP1L Inh L2−4 PVALB WFDC2 Exc L3−5 RORB TWIST2 Inh L1−3 VIP ADAMTSL1 Exc L4−5 RORB FOLH1B Exc L5−6 THEMIS FGF10 Exc L5−6 THEMIS C1QL3 Exc L4−6 RORB SEMA3E Inh L2−5 PVALB SCUBE3 Exc L3−5 RORB COL22A1 Inh L1−2 PAX6 TNFAIP8L3 Exc L3−4 RORB CARM1P1 Exc L5−6 THEMIS CRABP1 Exc L5−6 FEZF2 EFTUD1P1 Exc L2−4 LINC00507 GLP2R Exc L2−3 LINC00507 FREM3 Exc L5−6 THEMIS DCSTAMP

CGE−derived MGE−derived L2/L3 L4 L5 L6

LAMP5/PAX6 VIP SST PVALB RORB FEZF2

Inhibitory neurons Excitatory neurons

i mRNA.M9 10

4.4 7.9 mRNA.M15 8 (0.029) (0.033)

12 3.5 25 10 Odds ratio mRNA.M19 6 (7.6e−10) (0.029) (8.6e−16) (3.4e−09) 10 mRNA.M20 4 (0.028) 7.7 6.2 50 mRNA.M21 (7.2e−07) (0.00097) (3.1e−34) 2

mRNA.M23 0

ated ry A) Adult Adult Adult Clustervenile 1 P Cluster 4 ClusterAging 8 ClusterInju 9 Activ Ju Cluster 2a Cluster 2b Cluster 2c Cluster 7a Cluster 7b Cluster 7c Microglia (ARG1) (RRM2) Axon Tract(SPP1) (CCL4) (UBE2C) (CEN (IFI27L2A) Cycling S−PhaseCycling G−PhaseCycling M−Phase (Hirbec et al.) Distinct microglia subtypes (Hammond et al.) modules. Overall, subtype-specific analyses of miRNA expression promoters and gene bodies (“Methods”). For the ASD Con- largely recapitulated findings from the previous work12. vergent subtype, we identified 3013 differentially methylated gene promoters at an FDR < 5%, 2298 hypermethylated and 715 Subtype-specific DNA methylation differences in ASD.We hypomethylated in ASD (Supplementary Fig. 9b and Supple- conducted differential DNA methylation analyses for ASD Con- mentary Data 4). ASD hypermethylated gene promoters are vergent and Disparate subtypes against control samples for gene enriched in RNA processing genes, while ASD hypomethylated

NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications 5 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1

Fig. 2 mRNA expression differences in ASD. a Overlap in ASD downregulated genes identified in this study with Parikshak et al.9. b Overlap in ASD upregulated genes identified in this study with Parikshak et al.9. c Signed association of mRNA co-expression module eigengenes with diagnosis (Bonferroni-corrected p-value from a linear mixed effects model, see Supplementary Fig. 7e). Positive values indicate modules with an increased expression in ASD samples. Gray and black bars with labels signify ASD-associated modules identified in Parikshak et al., and those additionally identified in this study, respectively. Cell type enrichment for each module is shown in parenthesis: neuron (N), astrocyte (A), microglia (M), and no enrichment (−) (see Supplementary Fig. 7g). d Top 30 hub genes and 300 connections for co-expression module mRNA.M17. e Top gene ontology enrichments for co- expression module mRNA.M17. Ontology enrichments were calculated by g:Profiler with FDR corrected p-values. f Enrichment of ASD downregulated neuronal co-expression modules with neuronal cell-type markers identified from single-nuclei RNA sequencing22. Enrichments were calculated using a logistic regression model and p-values were adjusted for multiple testing using FDR correction. Only those enrichments with odds ratio >1 and FDR corrected p-value < 0.05 are shown. g Top 30 hub genes and 300 connections for co-expression module mRNA.M15. h Top gene ontology enrichments for co-expression module mRNA.M15. Ontology enrichments were calculated by g:Profiler with FDR corrected p-values. i Enrichment of ASD upregulated glial co-expression modules with microglial activated genes26 and microglial cell-type markers25. Enrichments were calculated using a logistic regression model and p-values, which are shown in parentheses, were adjusted for multiple testing using FDR correction. Only those enrichments with odds ratio >1 and FDR corrected p-value < 0.05 are shown.

gene promoters are enriched in chemical sensory receptor genes markers and representing mitochondrial activity, and GB. (Supplementary Fig. 9d, e). We identified 1460 differentially royalblue, a module with no functional enrichments. The methylated gene bodies at an FDR < 5%, 678 hypermethylated hypomethylated modules were: GB.black, GB.darkgreen, GB. and 782 hypomethylated in ASD (Supplementary Fig. 10b and lightcyan, and GB.salmon, 4 modules representing immune Supplementary Data 4). ASD hypermethylated gene bodies are processes, as well as GB.green, a module representing glucur- enriched in RNA processing genes, while ASD hypomethylated onidation, and GB.yellow, representing bile acid ion transport. In gene bodies are enriched in keratinization and bile acid transport contrast, for the ASD Disparate subtype, we did not find any genes (Supplementary Fig. 10d, e). We assigned the previously ASD-associated promoter or gene body co-methylation modules. identified differentially methylated probes11 to their corre- Overall, the ASD-associated co-methylation modules did not sponding promoter or gene body annotation and now identify show significant global overlap with the ASD-associated co- fifty and eleven-fold more differentially methylated promoters expression modules at the gene level (Supplementary Fig. 11c, d), (Supplementary Fig. 9h, i) and gene bodies (Supplementary suggesting that differential methylation is not a prominent driver Fig. 10h, i), respectively. In contrast, for the ASD Disparate of differential gene expression. The largest overlaps are between subtype, we did not find any differentially methylated gene pro- hypomethylated co-methylation modules and upregulated co- moters or gene bodies (Supplementary Figs. 9c and 10c). Genes expression modules involved in immune processes. In particular, with differentially methylated gene promoters were largely dis- Prom.lightgreen overlaps significantly with mRNA.M19 and GB. tinct from those with differentially methylated gene bodies darkgreen overlaps significantly with mRNA.M15, suggesting that although there was a greater overlap in hypermethylated genes the ASD-associated upregulation in immune activation is, in part, due to their shared biological enrichments (Fig. 3a, b). There were regulated by a decrease in DNA methylation at promoters and 596 and 305 genes that were both differentially expressed as well gene bodies (Fig. 3e–j). as differentially methylated at gene promoters and gene bodies, respectively. As expected, there was a negative correlation Subtype-specific histone acetylation differences in ASD.We between differential expression and differential methylation for next performed differential histone acetylation analyses for ASD gene promoters (Fig. 3c). Surprisingly, there was also a negative Convergent and Disparate subtypes against control samples for correlation with gene body methylation (Fig. 3d), suggesting that H3K27ac peaks in the genome (“Methods”). For the ASD Con- in the context of ASD, gene body methylation is associated with vergent subtype, we identified 15967 differentially acetylated negative regulation of gene expression, or reflects a secondary peaks at an FDR < 10%, 8707 hyperacetylated and 7260 hypoa- response. cetylated in ASD (Supplementary Fig. 12a and Supplementary We generated co-methylation networks for both promoters Data 5). There was a strong overlap with differentially acetylated and gene bodies (Supplementary Figs. 9a and 10a), which peaks identified in a previous study (Supplementary Fig. 12c, d)10. recapitulated many aspects of the probe-level co-methylation In contrast, for the ASD Disparate subtype, we did not find any network that was previously built11 (Supplementary Fig. 11a, b). differentially acetylated peaks (Supplementary Fig. 12b). Using We identified eight ASD-associated promoter co-methylation GREAT to assess ontology enrichments for genes closest to each modules, four hypermethylated and four hypomethylated in ASD differentially acetylated peak28,wefind that genes proximal to (FDR < 0.05; Supplementary Fig. 9f). The hypermethylated ASD hyperacetylated peaks are enriched in synaptic potential, promotor modules were: Prom.midnightblue, representing coen- while genes proximal to ASD hypoacetylated peaks are enriched in zyme A biosynthesis, Prom.pink, a module with no functional neurogenesis and organ development (Fig. 4a, b). H3K27ac is enrichments, Prom.tan, enriched with oligodendrocyte cell known to mark active promoters29, and as expected, differentially markers, and Prom.turquoise, enriched with astrocyte cell acetylated peaks in gene promoters were strongly positively cor- markers and representing RNA processing. The hypomethylated related (R = 0.25) with differential expression (Fig. 4d). Surpris- modules were: Prom.brown and Prom.greenyellow, two modules ingly, we found that the relationship between differential promoter representing sensory perception, as well as Prom.lightcyan and acetylation and differential expression was cell type-specific, with Prom.lightgreen, two modules representing immune processes. hyperacetylated promoters associated with upregulated microglial At the gene body level, we identified 10 ASD-associated co- genes and downregulated neuronal genes, whereas hypoacetylated methylation modules, 4 hypermethylated and 6 hypomethylated promoters were associated with upregulated astrocyte genes and in ASD (FDR < 0.05; Supplementary Fig. 10f). The hypermethy- downregulated oligodendrocyte genes (Fig. 4e). lated modules were: GB.blue, representing RNA processing, GB. In addition to marking promoters, H3K27ac also marks distal cyan, enriched in neuron cell markers and representing the enhancers up to 1 MB away from gene transcription start sites unfolded response, GB.darkred, enriched in neuron cell (TSS)30. We utilized expression quantitative trait loci (eQTL) and

6 NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 ARTICLE

ab c d 2 R = −0.1 2 R = −0.21 P = 0.0084 P = 0.00015 Promoter Slope = −4.3 Slope = −9 93 39 1 1 Gene body Promoter Gene body

0 0 2205 585 676 743 (FC) expression (FC) expression 2 2 log log −1 −1

−2 −2 −0.04 −0.02 0.00 0.02 0.04 −0.04 −0.02 0.00 0.02 0.04 Methylation change Methylation change

e AC005391.2 f Prom.lightgreen g p = 0.013 IRF7 GP1BA Immune 0.2 KRT85 AVP response−activating signal transduction CEACAM8 SOAT2 Regulation of 0.1 TREML2 TMC6 MyD88−dependent CTC−338M12.7 LAT FGF23 LSP1 toll−like receptor signaling pathway 0.0 AL118508.1 PIK3R6 IPCEF1 CD300LF Leukocyte mediated immunity −0.1

REG1P CAMP MARCO PADI4 Leukocyte activation −0.2 GSTA3 TRPV2

EYA4 MIR145 Adaptive immune ME Prom.lightgreen response RP11−536I6.2 ACT RT2 −0.3 SLC5A4 CRYBB1CD27 0.0 1.0 2.0 3.0 −log (FDR corrected p−value) −0.4 10 ASD Control

i j p = 0.0024 h PLEKHG7 MUC5B RP11−99J16__A.2 GB.darkgreen Regulation of muscle EVPL COL6A2 filament sliding Positive regulation of 0.2 UROC1 HNF1A chronic inflammatory EPPK1 MRPL23 response to antigenic TTC40 MYBPC3 TNF RBP3 stimulus Positive regulation of 0.0 OR9Q2 TNNC1 PADI1 IL36B humoral immune response mediated by circulating immunoglobulin RP11−1E3.1 PCK1 GPR182 KCNK16 Cytokine−mediated −0.2 CXCR1 signaling pathway ME GB.darkgreen CARD14 CALML6 MVP Regulation of heterotypic cell−cell −0.4 IL1RN BCAM adhesion ACE MYH16 KCNH6 0.0 1.0 2.0 −log (FDR corrected p−value) 10 ASD Control

Fig. 3 DNA methylation differences in ASD. a Overlap in ASD hypermethylated gene promoters and gene bodies. b Overlap in ASD hypomethylated gene promoters and gene bodies. c Correlation between expression and methylation changes for genes that have differential promoter methylation and are differentially expressed. A linear model was used to correlate differential expression with differential methylation. P-value is from fit of linear model. d Correlation between expression and methylation changes for genes that have differential gene body methylation and are differentially expressed. A linear model was used to correlate differential expression with differential methylation. P-value is from fit of linear model. e Top 30 hub genes and 300 connections for promoter co-methylation module Prom.lightgreen. f Top gene ontology enrichments for promoter co-methylation module Prom.lightgreen. Ontology enrichments were calculated by g:Profiler with FDR corrected p-values. g Promoter co-methylation module Prom.lightgreen eigengene values for ASD and control samples. P-value is from fit of a linear mixed effects model (see Supplementary Fig. 9f). h Top 30 hub genes and 300 connections for gene body co-methylation module GB.darkgreen. i Top gene ontology enrichments for gene body co-methylation module GB.darkgreen. Ontology enrichments were calculated by g:Profiler with FDR corrected p-values. J Gene body co-methylation module GB.darkgreen eigengene values for ASD and control samples. P-value is from fit of a linear mixed effects model (see Supplementary Fig. 10f). For boxplots in g, j, the center of the box is the median value, the bounds of the box are the 75th and 25th percentile values, the whiskers extend out from the box to 1.5 times the interquartile range of the box, and outlier values are presented as individual points. chromatin conformation capture (Hi-C) datasets from bulk adult 1.4e-5) (Supplementary Fig. 12g), neuronal Hi-C (R = 0.061, p = brain tissue31, as well as Hi-C data using sorted neuronal and glial 7.6e-5) (Supplementary Fig. 12h), and glial Hi-C (R = 0.11, p = cells from adult brain tissue (“Methods”) to link distal differential 1.4e-11) (Supplementary Fig. 12i) datasets. The correlations were H3K27ac peaks with their cognate genes (Fig. 4c; “Methods”). As a further improved when H3K27ac peaks were linked to genes using baseline, we analyzed all differentially acetylated peaks within 1 MB a combination of eQTL and bulk Hi-C (R = 0.18, p = 9.2e-4) of a differentially expressed gene TSS and found that the overall (Supplementary Fig. 12j) or glial Hi-C (R = 0.23, p = 2.9e-5) correlation between differential acetylation and differential expres- (Supplementary Fig. 12l) datasets. However, the combination of sion was minute (R = 0.018, p = 2e-8) (Supplementary Fig. 12e). eQTL and neuronal Hi-C datasets was uncorrelated (R = 0, p = The correlations improved when linking differentially acetylated 0.36) (Supplementary Fig. 12k), possibly reflecting the bias of bulk regions to differentially expressed genes using eQTL (R = 0.084, eQTL linkages to glial cells due to the greater number of glia as p = 2.3e-9) (Supplementary Fig. 12f), bulk Hi-C (R = 0.075, p = compared with neurons in the cerebral cortex32.

NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications 7 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1

a b c ASD up ASD down H3K27ac peak

Neuronal stem cell eQTL linkage Syntaxin−1 binding population maintenance Hi-C linkage Glucosylceramide Diencephalon metabolic process development

Protein phosphatase Embryonic digestive activator activity tract development

Embryonic digestive Respiratory burst tract morphogenesis Promoter Inhibitory postsynaptic Forebrain ventricular potential zone progenitor cell division 012345 04812 Distal − −log (FDR corrected p−value) log10(FDR corrected p−value) 10

d f

mRNA.M1

2 8.1 2.4 3.3 3.4 2.1 1.7 2.4 2.3 R = 0.25 mRNA.M2 5 (1e−118) (3.3e−06) (1.3e−27) (1.3e−20) (3.9e−100) (9.7e−16) (3.7e−98) (8e−61) P = 4.3e−18 1.3 mRNA.M3 Slope = 0.39 (0.017) Q1 1.3 1.3 1.3 1.4 1 mRNA.M4 Q2 (0.014) (0.0059) (0.029) (0.0024) mRNA.M5

0 mRNA.M6 4

mRNA.M7 (FC) expression 2 −1 Q3 Q4 mRNA.M8 1.3 log mRNA.M9 (0.039)

mRNA.M10 −2 3

mRNA.M12 Odds ratio

−2 −1 0 1 2 1.3 1.5 mRNA.M13 (0.039) (0.0034) log (FC) acetylation 2 mRNA.M14

mRNA.M15 2 e mRNA.M16 10 6.9 2.5 mRNA.M17 Q1 (1e−10) (0.0081) 8 mRNA.M18

5.7 mRNA.M19 Odds ratio Q2 (0.00013) 6 mRNA.M20 1

4 mRNA.M21 Q3 9.6 (2.9e−17) mRNA.M22

1.3 2 mRNA.M23 (0.044) 5.1 1.5 Q4 (7.6e−08) mRNA.M24 0 0 (0.03) 1.3 1.2 mRNA.M25 (0.022) (0.0043)

Bulk Bulk eQTL eQTL Neuron NeuN+ NeuN− NeuN+ NeuN− Oligoden− Microglia Astrocyte Promoter Promoter drocytes Endothelial eQTL + Bulk eQTL + Bulk eQTL + NeuN+eQTL + NeuN− eQTL + NeuN+eQTL + NeuN−

ASD up ASD down g 2 mRNA.M9: ASD upregulated 2 mRNA.M2: No change 2 mRNA.M4: ASD downregulated Astrocyte Compensatory Oligodendrocyte Contributory Synaptic vesicle Compensatory metabolism myelination transport 1 1 1

0 0 0 (FC) expression (FC) expression (FC) expression 2 −1 2 −1 2 −1 log log log

−2 −2 −2 −2 −1 0 1 2 −2 −1 0 1 2 −2 −1 0 1 2

log2(FC) acetylation log2(FC) acetylation log2(FC) acetylation

Fig. 4 Histone acetylation differences in ASD. a Top gene ontology enrichments when linking ASD hyperacetylated regions to proximal genes using GREAT28. P-values were adjusted for multiple testing by FDR correction. b Top gene ontology enrichments when linking ASD hypoacetylated regions to proximal genes using GREAT28. P-values were adjusted for multiple testing by FDR correction. c Schema to link H3K27ac regions with their cognate genes. H3K27ac peaks within promoters were directly assigned to the proximal gene. Distal H3K27ac peaks were assigned to genes using eQTL and Hi-C datasets. d Correlation between expression and acetylation changes for genes that have a differentially acetylated region within their promoter and are differentially expressed. P-value is from a linear model used to correlate differential expression with differential acetylation. The four separate quadrants are marked. e Cell type enrichments for the four quadrants in d. Enrichments were calculated using a logistic regression model and p-values, which are shown in parentheses, were adjusted for multiple testing using FDR correction. Only those enrichments with odds ratio >1 and FDR corrected p-value < 0.05 are shown. f Enrichment of cognate genes linked to differentially acetylated regions within mRNA co-expression modules. Modules with a significant relationship to diagnosis are marked along the y axes (red: increased expression in ASD; blue: decreased expression in ASD). Enrichments were calculated using a logistic regression model and p-values, which are shown in parentheses, were adjusted for multiple testing using FDR correction. Only those enrichments with odds ratio >1 and FDR corrected p-value < 0.05 are shown. g Relationship between expression and acetylation changes for differentially acetylated peaks linked to gene co-expression modules. The functional annotation for each module is represented in the top left corner. The association of each module to ASD diagnosis is represented in the top right corner as well as whether acetylation changes are contributory or compensatory to changes in expression.

8 NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 ARTICLE

We provide a listing of potential cognate genes linked to compensate these transcriptional changes, there is a decrease of differentially acetylated peaks using the eQTL, neuronal Hi-C, histone acetylation at the promoters of upregulated astrocyte glial Hi-C, and bulk Hi-C linkages (Supplementary Data 5). Using genes, and an increase of histone acetylation linked to down- this list of cognate genes, we identified gene co-expression regulated neuronal genes at the same regulatory elements initially modules potentially regulated by ASD-associated acetylation impacted by the casual genetic variants. changes (Fig. 4f). We find an enrichment of ASD hypoacetylated elements in mRNA.M2, an oligodendrocyte module harboring many ASD downregulated genes, suggesting that hypoacetylation Discussion is contributory to oligodendrocyte downregulation in ASD. We integrate mRNA expression, miRNA expression, DNA Surprisingly, we find an enrichment of ASD hypoacetylated methylation, and histone acetylation datasets to identify a subtype elements in mRNA.M9, an astrocyte module that is upregulated of ASD brain samples with convergent dysregulation across the in ASD, and hyperacetylated elements in mRNA.M4, a neuron epigenome and transcriptome. By focusing on this convergent module that is downregulated in ASD. In both of these cases, ASD subtype, we identify a four-fold expansion in differentially acetylation changes are negatively correlated with expression expressed mRNAs and co-expression modules encompassing the changes. This suggests, for genes in these modules, that ASD- major processes of neuronal downregulation and immune upre- associated acetylation changes are not causal, but compensatory gulation. We identify thousands of differentially methylated gene to gene expression changes (Fig. 4g). promoters and gene bodies, but only a small proportion of the methylation changes, specifically hypomethylation of immune genes, seem to influence gene expression regulation. In contrast, Enrichments of ASD heritability in dysregulated genomic histone acetylation is a strong positive regulator of mRNA features. Differential gene expression or epigenetic changes may expression19 and we identify thousands of differentially acetylated either be contributory to, or a consequence of disease. To provide regions across the genome and furthermore assign them to cog- a causal anchor, we used stratified LD score regression33 to nate genes using eQTL and chromosome conformation datasets. partition heritability of ASD risk variants from genome-wide We find differentially acetylated regions enriched within dysre- association studies4,34 into regions of the genome that are dif- gulated astrocyte and neuronal co-expression modules. Surpris- ferentially expressed, methylated, or acetylated. We included ingly the acetylation changes in these regions are negatively inflammatory bowel disease (IBD) as comparator GWAS data- correlated with expression changes, implying that these changes set35, because IBD is a disorder of immune dysregulation, but it are most likely compensatory. does not directly affect the brain. We also included Alzheimer’s Over 50% of ASD genetic liability is carried by small effect size Disease (AD)36, because AD is a neurological disorder, but like common variants that are mostly noncoding3. One of the major IBD, has a distinct genetic profile to ASD. We found a significant challenges in characterizing these common variants is the ability enrichment of ASD and IBD heritability in genomic regions that to link noncoding regulatory regions with their cognate genes, are hyperacetylated in ASD brains (Fig. 5a). Specifically, these which are often dynamically regulated across different cell types enrichments are found at distal enhancer regions and not at gene and developmental stages. In this study, we find an enrichment of promoters (Fig. 5b), highlighting the importance of noncoding ASD genetic risk within hyperacetylated regions of the genome, regulatory elements. The IBD enrichment is likely marking specifically those linked to downregulated neuronal genes. This is hyperacetylated elements linked to microglial genes (Fig. 4d, e). consistent with a previous study, which found enrichment of ASD Cognate genes linked to hyperacetylated enhancers are enriched risk variants in a down-regulated, neuronal-associated module in within module mRNA.M4 (Fig. 4f), an ASD-downregulated ASD post-mortem brain13. Moreover, whereas enrichment of neuronal module representing genes involved in synaptic vesicle ASD risk variants in brain regulatory elements, including those transport (Fig. 5c, d). This finding supports previous observations marked by H3K27ac, has been observed previously4,37,we that common genetic risk variants for ASD are enriched in reg- observe that ASD risk variants are specifically enriched in dif- ulatory regions of neuronal genes4,13. Among the genes within ferentially regulated enhancers, providing a potential mechanistic module mRNA.M4, six of its top hub genes are linked to elements understanding of how non-coding ASD risk variants likely impact hyperacetylated in ASD, including DMTN and STX1B (Fig. 5e, f). gene regulation. The enrichment of causal risk variants and subsequent down- A major unanswered question is what molecular processes are regulation of these hub genes within the module suggests that an involved in the ASD Disparate subtype samples. We were unable increase in acetylation is likely an attempt to compensate for to find any consensus molecular dysregulation in these samples downregulation of key driver genes of this module. across all the datasets (Supplementary Figs. 7b, 8b, 9c, 10c, and Next, we analyzed whether there were ASD heritability 12b). We also assessed whether they were misdiagnosed. The enrichments in the co-expression and co-methylation network available clinical records for these individuals are sparse, but we modules (Supplementary Fig. 13a–c). We found a significant found no evidence to support misdiagnosis, nor did we find enrichment of heritability within the promoter co-methylation samples exhibiting differential expression signatures for four module Prom.green, which represents genes involved in neuro- other neuropsychiatric disorders (Supplementary Fig. 14). We genesis (Supplementary Fig. 13d, e), further strengthening the note that in 11 of the ASD subjects, the two different cortical observation that ASD risk variants reside within regulatory regions were classified into different molecular subtypes (Sup- regions of genes involved in neuronal function and neurogenesis. plementary Fig. 4d), suggesting that the extent of molecular Taken together (Fig. 6), our findings support a model whereby dysregulation may vary across regions of the cortex in these ASD risk variants perturb regulatory elements controlling the individuals. As an initial assessment of regional heterogeneity, we expression of neuronal genes, leading to the overall down- analyzed RNA-seq data derived from four additional cortical regulation of synaptic signaling and neuronal ion transport. This areas in a subset of individuals in this study21, including portions in turn leads to the transcriptional upregulation of astrocyte and of the parietal and occipital lobes, and find some evidence of microglial mediated neural immune processes through a quantitative differences in gene expression dysregulation across concomitant decrease in DNA methylation, decrease in expres- the cortex (Supplementary Fig. 15). Future studies of ASD post- sion of associated miRNAs, and increase of histone acetylation at mortem brain samples with larger sample sizes, assessing more regulatory elements of microglial genes. Finally, as a response to brain regions, and single-cell resolution, coupled to the

NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications 9 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1

a b FDR < 0.1 FDR < 0.1 * * 5 Acetylation ASD 5 Acetylation 4 Down ASD up 0.021 0.047 0.033 − Distal log 10

Acetylation 3 ( 0.0066 0.0018 P ASD 0.037 −value) Up (*) (*) 4 2 Acetylation Expression ASD up ASD 0.033 1 Down Promoter 0

Expression − 2.5e−06 3 log ASD 0.02 0.027 (*) 10 IBD

Up ( P +PGC −value) Methylation ASD PGC Alzheimer ASD iPSYCH Gene body ASD iPSYCH ASD 2 Down c SNURF d mRNA.M4 Methylation C12orf43 EFNA5 Gene body PARG NCOA7 Regulation of synaptic ASD vesicle priming Up SPRYD7 TBR1 MAP1A C9orf91 Methylation 1 Vesicle fusion to Promoter KIAA1279 RP13 514E23.1 EPB41L1 NOVA1 plasma membrane ASD Synaptic vesicle Down ZNF280B MAGED1 STXBP1 OPTN fusion to presynaptic Methylation active zone membrane Promoter 0.0063 0.029 0 SPTAN1 STX1B DMTN ZER1 Cytosolic transport ASD (*) Up KLC2 Positive regulation of CACNG8 SRRM4 neurotransmitter TMEM130 IBD secretion AGK PRKACA +PGC ASD PGC Alzheimer FSTL4 SYN1 0.0 0.5 1.0 1.5 2.0 2.5 ASD iPSYCH ASD iPSYCH − log10(FDR corrected p−value) HMG20A

e Chromosome 8 f Chromosome 16

21.8 mb 22 mb 30.85 mb 30.95 mb

21.9 mb 30.9 mb 31 mb NPM2 FAM160B2 REEP4 BMP1 AC106782.20 FBXL19

FGF17 LGI3 BCL7C HSD3B7

NUDT18 FBXL19-AS1 ORAI3 STX1B

DMTN HR ENSEMBL MIR4519 SETD1A

SFTPC CTF1

CTF2P eQTL

Hi-C NeuN+ Hi-C NeuN+

Hi-C NeuN- Hi-C NeuN- Hi-C Bulk Hi-C Bulk DAR DAR ASD up D ASD up

Fig. 5 ASD genetic risk variant enrichments. a Partitioned heritability enrichments for ASD, Alzheimer’s, and IBD GWAS in differentially expressed, methylated, or acetylated regions of the genome. Uncorrected p-values < 0.05 are shown. P-values were also adjusted for multiple testing by FDR correction and those enrichments with adjusted p-value < 0.1 are marked with asterisks. b Partitioned heritability enrichments for ASD, Alzheimer’s, and IBD GWAS in differentially acetylated regions of the genome within, or distal to, gene promoters. Uncorrected p-values < 0.05 are shown. P-values were also adjusted for multiple testing by FDR correction and those enrichments with adjusted p-value < 0.1 are marked with asterisks. c Top 30 hub genes and 300 connections for co-expression module mRNA.M4. d Top gene ontology enrichments for co-expression module mRNA.M4. Ontology enrichments were calculated by g:Profiler with FDR corrected p-values. e, f Genomic region around DMTN e and STX1B f. ASD-associated hyperacetylated regions are shown along with eQTL and Hi-C linkages to the gene TSS. availability of comprehensive phenotypic data will be needed to For mRNA expression, we used the quantification of log2 RPKM values for further characterize the regional and cellular specificity of mole- 16310 genes in 82 ASD samples and 74 control samples from 47 ASD and 44 9 cular changes in ASD. control brains from Parikshak et al. . These RPKM values were normalized for gene length and GC content using CQN38, but not adjusted for technical or biological covariates. We downloaded the differential mRNA gene expression summary statistics and mRNA co-expression network module definitions. We Methods calculated principal components (SeqStatPCs) to summarize the following Initial processing of datasets. All molecular datasets came from a cohort of 48 sequencing statistics: log10(TotalReads.picard), log10(Aligned.Reads.picard), log10 ASD and 45 control brains from the Harvard Autism Tissue Program [https:// (HQ.Aligned.Reads.picard), log10(PF.All.Bases.picard), log10(Coding.Bases. hbtrc.mclean.harvard.edu/] and NIH Neuro Brain Bank [http://www.medschool. picard), log10(UTR.Bases.picard), log10(Intronic.Bases.picard), log10(Intergenic. umaryland.edu/btbank/]. Initial analyses from the transcriptomic and epigenomic bases.picard), Median.CV.Coverage.picard, Median.5prime.Bias.picard, datasets have been previously published9–12. We restricted this study to only those Median.3prime.Bias.picard, Median.5to3prime.Bias.picard, AT.Dropout.picard, samples originating from the frontal or temporal cortex. In the epigenomic GC.Dropout.picard, and PropExonicReads.HTSC. fi studies10,11, samples from the Oxford and MRC London Brain Banks were also For miRNA expression, we used the quanti cation of log2 read counts mapping present, however, we removed these samples because we did not have tran- to 699 mature miRNAs in 60 ASD samples and 42 control samples from 39 ASD 12 scriptomic data for them. and 28 control brains from Wu et al. . To balance the case/control cohorts with

10 NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 ARTICLE

miRNA dysregulation DNA hypomethylation

Prom.lightgreen ( Yellow ) ( Brown ) ( GB.darkgreen )

M16 M19 M15 / M19 ASD risk variants Downregulation of Upregulation of neuronal gene expression immuno-glial gene expression

M1 : Neurogenesis M9 : Metabolic processes M3 : Mitochondrial energetics M23 : Cilium signaling M4 : Synaptic vesicle transport M19 : Activated microglia M10 : Glycolytic energetics M15 : Glial maturation M16 : Synaptic signaling / learning / memory M20 : Developmental patterning M17 : Synaptic signaling / cell adhesion M21 : Ribosomal translation

M4 M9 Key: Microglia Downregulates expression

Upregulates expression

Downregulated module

Upregulated module Histone acetylation dysregulation

Fig. 6 Schematic model of molecular dysregulation in ASD. Integration across genetic, transcriptomic, and epigenomic levels in ASD. ASD risk variants primarily act with respect to neuronal genes, which are broadly down-regulated, likely indirectly, including via micro-RNA (e.g. yellow module). Other features, including another microRNA module (brown), and DNA hypomethylation are predicted to be compensatory or secondary and are associated with up-regulation of glial-immune genes. Histone acetylation patterns show a complex relationship, with some predicted to reflect an attempt to compensate for changes in gene expression (e.g. M4), while others are predicted to be likely driving the changes (microglia). Blue bar-headed and red arrows correspond to regulatory mechanisms predicted to decrease and increase gene expression, respectively. respect to age, we removed all samples from younger individuals (Age ≤10) leaving SNF, we adjusted the datasets to remove the influence of technical and biological us with 49 ASD samples and 42 control samples from 31 ASD and 28 control covariates. For mRNA expression, we fit a linear model for each gene: Expression ~ brains. These read counts were normalized for mature miRNA GC content using Diagnosis + Age + Sex + Region + RIN + Brain bank + Sequencing batch + seq- CQN38 and batch effects using ComBat39, but not adjusted for technical or StatPC1 + seqStatPC2 + seqStatPC3 + seqStatPC4 + seqStatPC5 and regressed out biological covariates. We downloaded the differential miRNA expression summary the effect of all covariates except Diagnosis. For miRNA expression, we fit a linear statistics and miRNA co-expression network module definitions. model for each miRNA transcript: Expression ~ Diagnosis + Age + Sex + Region For DNA methylation, we used the quantification of methylation beta values for + RIN + Brain bank + Proportion of reads mapping to exons + log10(Sequencing 417460 CpG probes across the genome in 74 ASD samples and 42 control samples depth) + PMI and regressed out the effect of all covariates except Diagnosis. For from 42 ASD and 27 control brains from Wong et al.11. To balance the case/control DNA methylation, we fit a linear model for each gene promoter and gene body: cohorts with respect to age, we removed all samples from younger individuals (Age Methylation ~ Diagnosis + Age + Sex + Region + Brain bank + Batch + CET and ≤10) leaving us with 56 ASD samples and 41 control samples from 33 ASD and 26 regressed out the effect of all covariates except Diagnosis. For histone acetylation, control brains. Probe quantifications were normalized using wateRmelon40. For we fit a linear model for each H3K27ac peak: Acetylation ~ Diagnosis + Age + Sex each sample, the CET score was calculated41, which is the proportion of neuronal + Region + Brain bank + CET + Fraction of reads in peaks + Duplicate read vs glial cells. For samples where we had expression and/or acetylation data but did fraction + Aligned read fraction and regressed out the effect of all covariates except not have methylation data, we assigned their CET score as the average CET score. Diagnosis. We collapsed the probe-level measurements to gene promoters and gene bodies by To identify ASD molecular subtypes, we restricted each dataset to its differential taking the average methylation level of all probes mapping onto gene promoters features between ASD and control samples. For mRNA expression, we restricted (2KB upstream of TSS to TSS) and gene bodies (TSS to transcription end site the genes to 2591 differentially expressed genes at an FDR < 10% from the (TES)) using gencode v19 annotations42. We only kept gene promoters or gene idiopathic ASD vs control analysis of Parikshak et al.9. For miRNA expression, we bodies that contained 2 or more CpG probes. We downloaded the list of restricted the miRNA transcripts to 92 differentially expressed miRNAs at an FDR differentially methylated probes and co-methylation network identified from the < 10% from Wu et al.12. For DNA methylation, we ran an initial differential cross-cortex analysis of Wong et al.11. These probes were assigned to gene methylation analysis looking at all ASD vs control samples (for details, see below promoters and gene bodies as described above. ASD vs control differential molecular analyses) and restricted the genes to 2578 For histone acetylation, we downloaded fastq files for 257 H3K27ac ChIP-seq differentially methylated promoters at an FDR < 10%. For histone acetylation, we samples as well as input samples from frontal cortex, temporal cortex, and ran an initial differential acetylation analysis looking at all ASD vs control samples cerebellum from Sun et al.10 (https://www.synapse.org/#!Synapse:syn8104916). We (for details, see below ASD vs control differential molecular analyses) and restricted mapped reads from each sample onto the hg19 reference genome using BWA- the peaks to 2156 differentially acetylated peaks at an FDR < 20%. MEM43 with default parameters. We removed duplicate reads using Picard tools We ran SNF on the four adjusted and restricted datasets using the SNFtool [http://broadinstitute.github.io/picard/]. We utilized 56503 consensus H3K27ac package in R [https://cran.r-project.org/web/packages/SNFtool/index.html]. For peaks in the genome from Sun et al.10 and quantified the levels of each consensus each dataset, we normalized the values of each feature using the peak in each sample by counting the number of overlapping reads and dividing by standardNormalization function in SNF. For each dataset, we calculated a sample- the library size (in millions of reads). The log2 normalized peak quantifications sample Euclidean distance using the dist2 function and used this distance to were used in further analyses. As a Q/C check, we used phantompeakqualtools44 to calculate a sample-sample affinity matrix using the affinityMatrix function with calculate ChIP-seq cross-correlation statistics. We removed samples failing Q/C: parameters: K = 20 and alpha = 0.5. We generated a fused affinity matrix those with total reads <10,000,000, read alignment fraction <75%, read duplication combining all 4 affinity matrices using the SNF function with parameters: K = 20 fraction >30%, normalized strand coefficient (NSC) <1.03, relative strand and T = 15 and then used the spectralClustering function to demarcate the fused correlation (RSC) <0.5, or fraction of reads in peaks <11%. We also removed all affinity matrix into 2 clusters of samples. Alternatively, we also used the cerebellum samples and samples originating from the Oxford or MRC London symnmf_newton function in Matlab from the symNMF package45 to cluster the Brain Banks. This left us with a dataset of 56 ASD samples and 48 control samples fused affinity matrix. from 35 ASD and 33 control brains. Classification of samples into the two SNF clusters. For 61 ASD and 61 control Similarity network fusion to identify molecular subtypes. For 30 ASD and 17 samples that were missing from at least one of the four datasets, we built logistic control samples present in all four datasets, we used SNF17 to cluster samples regression models to classify them into one of the two clusters identified by SNF. together based on their relationships across the four data types. Before running For each of the four datasets, we calculated the sample loadings on the first

NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications 11 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 principal component (PC1) of its differential features used as input to SNF. These Prediction of miRNA target genes. We predicted mRNA target genes for each PC1 loadings were transformed into Z-scores and used as predictors in the models. miRNA using TargetScan v7.227. We downloaded 3′ UTR sequences of human We built a total of twelve different logistic regression models. Three models genes and miRNA family information from the TargetScan database. For miRNAs were for samples present in three datasets: mRNA/miRNA/DNA methylation, that were identified in the previous publication12 and not present in the TargetScan mRNA/miRNA/histone acetylation, and mRNA/DNA methylation/histone default predictions, we manually curated their family conservation by visually acetylation. Six models were for samples present in two datasets: mRNA/miRNA, inspecting the multiz 46-way vertebrate alignment at their genomic locus in the mRNA/DNA methylation, mRNA/histone acetylation, miRNA/DNA methylation, hg19 human assembly of the UCSC genome browser49. For each putative miRNA- miRNA/histone acetylation, and DNA methylation/histone acetylation. Finally, UTR target site, TargetScan calculates a context++ score which takes into account three models were for samples present in one dataset: mRNA, DNA methylation, both evolutionary conservation and targeting efficiency. These context++ scores and histone acetylation. There were no samples that were present in miRNA/DNA were weighted based on affected isoform ratios. We took the top weighted context methylation/histone acetylation or miRNA only datasets. ++ score for each unique miRNA-UTR target pair. For each model, the training set was the 47 sample assignments identified by To assess enrichment of targets within miRNA co-expression modules, we first SNF. In training the model, the response variable was either 0 (for SNF group 1) or filtered for the top 25% miRNAs by connectivity (module hubs) within each 1 (for SNF group 2) and the predictors were the sample differential Z-scores. We module (kME ≥0.84, 0.82, and 0.57 for the brown, magenta, and yellow modules, performed exhaustive leave one out validation on the training set by leaving each respectively) and identified the strongest targets with a context++ score ≤−0.05 sample out, training the model with the remaining samples and predicting the for these hub genes. response of the held-out sample. For each model, we chose a cutoff to distinguish between the two groups that maximized the cross-validation accuracy. We classified the test samples by running them through the model and applying the Assignment of H3K27ac regions to cognate gene. We assigned H3K27ac chosen cutoff. regions within promoter regions (2KB upstream of TSS to TSS) to their proximal gene. For H3K27ac regions that did not lie within a gene promoter, we assigned them to their cognate gene using adult brain eQTL data and Hi-C data from bulk 31 + Outlier removal. Before running differential analyses and Co-expression/co- adult brain tissue as well as Hi-C data from sorted NeuN and NeuN- cells from methylation network analysis (WGCNA), we removed outlier samples from each of adult brain tissue (Synapse accession number: syn10248174 for NeuN-, + the four datasets. For each dataset, we calculated sample-sample correlations and syn10248215 for NeuN ). For eQTL data, we assigned a H3K27ac region to a gene removed samples with a signed Z-score >346. if the eSNP resided within the H3K27ac peak. For Hi-C data, we assigned a H3K27ac region to a gene if the promoter of that gene physically interacted with a region containing the H3K27ac peak at an FDR < 1%. We assessed all possible ASD vs control differential molecular analyses. For mRNA expression, we ran H3K27ac region to gene linkages even if they were not consistent between eQTL differential expression analysis by fitting a linear mixed effect model for each gene: and Hi-C datasets. Expression ~ Diagnosis + Age + Sex + Region + RIN + Brain bank + Sequencing batch + seqStatPC1 + seqStatPC2 + seqStatPC3 + seqStatPC4 + seqStatPC5 as Enrichment analyses. We downloaded post-mortem brain single nucleus gene fixed effects, and brainID as a random effect. For miRNA expression, we ran expression data from Hodge et al.22. The count data were normalized using log2 differential expression analysis by fitting a linear mixed effect model for each (CPM + 1). To identify markers of neuronal cell types, we ran differential miRNA transcript: Expression ~ Diagnosis + Age + Sex + Region + RIN + Brain expression analyses for a particular cell cluster against all other clusters when bank + Proportion of reads mapping to exons + log10(Sequencing depth) + PMI restricting the dataset to inhibitory neurons or excitatory neurons separately. as fixed effects, and brainID as a random effect. For DNA methylation, we ran Differential expression analyses were run in R using a linear model: expression ~ differential methylation analysis by fitting a linear mixed effect model for each cluster membership. For each cluster, we identified markers as those genes with an promoter and each gene body: Methylation ~ Diagnosis + Age + Sex + Region + FDR corrected P-value < 0.05 and a log2(fold change) >0.75. Brain bank + Batch + CET as fixed effects, and brainID as a random effect. For We downloaded cell type markers for neurons, astrocytes, oligodendrocytes, histone acetylation, we ran differential acetylation analysis by fitting a linear mixed endothelial cells, and microglia from Zhang et al.50. We downloaded microglial cell effect model for each H3K27ac peak: Acetylation ~ Diagnosis + Age + Sex + type-specific markers (fold change ≥1) from Hammond et al.25. We downloaded Region + Brain bank + CET + Fraction of reads in peaks + Duplicate read frac- markers of microglial activation from Hirbec et al.26. For module cell type tion + Aligned read fraction as fixed effects, and brainID as a random effect. For all enrichments, enrichments of co-expression vs co-methylation modules, and differential analyses, we ran them separately for ASD Convergent subtype vs enrichment with orthogonal gene lists we used logistic regression to test whether control samples and ASD Disparate subtype vs control samples. For DNA gene set 1 ~ gene set 2 using a background set of genes shared between study 1 and methylation and histone acetylation, we also ran an initial analysis looking at all study 2. ASD vs control samples before running SNF and sample classification. We used the For expression and methylation gene ontology enrichments, we used the g: R package nlme to fit linear mixed models [https://cran.r-project.org/web/ Profiler51 package in R with parameters: correction_method = fdr, max_set_size = packages/nlme/index.html]. 1000, and hier_filtering = moderate. We performed ordered queries with genes ordered by fold change for differential expression and methylation or by connectivity to the module eigengene (kME) for co-expression and co-methylation Co-expression/co-methylation network analysis. For mRNA and miRNA modules. For acetylation gene ontology enrichments, we used GREAT28 [http:// expression, we used the co-expression networks defined in the previous studies9,12. fi great.stanford.edu/public/html/index.php] with the default basal plus extension To identify ASD-associated modules, we t a linear mixed effect model for each co- association rule setting. expression module. For mRNA modules: module eigengene ~ Diagnosis + Age + Sex + Region + RIN + Brain bank + Sequencing batch + seqStatPC1 + seq- StatPC2 + seqStatPC3 + seqStatPC4 + seqStatPC5 as fixed effects, and brainID as Partitioned heritability. We ran stratified LD-score regression33 to test for a random effect. For miRNA modules: module eigengene ~ Diagnosis + Age + Sex enrichment of common variant heritability from GWAS studies of ASD4,34, Alz- + Region + RIN + Brain bank + Proportion of reads mapping to exons + log10 heimer’s disease36, and Inflammatory bowel disease35 in genomic regions of (Sequencing depth) + PMI as fixed effects, and brainID as a random effect. We ran interest. We downloaded the full baseline model of 53 functional categories these analyses separately for ASD Convergent subtype vs control samples and ASD [https://github.com/bulik/ldsc/wiki/Partitioned-Heritability] and included them Disparate subtype vs control samples. with each calculation of partitioned heritability. For differentially expressed, dif- For DNA methylation, we generated co-methylation networks separately for ferentially methylated, and co-expression/co-methylation modules, we defined both gene promoters and gene bodies. First, we set a linear model for each gene their genomic regions as each gene body ±10 KB. For differentially acetylated promoter and gene body as: Methylation ~ Diagnosis + Age + Sex + Region + regions, we defined the genomic region as each H3K27ac peak ±1 KB. Brain bank + Batch + CET and regressed out the effect of Brain bank. We generated networks with robust consensus WGCNA (rWGCNA)47 using the WGCNA package in R48. We used a soft threshold power of 9 for gene promoters Comparison to transcriptomic signatures of other disorders. We corrected and 8 for gene bodies. We created 100 topological overlap matrices (TOMs) using mRNA expression data from ASD Disparate subtype and control samples for 100 independent bootstraps of the samples with parameters: type = signed and technical and biological covariates by fitting a linear model for each gene: corFnc = bicor. The 100 TOMs were combined edge-wise by taking the median of Expression ~ Diagnosis + Age + Sex + Region + RIN + Brain bank + Sequencing each edge across all bootstraps. The consensus TOM was clustered hierarchically batch + seqStatPC1 + seqStatPC2 + seqStatPC3 + seqStatPC4 + seqStatPC5 and using average linkage hierarchical clustering (using 1 – TOM as a dissimilarity regressing out the effect of all covariates except Diagnosis. We downloaded dif- measure). The topological overlap dendrogram was used to define modules using ferentially expressed genes at an FDR < 5% for ASD, Schizophrenia, Bipolar dis- the cutreeHybrid() function with parameters: mms = 100, ds = 4, merge threshold order, Major depressive disorder, and Alcoholism from a cross-disorder analysis of of 0.1, and negative pamStage. To identify ASD-associated modules, we fit a linear neuropsychiatric disorders21. For each disorder, we calculated the first principal mixed effect model for each co-methylation module: module eigengene ~ component on the corrected expression when restricting to genes differentially Diagnosis + Age + Sex + Region + Brain bank + Batch + CET as fixed effects, and expressed in that disorder. We checked for differential loading between ASD brainID as a random effect. We ran these analyses separately for ASD Convergent Disparate subtype samples and control samples using a two-sided Mann–Whitney subtype vs control samples and ASD Disparate subtype vs control samples. U test.

12 NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1 ARTICLE

Assessment of transcriptome in other cortical regions. We previously 7. Weiner, D. J. et al. Polygenic transmission disequilibrium confirms that sequenced 87 samples in 4 additional cortical regions (BA4-6, BA7, BA17, BA38) common and rare variation act additively to create risk for autism spectrum from the individuals in this study21 (Synapse accession number syn11242290). We disorders. Nat. Genet 49, 978–985 (2017). mapped sequencing reads onto the hg19 genome using STAR52 and calculated 8. Gupta, S. et al. Transcriptome analysis reveals dysregulation of innate immune RNA-seq quality control metrics using PicardTools [http://broadinstitute.github.io/ response genes and neuronal activity-dependent genes in autism. Nat. picard/]. We quantified gene expression using RSEM53 with gencode v25 anno- Commun. 5, 5748 (2014). tations42. We corrected the expression data for technical and biological covariates 9. Parikshak, N. N. et al. Genome-wide changes in lncRNA, splicing, and by fitting a linear mixed model for each gene: Expression ~ Region + Batch + Age regional gene expression patterns in autism. Nature 540, 423–427 (2016). + + + + + + Sex Diagnosis Ancestry_Genotype PMI RIN picard_rnaseq. 10. Sun, W. et al. Histone acetylome-wide association study of autism spectrum + + PCT_CORRECT_STRAND_READS picard_rnaseq.PCT_MRNA_BASES disorder. Cell 167, 1385–1397 e11 (2016). + + picard_gcbias.AT_DROPOUT star.multimapped_percent picard_alignment. 11. Wong, C. C. Y. et al. Genome-wide DNA methylation profiling identifies + + PCT_CHIMERAS star.multimapped_toomany_percent picard_insert.MED- convergent molecular signatures associated with idiopathic and syndromic + + IAN_INSERT_SIZE picard_rnaseq.PCT_INTERGENIC_BASES picard_rna- autism in post-mortem human brain tissue. Hum. Mol. Genet. 28, 2201–2211 seq.MEDIAN_5PRIME_BIAS + picard_rnaseq.PCT_UTR_BASES + star.num_A- + + + (2019). TAC_splices star.num_GCAG_splices star.num_splices star.avg_mappe- 12. Wu, Y. E., Parikshak, N. N., Belgard, T. G. & Geschwind, D. H. Genome-wide, d_read_length + Age_sqd + picard_alignment.PCT_PF_READS_ALIGNED_sqd integrative analysis implicates microRNA dysregulation in autism spectrum + picard_rnaseq.PCT_CORRECT_STRAND_READS_sqd + star.avg_mappe- disorder. Nat. Neurosci. 19, 1463–1476 (2016). d_read_length + star.num_ATAC_splices_sqd + star.num_annotated_splices_sqd 13. Voineagu, I. et al. Transcriptomic analysis of autistic brain reveals convergent + star.num_GCAG_splices_sqd as fixed effects and subject as a random effect. We molecular pathology. Nature 474, 380–384 (2011). regressed out the effect of all technical covariates, which created an expression dataset containing the effects of only biological covariates (subject, diagnosis, 14. Nardone, S., Sams, D. S., Zito, A., Reuveni, E. & Elliott, E. Dysregulation of cortical neuron DNA methylation profile in autism spectrum disorder. Cereb. region, age, age squared, sex, and ethnicity). – For each region, we calculated the sample loadings across the first principal Cortex 27, 5739 5754 (2017). component (PC1) of gene expression for 2591 differentially expressed genes at an 15. Zhang, W. et al. Integrating genomic, epigenomic, and transcriptomic features FDR < 10% from the idiopathic ASD vs control analysis of Parikshak et al.9. These reveals modular signatures underlying poor prognosis in ovarian cancer. Cell – PC1 loadings were then transformed into Z-scores. Rep. 4, 542 553 (2013). 16. Woo, H. G. et al. Integrative analysis of genomic and epigenomic regulation of the transcriptome in liver cancer. Nat. Commun. 8, 839 (2017). Reporting summary. Further information on research design is available in the Nature 17. Wang, B. et al. Similarity network fusion for aggregating data types on a Research Reporting Summary linked to this article. genomic scale. Nat. Methods 11, 333–337 (2014). 18. Giorgi, F. M., Del Fabbro, C. & Licausi, F. Comparative study of RNA-seq- Data availability and microarray-derived coexpression networks in Arabidopsis thaliana. – Full range of values underlying heatmaps in Figs. 2f, i, 4e, f, 5a, b and Supplementary Bioinformatics 29, 717 724 (2013). Figs. 7e, g, 8e, f, 9f, g, 10f, g, 11a–d, 13a–c are provided as a Source Data File. Harvard 19. Chen, L. et al. Genetic drivers of epigenetic and transcriptional variation in – Autism Tissue Program [https://hbtrc.mclean.harvard.edu/], NIH Neuro Brain Bank human immune cells. Cell 167, 1398 1414 e24 (2016). [http://www.medschool.umaryland.edu/btbank/], Oxford Brain Bank [https://www.hra. 20. Friedman, R. C., Farh, K. K., Burge, C. B. & Bartel, D. P. Most mammalian – nhs.uk/planning-and-improving-research/application-summaries/research-summaries/ mRNAs are conserved targets of microRNAs. Genome Res. 19,92 105 (2009). the-oxford-brain-bank/], MRC London Brain Bank [https://brainbanknetwork.cse.bris. 21. Gandal, M. J. et al. Shared molecular neuropathology across major psychiatric – ac.uk/], Raw data for mRNA, miRNA, DNA methylation, and H3K27ac from ASD and disorders parallels polygenic overlap. Science 359, 693 697 (2018). control brains (Synapse.org accession number syn4587609), Gencode [https://www. 22. Hodge, R. D. et al. Conserved cell types with divergent features in human – gencodegenes.org/], hg19 genome [http://genome.ucsc.edu/], TargetScan DB [http:// versus mouse cortex. Nature 573,61 68 (2019). fi fi www.targetscan.org/cgi-bin/targetscan/data_download.vert72.cgi], Psychencode eQTL 23. Velmeshev, D. et al. Single-cell genomics identi es cell type-speci c molecular – and Hi-C datasets [http://resource.psychencode.org/] (Synapse.org accession number: changes in autism. Science 364, 685 689 (2019). syn10248174 for NeuN-, syn10248215 for NeuN+), LD-score regression model [https:// 24. Ballas, N., Grunseich, C., Lu, D. D., Speh, J. C. & Mandel, G. REST and its github.com/bulik/ldsc/wiki/Partitioned-Heritability], Gene expression of ASD and corepressors mediate plasticity of neuronal gene chromatin throughout – control brains from other cortical regions (Synapse.org accession number syn11242290). neurogenesis. Cell 121, 645 657 (2005). All other relevant data supporting the key findings of this study are available within the 25. Hammond, T. R. et al. Single-cell RNA sequencing of microglia throughout article and its Supplementary Information files or from the corresponding author upon the mouse lifespan and in the injured brain reveals complex cell-state changes. – reasonable request. A reporting summary for this article is available as a Supplementary Immunity 50, 253 271 e6 (2019). Information file. Source data are provided with this paper. 26. Hirbec, H. et al. The microglial reaction signature revealed by RNAseq from individual mice. Glia 66, 971–986 (2018). 27. Agarwal, V., Bell, G. W., Nam, J. W. & Bartel, D. P. Predicting effective Code availability microRNA target sites in mammalian mRNAs. Elife. 4, e05005 (2015). Underlying R code to run SNF clustering and ASD/Control differential analyses is 28. McLean, C. Y. et al. GREAT improves functional interpretation of cis- available at [https://github.com/dhglab/ASD-Integration-Subtypes-Manuscript]. regulatory regions. Nat. Biotechnol. 28, 495–501 (2010). 29. Karlic, R., Chung, H. R., Lasserre, J., Vlahovicek, K. & Vingron, M. Histone fi Received: 6 February 2020; Accepted: 27 August 2020; modi cation levels are predictive for gene expression. Proc. Natl Acad. Sci. USA 107, 2926–2931 (2010). 30. Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013). 31. Wang, D. et al. Comprehensive functional genomic resource and integrative model for the human brain. Science. 362, eaat8464 (2018). 32. Azevedo, F. A. et al. Equal numbers of neuronal and nonneuronal cells make References the human brain an isometrically scaled-up primate brain. J. Comp. Neurol. 1. Chen, J. A., Penagarikano, O., Belgard, T. G., Swarup, V. & Geschwind, D. H. 513, 532–541 (2009). The emerging picture of autism spectrum disorder: genetics and pathology. 33. Finucane, H. K. et al. Partitioning heritability by functional annotation using – Annu. Rev. Pathol. 10, 111 144 (2015). genome-wide association summary statistics. Nat. Genet 47, 1228–1235 2. Ramaswami, G. & Geschwind, D. H. Genetics of autism spectrum disorder. (2015). – Handb. Clin. Neurol. 147, 321 329 (2018). 34. Autism Spectrum Disorders Working Group of The Psychiatric Genomics, C. 3. de la Torre-Ubieta, L., Won, H., Stein, J. L. & Geschwind, D. H. Advancing the Meta-analysis of GWAS of over 16,000 individuals with autism spectrum understanding of autism disease mechanisms through genetics. Nat. Med. 22, disorder highlights a novel locus at 10q24.32 and a significant overlap with – 345 361 (2016). schizophrenia. Mol. Autism 8, 21 (2017). fi 4. Grove, J. et al. Identi cation of common genetic risk variants for autism 35. Liu, J. Z. et al. Association analyses identify 38 susceptibility loci for – spectrum disorder. Nat. Genet. 51, 431 444 (2019). inflammatory bowel disease and highlight shared genetic risk across 5. Iossifov, I. et al. The contribution of de novo coding mutations to autism populations. Nat. Genet. 47, 979–986 (2015). – spectrum disorder. Nature 515, 216 221 (2014). 36. Lambert, J. C. et al. Meta-analysis of 74,046 individuals identifies 11 new 6. Leppa, V. M. et al. Rare inherited and de novo cnvs reveal complex susceptibility loci for Alzheimer’s disease. Nat. Genet. 45, 1452–1458 (2013). contributions to ASD risk in multiplex families. Am. J. Hum. Genet. 99, 37. Hannon, E., Marzi, S. J., Schalkwyk, L. S. & Mill, J. Genetic risk variants for brain – 540 554 (2016). disorders are enriched in cortical H3K27ac domains. Mol. Brain 12, 7 (2019).

NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications 13 ARTICLE NATURE COMMUNICATIONS | https://doi.org/10.1038/s41467-020-18526-1

38. Hansen, K. D., Irizarry, R. A. & Wu, Z. Removing technical variability in R01MH094714), G.R. (NIMH 1F32MH114620), H.W. (NIMH R00MH113823 and RNA-seq data using conditional quantile normalization. Biostatistics 13, NARSAD Young Investigator Award), M.J.G. (SFARI Bridge to Independence Award, 204–216 (2012). NIMH R01MH121521, and NIMH R01MH123922), S.P. (Core funds from A*STAR 39. Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray Singapore), and J.M. (Medical Research Council R005176 and K013807). expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007). Author contributions 40. Pidsley, R. et al. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14, 293 (2013). G.R. and D.H.G. planned the analyses and wrote the paper with assistance from all 41. Guintivano, J., Aryee, M. J. & Kaminsky, Z. A. A cell epigenotype specific authors. The analyses were performed by G.R. with help from H.W., M.J.G., J.H. and J.C. model for the correction of brain cellular heterogeneity bias and its W. H.W. processed all Hi-C datasets. C.C.Y.W. and J.M. generated, normalized, and application to age, brain region and major depression. Epigenetics 8,290–302 provided guidance on the DNA methylation dataset. W.S. and S.P. generated and pro- (2013). vided guidance on the histone acetylation dataset. 42. Harrow, J. et al. GENCODE: the reference annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012). Competing interests 43. Li, H. Aligning sequence reads, clone sequences and assembly contigs with The authors declare no competing interests. BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013). 44. Landt, S. G. et al. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia. Genome Res. 22, 1813–1831 (2012). Additional information 45. Kuang, D., Yun, S. & Park, H. SymNMF: nonnegative low-rank Supplementary information is available for this paper at https://doi.org/10.1038/s41467- approximation of a similarity matrix for graph clustering. J. Glob. Optim. 62, 020-18526-1. 545–574 (2015). 46. Oldham, M. C., Langfelder, P. & Horvath, S. Network methods for describing Correspondence and requests for materials should be addressed to D.H.G. sample relationships in genomic datasets: application to Huntington’s disease. BMC Syst. Biol. 6, 63 (2012). Peer review information Nature Communications thanks Valerie Hu and the other, 47. Zhang, B. & Horvath, S. A general framework for weighted gene co-expression anonymous, reviewer(s) for their contribution to the peer review of this work. Peer network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005). reviewer reports are available. 48. Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinform. 9, 559 (2008). Reprints and permission information is available at http://www.nature.com/reprints 49. Haeussler, M. et al. The UCSC Genome Browser database: 2019 update. ’ Nucleic Acids Res. 47, D853–D858 (2018). Publisher s note Springer Nature remains neutral with regard to jurisdictional claims in fi 50. Zhang, Y. et al. Purification and characterization of progenitor and mature published maps and institutional af liations. human astrocytes reveals transcriptional and functional differences with mouse. Neuron 89,37–53 (2016). 51. Reimand, J., Kull, M., Peterson, H., Hansen, J. & Vilo, J. g:Profiler–a web- Open Access This article is licensed under a Creative Commons based toolset for functional profiling of gene lists from large-scale Attribution 4.0 International License, which permits use, sharing, experiments. Nucleic Acids Res. 35, W193–W200 (2007). adaptation, distribution and reproduction in any medium or format, as long as you give 52. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, appropriate credit to the original author(s) and the source, provide a link to the Creative – 15 21 (2013). Commons license, and indicate if changes were made. The images or other third party fi 53. Li, B. & Dewey, C. N. RSEM: accurate transcript quanti cation from RNA-Seq material in this article are included in the article’s Creative Commons license, unless data with or without a reference genome. BMC Bioinform. 12, 323 (2011). indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from Acknowledgements the copyright holder. To view a copy of this license, visit http://creativecommons.org/ We thank Neelroop Parikshak and Ye Emily Wu for assistance in obtaining datasets, licenses/by/4.0/. William Pembroke and Damon Polioudakis for assistance with figures, as well as members of the Geschwind lab for stimulating discussions. Funding for this work was provided by grants to D.H.G (NIMH R01MH110927, U01MH115746 and © The Author(s) 2020

14 NATURE COMMUNICATIONS | (2020) 11:4873 | https://doi.org/10.1038/s41467-020-18526-1 | www.nature.com/naturecommunications