<<

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Post-transcriptionally impaired de novo mutations

2 contribute to the genetic etiology of four neuropsychiatric

3 disorders

4 5 Fengbiao Mao1,2¶, Lu Wang3¶, Xiaolu Zhao2, Zhongshan Li4, Luoyuan Xiao5, 6 Rajesh C. Rao2, Jinchen Li4, Huajing Teng1*, Xin He6*, and Zhong Sheng Sun1,4* 7 8 1 Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, 9 China. 10 2 Department of Pathology, University of Michigan, Ann Arbor, MI 48109, USA. 11 3 Institute of Life Science, Southeast University, Nanjing 210096, China. 12 4 Institute of Genomic Medicine, Wenzhou Medical University, Wenzhou 325027, 13 China 14 5 Department of Computer Science and Technology, Tsinghua University, Beijing 15 100084, China. 16 6 Department of Genetics, University of Chicago, Chicago, IL, USA. 17 18 ¶These authors contributed equally to this work 19 * Corresponding authors 20 E-mail: 21 [email protected] (Z.S.S.) 22 [email protected] (X.H.) 23 [email protected] (H.T.) 24

25

1

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Abstract 2 While deleterious de novo mutations (DNMs) in coding region conferring risk in 3 neuropsychiatric disorders have been revealed by next-generation sequencing, the 4 role of DNMs involved in post-transcriptional regulation in pathogenesis of these 5 disorders remains to be elucidated. Here, we identified 1,736 post-transcriptionally 6 impaired DNMs (piDNMs), and prioritized 1,482 candidate in four 7 neuropsychiatric disorders from 7,748 families. Our results revealed higher 8 prevalence of piDNMs in the probands than in controls (P = 8.19×10-17), and 9 piDNM-harboring genes were enriched for epigenetic modifications and neuronal 10 or synaptic functions. Moreover, we identified 86 piDNM-containing genes 11 forming convergent co-expression modules and intensive -protein 12 interactions in at least two neuropsychiatric disorders. These cross-disorder genes 13 carrying piDNMs could form interaction network centered on RNA binding 14 , suggesting a shared post-transcriptional etiology underlying these 15 disorders. Our findings illustrate the significant contribution of piDNMs to four 16 neuropsychiatric disorders, and lay emphasis on combining functional and 17 network-based evidences to identify regulatory causes of genetic disorders.

18

19

20

2

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Introduction

2 Next-generation sequencing, which allows genome-wide detection of rare and de 3 novo mutations (DNMs), is transforming the pace of genetics of human disease by 4 identifying protein-coding mutations that confer risk 1. Various computational 5 methods have been developed to predict the effects of amino acid substitutions on 6 protein function, and to classify corresponding mutations as deleterious or benign, 7 based on evolutionary conservation or protein structural constraints 2, 3. Beside the 8 effect on protein structure and function, genetic mutations involve in 9 transcriptional processes 4, 5 via direct or indirect effects on histone modifications 6, 10 and enhancers 7 to affect pathogenesis of diseases. However, the majorities of 11 mutation are located in non-coding regions, and some of them have no relationship 12 with transcriptional regulation, but can lead to an observable phenotype or disease 13 8, suggesting the existence of another layer of regulatory effect of mutations. It has 14 been revealed that single nucleotide variants can alter RNA structure, known as 15 RiboSNitches, and depletion of RiboSNitches result in the alteration of specific 16 RNA shapes at thousands of sites, including 3'untranslated region, binding sites of 17 RBPs and microRNAs 9. Thus, the mutations can impair post-transcriptional 18 processes through disrupting the binding of micRNAs and RNA binding proteins 19 (RBPs) 10, 11, resulting in various human diseases. For example, a variant in the 3' 20 untranslated region of FMR1 decreases neuronal activity-dependent translation of 21 FMRP by disrupting the binding of HuR, leading to developmental delay in 22 patients 10. Some attempts have been undertaken to better understand the 23 interactions between mutations and binding of noncoding RNAs or RBPs. 24 Maticzka et al. developed a machine learning-based approach to predict protein 25 binding sites on RNA from crosslinking immunoprecipitation (CLIP) data using 26 both RNA structure and sequence features 12. Fukunaga et al. developed the CapR

3

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 algorithm based on the probability of secondary structure of an RNA for RBP 2 binding 13. However, identifying the network between single nucleotide mutations 3 and post-transcriptional regulation remains challenging because of the complexity 4 of the underlying interaction networks. Our and other’s methods named RBP-Var 14 5 and POSTAR15 represent initial efforts to systematically annotate post- 6 transcriptional regulatory maps, which hold great promise for exploring the effect 7 of single nucleotide mutations on post-transcriptional regulation in human diseases.

8 Increasing prevalence of neuropsychiatric disorders in children with unclear 9 etiology has been reported during the past three decades 16. Whole-exome 10 sequencing of pediatric neuropsychiatric disorders uncovered the critical role of 11 DNMs in the pathogenesis of these disorders 1. However, previous studies of these 12 disorders have focused on mutations in coding region 1, cis-regulation 17, 18, 13 epigenome 19, transcriptome 20, 21, and proteome 22, very few is known about the 14 effect of DNMs on post-transcriptional regulation. Recently, more attentions have 15 been paid on DNMs in regulatory elements and non-coding regions in 16 neurodevelopmental disorders as it is indispensable to combine functional and 17 evolutionary evidence to identify regulatory causes of genetic disorders 23, 24. Most 18 recently, a deep-learning-based framework illuminates involvement of noncoding 19 DNMs in synaptic transmission and neuronal development in autism spectrum 20 disorder 25. Therefore, it is imperative to identify the post-transcriptionally 21 regulation-disrupting DNMs related to pathology and clinical treatment of 22 neuropsychiatric disorders.

23 To test whether post-transcriptionally regulation-disrupting DNMs contribute 24 to the genetic architecture of psychiatric disorders, we collected whole exome 25 sequencing data from 7,748 core families (5,677 families were parent-probands 26 trios and 2,071 families were normal trios) and curated 9,519 de novo mutations

4

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 (6,996 DNMs in probands and 2,523 DNMs in controls) from four kinds of 2 neuropsychiatric disorders, including autism spectrum disorder (ASD), epileptic 3 encephalopathy (EE), intellectual disability (ID), schizophrenia (SCZ), as well as 4 unaffected control subjects (Supplementary Table 1). By employing our newly 5 updated workflow RBP-Var2 (Figure 1A, Supplementary Table 2) from our 6 previously developed RBP-Var 14, we investigated the potential impact of these de 7 novo mutations involved in post-transcriptional regulation in these four 8 neuropsychiatric disorders based on experimental data of genome-wide association 9 studies (GWAS), expression quantitative trait (eQTL), CLIP-seq derived 10 RBP binding sites, RNA editing and miRNA targets, and found that a subset of de 11 novo mutations could be classified as post-transcriptionally impaired DNMs 12 (piDNMs). These piDNMs showed significant enrichment in cases after correcting 13 for multiple testing, and genes hit by these piDNMs were further analyzed for their 14 properties and relative contribution to the etiology of neuropsychiatric disorders.

15 Results

16 The frequency of piDNMs is much higher in probands than that in 17 controls

18 To test whether specific subsets of regulatory DNMs contribute to the genetic 19 architecture of neuropsychiatric disorders, we devised and updated the method, 20 RBP-Var2 (http://www.rbp-var.biols.ac.cn/), based on experimental data of GWAS, 21 eQTL, CLIP-seq derived RBP binding sites, RNA editing and miRNA targets. 22 Subsequently, we used our updated workflow to identify functional piDNMs from 23 5,677 trios with 6,996 DNMs across four neuropsychiatric disorders as well as 24 2,071 unaffected controls with 2,523 DNMs (Supplementary Table 1). We 25 determined DNMs with 1/2 category score predicted by RBP-Var2 as piDNMs

5

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 when considering their impact on RNA secondary structure, the binding of 2 miRNAs and RBPs, and identified 1,736 piDNMs in probands (Supplementary 3 Table 3), of which 17,7,7,6 and 1,699 were located in 3' UTRs, 5' UTRs, ncRNA 4 exons, splicing sites and exons, respectively. In detail, RBP-Var2 identified 1,262 5 piDNMs in ASD, 281 piDNMs in SCZ, 101 piDNMs in EE, 92 piDNMs in ID and 6 354 piDNMs in healthy controls (Supplementary Table 3, 4). Interestingly, the 7 frequency of piDNMs in the four neuropsychiatric disorders were significantly 8 over-represented compared with those in controls (OR = 1.62, P = 8.19×10-17, 9 Table 1). We also observed that probands groups have much more abundant 10 piDNMs compared with controls in four kinds of neuropsychiatric disorders 11 (Figure 1B; Table 1). Dramatically, we found that synonymous piDNMs were 12 significantly enriched in probands in contrast to those in controls (P=9.73×10-4). 13 While in the data set of original DNMs before the evaluated by RBP-Var2, the 14 enrichment of the synonymous DNMs was not observed in cases, which is 15 consistent with previous study 1. To eliminate the effects of loss-of-function (LoF) 16 mutations, we filtered out those LoF mutations from all piDNMs and found the 17 non-LoF piDNMs also exhibited higher frequency in probands (P=2.36×10-14) 18 (Table 1, Figure 1C; Supplementary Figure 1). Our analysis found a subset of 19 DNMs, namely piDNMs, are enriched in probands and may contribute to the 20 pathogenesis of these disorders although the rate of all de novo synonymous 21 variants, which as a category, does not contribute significantly to risk for 22 neurodevelopmental disorders.

23 The piDNMs outperforms protein-disruptive DNMs in risk

24 prediction

6

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 To investigate the accuracy and specificity of DNMs in different regulatory 2 processes, we compared our tool with other three variant effect prediction tools, 3 including SIFT25, PolyPhen2 (PPH2)26 and RegulomeDB26. We found that the 4 frequency of the stop gain DNMs is higher in cases than in controls determined by 5 SIFT (P = 4.83×10-2), and higher frequency of nonsynonymous DNMs was 6 identified by PPH2 (P = 1.82×10-2) (Figure 2A, B). However, RegulomeDB 7 determined no significant higher frequency of functional DNMs in any functional 8 category (Figure 2C) in cases versus controls. In contrast, RBP-Var2 could 9 determine much more functional DNMs in the categories of frameshift (P = 10 1.38×10-3), nonsynonymous (P = 8.79×10-15), stopgain (P = 6.42×10-4) and 11 synonymous (P = 7.30×10-4) (Figure 2D). Then, we performed receiver operating 12 characteristic (ROC) analysis to systemically evaluate the sensitivity and 13 specificity of these four prediction methods. We found that area under curve (AUC) 14 value of SIFT, PPH2, RBP-Var2 and RegulomeDB are 78.27%, 76.57%, 82.89% 15 and 50.77%, respectively (Supplementary Figure 2), indicating that SIFT, PPH2, 16 and RBP-Var2 is more sensitive and specific than that of RegulomeDB with P 17 value 1.63×10-10, 2.40×10-8 and 2.51×10-60, respectively. In addition, the AUC 18 value of RBP-Var2 is higher than that of SIFT and PPH2 with P value 0.049 and 19 0.019, respectively. Intriguingly, RBP-Var2 could detect an additional 928 piDNMs 20 covering 665 genes that were regarded as benign DNMs by other three methods, 21 accounting for 25.27% of total 3,672 deleterious DNMs detected by all four tools 22 (Supplementary Figure 3A, B). Especially, the non-LoF piDNMs detected by RBP- 23 Var2 alone account for 52.8% of non-LoF piDNMs, while only 26.2% of non-LoF 24 piDNMs were classified to be deleterious predicted by both SIFT and Polyphen2 25 (Supplementary Figure 3C). The top three enriched ontology of these 665 26 genes were intracellular signal transduction (P = 7.41×10-6), organelle organization 27 (P = 8.90×10-6) and mitotic cell cycle (P = 2.06×10-5) (Supplementary Figure 3D;

7

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Supplementary Table 5), suggesting dysregulation involved in cell cycle and the 2 impaired signal transduction may contribute to diverse neural damage, thereby 3 trigger neurodevelopmental disorders27, 28. Therefore, the piDNMs detected by 4 RBP-Var2 are distinct, and may play significant roles in the post-transcriptional 5 processes of the development of neuropsychiatric disorders.

6 Genes hit by piDNMs are shared across four

7 neuropsychiatric disorders

8 Firslty, we identified 13 recurrent piDNMs, including seven piDNMs in ASD and 9 six piDNMs in ID (Figure 3A). Secondly, we identified 149 genes carrying at least 10 two piDNMs in all disorders, including 128 genes in ASD, three genes in EE, ten 11 genes in ID and eight in SCZ. Among these 149 genes, we identified 21 high risk 12 genes with P value < 1×10-2 derived from our previously published TADA program 13 (Transmission And De novo Association) 29 (Figure 3B). As our previous study 14 using the NPdenovo database demonstrated that DNMs predicted as deleterious in 15 the protein level are shared by four neuropsychiatric disorders30, we then wondered 16 whether there were common piDNMs among four neuropsychiatric disorders. By 17 comparing the genes harboring piDNMs across four disorders, we found 86 genes 18 significantly shared by at least two disorders rather than random overlaps 19 (permutation test, P <1.00×10-5 based on random resampling, Figure 3C, D). 20 Similar results have been observed for the overlap between the cross-disorder 21 genes of any two/three disorders and the genes in control, as well as for the 22 overlapping genes between each disorder and the control (Supplementary Figure 23 4B-O). In addition, the numbers of shared genes for any pairwise comparison or 24 any three disorders are all significantly higher than randomly expected except for 25 the comparison of EE versus SCZ (P=0.4964) (Supplementary Figure5). Our

8

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 observation revealed the existence of common genes harboring piDNMs among 2 these four neuropsychiatric disorders.

3 Genes harboring piDNMs are involved in epigenetic

4 modification and synaptic functions

5 The phenomenon of shared genes among the four neuropsychiatric disorders 6 suggest there may exist common molecular mechanisms underlying their 7 pathogenesis. Thus, we performed functional enrichment analysis for these shared 8 genes, and found they were remarkably enriched in biological processes in 9 chromatin modification like histone methylation, functional classifications of 10 neuromuscular control and protein localization to synapse (Supplementary Table 5, 11 Supplementary Figure 6). These epigenetic regulating genes are composed of 12 CHD5, DOT1L, JARID2, MECP2, PHF19, PRDM4 and TNRC18 (Supplementary 13 Table 6). Moreover, most of these epigenetic modification genes, have been 14 previously linked with neuropsychiatric disorders 31-35. Interestingly, shared genes 15 in enrichment analyses have intensive linkages among these significant pathways 16 as some of piDNM-containing genes could play roles in more than one of these 17 pathways (Supplementary Figure 7).

18 Next, to investigate the biological pathways involved in each group of 19 disorder-specific genes with piDNMs, we carried out functional enrichment 20 analysis with terms in biological process (Supplementary Table 7-9). The top three 21 enriched categories of ASD-specific genes were “macromolecule modification” (P 22 = 2.90×10-13), “organelle organization” (P = 1.22×10-11) and “cell cycle” (P = 23 1.17×10-9). With respect to genes specific to SCZ, it is actually no surprise that the 24 significantly enriched categories are related to protein localization and calcium 25 transport, which have been revealed to be involved in the pathophysiology of

9

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 schizophrenia 36. Because of the limited number of genes, only two GO terms were 2 enriched for EE-specific genes, which were “N-glycan processing” (P = 3.65×10-5) 3 and “protein deglycosylation” (P = 2.17×10-4), while no terms were statistically 4 enriched for ID-specific genes. Our observation that each group of disorder- 5 specific genes being overrepresented into different biological pathways, suggests 6 that piDNMs may also play a role in the distinct phenotypes of the four psychiatric 7 disorders although the explicit underlying mechanisms need to be further explored.

8 Co-expression modules are convergent for cross-disorder

9 genes hit by piDNMs

10 Co-expression of genes can be used to explore the common and distinct molecular 11 mechanisms in neuropsychiatric disorders 37. Thus, we performed weighted gene 12 co-expression network analysis (WGCNA) 38 for the 86 cross-disorder piDNMs- 13 containing genes based on in 16 human brain structures across 31 14 developmental stages from BrainSpan developmental transcriptome (n=524)39. The 15 results of WGCNA deciphered two gene modules with distinct spatiotemporal 16 expression patterns (Figure 4A, B; Supplementary Figure 8). The turquoise module 17 (n=55 genes) was characterized by high expression during early fetal development 18 (8-24 postconceptional weeks) in the majority of brain structures (Figure 4C). 19 Whereas, the blue module (n=22 genes) showed low expression in early fetal 20 development (8-38 postconceptional weeks) in the majority of brain structures 21 (Figure 4D). It is also crucial to clarify gene expression of these genes in early 22 development stages since altered epigenetic regulation in early development has 23 been shown to be associated with neurodevelopmental disorders 40. We found most 24 of these 86 genes are highly expressed and may be required for the normal 25 development of human embryo (Figure 4E, Supplementary Table 10). Our

10

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 observation indicates that these cross-disorder piDNMs-containing genes may play 2 important roles in not only early brain developmental but also early embryonic 3 development.

4 Protein-protein interactions are intensive for cross-disorder

5 piDNM-containing proteins

6 The co-expression results indicate that the proteins coded by the 86 cross-disorder 7 genes may have intensive protein-protein interactions (PPIs). To identify common 8 biological processes that potentially contribute to disease pathogenesis, we 9 investigated protein-protein interactions within these 86 cross-disorder piDNM- 10 containing genes. Our results revealed that 56 out of 86 (65.12%) cross-disorder 11 genes represent an interconnected network on the level of direct/indirect protein- 12 protein interaction relationships (Figure 5A; Supplementary Table 11). 13 Furthermore, we determined several crucial hub piDNM-containing genes in the 14 protein-protein interaction network, such as NOTCH1, MTOR, RYR2, and GNAS 15 (Figure 5A), which may control common biological processes among these four 16 neuropsychiatric disorders. Besides, these 86 cross-disorder proteins are indeed 17 enriched in nervous system phenotypes, including abnormal synaptic transmission, 18 abnormal nervous system development, abnormal neuron morphology and 19 abnormal brain morphology, and behavior/neurological phenotype such as 20 abnormal motor coordination/balance (P <0.05, Supplementary Figure 9A). 21 Similarly, these 56 genes in interaction network are enriched in nervous system 22 phenotype including abnormal nervous system development and abnormal brain 23 morphology (P <0.05, Supplementary Figure 9B). By investigating the expression 24 of these interacting genes in the human cortex of 12 ASD patients and 13 normal 25 donators from public datasets of GSE64018 41 and GSE76852 42, we identified 45

11

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 (80.35%) of these PPI genes were significantly differentially expressed between 2 ASD patients and normal controls (Student's t-test, q <0.05, Figure 5B). And 40 3 (71.42%) of these PPI genes were up-regulated in ASD patients while three (5.35%) 4 genes were down-regulated in ASD patients when compared with normal controls 5 (Supplementary Table 12), indicating that the majority of these PPI genes were 6 abnormally expressed in ASD patients compared with normal controls.

7 Regulatory networks between RBPs and targeting genes are

8 potentially disrupted by piDNMs

9 Dysregulation or mutations of RBPs can cause a range of developmental and 10 neurological diseases 43, 44. Meanwhile, mutations in RNA targets of RBPs, which 11 could disturb the interactions between RBPs and their mRNA targets, and affect 12 mRNA metabolism and protein homeostasis in neurons during the progression of 13 neuropathological disorders 45-47. Hence, we constructed a regulatory network 14 between piDNMs and RBPs based on predicted binding sites of RBPs to 15 investigate the genetic perturbations of mRNA-RBP interactions in the four 16 disorders (Figure 6). We identified several crucial RBP hubs that may contribute to 17 the pathogenesis of the four neuropsychiatric disorders, including EIF4A3, FMR1, 18 PTBP1, AGO1/2, ELAVL1, IGF2BP1/3 and WDR33. Genes with piDNMs in 19 different disorders could be regulated by the same RBP hub while one candidate 20 gene may be regulated by different RBP hubs (Figure 6). In addition, all of these 21 RBP hubs were highly expressed in early fetal development stages (8-37 22 postconceptional weeks) based on BrainSpan developmental transcriptome 23 (Supplementary Figure 10), suggesting their essential roles in the early stages of 24 brain development.

25 Discussion

12

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 In contrasting with the recognized role of LoF DNMs in conferring risk for 2 neuropsychiatric disorders, the effect of DNMs on post-transcriptional regulation 3 in pathogenesis of these disorders remains unknown. In this study, we 4 systematically analyzed the damaging effect of DNMs on post-transcriptional 5 regulation in four neuropsychiatric disorders, and observed higher prevalence of 6 piDNMs in probands than that in controls in four kinds of neuropsychiatric 7 disorders.

8 To date, it has been a challenge to estimate the functions of synonymous and 9 UTRs mutations though such mutations have been widely acknowledged to alter 10 protein expression, conformation and function 48. We applied RBP-Var2 algorithm 11 to annotate and interpret de novo variants in subjects with four neuropsychiatric 12 disorders based on their impact to RNA secondary structure, the binding of 13 miRNAs and RBPs. In comparison with accuracy of other prediction algorithms 14 such as SIFT, PPH2 or RegulomeDB, RBP-Var2 has highest accuracy (AUC: 15 82.89%) to differentiate affected from the control subjects. Our RBP-Var2 tool 16 identified 399 synonymous DNMs and 25 UTR’s DNMs, which were extremely 17 harmful in post-transcriptional regulation. Consistent with previous study 49, 18 synonymous damaging DNMs were significantly prominent in probands compared 19 with that in controls. Meanwhile, de novo insertions and deletions (InDels), 20 especially frameshift patterns are taken for granted to be deleterious. Indeed, de 21 novo frameshift InDels are more frequent in neuropsychiatric disorders compared 22 to non-frameshift InDels 50, which were demonstrated by predictions of RBP-Var2 23 but not SIFT or PPH2. Therefore, the updated version of RBP-Var2 will held great 24 promise for exploring the effect of mutations on post-transcriptional regulation, 25 and deciphering multiple biological layers of deleteriousness may improve the 26 accuracy to predict disease related genetic variations.

13

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Most interestingly, we found that some epigenetic pathways are enriched 2 among these piDNM-containing genes, such as those that regulation of gene 3 expression and histone modification. This finding is consistent with a previous 4 report in which more than 68% of ASD cases shared a common acetylome 5 aberrations at >5,000 cis-regulatory regions in prefrontal and temporal cortex 51. 6 Such common "epimutations" may be induced by either perturbations of epigenetic 7 regulations, including post-transcriptional regulations due to mutations of 8 substrates or the disruptions of epigenetic modifications resulting from the 9 mutation of epigenetic genes. Our observations revealed the association of 10 alterations of "epimutations" with dysregulation of post-transcription. This 11 hypothesis is consistent with the observation that several recurrent piDNM- 12 containing genes are non-epigenetic genes, including SYNGAP1, ADNP, POGZ 13 and ANK2. Moreover, we discovered several recurrent epigenetic genes which 14 contain piDNMs, including CHD8, EP300, KMT2A, KMT2C, KDM3B, JARID2 15 and MECP2, and they may play important roles in the genome-wide aberrations of 16 epigenetic landscapes through disruption of the post-transcriptional regulation. 17 Furthermore, WGCNA analysis revealed that major hubs of the co-expression 18 network for these 86 piDNM-containing genes were histone modifiers by using 19 BrainSpan developmental transcriptome. These data indicate that piDNM- 20 containing genes are co-expressed with genes frequently involved in epigenetic 21 regulation of common cellular and biological process in neuropsychiatric disorders. 22 Importantly, these 86 piDNM-containing genes harbor intensive protein-protein 23 interactions in physics, and shared regulatory networks between piDNMs and 24 RBPs in four neuropsychiatric disorders. We identified several RBP hubs of 25 regulatory networks between piDNM-containing genes and RBP proteins, 26 including EIF4A3, FMRP, PTBP1, AGO1/2, ELAVL1, IGF2BP1/3, WDR33 and 27 FXR2. Taking FMRP for example, it is a well-known pathogenic gene of Fragile X

14

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 syndrome which co-occurs with autism in many cases and its targets are highly 2 enriched for DNMs in ASD 52. Our results demonstrated that, like the mutations on 3 RBP hubs, mutations of RBP-targeting genes through disrupting their interactions 4 with multiple RBPs may synergistically result in pathogenesis of multiple 5 neuropsychiatric disorders.

6 Alterations in expression or mutations in either RBPs or their binding sites in 7 target transcripts have been reported to cause several human diseases such as 8 muscular atrophies, neurological disorders and cancer 53. Although we identified 9 1,736 piDNMs associated with neuropsychiatric disorders, the cause and explicit 10 effects of these piDNMs in these disorders need to be further validated and 11 explored. In this study, our method sheds light on evaluation of post-transcriptional 12 impact of genetic mutations especially for synonymous mutations. Additionally, as 13 small molecules can be rapidly designed to selectively target RNAs and affect 14 RNA-RBP interactions 54, our study provides new insights into RNA-based 15 therapeutic strategies for the treatment of neuropsychiatric disorders.

15

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Materials and methods

2 Data collection and filtration

3 For this study, 7,748 trios or quartets were recruited from previous whole exome 4 sequencing (WES) studies (ref), comprising 5,677 parent-probands trios associated 5 with four neuropsychiatric disorders and 2,071 control trios (Supplementary Table 6 1). After removing the overlap of DNMs between probands and controls, a total of 7 6,996 DNMs in probands and 2,523 DNMs in controls were identified for 8 subsequent analysis.

9 RBP-Var2 algorithm

10 To better interpret the catalog of DNMs, we developed a new heuristic scoring 11 system according to the functional confidence of variants based on experimental 12 data of GWAS, eQTL, CLIP-seq derived RBP binding sites, RNA editing and 13 miRNA targets, and machine learning algorithms. The scoring system represents 14 with increasing confidence if a variant lies in more functional elements 14. For 15 example, we consider variants that are known eQTLs as significant and label them 16 as category 1. Within category 1, subcategories indicate additional annotations 17 ranging from the most informational variants (1a, variant may change the motif for 18 RBP binding) to the least informational variants (1e, variant only has a motif for 19 RBP binding). In mathematical algorithms, we employed LS-GKM 55 (10-mer) and 20 deltaSVM 56 to predict the impact of DNMs on the binding of specific RBPs by 21 calculating the delta SVM scores. Moreover, for single-base mutations, we 22 employed the RNAsnp 57 with default parameters to estimate the mutation effects 23 on local RNA secondary structure and calculated the empirical P values based on 24 the probabilities of the wild-type and mutant RNA sequences. For 25 insertions and deletions, we evaluated the effects of DNMs on RNA secondary

16

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 structure using the minimal free energy generated by RNAfold 58 to calculate 2 empirical P values based on cumulative probabilities of the Poisson distribution. 3 Only the functional DNM produces >5 change in gkm-SVM scores for the effect of 4 RBP binding, and P-value < 0.1 or free energy change >1 for the effect of DNMs 5 on RNA secondary structure change were determined to be a piDNM. Only DNMs 6 occurred in exonic or UTR regions were included in our analysis.

7 Identification of piDNMs and comparison with variants predicted by 8 other methods

9 To determine the likelihood of a functional mutation in post-transcriptional 10 regulation for all SNVs and InDels, our newly updated program RBP-Var2 was 11 utilized to assign an exclusive rank for each mutation and only those mutations 12 categorized into rank 1 or 2 were considered as piDNMs. In comparison with those 13 mutations involved in the disruption of gene function or transcriptional regulation, 14 several programs such as SIFT, PolyPhen2 and RegulomeDB were used to analyze 15 the same dataset of DNMs as the input for RBP-Var2. We only kept the mutations 16 qualified as “damaging” from the result of SIFT and “possibly damaging” or 17 “probably damaging” from PolyPhen2. In the case of RegulomeDB, mutations 18 labeled as category 1 and 2 were retained. Next, we classified the type of mutation 19 (frameshift, nonframeshift, nonsynonymous, synonymous, splicing and stop) and 20 located regions (UTR3, UTR5, exonic, ncRNA exonic and splicing) to determine 21 the distribution of piDNMs, genetic variants and other regulatory variants. The 22 number of variants in cases versus controls was illustrated by bar chart (***: P< 23 0.001, **: 0.001

24 TADA analysis of DNMs in four disorders

17

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Our previously published TADA program 29, which predicts risk genes accurately 2 on the basis of allele frequencies, gene-specific penetrance, and mutation rate, was 3 used to calculate the P value for the likelihood of each gene contributing to the all 4 four disorders with default parameters.

5 ROC curves and specificity/sensitivity estimation 6 We screened a positive (non-neutral) test set of likely casual mutations in 7 Mendelian disease from the ClinVar database (v20170130). From a total of 8 237,308 mutations in ClinVar database, we picked up 145 exonic mutations 9 presented in our curated DNMs in probands. Our negative (neutral) set of likely 10 non-casual variants was built from DNMs of unaffected controls in four 11 neuropsychiatric disorders. To exclude rare deleterious DNMs, we selected only 12 DNMs in controls with a minor allele frequency of at least 0.01 in 1000 genome 13 (1000g2014oct), and obtained a set of 921 exonic variants. Then, we employed R 14 package pROC to analyze and compare ROC curves.

15 Permutation analysis for overlaps of genes with piDNMs 16 In order to evaluate the overlap of genes among any two set of genes with piDNMs, 17 we shuffled the intersections of genes and repeated this procedure 100,000 times. 18 During each permutation, we randomly selected the same number of genes as the 19 actual situation from the all RefSeq genes for each disorder taking account of gene- 20 level de novo mutation rate, then P values were calculated as the proportion of 21 permutations during which the simulated number of overlap was greater than or 22 equal to the actual observed number.

23 Functional enrichment analysis 24 A gene harboring piDNMs was selected into our candidate gene set to conduct 25 functional enrichment analysis if it occurred in at least two of the four disorders.

18

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 GO () and KEGG (Kyoto Encyclopedia of Genes and Genomes) 2 pathway enrichments analyses were implemented by Cytoscape (version 3.4.0) 3 plugin ClueGO (version 2.3.0) using genome-wide coding genes as background 4 and P values calculated by hypergeometric test were corrected to be q values by 5 Benjamini–Hochberg procedure for reducing the false discovery rate resulted from 6 multiple hypothesis testing.

7 Co-expression and spatiotemporal specificity 8 Normalized gene-expression of 16 human brain regions were determined by RNA 9 sequencing and obtained from database BrainSpan (http://www.brainspan.org). We 10 extracted expression for 77 out of 86 extreme damaging cross-disorder genes and 11 employed R-package WGCNA (weighted correlation network analysis) with a 12 power of five to cluster the spatiotemporal-expression patterns and prenatal 13 laminar-expression profiles. The expression level for each gene and development 14 stage (only stages with expression data for all 16 structures were selected, n = 14) 15 was presented across all brain regions.

16 Protein-protein interaction network of cross-disorder genes 17 Protein-protein interactions data of Homo sapiens was collected from the STRING 18 (v10.5) database with score over 0.8. For the PPI network of all cross-disorder 19 genes, we only retain the proteins with at least two links. Those nodes with degree 20 over 30 in the network were considered as hubs. Cytoscape (version3.4.0) was 21 used to analyze and visualize protein-protein interaction networks. 22 Overrepresentation of mouse-mutant phenotypes was evaluated by the web tool 23 MamPhea for the genes in the PPI network and for all cross-disorder genes 24 containing piDNMs. The rest of genome was used as background and multiple test 25 adjustment for P values was done by Benjamini-Hochberg method.

19

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Gene-RBP interaction network 2 Cytoscape (version 3.4.0) was utilized for visualization of the associations between 3 genes harboring piDNMs in the four neuropsychiatric disorders and the 4 corresponding regulatory RBPs.

5 The available data resources

6 To make our findings easily accessible to the research community, we have 7 developed RBP-Var2 platform (http://www.rbp-var.biols.ac.cn/) for storage and 8 retrieval of piDNMs, candidate genes, and for exploring the genetic etiology of 9 neuropsychiatric disorders in post-transcriptional regulation. The expression and 10 epigenetic profiles of genes related to regulatory de novo mutations and early 11 embryonic development have been deposited in our previously published database 12 EpiDenovo (http://www.epidenovo.biols.ac.cn/) 40.

13 URLs 14 RBP-Var2, http://www.rbp-var.biols.ac.cn/; NPdenovo, 15 http://www.wzgenomics.cn/NPdenovo/index.php; EpiDenovo: 16 http://www.epidenovo.biols.ac.cn/; BioGRID, https://thebiogrid.org/;

17 MamPhea , http://evol.nhri.org.tw/phenome/index.jsp?platform; BrainSpan, 18 http://www.brainspan.org; ClinVar, https://www.ncbi.nlm.nih.gov/clinvar/; 19 1000Genomes, http://www.internationalgenome.org/; WGCNA, 20 https://labs.genetics.ucla.edu/horvath/CoexpressionNetwork/Rpackages/WGCNA/;

21 esyN, http://www.esyn.org/; Cytoscape, http://www.cytoscape.org/; TADA, 22 http://wpicr.wpic.pitt.edu/WPICCompGen/TADA/TADA_homepage.htm; ClueGO, 23 http://apps.cytoscape.org/apps/cluego; pROC, http://web.expasy.org/pROC/; R, 24 https://www.r-project.org/; Perl, https://www.perl.org/.

20

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Contributions

2 F.M. and L.W. participated in the design and execution of analyses, produced the 3 figures, participated in the interpretation of results and edited the manuscript. F.M. 4 developed computational code employed in the analyses. L.W. and Z.L. developed 5 the statistical framework and drew the figures. X.Z. participated in the 6 interpretation of results, the oversight of analyses. L.X. developed and improved 7 the online platform of RBP-Var2. H.T. and R.C.R. provided professional guidance 8 in the writing and refining of the manuscript. J.L. and H. T. collected the DNMs 9 from literature and database. Z.S.S. and X.H. conceived the study, participated in 10 the design of analyses, oversaw the study and the interpretation of results, and 11 drafted and edited the manuscript.

12 Acknowledgments

13 The project was funded by National Key R&D Program of China (No. 14 2016YFC0900400). We thank Kun Zhang for his help in TADA analysis and thank 15 Leisheng Shi for his help in data analysis.

16 Competing interests

17 The authors declare no competing financial interests.

18 Figure legends

19 Figure 1. The abundance of piDNMs in different disease categories. (A) Schematic 20 overview of the RBP-Var2 algorithm. (B) The bar plot corresponds to the odds 21 ratios indicating the enrichment of piDNMs in patients from each of the four 22 neuropsychiatric disorders. (C) The relative amount of LoF and non-LoF piDNMs 23 in five neuropsychiatric disorders.

21

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Figure 2. Performance comparison of the ability to distinguish severe DNMs 2 between RBP-Var2 and three other tools. (A) Different kinds of DNMs affecting 3 protein function predicted by SIFT. The Y-axis corresponds to the proportion of 4 each kind of mutations within the total number of damaging DNMs predicted by 5 SIFT. (B) Different kinds of DNMs that affect protein function predicted by 6 PolyPhen2. The Y-axis corresponds to the proportion of each kind of mutations 7 within the total number of damaging DNMs predicted by PolyPhen2. (C) The 8 DNMs, predicted as functional elements involved in transcriptional regulation by 9 RegulomeDB, are categorized into different functional types. The Y-axis 10 corresponds to the proportion of each kind of mutations within the total number of 11 damaging DNMs predicted by RegulomeDB. (D) The DNMs classified as either 12 level 1 or 2 (piDNMs) are categorized into different functional types. The Y-axis 13 corresponds to the proportion of each kind of mutations within the total number of 14 damaging piDNMs. The P values were measured by two-sided binomial test. 15 DNMs predicted in both cases and controls are excluded in the comparison and the 16 DNMs labeled as “unknown” are not demonstrated in the bar plot.

17 Figure 3. Genes with piDNMs involved in four neuropsychiatric disorders. (A) 18 Scatter plot of eight genes harboring recurrent piDNMs among 1,736 piDNMs.

19 The Y-axis corresponds to the -log10(P value) calculated by TADA. The X-axis

20 stands for the TADA output of -log10(mutation rate). (B) Scatter plot of 21

21 recurrent genes with p < 0.01 by TADA. The Y-axis corresponds to the -log10(P 22 value) calculated by TADA. The X-axis stands for the TADA output of -

23 log10(mutation rate). (C) Venn diagram representing the distribution of candidate 24 genes shared among the four neuropsychiatric disorders. (D) Permutation test for 25 the randomness of the overlap between the 86 cross-disorder genes. We shuffled 26 the genes of each disorder and calculated the shared genes between the four

22

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 disorders, and repeated this procedure for 100,000 times to get the null distribution. 2 The vertical dash line indicates the observed value.

3 Figure 4. Weighted co-expression analysis of 86 shared genes. (A) Heatmap 4 visualization of the co-expression network of 86 shared genes. The more saturated 5 color corresponds to the more highly expressed genes. (B) Hierarchical clustering 6 dendrogram of the two color-coded gene modules displayed in (A). (C, D) The two 7 spatiotemporal expression patterns (Turquoise module and Blue module) for 8 network genes based on RNA-seq data from BrainSpan, and they correspond to 17 9 developmental stages across 16 subregions: A1C, primary auditory cortex; AMY, 10 amygdaloid complex; CBC, cerebellar cortex; DFC, dorsolateral prefrontal cortex; 11 HIP, hippocampus; IPC, posteroinferior parietal cortex; ITC, inferolateral temporal 12 cortex; M1C, primary motor cortex; MD, mediodorsal nucleus of thalamus; MFC, 13 anterior cingulate cortex; OFC, orbital frontal cortex; STC, posterior superior 14 temporal cortex; STR, striatum; S1C, primary somatosensory cortex; V1C, primary 15 visual cortex; VFC, ventrolateral prefrontal cortex.

16 Figure 5. Protein-protein interaction network of the cross-disorder genes. The 17 network of interactions between pairs of the proteins encoded by the 56 out of 86 18 cross-disorder genes.

19 Figure 6. Interaction network of RBPs and genes with piDNMs. Different roles of 20 the nodes are reflected by distinguishable geometric shapes and colors. The 21 magenta vertical arrow stands for the RNA binding proteins. Disks with different 22 colors represent the genes with piDNMs involved in different kinds of disorders.

23 Supplementary Figure 1. Excess of piDNMs in probands. The odds ratio of 24 synonymous DNMs and piDNMs were analyzed. The of filtered 25 piDNMs that not contained LoF mutations were also displayed.

23

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Supplementary Figure 2. ROC curve showing the performance of the predictions 2 of SIFT, PPH2, RBP-Var2 and RegulomeDB.

3 Supplementary Figure 3. Overlap of DNMs identified by different tools. (A) 4 Venn diagram depicting the overlap between the DNMs predicted by SIFT, PPH2, 5 RBP-Var2 and RegulomeDB. (B) Venn diagram depicting the overlap between the 6 genes predicted by SIFT, PPH2, RBP-Var2 and RegulomeDB. (C) The pie chart 7 shows the distribution of all non-LoF piDNMs. The non-LoF piDNMs detected by 8 RBP-Var2 alone account for 52.8% of all non-LoF piDNMs (pink), while the non- 9 LoF deleterious DNMs identified by both SIFT and Polyphen2 take up 26.2% of 10 all (light purple). (D) Pathway enrichment analysis of the 665 genes unique to the 11 prediction of RBP-Var2.

12 Supplementary Figure 4. Permutation test of the randomness of the overlap of 13 different set of disease genes with control. (A-K) Permutation test for the validity 14 of the gene overlap between the cross-disorder genes and the control. (L-O) 15 Permutation for the overlap of genes from each disorder with control. We shuffled 16 the genes of each disorder and calculated the shared genes between each pair, and 17 repeated this procedure for 100,000 times to get the null distribution. The vertical 18 dash line stands for the observed value.

19 Supplementary Figure 5. Test of the significance of the number of cross-disorder 20 genes involved in the four neuropsychiatric disorders. (A-J) Permutation test for 21 the validity of the gene overlap among/between every combination of three/two 22 disorders.

23 Supplementary Figure 6. Pie chart of the pathway enrichment analysis for the 86 24 cross-disorder genes.

24

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Supplementary Figure 7. Interaction network of the gene enrichment analysis for 2 the 86 cross-disorder genes.

3 Supplementary Figure 8. Relationship between Co-expression modules. (A) 4 MDS plot of genes in turquoise module and blue module. (B) Relationship 5 between module eigengenes. (C) Clustering tree based of the module eigengenes. 6 (D) heatmap of adjacency Eigengene.

7 Supplementary Figure 9. Mammalian phenotype enrichment analysis of selected 8 genes. (A) Mammalian phenotype enrichment of 86 cross-disorder piDNMs genes. 9 (B) Mammalian phenotype enrichment of 56 genes in interaction network.

10 Supplementary Figure 10. Heat map of the expression of the crucial RBP hub 11 genes during the early fetal development stages.

12

25

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Reference

2 1. Iossifov I, O'Roak BJ, Sanders SJ, Ronemus M, Krumm N, Levy D et al. The contribution of de 3 novo coding mutations to autism spectrum disorder. 2014; 515(7526): 216-221. 4 5 2. Pires DE, Ascher DB, Blundell TL. DUET: a server for predicting effects of mutations on protein 6 stability using an integrated computational approach. Nucleic Acids Res 2014; 42(Web Server 7 issue): W314-319. 8 9 3. Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P et al. A method and 10 server for predicting damaging missense mutations. Nature Methods 2010; 7(4): 248-249. 11 12 4. Chen XF, Zhang YW, Xu HX, Bu GJ. Transcriptional regulation and its misregulation in Alzheimer's 13 disease. Molecular Brain 2013; 6. 14 15 5. Lee TI, Young RA. Transcriptional Regulation and Its Misregulation in Disease. Cell 2013; 152(6): 16 1237-1251. 17 18 6. Portela A, Esteller M. Epigenetic modifications and human disease. Nature Biotechnology 2010; 19 28(10): 1057-1068. 20 21 7. Sakabe NJ, Savic D, Nobrega MA. Transcriptional enhancers in development and disease. 22 Genome Biology 2012; 13(1). 23 24 8. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M et al. Annotation of functional 25 variation in personal genomes using RegulomeDB. Genome Res 2012; 22(9): 1790-1797. 26 27 9. Wan Y, Qu K, Zhang QC, Flynn RA, Manor O, Ouyang Z et al. Landscape and variation of RNA 28 secondary structure across the human transcriptome. Nature 2014; 505(7485): 706-709. 29 30 10. Suhl JA, Muddashetty RS, Anderson BR, Ifrim MF, Visootsak J, Bassell GJ et al. A 3' untranslated 31 region variant in FMR1 eliminates neuronal activity-dependent translation of FMRP by disrupting 32 binding of the RNA-binding protein HuR. Proc Natl Acad Sci U S A 2015; 112(47): E6553-6561. 33 34 11. Chin LJ, Ratner E, Leng S, Zhai R, Nallur S, Babar I et al. A SNP in a let-7 microRNA 35 complementary site in the KRAS 3' untranslated region increases non-small cell lung cancer risk. 36 Cancer Res 2008; 68(20): 8535-8540. 37 38 12. Maticzka D, Lange SJ, Costa F, Backofen R. GraphProt: modeling binding preferences of RNA- 39 binding proteins. Genome Biol 2014; 15(1): R17.

26

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 13. Fukunaga T, Ozaki H, Terai G, Asai K, Iwasaki W, Kiryu H. CapR: revealing structural specificities of 3 RNA-binding protein target recognition using CLIP-seq data. Genome Biol 2014; 15(1): R16. 4 5 14. Mao FB, Xiao LY, Li XF, Liang JL, Teng HJ, Cai WS et al. RBP-Var: a database of functional variants 6 involved in regulation mediated by RNA-binding proteins. Nucleic Acids Research 2016; 44(D1): 7 D154-D163. 8 9 15. Hu B, Yang Y-CT, Huang Y, Zhu Y, Lu ZJ. POSTAR: a platform for exploring post-transcriptional 10 regulation coordinated by RNA-binding proteins. Nucleic Acids Research 2016: gkw888. 11 12 16. Elsabbagh M, Divan G, Koh YJ, Kim YS, Kauchali S, Marcin C et al. Global Prevalence of Autism 13 and Other Pervasive Developmental Disorders. Autism Research 2012; 5(3): 160-179. 14 15 17. Gamazon ER, Badner JA, Cheng L, Zhang C, Zhang D, Cox NJ et al. Enrichment of cis-regulatory 16 gene expression SNPs and methylation quantitative trait loci among bipolar disorder 17 susceptibility variants. Molecular Psychiatry 2013; 18(3): 340-346. 18 19 18. Bonder MJ, Luijk R, Zhernakova DV, Moed M, Deelen P, Vermaat M et al. Disease variants alter 20 levels and methylation of their binding sites. Nat Genet 2016. 21 22 19. Loke YJ, Hannan AJ, Craig JM. The Role of Epigenetic Change in Autism Spectrum Disorders. Front 23 Neurol 2015; 6: 107. 24 25 20. Gupta S, Ellis SE, Ashar FN, Moes A, Bader JS, Zhan J et al. Transcriptome analysis reveals 26 dysregulation of innate immune response genes and neuronal activity-dependent genes in 27 autism. Nat Commun 2014; 5: 5748. 28 29 21. Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y, Horvath S et al. Transcriptomic analysis of 30 autistic brain reveals convergent molecular pathology. Nature 2011; 474(7351): 380-384. 31 32 22. Corbett BA, Kantor AB, Schulman H, Walker WL, Lit L, Ashwood P et al. A proteomic study of 33 serum from children with autism showing differential expression of apolipoproteins and 34 complement proteins. Mol Psychiatry 2007; 12(3): 292-306. 35 36 23. Short PJ, McRae JF, Gallone G, Sifrim A, Won H, Geschwind DH et al. De novo mutations in 37 regulatory elements in neurodevelopmental disorders. Nature 2018. 38 39 24. Liu Y, Liang Y, Cicek AE, Li Z, Li J, Muhle RA et al. A Statistical Framework for Mapping Risk Genes 40 from De Novo Mutations in Whole-Genome-Sequencing Studies. The American Journal of 41 Human Genetics 2018.

27

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 25. Zhou J, Park CY, Theesfeld CL, Wong AK, Yuan Y, Scheckel C et al. Whole-genome deep-learning 3 analysis identifies contribution of noncoding mutations to autism risk. Nat Genet 2019; 51(6): 4 973-980. 5 6 26. Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M et al. Annotation of functional 7 variation in personal genomes using RegulomeDB. Genome Res 2012; 22(9): 1790-1797. 8 9 27. Wang W, Bu B, Xie M, Zhang M, Yu Z, Tao D. Neural cell cycle dysregulation and central nervous 10 system diseases. Prog Neurobiol 2009; 89(1): 1-17. 11 12 28. Lau CG, Zukin RS. NMDA receptor trafficking in synaptic plasticity and neuropsychiatric disorders. 13 Nat Rev Neurosci 2007; 8(6): 413-426. 14 15 29. He X, Sanders SJ, Liu L, De Rubeis S, Lim ET, Sutcliffe JS et al. Integrated Model of De Novo and 16 Inherited Genetic Variants Yields Greater Power to Identify Risk Genes. Plos Genetics 2013; 9(8). 17 18 30. Li JC, Cai T, Jiang Y, Chen HQ, He X, Chen C et al. Genes with de novo mutations are shared by 19 four neuropsychiatric disorders discovered from NPdenovo database. Molecular Psychiatry 2016; 20 21(2): 290-297. 21 22 31. Brookes E, Shi Y. Diverse Epigenetic Mechanisms of Human Disease. Annual Review of Genetics, 23 Vol 48 2014; 48: 237-268. 24 25 32. Peter CJ, Akbarian S. Balancing histone methylation activities in psychiatric disorders. Trends in 26 Molecular Medicine 2011; 17(7): 372-379. 27 28 33. Egan CM, Nyman U, Skotte J, Streubel G, Turner S, O'Connell DJ et al. CHD5 Is Required for 29 Neurogenesis and Has a Dual Role in Facilitating Gene Expression and Polycomb Gene 30 Repression. Developmental Cell 2013; 26(3): 223-236. 31 32 34. Dubey N, Hoffman JF, Schuebel K, Yuan Q, Martinez PE, Nieman LK et al. The ESC/E(Z) complex, 33 an effector of response to ovarian steroids, manifests an intrinsic difference in cells from women 34 with premenstrual dysphoric disorder. Mol Psychiatry 2017; 22(8): 1172-1184. 35 36 35. Chittka A, Nitarska J, Grazini U, Richardson WD. Transcription factor positive regulatory domain 4 37 (PRDM4) recruits protein arginine methyltransferase 5 (PRMT5) to mediate histone arginine 38 methylation and control neural stem cell proliferation and differentiation. J Biol Chem 2012; 39 287(51): 42995-43006. 40

28

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 36. Wang H, Westin L, Nong Y, Birnbaum S, Bendor J, Brismar H et al. Norbin Is an Endogenous 2 Regulator of Metabotropic Glutamate Receptor 5 Signaling. Science 2009; 326(5959): 1554-1557. 3 4 37. Lotan A, Fenckova M, Bralten J, Alttoa A, Dixson L, Williams RW et al. Neuroinformatic analyses 5 of common and distinct genetic components associated with major neuropsychiatric disorders. 6 Frontiers in Neuroscience 2014; 8. 7 8 38. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. Bmc 9 Bioinformatics 2008; 9. 10 11 39. Kang HJ, Kawasawa YI, Cheng F, Zhu Y, Xu X, Li M et al. Spatio-temporal transcriptome of the 12 human brain. Nature 2011; 478(7370): 483-489. 13 14 40. Mao F, Liu Q, Zhao X, Yang H, Guo S, Xiao L et al. EpiDenovo: a platform for linking regulatory de 15 novo mutations to developmental epigenetics and diseases. Nucleic Acids Res 2017. 16 17 41. Irimia M, Weatheritt RJ, Ellis JD, Parikshak NN, Gonatopoulos-Pournatzis T, Babor M et al. A 18 Highly Conserved Program of Neuronal Microexons Is Misregulated in Autistic Brains. Cell 2014; 19 159(7): 1511-1523. 20 21 42. Werling DM, Parikshak NN, Geschwind DH. Gene expression in human brain implicates sexually 22 dimorphic pathways in autism spectrum disorders. Nature Communications 2016; 7. 23 24 43. Castello A, Fischer B, Hentze MW, Preiss T. RNA-binding proteins in Mendelian disease. Trends in 25 Genetics 2013; 29(5): 318-327. 26 27 44. Klein ME, Monday H, Jordan BA. Proteostasis and RNA Binding Proteins in Synaptic Plasticity and 28 in the Pathogenesis of Neuropsychiatric Disorders. Neural Plasticity 2016. 29 30 45. Lukong KE, Chang KW, Khandjian EW, Richard S. RNA-binding proteins in human genetic disease. 31 Trends in Genetics 2008; 24(8): 416-425. 32 33 46. Gerstberger S, Hafner M, Tuschl T. A census of human RNA-binding proteins. Nature Reviews 34 Genetics 2014; 15(12): 829-845. 35 36 47. Neelamraju Y, Hashemikhabir S, Janga SC. The human RBPome: From genes and proteins to 37 human disease. Journal of Proteomics 2015; 127: 61-70. 38 39 48. Sauna ZE, Kimchi-Sarfaty C. Understanding the contribution of synonymous mutations to human 40 disease. Nature Reviews Genetics 2011; 12(10): 683-691.

29

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 2 49. Takata A, Ionita-Laza I, Gogos JA, Xu B, Karayiorgou M. De Novo Synonymous Mutations in 3 Regulatory Elements Contribute to the Genetic Etiology of Autism and Schizophrenia. Neuron 4 2016; 89(5): 940-947. 5 6 50. Kosmicki JA, Samocha KE, Howrigan DP, Sanders SJ, Slowikowski K, Lek M et al. Refining the role 7 of de novo protein-truncating variants in neurodevelopmental disorders by using population 8 reference samples. Nat Genet 2017. 9 10 51. Sun W, Poschmann J, Cruz-Herrera Del Rosario R, Parikshak NN, Hajan HS, Kumar V et al. Histone 11 Acetylome-wide Association Study of Autism Spectrum Disorder. Cell 2016; 167(5): 1385-1397 12 e1311. 13 14 52. Parikshak NN, Gandal MJ, Geschwind DH. Systems biology and gene networks in 15 neurodevelopmental and neurodegenerative disorders. Nature Reviews Genetics 2015; 16(8): 16 441-458. 17 18 53. Kechavarzi B, Janga SC. Dissecting the expression landscape of RNA-binding proteins in human 19 cancers. Genome Biology 2014; 15(1). 20 21 54. Costales MG, Haga CL, Velagapudi SP, Childs-Disney JL, Phinney DG, Disney MD. Small Molecule 22 Inhibition of microRNA-210 Reprograms an Oncogenic Hypoxic Circuit. J Am Chem Soc 2017; 23 139(9): 3446-3455. 24 25 55. Lee D. LS-GKM: a new gkm-SVM for large-scale datasets. Bioinformatics 2016; 32(14): 2196-2198. 26 27 56. Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL, McCallion AS et al. A method to predict the 28 impact of regulatory variants from DNA sequence. Nature Genetics 2015; 47(8) : 955-+. 29 30 57. Sabarinathan R, Tafer H, Seemann SE, Hofacker IL, Stadler PF, Gorodkin J. RNAsnp: Efficient 31 Detection of Local RNA Secondary Structure Changes Induced by SNPs (vol 34, pg 546, 2013). 32 Human Mutation 2013; 34(6): 925-925. 33 34 58. Gruber AR, Lorenz R, Bernhart SH, Neuboock R, Hofacker IL. The Vienna RNA Websuite. Nucleic 35 Acids Research 2008; 36: W70-W74. 36

37

30

bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A

starBase miRNAs NPdenovo (miRbase)

RNA structure CLIPdb rbDNMs eQTLs RBP- RBP motif binding regions GEO SVM Splicing m6A

GWAS delta COSMIC SVM ClinVar DISEASES

Expression Annotation (mRNA/miRNA) Classifying piDNMs system HapMap 1000Geome LD (plink)

B 4.46e−05 8.19e−17 8.19e−17 8.87e−08 1.5

0.02

−log10(P value) 16 1.0 12 OR 8

4

0.5

0.0

EE ID SCZ ASD ALL Disorders C

1500

90.61%

1000 91.13% LoF

piDNMs nonLoF

500

91.81%

84.78% 86.14% 0

ID EE SCZ ASD ALL Disorders bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B * Case Case 1788 2041

0.30 SIFT 558 Control PPH2 Control 611 0.20 Frequency Frequency 0.10

* 0 0 0 0 7 5 2 1 43 7 0 0 0 0 0 0 0 0 2 1 0 0 0 0 0.00 0.00 0.10 0.20 0.30

frameshift nonframeshift synonymousnonsynonymous splicing stopgain stoploss frameshift nonframeshift synonymousnonsynonymous splicing stopgain stoploss

C D

0.03 *** RegulomeDB RBP-Var2 160 52 Case 1104 Case Control Control 0.02

211

77 24 *** 397 0.01 Frequency Frequency 91 *** * 14 3 114 11 9 45 17 3 0 1 1 0 0 7 29 10 4 0 0 1 0.00 0.00 0.05 0.10 0.15 0.20

frameshift nonframeshift synonymousnonsynonymous splicing stopgain stoploss frameshift nonframeshift synonymousnonsynonymous splicing stopgain stoploss bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

A B

SYNGAP1 TCF4 TNRC6B CHD8● ● ●POGZ STXBP1●● ● ● WDR45 SUV420H1 ●

4 ILF2 ● ● MSL2 TLK2 ● 6 factor(dn.LoF) 3 ●DYNC1H1 ● ● 0 EEF1A2 ● SETD5 ● 1 factor(dn.LoF) ● 2 ● 0 ● 3 ● 2 ● 5 ● 3 ● 6

2 −log10(pval.TADA)

−log10(pval.TADA) ● 8 TANC2 4 ● ● 16 ● ITPR1 DEAF1 FHDC1 ● ● RANBP17 ● GIGYF2 A ● ● HADHA ●DYNC1H1 ● EEF1A2 1 RALGAPB ● RYR2 ● ● NLGN2 TNRC18● NFASC ● ● 2 PTK7

3.6 3.9 4.2 4.5 4.0 4.5 −log10(mut.rate) −log10(mut.rate)

C D

EE ID 0.06

70 62 SCZ

ASD 4 20 3 0.04 p value < 1e-5 2 0 1049 226 Frequency

0 0.02 12 1 4 1 39 0.00

25 50 75

Number of connections E C A AMY MFC M1C OFC CBC DFC STC VFC STR V1C A1C S1C HIP IPC ITC MD bioRxiv preprint not certifiedbypeerreview)istheauthor/funder,whohasgrantedbioRxivalicensetodisplaypreprintinperpetuity.Itmadeavailable Turquoise Module(n=55) 8-9 pcw 10-12 pcw 13-15 pcw doi:

Primitive.endoderm 16-18 pcw

https://doi.org/10.1101/175844

Sperm 19-24 pcw Oocyte 25-38 pcw 0-5 mos

X4.cell 6-18 mos

Pronucleus 19mos-5 yrs MII.oocyte 6-11 yrs

12-19 yrs X2.cell under a 20-40 yrs

; Zygote this versionpostedNovember26,2019. CC-BY-NC-ND 4.0Internationallicense

Trophectoderm Normalized expression Middle.blastocyst −1.0 −0.5

0.0 0.5 1.0 B

ICM Late.blastocyst Height

D

X8.cell 0.6 0.7 0.8 0.9 1.0 Early.blastocyst

AMY MFC CBC OFC M1C DFC STR STC VFC V1C A1C S1C

HIP IPC ITC MD Pachytene.spermatocyte

The copyrightholderforthispreprint(whichwas

. Round.spermatids 8-9 pcw Blue Module(n=22)

Primordial.germ.cell 10-12 pcw

Dynamic Tree Cut Epiblast 13-15 pcw

hclust (*,"average")

as.dist(dissTOM) ESC 16-18 pcw 19-24 pcw

Trophoblast 25-38 pcw

Morulae 0-5 mos Syncytiotrophoblast 6-18 mos 19mos-5 yrs 6-11 yrs IRS1 GRAMD2 CTTNBP2 BCORL1 SMAP2 EEF1A2 PITX1 FHOD1 PPP6R2 RANBP17 ZC3H3 RYR2 ITPR1 KIF24 ALDH5A1 GNAO1 SPATA5 KIF18A MED13L ARMC9 PRDM4 MDM1 RALGAPB POGZ TNKS2 MECP2 DOT1L PHIP USP38 DSG2 RUSC1 KPNA1 TNPO2 MKI67 STXBP1 TANC2 NEB MYH10 HBS1L SETD5 MYO5A COL4A3BP NUMA1 INTS1 TNRC18 TCF4 DYNC1H1 OBSCN CELSR2 CHD5 A2M NOTCH1 PHF19 LLGL1 CIC NEDD9 ALS2CL DDX50 MTOR JARID2 INTS10 AP3D1 SHKBP1 LLGL2 CDC34 BAIAP2 GNPTG DEAF1 ATP2B4 CTSL OXA1L HNRNPU GNAS 12-19 yrs 20-40 yrs −3 −2 −1 0 1 2 bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

RHOQ

GCGR

RHOF

RHOH

HTR2C

A RHOT2 GTPBP2

RHOBTB1

RHOBTB2 TTN GNG3

F2 RHOU

IGF2

FZD3 RRAS RHOJ

GAB2

RHOT1 RHOV

ADCY7

MYH8

RAC3 KNG1

EDN1

AVPR1A

MYL3 JMJD7-PLA2G4B RHOD

PARD6G PLA2G4D ACTN2

AVPR1B CHRM3

PDGFRA

IL2RA PTPN11

MRAS

GNA11

P2RY2

MYL2 ADCY5

LLGL2 PLCD1

GNRHR RHOG

ADCY4 MECP2 EDNRA

PRKCI

PARD6B

NEB RHOC FGFR2 KIT AGTR1 FLT1 DNMT1

LPAR2 MYL1

GNG7 PIK3C2G

AGT FGFR1

OBSCN RRAS2 PRKAR2A DNMT3A

CRKL

JAK1

H2AFJ OPRM1 RHOB RALA ADCY8 SYNGAP1

RET GNAQ PTGER1 PARD6A A2M HIST2H2BE

HTR5A FLT4 VAV2 AEBP2 ADRA1B HIST1H2BD FCGR2A ESR1 GNG8 PRKAR1A

KRAS RAC2 PLA2G4E

GNG12

BLNK CCR5 NRAS MYH6 LLGL1 RALGAPB HIST1H2AC PHF1 SHC1

ACTC1 MYO1C TNNC1 ARPC1B

IGF1 MAPK8 HIST1H2BA

GNAT2

HIST1H2BN

FGFR4 NCKIPSD GNAS HIST1H2BL PLA2G4B IL2RG H2AFX FN1

F2R

MAPK10 INPP5D GNG11 GAB1

ACTG1 HIST1H2AJ WIPF3 MYO9B

JAK2

GNB4

EDNRB

GNG2 ADCY2 CRHR1 RHOA

HRAS H2AFZ ACTG2 FOXO1 HIST1H2BJ BTK MAPK11 CD3G

VAV3 TNNC2 CXCR4 OGT MET PTGFR FGFR3

PLA2G4A AKT3 MYH10 PLCD4 INSR NGF HIST1H2BK ARPC5 PRKAR1B

GNB5

FCGR1A RBBP7 IQGAP1 GDNF EZH2 CBL ULK1 ADCY3

LPAR3

PHF19 GNA15 WAS ACTA2 JARID2

PIK3CB CHRM1

GNB3 WASF3 BAIAP2 PF4 PRDM4 JAK3 BCAR1 H2AFV GRM5

PLCB2 GNG4 SYK ULK2

HIST2H2AC

RAC1 VTI1B AKT2 FCGR3A GNGT2 ARPC2 MAPK12 EGF GNAI1 DLG3 DLG2 HDAC1 PIK3CG WIPF2 GNG13 MAPK14 TNNI3 HIST2H3A CHD5 SUZ12

LPAR1

MYH9 ADCY9 HDAC2 GNG10 ABI1 CRK PIK3R3

WASF1 ERBB2

VAV1 GNG5 ADCY6 EED

RPS6KB1 OXTR HIST1H2AD GNB2 MYH2 PRKACG EEF1B2

GNAT1 GNAI3 GRM1 SOS1 HBS1L

GNGT1 NCK1 PIK3R5 WIPF1 MAPK1 GFRA1 ARPC1A ETF1 DNMT3B PRKCE MTF2 IRS2

PRKCQ PLCG2 PIK3CA MEF2A

BRK1 ADCY1 NCKAP1

PLCZ1 IL2

CDC42 CALM1 PRKCA

PRKACB PIK3CD PRKCB

ACTR2 CALM2

PLCB1

LARS

IRS4 NLGN2 IGF1R STXBP1 PRKCG MYO5AWASL ARRB1

MEF2C PRKG2 PPP1CB IRS1 EEF1A2 TCF4 CALML3

PLCG1 PXN IL2RB

ABI2 KDR MEF2D VAMP2 LCK GRB2 MAPK3

GNAT3 NGFR

ACTR3

CACNA1F

SNAPC3 GNAI2

PTEN PIK3R2 ELL GNAO1 GNA14 ITPR2 CABIN1

PLCB3

EXOC3 PTK2 PRKCD DOT1L INTS14 ARPC3

RBBP4 TBL1X ACTB

DVL2

ELL2 MYOD1 CALML5 ITPR3 NCKAP1L CACNA1C VEGFA

F8 CALM3 EGFR USP38 CCNT1

YWHAZ

RPS6KB2

CXCL12 ELL3 ASCL1

APP STX4 IL6 SNAPC5 ABL1 PDGFRB ARPC4 CACNA1B

HDAC3 CACNA1S LEP

FYN STX1A SRC

MLST8

INS

CACNA1D TAF5

GTF2A1 STAT3

TBL1XR1 RPTOR

ATOH1

CD79A

TAF8 INTS9 CTNNB1 IL4 ERBB4

PIK3C3 CALML6 F5

DVL3 AKT1 NRXN3 WNT5A ATP2B4 INTS3

ITPR1NOS1 TLE1

SNAPC4 PRKAR2B UBA52

ANK1 SNAP25 CD79B EFTUD2 PRKACA ICE2 MTOR

NRXN2 SYT5 SERPINA1 NEDD9 NABP1 CAMK2B

NRXN1 HIF1A SLC2A4 FIP1L1 CREBBP IKBKB

ITGA2 POLR2B

HERC6 RBCK1

RPRD1B

TP53 CBLB

RAB11A PDPK1

NCBP1 SOCS1 ANAPC11

FKBP1B CUL3

FBXO15 YWHAH CHUK FBXL13

POLR2K

PLOD1 YWHAQ UBE2V2 HES5 FBXO2 PPARG FBXO27 CDRT1

UBE3A UBE3B WSB1 HSP90AA1 LONRF1 UBAC1 ATG7 FBXL19 PHAX PLN

TRIP12 UBE2G1 SMURF1 SIAH1

TRIM63 HECTD2 CDC23 CCND1 INTS1 DVL1 NOTCH1

SP1 ASB18 FBXO21 EP300 SOCS3 GTF2F2 YWHAB UFL1 POLR2F

NFKBIA

POLR2G CCNT2 SPSB1 ATP2A2

FBXL16 FBXW9 ASB17 LNX1 NSF UBE2Z PLOD2 UBA6 CCNC FBXW7 ZNRF2 ZNRF1 HNRNPU NAPA RNF7 UBE2K BCORL1 TRIM4 POLR2D PJA1 UBE2B BTRC DCAF1 ITCH TAF11 UBE2FTRIM9 KBTBD8 UBA7

ANK2 RELA FBXL3 NCBP2 RAB8A RNF144B RYR2 NFKB1 KCTD6 POLR2I YY1 FBXW12

FBXW8 PLOD3 ATP2A1 CUL1 UBE2L6 SH3RF1 RCHY1 PRKN MED13L FBXL5 RBX1 UBB UBE2D1 HACE1 NEDD4L GTF2F1 TRIM50 RNF138 RNF130 ICE1 MIB2 DTX3L UBR1 ARIH2 CDC26 DZIP3 FBXO10

POLR2C KLHL22 TRIM32 SPSB2 KPNA1 NCOR2

UBE2O ANAPC4 POLR2L SFN FBXL14 KEAP1

ELOB INTS5 SKP2 RBBP6

GTF2E2

TRIM36 CUL2 RPRD1A POLR2J

PPARGC1A NCOR1 AXIN1 RNF34

POU2F1 RPS27A PPP2R5D ZC3H8

FZR1 POLR2E RNF14 PPP2CA ASB2

INTS12 CUL7

RPRD2 YWHAE STX5 LRR1 DNAH8 DET1 KLHL9 HERC3 RNF19B POU2F2 ASB12 CDK7 MGRN1 GTF2A2 LRRC41 FBXL18 TANC2

UBC INTS11 VHL UBE3D TAF13 GTF2E1 PPP1CC BCL2 SHH CCNK UBE2Q1 INTS2 INTS7 PPP2R5C

ZNF143 PPP2R5A YWHAG TNKS2 FBXL7 UBE2E2 UBE2C AURKA KLHL5 FBXO9 BCL2L1

UBA1 SMURF2 NEDD4 CDK8 ASB11

HLA-DRB1 ASF1A UBE2L3 CLIP1

YKT6 GTF2B RNF213 INTS4 PPP6R2 RNF111 MKRN1 SF3B1 MEX3C UBE2V1 PPP2R5B GOSR2 COL7A1 INTS13 RAB6A TBP NABP2 PPP2CB UBE2U PPP2R5E FBXO32 KPNA2 PPP1CA SNAPC1 HECTD1 TRAIP INTS6 UBE2S RNF126

RANBP2 INTS8

KPNB1

TMED2 SKP1 UBE2E3 UBE3C BET1 SNW1 COPG1 RAB1B TMED9 ZNF645 FBXO40 NUDC RNF123 TAF6 UBE2D3

DYNC2H1 DYNC1LI2 COL18A1 ASB15 WWP1

CENPE SNAPC2 FBXL15 HERC4 KIF3C RAB1A ARF1 ANAPC7 KIF4B GOLGA2 SEC13 TRIM21 CTSL UBE2H

CDK9 TMED7 UBE2D2 TAF9 TMED10 FBXW5 PLK1 GORASP1 HERC2 TUBGCP4 KLC3 ARFGAP1 DYNC1I1 UNKL UBE2M PCK1 LMO7 TRIM11 CD59

KLHL20 COL15A1 FBXW11 DCTN4 KIFAP3 KDELR3 HLA-DQA2 FOLR1 ANAPC2 KIF5B ASB5 CUL5

UBE2Q2

RLIM FBXO7 FBXO31

CSNK1D KLHL11 CCNF

TUBG1 HLA-DRB5 ASB16

UBE2G2

ARFGAP3 KBTBD13 HLA-DMA HLA-DPA1 HECTD3 PJA2 ANAPC5 HECW2 AREL1 UBR2 KLHL41 FBXO4 SIAH2 ASB7 CKAP5 LRSAM1 RNF4 HAUS6 CDC27 NEDD1 DCTN2 UBE2N

ASB9 INTS10 DYNLL1 DYNC1I2 SPTBN2 UBA3

DYNC1H1KIF26A KLC1 NUMA1 RNF217

CDC20 KIF2B FBXL12

FBXO41

ELAVL1 UBOX5 UBE2J1 KLHL21 RAB7A ASB14 ASB6 KDELR2 ELOC COL4A3 HLA-DPB1 RNF182

KIF15 KBTBD6 HLA-DRA ARF3 CDK1 UBR4 FBXL4 TUBGCP3 DCTN6 ARFGAP2

KIF20A TRIM41 RACGAP1 ACTR1A

KIF5A PAFAH1B1 FBXO6 MYLIP TRIM37 KLHL2

COPB1 MAPRE1 TRIM71

RNF6

ASB10

B9D2 ZBTB16

COL4A3BP DSG2 ASB13 TUBGCP6 KIF2A RNF114

ANAPC10 UBA5 DCTN1 BTBD6 TMED3 ANAPC13 HERC5 HERC1

FBXO17

COPE ARF5 ARCN1 PPP2R1A CDC34 ASB1

TLK2

COPA KLC2

KDELR1 RNF220

BTBD1

FBXL20 ASB4

TUBGCP2 CLASP1

ALDH5A1 UBE2D4 UBE2E1 RILP HLA-DOA KIF3B GBF1

KLHL42 ASB8 NDEL1 KLC4 TRAF7 ACTR1B

STUB1 KIF4A KLHL3 HUWE1 TUBG2

TRIM39 UBE2J2 LTN1

RNF19A

DCTN5 PSMB2 SPSB4 KIF3A UBE2R2

ARF4 KLHL25

COPB2 CDC16 CCNB1 OSBPL1A

TPX2

KIF23 CASP3

FBXO44 RNF41

UBE2A HLA-DMB KIF2C KIF11 KIF18A FBXO30

TRIM69

FBXO22

BIRC5

KBTBD7

XPO1 DYNC2LI1 RNF25 DCTN3

TNPO2

RANGAP1

UBE4A

HLA-DOB

ANAPC1 FBXL8

COPZ1 SPATA5 SEPT2

MKI67

KIF22

CCNB2

CCNA2

CDCA8 B

LLGL2 COL7A1 CHD5 3 GNAS MYO5A STXBP1 DYNC1H1 2 GNAO1 RYR2 ATP2B4 TNPO2 1 MYH10 ITPR1 NLGN2 HNRNPU 0 HECTD1 PPP6R2 INTS1 SYNGAP1 TANC2 BAIAP2 MED13L MTOR NUMA1 TCF4 CABIN1 MECP2 OBSCN BCORL1 DOT1L CTSL SPATA5 TNKS2 A2M NOTCH1 IRS1 COL4A3BP KPNA1 RALGAPB JARID2 INTS10 PRDM4 HBS1L PIK3R3 TLK2 CTL_UMB5168 CTL_AN15566 CTL_AN19760 CTL_AN01410 CTL_AN08161 CTL_AN01125 CTL_AN17425.1 CTL_AN08161.1 CTL_AN10679 CTL_UMB5079 CTL_AN17425 CTL_UMB5079.1 CTL_AN15088 ASD_AN01971 ASD_AN00493 ASD_AN12457 ASD_AN04682 ASD_AN01570 ASD_AN02987 ASD_AN09714 ASD_AN08043 ASD_UMB5278 ASD_AN03632 ASD_AN08166 ASD_AN08792 bioRxiv preprint doi: https://doi.org/10.1101/175844; this version posted November 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

RYR2 SHKBP1 RUSC1 RALGAPB STXBP1 PRDM4

SYNGAP1 PIK3R3

YTHDC1 PHIP SMAP2 PHF19

TNRC18 NLGN2

RANBP17 NUMA1 WDR45 FUS IGF2BP1 NEDD9 AGO ZC3H3

TNKS2 U2AF65 EWSR1 eIF4AIII PITX1 MYO5A

SPATA5 NOTCH1 WDR33 TLK2 MECP2 AGO3 ZC3H7B TNPO2 NEB HECTD1

IRS1 IGF2BP3 OXA1L IGF2BP2 TANC2 FXR2 MYH10

A2M DYNC1H1 MTOR USP38 INTS10 ZNF445 MED13L MKI67 MDM1 BAIAP2 FAM134A BCORL1 CstF64 TCF4 LIN28 LLGL2 AGO1 RBP ERV3-1 nSR100 ELAVL1 CTTNBP2 C22ORF28 LLGL1 DGCR8 SETD5 ASD & EE ATP2B4 GNAS KPNA1 POGZ PTBP1 PPP6R2 ASD & ID GRAMD2 COL4A3BP KIF24 TARBP2 LIN28A HNRNPU DDX21 JARID2 OBSCN AP3D1 ORF57 DOT1L ASD & SCZ

HBS1L ALDH5A1 ITPR1 AGO2 EE & ID CIC DSG2 FMR1 UPF1 ALS2CL INTS1 EE & SCZ KIF18A GNPTG DAGLB

MOV10 CABIN1 RBM47 ARMC9 ID & SCZ COL7A1 CTSL CDC34 GNAO1

FHOD1 ASD & EE & ID CELSR2

CHD5 EEF1A2 ASD & EE & SCZ Top3b SFRS1 DDX50 DISP1 DEAF1 ASD & ID & SCZ