<<

LETTER doi:10.1038/nature10989

Sporadic autism exomes reveal a highly interconnected network of de novo mutations

Brian J. O’Roak1,LauraVives1, Santhosh Girirajan1,EmreKarakoc1, Niklas Krumm1,BradleyP.Coe1,RoieLevy1,ArthurKo1,CholiLee1, Joshua D. Smith1, Emily H. Turner1, Ian B. Stanaway1, Benjamin Vernot1, Maika Malig1, Carl Baker1, Beau Reilly2,JoshuaM.Akey1, Elhanan Borenstein1,3,4,MarkJ.Rieder1, Deborah A. Nickerson1, Raphael Bernier2, Jay Shendure1 &EvanE.Eichler1,5

It is well established that autism spectrum disorders (ASD) have a per generation, in close agreement with our previous observations4, strong genetic component; however, for at least 70% of cases, the yet in general, higher than previous studies, indicating increased underlying genetic cause is unknown1. Under the hypothesis that sensitivity (Supplementary Table 2 and Supplementary Table 4)7. de novo mutations underlie a substantial fraction of the risk for We also observed complex classes of de novo mutation including: five developing ASD in families with no previous history of ASD or cases of multiple mutations in close proximity; two events consistent related phenotypes—so-called sporadic or simplex families2,3—we with paternal germline mosaicism (that is, where both siblings con- sequenced all coding regions of the genome (the exome) for tained a de novo event observed in neither parent); and nine events parent–child trios exhibiting sporadic ASD, including 189 new showing a weak minor profile consistent with somatic mosaicism trios and 20 that were previously reported4. Additionally, we also (Supplementary Table 3 and Supplementary Figs 2 and 3). sequenced the exomes of 50 unaffected siblings corresponding to Of the severe de novo events, 28% (33 of 120) are predicted to these new (n 5 31) and previously reported trios (n 5 19)4, for a truncate the protein. The distribution of synonymous, missense and total of 677 individual exomes from 209 families. Here we show nonsense changes corresponds well with a random mutation model7 that de novo point mutations are overwhelmingly paternal in (Supplementary Fig. 4 and Supplementary Table 2). However, the origin (4:1 bias) and positively correlated with paternal age, con- difference in nonsense rates between de novo and rare singleton events sistent with the modest increased risk for children of older fathers (not present in 1,779 other exomes) is striking (4:1) and suggests to develop ASD5. Moreover, 39% (49 of 126) of the most severe or strong selection against new nonsense events (Fisher’s exact test, disruptive de novo mutations map to a highly interconnected P , 0.0001). In contrast with a recent report8, we find no significant b-catenin/ remodelling protein network ranked signifi- difference in mutation rate between affected and unaffected indivi- cantly for autism candidate . In proband exomes, recurrent duals; however, we do observe a trend towards increased non- protein-altering mutations were observed in two genes: CHD8 and synonymous rates in probands, consistent with the findings of ref. 9 NTNG1. Mutation screening of six candidate genes in 1,703 ASD (Supplementary Tables 1 and 2). probands identified additional de novo, protein-altering muta- Given the association of ASD with increased paternal age5 and our tions in GRIN2B, LAMC3 and SCN1A. Combined with copy previous observations4, we used molecular cloning, read-pair informa- number variant (CNV) data, these results indicate extreme tion, and obligate carrier status to identify informative markers linked heterogeneity but also provide a target for future discovery, to 51 de novo events and observed a marked paternal bias (41:10; diagnostics and therapeutics. binomial P , 1.4 3 1025; Fig. 1a and Supplementary Tables 3 and 5). We selected 189 autism trios from the Simons Simplex Collection This provides strong direct evidence that the germline mutation rate in (SSC)6, which included males significantly impaired with autism and protein-coding regions is, on average, substantially higher in males. A intellectual disability (n 5 47), a female sample set (n 5 56) of which similar finding was recently reported for de novo CNVs10. In addition, 26 were cognitively impaired, and samples chosen at random from the we observe that the number of de novo events is positively correlated remaining males in the collection (n 5 86) (Supplementary Table 1 with increasing paternal age (Spearman’s rank correlation 5 0.19; and Supplementary Fig. 1). In general, we excluded samples known to P , 0.008; Fig. 1b). Together, these observations are consistent with carry large de novo CNVs2. Exome sequencing was performed as the hypothesis that the modest increased risk for children of older described previously4, but with an expanded target definition (see fathers to develop ASD5 is the result of an increased mutation rate. Methods). We achieved sufficient coverage for both parents and child Using sequence read-depth methods in 122 of the 189 families, we to call genotypes for, on average, 29.5 megabases (Mb) of haploid scanned ASD probands for either de novo CNVs or rare (,1% of exome coding sequence (Supplementary Table 1). In addition, we controls), inherited CNVs. Individual events were validated by either performed copy number analysis on 122 of these families, using a array CGH or genotyping array (see Methods). We identified 76 events combination of the exome data, array comparative genomic hybrid- in 53 individuals, including six de novo (median size 467 kilobases ization (CGH), and genotyping arrays, thereby providing a more com- (kb)) and 70 inherited (median size 155 kb) CNVs (Supplementary prehensive view of rare variation. Table 6). These include disruptions of EHMT1 (Kleefstra’s syndrome, In the 189 new probands, we validated 248 de novo events, 225 single Online Mendelian Inheritance in Man (OMIM) accession 610253), variants (SNVs), 17 small insertions/deletions (indels), and CNTNAP4 (reported in children with developmental delay and aut- six CNVs (Supplementary Table 2). These included 181 non- ism11) and the 16p11.2 duplication (OMIM 611913) associated with synonymous changes, of which 120 were classified as severe based developmental delay, bipolar disorder and schizophrenia. on sequence conservation and/or biochemical properties (Methods We performed a multivariate analysis on non-verbal IQ (NVIQ), and Supplementary Table 3). The observed point mutation rate in verbal IQ (VIQ) and the load of ‘extreme’ de novo mutations—where coding sequence was ,1.3 events per trio or 2.17 3 1028 per base extreme is defined as point mutations that truncate , intersect

1Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA. 2Department of Psychiatry and Behavioral Sciences, University of Washington, Seattle, Washington 98195, USA. 3Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA. 4Santa Fe Institute, Santa Fe, New Mexico 87501, USA. 5Howard Hughes Medical Institute, Seattle, Washington 98195, USA.

00 MONTH 2012 | VOL 000 | NATURE | 1 ©2012 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

a b c 140 120 A A T

T 80 100 T T 41 10 Non-verbal IQ A G paternal maternal 40 60 Paternal age (months)

A G events events 250 350 450 550 T 012 3+ 20 012+ Number of de novo coding mutations Number of extreme de novo mutations

d Chr18: 40000000 40500000 41000000 41500000

Cases

Controls

18q12.3 SETBP1 SLC14A2 EPG5 SLC14A1 SIGLEC15

Figure 1 | De novo mutation events in autism spectrum disorder. mutation events (0, n 5 138; 1, n 5 41; 21, n 5 10), both with and without a, Haplotype phasing using informative markers shows a strong parent-of- CNVs (Supplementary Discussion). d, Browser images showing CNVs origin bias with 41 of 51 de novo events occurring on the paternally inherited identified in the del(18)(q12.2q21.1) syndrome region. The truncating point haplotype. Arrows represent sequence reads from paternal (blue) or maternal mutation in SETBP1 occurs within the critical region, identifying the likely (red) haplotypes. b, c, Box and whisker plots for 189 SSC probands. b,The causative locus. Each red (deletion) and green (duplication) line represents an paternal estimated age at conception versus the number of observed de novo identified CNV in cases (solid lines) versus controls (dashed lines), with point mutations (0, n 5 53; 1, n 5 65; 2, n 5 44; 31, n 5 27). c, Decreased non- arrowheads showing point mutation. verbal IQ is significantly associated with an increasing number of extreme

Mendelian or ASD loci (n 5 57), or de novo CNVs that intersect genes The de novo mutations included truncating events in syndromic (n 5 5) (Fig. 1c and Supplementary Discussion). NVIQ, but not VIQ, intellectual disability genes (MBD5 (mental retardation, autosomal decreased significantly (P , 0.01) with increased number of events. dominant 1, OMIM 156200), RPS6KA3 (Coffin–Lowry syndrome, Covariant analysis of the samples with CNV data showed that this OMIM 303600) and DYRK1A (the Down’s syndrome candidate finding was strengthened, but not exclusively driven, by the presence , OMIM 600855)), and missense variants in loci associated with of either de novo or rare CNVs (Supplementary Fig. 5). syndromic ASD, including CHD7, PTEN (macrocephaly/autism Among the de novo events, we identified 62 top ASD risk con- syndrome, OMIM 605309) and TSC2 (tuberous sclerosis complex, tributing mutations based on the deleteriousness of the mutations, OMIM 613254). Notably, DYRK1A is a highly conserved gene functional evidence, or previous studies (Table 1). Probands with these mapping to the Down’s syndrome critical region (Supplementary mutations spanned the range of IQ scores, with only a modest non- Fig. 8). The proband here (13890) is severely cognitively impaired significant trend towards individual’s co-morbid with intellectual and microcephalic, consistent with previous studies of DYRK1A disability (Supplementary Figs 1 and 6). We observed recurrent, haploinsufficiency in both patients and mouse models18. protein-disruptive mutations in two genes: NTNG1 (netrin G1) and Twenty-one of the non-synonymous de novo mutations map to CHD8 (chromodomain DNA binding protein 8). Given their CNV regions recurrently identified in children with developmental locus-specific mutation rates, the probability of identifying two inde- delay and ASD (Supplementary Table 10), such as MBD5 (2q23.1 dele- pendent mutations in our sample set is low (uncorrected, NTNG1: tion syndrome), SYNRG (17q12 deletion syndrome) and POLRMT P , 1.2 3 1026; CHD8: P , 6.9 3 1025) (Supplementary Fig. 7, (19p13.3 deletion)19. There is also considerable overlap with genes dis- Supplementary Table 8 and Methods). NTNG1 is a strong biological rupted by single de novo CNVs in children with ASD (for example, candidate given its role in laminar organization of dendrites and axonal NLGN1 and ARID1B; Supplementary Table 11). Given the prior guidance12 and was also reported as being disrupted by a de novo trans- probability that these loci underlie genomic disorders, the disruptive location in a child with Rett’s syndrome, without MECP2 mutation13. de novo SNVs and small indels may be pinpointing the possible major Both de novo mutations identified here are missense (p.Tyr23Cys and effect locus for ASD-related features. For example, we identified a com- p.Thr135Ile) at highly conserved positions predicted to disrupt protein plex de novo mutation resulting in truncation of SETBP1 (SET binding function, although there is evidence of mosaicism for the former muta- protein 1), one of five genes in the critical region for del(18)(q12.2q21.1) tion (Supplementary Table 3). syndrome (Fig. 1d), which is characterized by hypotonia, expressive CHD8 has not previously been associated with ASD and codes for language delay, short stature and behavioural problems20. Recurrent an ATP-dependent chromatin-remodelling factor that has a signifi- de novo missense mutations at SETBP1 were recently reported to be cant role in the regulation of both b-catenin and p53 signalling14,15.We causative for a distinct phenotype, Schinzel–Giedion syndrome, also identified de novo missense variants in CHD3 as well as CHD7 probably through a gain-of-function mechanism21, indicating diverse (CHARGE syndrome, OMIM 214800), a known binding partner of phenotypic outcomes at this locus depending on mutation mechanism. CHD8 (ref. 16). ASD has been found in as many as two-thirds of Several of the mutated genes encode proteins that directly interact, children with CHARGE, indicating that CHD7 may contribute to an suggesting a common biological pathway. From our full list of genes ASD syndromic subtype17. carrying truncating or severe missense mutations (126 events from all We identified 30 protein-altering de novo events intersecting with 209 families), we generated a protein–protein interaction (PPI) net- Mendelian disease loci (Supplementary Table 3) as well as inherited work based on a database of physical interactions (Supplementary hemizygous mutations of clinical significance (Supplementary Table 9). Table 12)22. We found 39% (49 of 126) of the genes mapped to a highly

2 | NATURE | VOL 000 | 00 MONTH 2012 ©2012 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

Table 1 | Top de novo ASD risk contributing mutations set of 103 previously identified ASD genes17. We found that the genes Proband NVIQ Candidate gene Amino acid change with severe mutations ranked significantly higher than all other genes 24 12225.p1 89 ABCA2 p.Val1845Met (Mann–Whitney U-test, P , 4.0 3 10 ), suggesting enrichment of 11653.p1 44 ADCY5 p.Arg603Cys ASD candidates. Furthermore, the 49 members of the connected com- 12130.p1 55 ADNP Frameshift indel ponent overwhelmingly drove this difference (Mann–Whitney U-test, 11224.p1 112 AP3B2 p.Arg435His P , 1.6 3 1028), as the unconnected members were not significant on 13447.p1 51 ARID1B Frameshift indel 13415.p1 48 BRSK2 3n indel their own (Mann–Whitney U-test, P , 0.28), increasing our confid- 14292.p1 49 BRWD1 Frameshift indel ence that these connected gene products are probably related to ASD 11872.p1 65 CACNA1D p.Ala769Gly (Supplementary Fig. 10). Consistent with this finding, the rankings of 11773.p1 50 CACNA1E p.Gly1209Ser unaffected sibling events are highly similar to the unconnected com- 13606.p1 60 CDC42BPB p.Arg764TERM 12086.p1 108 CDH5 p.Arg545Trp ponent, strengthening our confidence in the enrichment of the con- 12630.p1 115 CHD3 p.Arg1818Trp nected component of proband events for ASD-relevant genes. 13733.p1 68 CHD7 p.Gly996Ser Members of this network have known functions in b-catenin and 13844.p1 34 CHD8 p.Gln959TERM p53 signalling, chromatin remodelling, ubiquitination and neuronal 12752.p1 93 CHD8 Frameshift indel 13415.p1 48 CNOT4 p.Asp48Asn development (Fig. 2a). A fundamental developmental regulator 12703.p1 58 CTNNB1 p.Thr551Met observed in the network is CTNNB1 (catenin (cadherin-associated 11452.p1 80 CUL3 p.Glu246TERM protein), b1, 88 kDa), also known as b-catenin. Interestingly, a parallel 11571.p1 94 CUL5 p.Val355Ile analysis using ingenuity pathway analysis (IPA) shows an enrichment 13890.p1 42 DYRK1A Splice site 12741.p1 87 EHD2 p.Arg167Cys of upstream interacting genes of the b-catenin pathway (8 of 358, 11629.p1 67 FBXO10 p.Glu54Lys P 5 0.0030; see Methods, Supplementary Table 13 and Supplemen- 13629.p1 63 GPS1 p.Arg492Gln tary Fig. 11). A role for Wnt/b-catenin signalling in ASD was previ- 13757.p1 91 GRINL1A 3n indel ously proposed24, largely on the basis of the association of common 11184.p1 94 HDGFRP2 p.Glu83Lys 11610.p1 138 HDLBP p.Ala639Ser variants in EN2 and WNT2, and the high rate of children with 11872.p1 65 KATNAL2 Splice site macrocephaly. It is striking that both individuals with CHD8 muta- 12346.p1 77 MBD5 Frameshift indel tions in this study have multiple de novo disruptive missense muta- 11947.p1 33 MDM2 p.Glu433Lys/p.Trp160TERM tions in this pathway or closely related pathways (Fig. 2b, c and 11148.p1 82 MLL3 p.Tyr4691TERM 12157.p1 91 NLGN1 p.His795Tyr Supplementary Fig. 12) and both have macrocephaly. 11193.p1 138 NOTCH3 p.Gly1134Arg In addition, the pathway analysis shows several other disrupted genes 11172.p1 60 NR4A2 p.Tyr275His not identified in the PPI that are involved in common pathways, which 11660.p1 60 NTNG1 p.Thr135Ile in some cases are linked to b-catenin (Supplementary Discussion and 12532.p1 110 NTNG1 p.Tyr23Cys 11093.p1 91 OPRL1 p.Arg157Cys Supplementary Fig. 11). TBR1, for example, is a transcription factor that 25 13793.p1 56 PCDHB4 p.Asp555His has a critical role in the development of the cerebral cortex .TBR1 11707.p1 23 PDCD1 Frameshift indel binds with CASK and regulates several candidate genes for ASD and 12304.p1 83 PSEN1 p.Thr421Ile intellectual disability including GRIN2B, AUTS2 and RELN—genes of 11390.p1 77 PTEN p.Thr167Asn recurrent ASD mutation, some of which are described here and in 13629.p1 63 PTPRK p.Arg784His 4,9,11,17 13333.p1 69 RGMA p.Val379Ile other studies . 13222.p1 86 RPS6KA3 p.Ser369TERM Our exome analysis of de novo coding mutations in 209 autism trios 11257.p1 128 RUVBL1 p.Leu365Gln identified only two recurrently altered genes, consistent with extreme 11843.p1 113 SESN2 p.Ala46Thr 12933.p1 41 SETBP1 Frameshift indel locus heterogeneity underlying ASD. This extreme heterogeneity 12565.p1 79 SETD2 Frameshift indel necessitates the analysis of very large cohorts for validation. We imple- 12335.p1 47 TBL1XR1 p.Leu282Pro mented a cost-effective approach based on molecular inversion probe 11480.p1 41 TBR1 Frameshift indel (MIP) technology26 for the targeted resequencing of six candidate 11569.p1 67 TNKS p.Arg568Thr genes in ,2,500 individuals, including 1,703 simplex ASD probands 12621.p1 120 TSC2 p.Arg1580Trp 11291.p1 83 TSPAN17 p.Ser75TERM and 744 controls. Four of these candidates (FOXP1, GRIN2B, LAMC3 11006.p1 125 UBE3C p.Ser845Phe and SCN1A) were identified previously4, whereas two (FOXP2, OMIM 12161.p1 95 UBR3 Frameshift indel 602081 and GRIN2A, OMIM 613971) are related genes implicated in 12521.p1 78 USP15 Frameshift indel other neurodevelopmental phenotypes. We identified all previously 11526.p1 92 ZBTB41 p.Tyr886His 13335.p1 25 ZNF420 p.Leu76Pro observed de novo events (that is, in the same individuals), as well as additional de novo events in GRIN2B (two protein-truncating events), CNV SCN1A (a missense) and LAMC3 (a missense) (Supplementary Table 8). Proband NVIQ Candidate gene Type The observed number of de novo events was compared with expecta- 11928.p1 66 CHRNA7 Duplication tions based on the mutation rates estimated for each gene (Methods 13815.p1 56 CNTNAP4 Deletion 13726.p1 59 CTNND1 Deletion and Supplementary Table 8), with GRIN2B showing the highest sig- 12581.p1 34 EHMT1 Deletion nificance (uncorrected P value ,0.0002). Notably, the three de novo 13335.p1 25 TBX6 Duplication events observed in GRIN2B are all predicted to be protein truncating, Top candidate mutations based on severity and/or supporting evidence from the literature. whereas no events truncating GRIN2B were found in more than 3,000 controls (Methods). interconnected network wherein 92% of gene pairs in the connected Our analysis predicts extreme locus heterogeneity underlying the component are linked by paths of three or fewer edges (Fig. 2a). We genetic aetiology of autism. Under a strict sporadic disorder–de novo tested this degree of interconnectivity by simulation (n 5 10,000 repli- mutation model, if 20–30% of our de novo point mutations are con- cates; Methods and Supplementary Fig. 9) and found that our experi- sidered to be pathogenic, we can estimate between 384 and 821 loci mental network had significantly more edges (P , 0.0001) and a (Methods and Supplementary Fig. 13). We reach a similar estimate if greater clustering coefficient (P , 0.0001) than expected by chance. we consider recurrences from ref. 9. It is clear from phenotype and To investigate the relevance of this network to autism further, we genotype data that there are many ‘autisms’ represented under the applied degree-aware disease gene prioritization (DADA)23, based on current umbrella of ASD and other genetic models are more likely the same PPI database to rank all genes based on their relatedness to a in different contexts (for example, families with multiple affected

00 MONTH 2012 | VOL 000 | NATURE | 3 ©2012 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

a b NR4A2 5 gene disruptions FBXW9 H2AFV CHD7 RPS6KA3 SFPQ De novo truncating SNV CHD8 BRWD1 De novo CUBN TSR2 DYRK1A 2x truncating SNV ADCY5 PDIA6 DEPDC7 ADNP Deletion PITRM1 CHD3 SPNS3 YTHDC2 13844.p1 Deletion CUL3 Deletion RCAN1 KRT80 NACA HDLBP CNOT1 c RUVBL1 CHD8 MAP4 IQGAP2 CNOT3 PPP3CA PRKACB ZNF143 HNRNPF HDGFRP2 CDC42BPB TOP3B EIF4G1 BRSK2 CTCF MYH10 MKI67 TBL1XR1 RCAN1 CHD8 PBRM1 ARID1B DDX20 MYBBP1A POLRMT SRBD1 KATNAL2 TLE1 CTNNB1 UBR3 CDH5 NOTCH3 SYNE1 UBE3C PSEN1 HNF4A LRP2 INCENP SCN1A CTNNB1 APOA1 ALB CUBN GC SCGB1A1 Nonsense Splice Missense Frameshift Deletion of amino acid

AMN LGMN LGALS3 GIF PID1 Figure 2 | Mutations identified in protein–protein interaction (PPI) novo truncating mutations. c, GeneMANIA22 view of three of the affected genes networks. a, The 49-gene connected component of the PPI network formed (b) (red labels) which encode proteins that are part of a b-catenin-linked from 126 genes with severe de novo mutations among the 209 probands. network. This proband is macrocephalic, impaired cognitively, and has deficits b, Proband 13844 inherits three rare gene-disruptive CNVs and carries two de in social behaviour and language development (Supplementary Discussion). individuals). There is marked convergence on genes previously impli- calculated by exon. Population normalization was performed using a set of 366 cated in intellectual disability and developmental delay. As has been non-ASD exomes. Calls were made if three or more exons passed a threshold value noted for CNVs, this indicates that nosological divisions may not and cross-validated calls using two orthogonal platforms, custom array CGH and 2 readily translate into differences at the molecular level. We believe that Illumina 1M array data . CNVs were filtered to identify de novo and rare inherited there is value in comparing mutation patterns in children with events by comparison with 2,090 controls and 1,651 parent profiles. Networkreconstruction andnull model estimation. PPI networks were generated developmental delay (without features of autism) to those in children using physical interaction data from GeneMANIA22. Null models were estimated with ASD. using gene-specific mutation rate estimates based on human–chimp divergence. To Although there is no one major genetic lesion responsible for ASD, rank candidate genes we obtained the seed ASD list from ref. 17 and severe dis- it is still largely unknown whether there are subsets of individuals with ruptive de novo events from all families (n 5 209). Given the PPI network and seed a common or strongly related molecular aetiology and how large these gene product list, we used DADA23 for ranking each gene. subsets are likely to be. Using , protein–protein inter- Human subjects. All samples and phenotypic data were collected under the actions, and CNV pathway analysis, recent reports have highlighted direction of the Simons Simplex Collection by its 12 research clinic sites (http:// the role of synapse formation and maintenance27–29. We find it sfari.org/sfari-initiatives/simons-simplex-collection). Parents consented and children intriguing that 49 proteins found to be mutated here have critical roles assented as required by each local institutional review board. Participants in fundamental developmental pathways, including b-catenin and p53 were de-identified before distribution. Research was approved by the University of Washington Human Subject Division under non-identifiable biological signalling, and that patients have been identified with multiple specimens/data. disruptive de novo mutations in interconnected pathways. The latter observations are consistent with an oligogenic model of autism where Full Methods and any associated references are available in the online version of both de novo and extremely rare inherited SNV and CNV mutations the paper at www.nature.com/nature. contribute in conjunction to the overall genetic risk. Recent work has Received 8 September 2011; accepted 23 February 2012. supported a role for these interconnected pathways in neuronal stem- Published online 4 April 2012. cell fate-determination, differentiation and synaptic formation in 24,30,31 humans and animal models . Given that fundamental develop- 1. Schaaf, C. P. & Zoghbi, H. Y. Solving the autism puzzle a few pieces at a time. mental processes have previously been found to underlie syndromic Neuron 70, 806–808 (2011). forms of autism, a wider role of these pathways in idiopathic ASD 2. Sanders, S. J. et al. Multiple recurrent de novo CNVs, including duplications of the 7q11.23 Williams syndrome region, are strongly associated with autism. Neuron would not be entirely surprising and would help explain the extreme 70, 863–885 (2011). genetic heterogeneity observed in this study. 3. Levy, D. et al. Rare de novo and transmitted copy-number variation in autistic spectrum disorders. Neuron 70, 886–897 (2011). 4. O’Roak, B. J. et al. Exome sequencing in sporadic autism spectrum disorders METHODS SUMMARY identifies severe de novo mutations. Nature Genet. 43, 585–589 (2011). Exome capture, alignments and base-calling. Genomic DNA was derived 5. Hultman, C. M., Sandin, S., Levine, S. Z., Lichtenstein, P. & Reichenberg, A. directly from whole blood. Exomes were considered to be completed when Advancing paternal age and risk of autism: new evidence from a population-based ,90% of the capture target exceeded 8-fold coverage and ,80% exceeded 20-fold study and a meta-analysis of epidemiological studies. Mol. Psychiatry 16, coverage. Exomes for the 189 trios (and 31 unaffected siblings) were captured with 1203–1212 (2010). NimbleGen EZ Exome V2.0. Reads were mapped as in ref. 4 to a custom reference 6. Fischbach, G. D. & Lord, C. The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010). genome assembly (GRC build37). Genotypes were generated with GATK unified 4 7. Lynch, M. Rate, molecular spectrum, and consequences of human mutation. Proc. genotyper and parallel SAMtools pipeline . Exomes for the unaffected siblings Natl Acad. Sci. USA 107, 961–968 (2010). matching the pilot trios were captured and analysed as in ref. 4. Predicted de novo 8. Xu, B. et al. Exome sequencing supports a de novo mutational paradigm for events were called as in ref. 4 and confirmed by capillary sequencing in all family schizophrenia. Nature Genet. 43, 864–868 (2011). members (for 176 of the 189 trios, this also included one unaffected sibling). 9. Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are Mutations were considered severe if they were truncating, missense with strongly associated with autism. Nature http://dx.doi.org/10.1038/nature10945 (this issue). Grantham score $50 and GERP score $3 or only Grantham score $85, or deleted 10. Hehir-Kwa, J. Y. et al. De novo copy number variants associated with intellectual a highly conserved amino acid. disability have a paternal origin and age bias. J. Med. Genet. 48, 776–778 (2011). Exome read-depth CNV analysis. Reads were mapped using mrsFAST and 11. O’Roak, B. J. & State, M. W. Autism genetics: strategies, challenges, and normalized reads per kilobase of exon per million mapped reads (RPKM) values opportunities. Autism Res. 1, 4–17 (2008).

4 | NATURE | VOL 000 | 00 MONTH 2012 ©2012 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH

12. Nishimura-Akiyoshi, S., Niimi, K., Nakashiba, T. & Itohara, S. Axonal netrin-Gs 31. Tedeschi, A. & Di Giovanni, S. The non-apoptotic role of p53 in neuronal biology: transneuronally determine lamina-specific subdendritic segments. Proc. Natl enlightening the dark side of the moon. EMBO Rep. 10, 576–583 (2009). Acad. Sci. USA 104, 14801–14806 (2007). 13. Borg, I. et al. Disruption of Netrin G1 by a balanced translocation in a Supplementary Information is linked to the online version of the paper at girl with Rett syndrome. Eur. J. Hum. Genet. 13, 921–927 (2005). www.nature.com/nature. 14. Nishiyama, M. et al. CHD8 suppresses p53-mediated apoptosis through Acknowledgements We would like to thank and recognize the following ongoing H1 recruitment during early embryogenesis. Nature Cell Biol. 11, 172–182 (2009). studies that produced and provided exome variant calls for comparison: NHLBI Lung 15. Thompson, B. A., Tremblay, V., Lin, G. & Bochar, D. A. CHD8 is an ATP-dependent Cohort Sequencing Project (HL 1029230), NHLBI WHI Sequencing Project (HL chromatin remodeling factor that regulates b-catenin target genes. Mol. Cell. Biol. 102924), NIEHS SNPs (HHSN273200800010C), NHLBI/NHGRI SeattleSeq (HL 28, 3894–3904 (2008). 094976), and the Northwest Genomics Center (HL 102926). We are grateful to all of the 16. Batsukh, T. et al. CHD8 interacts with CHD7, a protein which is mutated in CHARGE families at the participating Simons Simplex Collection (SSC) sites, as well as the syndrome. Hum. Mol. Genet. 19, 2858–2866 (2010). principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, 17. Betancur, C. Etiological heterogeneity in autism spectrum disorders: more than D. Geschwind, E. Hanson, D. Grice, A. Klin, R. Kochel, D. Ledbetter, C. Lord, C. Martin, 100 genetic and genomic disorders and still counting. Brain Res. 1380, 42–77 D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, (2011). M. State, W. Stone, J. Sutcliffe, C. Walsh, Z. Warren and E. Wijsman). We also 18. Moller, R. S. et al. Truncation of the Down syndrome candidate gene DYRK1A in two acknowledge M. State and the Simons Simplex Collection Genetics Consortium for unrelated patients with microcephaly. Am. J. Hum. Genet. 82, 1165–1170 (2008). providing Illumina genotyping data, T. Lehner and the Autism Sequencing Consortium 19. Cooper, G. M. et al. A copy number variation morbidity map of developmental for providing an opportunity for pre-publication data exchange among the delay. Nature Genet. 43, 838–846 (2011). participating groups. We appreciate obtaining access to phenotypic data on SFARI 20. Buysse, K. et al. Delineation of a critical region on chromosome 18 for the Base. This work was supported by the Simons Foundation Autism Research Initiative del(18)(q12.2q21.1) syndrome. Am. J. Med. Genet. A. 146A, 1330–1334 (2008). (SFARI 137578 and 191889; E.E.E., J.S. and R.B.) and NIH HD065285 (E.E.E. and J.S.). 21. Hoischen, A. et al. De novo mutations of SETBP1 cause Schinzel-Giedion E.B. is an Alfred P. Sloan Research Fellow. E.E.E. is an Investigator of the Howard Hughes syndrome. Nature Genet. 42, 483–485 (2010). Medical Institute. 22. Warde-Farley, D. et al. The GeneMANIA prediction server: biological network integration for gene prioritization and predicting gene function. Nucleic Acids Res. Author Contributions E.E.E., J.S. and B.J.O. designed the study and drafted the 38, W214–W220 (2010). manuscript. E.E.E. and J.S. supervised the study. R.B., B.R. and B.J.O. analysed the 23. Erten, S., Bebek, G., Ewing, R. & Koyutu¨rk, M. DADA: Degree-aware algorithms for clinical information. R.B., L.V., S.G., E.K., N.K. and B.P.C. contributed to the manuscript. network-based disease gene prioritization. BioData Mining 4, 19 (2011). S.G., N.K., B.P.C., A.K., C.B., M.M. and L.V. generated and analysed CNV data. B.J.O. and L.V. performed MIP resequencing and mutation validations. I.B.S., E.H.T., B.J.O. and J.S. 24. De Ferrari, G. V. & Moon, R. T. The ups and downs of Wnt signaling in prevalent developed MIP protocol and analysis. B.V. and J.M.A. generated loci-specific mutation neurological disorders. Oncogene 25, 7545–7553 (2006). rate estimates. R.L. and E.B. performed PPI network analysis and simulations. E.K. 25. Bedogni, F.et al. Tbr1 regulates regional and laminar identity of postmitotic neurons performed DADA analysis. C.L. performed Illumina sequencing. J.D.S., I.B.S., E.H.T. and in developing neocortex. Proc. Natl Acad. Sci. USA 107, 13129–13134 (2010). C.L. analysed sequence data. B.P.C. performed IPA analysis. B.J.O., E.K. and N.K. 26. Turner, E. H., Lee, C., Ng, S. B., Nickerson, D. A. & Shendure, J. Massively parallel developed the de novo analysis pipelines and analysed sequence data. D.A.N., M.J.R., exon capture and library-free resequencing across 16 genomes. Nature Methods J.D.S. and E.H.T. supervised exome sequencing and primary analysis. 6, 315–316 (2009). 27. Voineagu, I. et al. Transcriptomic analysis of autistic brain reveals convergent Author Information Access to the raw sequence reads can be found at the NCBI molecular pathology. Nature 474, 380–384 (2011). database of Genotypes and Phenotypes (dbGaP) and National Database for Autism 28. Sakai, Y. et al. Protein interactome reveals converging molecular pathways among Research under accession numbers phs000482.v1.p1 and NDARCOL0001878, autism disorders. Sci. Transl. Med. 3, 86ra49 (2011). respectively. Reprints and permissions information is available at www.nature.com/ 29. Gilman, S. R. et al. Rare de novo variants associated with autism implicate a large reprints. The authors declare competing financial interests: details accompany the functional network of genes involved in formation and function of synapses. full-text HTML version of the paper at www.nature.com/nature. Readers are welcome to Neuron 70, 898–907 (2011). comment on the online version of this article at www.nature.com/nature. 30. Ille, F. & Sommer, L. Wnt signaling: multiple functions in neural development. Cell. Correspondence and requests for materials should be addressed to E.E.E. Mol. Life Sci. 62, 1100–1108 (2005). ([email protected]) or J.S. ([email protected]).

00 MONTH 2012 | VOL 000 | NATURE | 5 ©2012 Macmillan Publishers Limited. All rights reserved RESEARCH LETTER

METHODS families selected for CNV comparisons in this study, calls were generated for 107 Exome capture, alignments and base-calling. Exomes for the 189 trios (and 31 probands. Of these, both parents were profiled for 101 families and one parent was unaffected siblings) were captured with NimbleGen EZ Exome V2.0. Final profiled for the remaining six families. In addition, at least one sibling was profiled libraries were then sequenced on either an Illumina GAIIx (paired- or single- for 99 of these families. end 76-bp reads) or HiSeq2000 (paired- or single-end 50-bp reads). Reads were Independent of array CGH detection, to identify putatively pathogenic CNVs, mapped to a custom GRCh37/hg19 build using BWA 0.5.6 (ref. 32). Read qualities we first compared our data to 2,090 control samples derived from the Wellcome 19,42 were recalibrated using GATK Table Recalibration 1.0.2905 (ref. 33). Picard-tools Trust Case Control Consortium (WTCCC) National Blood Services Cohort 1.14 was used to flag duplicate reads (http://picard.sourceforge.net/). GATK and filtered all CNVs present in 1% (20) of WTCCC2 controls or 1% (16) of IndelRealigner 1.0.2905 was used to realign reads around insertion/deletion parents by 50% reciprocal overlap with matching copy number status. In addition, (indel) sites. Genotypes were generated with GATK Unified Genotyper33 with similar to the filtering criteria used for array CGH detection, we selected only FILTER 5 ‘‘QUAL # 50.0 jj AB $ 0.75 jj HRun . 3 jj QD , 5.0’’ and in parallel CNVs that contained less than 50% segmental duplication and intersected with with the SAMtools pipeline as described previously4. Only positions with at least RefSeq coding sequence. To select putative de novo CNVs, we further required the eightfold coverage were considered. All pilot sibling exomes were captured and CNV not to be present in family-matched parents and siblings. Additionally, we analysed as described previously4. Predicted de novo events were called and com- filtered CNVs present in .0.1% (2) of the full 1,651 parent set. To select potential, pared against a set of 946 other exomes to remove recurrent artefacts and likely rare inherited events, we required the CNV be detected in a matched parent or undercalled sites. Indels were also called with the GATK Unified Genotyper and sibling. Finally, we filtered the genes inside each CNV under the same criteria (to SAMtools and filtered to those with at least 25% of reads showing a variant at a account for smaller or larger CNPs) and removed CNVs with no remaining genes. minimum depth of 83. Mutations were phased using molecular cloning of PCR CNV cross validation. High-confidence, cross-validated de novo and inherited fragments, read-pair information, linked informative SNPs, and obligate carrier CNVs were selected by identifying events detected by at least two of three status. To identify rare private variants (singleton), the full variant list was com- methodologies. To account for the variable breakpoint definitions in array pared against a larger set of 1,779 other exomes. Predicted de novo indels were also CGH, SNP arrays, and exome copy number profiles, we aligned the CNVs by at filtered against this larger set. least one overlapping gene ID and reported each CNV region by its maximal outer Sanger validations. All reported de novo events (exome or MIP capture) were boundaries. This identified six de novo and 70 rare inherited events for further validated by designing primers with BatchPrimer3 followed by PCR amplification study (Supplementary Table 6). and Sanger sequencing. We performed PCR reactions using 10 ng of DNA from Ingenuity pathway analysis. Ingenuity pathway analysis (IPA) was performed to father, mother, unaffected sibling (when available), and proband and performed identify potential functional enrichments within both our PPI (49 genes) and Sanger capillary sequencing of the PCR product using forward and reverse primers. overall set of 126 genes. RefSeq reference gene list was used as a background list In some cases, one direction could not be assessed due to the presence of repeat for all analysis. To confirm our results pertaining to CTNNB1 upstream enrich- elements or indels in close proximity to the mutation event. ment, we simulated 10,000 random populations of 209 individuals using Poisson Mutation candidate gene analysis. We examined whether each non-synonymous priors for each gene based on their estimated mutation rates (see below), with a or CNV de novo event may be contributing to the aetiology of ASD by evaluating global correction factor resulting in selecting a mean of 126 genes per population. the likelihood deleteriousness of the change (GERP, Grantham score) and We then used this simulation data to calculate the probability of observing eight intersecting with known syndromic and non-syndromic candidate genes, CNV direct upstream interactors of CTNNB1 and determined that our data set is morbidity maps, and information in OMIM and PubMed. Mutations were con- enriched for these genes with P 5 0.0030. sidered severe if they were truncating, missense with Grantham score $50 and Estimating locus-specific mutation rates. Human–chimpanzee alignments were GERP score $3 or only Grantham score $85, or deleted a highly conserved amino downloaded from the UCSC Genome Browser (reference versions GRCb37 and acid. For genes that had not previously been implicated in ASD, we gave priority to panTro2, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/vsPanTro2/syntenicNet/). those with structural similarities to known candidate or strong evidence of neural The more conservative syntenicNet alignments were used (details in http:// function or development. hgdownload.cse.ucsc.edu/goldenPath/hg19/vsPanTro2/README.txt). Gene defi- Exome read-depth CNV discovery. To find CNVs using exome read-depth data, nitions were downloaded from the UCSC Table Browser, from the RefSeq Genes we first mapped sequenced reads to the hg19 exome using the mrsFAST aligner34. track, and the refFlat table. Exons were extended by 2 bp, and overlapping exons Next, we applied a novel method (N.K. et al., manuscript in preparation), which were merged using BEDTools. Non-exonic sequence was not considered. For each uses normalized RPKM values35 of the ,194,000 captured exons/sequences, gene, we extracted: (1) d 5 the number of differences between chimpanzee and subsequent population normalization using 366 exomes from the Exome human; and (2) n 5 the number of bases aligned. We assumed a divergence time Sequencing Project and singular value decomposition to remove systematic bias between human and chimpanzee of 12 million years (Myr) and an average genera- present within exome capture reactions. Rare CNVs were detected using a tion time of 25 years. We then calculated gene-specific mutation rates per site per threshold cutoff of the normalized RPKM values, and we required at least three generation: r 5 (d/n)/(12 Myr/25 years/generation). We calculated the probability exons above our threshold in order to make a call. We made a total of 1,077 of observing X1 events using the Poisson distribution defined by the number of deletion or duplication calls in 366 individuals (range 0–14, median 5 3, screened and the size of the coding region, including actual splice mean 5 2.94). bases. CNV detection using array CGH. A custom-targeted 2 3 400K Agilent chip with Network simulation and null model estimation. To generate a null distribution median probe spacing of 500 bp in the genomic hotspots flanked by segmental of gene mutations, de novo mutation rates were estimated from human–chimp duplications or Alu repeats and probe spacing of 14 kb in the genomic backbone mutation rates. A pseudocount of 2.083331026 (the smallest calculated in the was designed. All experiments were performed according to the manufacturer’s gene set) was applied to any exon with a mutation rate of zero. To create null gene instructions using NA12878 as the female reference and NA18507 as the male sets, genes were drawn uniformly from this background distribution. Human reference (Coriell). Data analysis was performed following feature extraction using protein–protein interaction data were collected from GeneMANIA22 on 29 DNA analytics with ADM-2 setting. All CNV calls were visually inspected in the August 2011. Only direct physical interactions from the Homo sapiens database UCSC Genome Browser. CNV calls from probands were then intersected with were considered. The list comprises approximately 1.5 million physical interac- those from parents and also with 377 controls recruited through NIMH Genetics tions, gathered from 150 studies. A protein interaction network was created from Initiative36,37 and ClinSeq cohort38 analysed on the same microarray platform. The each experimental and null gene set by drawing edges between genes with physical NIMH set of controls were ascertained by the NIMH Genetics Initiative36 through interactions reported in the GeneMANIA database. Qualitatively similar results an online self-report based on the Composite International Diagnostic Instrument were achieved by including only interactions supported by multiple independent Short-Form (CIDI-SF)37. Those who did not meet DSM-IV criteria for major data sources. For each network, clustering coefficient, centralization, average shortest depression, denied a history of bipolar disorder or psychosis, and reported exclu- path length, density, and heterogeneity were determined using Cytoscape43 and sively European origins were included39,40. Samples from the ClinSeq cohort were Network Analyzer44. Duplicate- and self-interactions were not considered in cal- selected from a population representing a spectrum of atherosclerotic heart culating network statistics. disease38. De novo and inherited potential pathogenic CNVs were selected only Disease gene prioritization based on PPI networks. We applied degree-aware if they intersected with RefSeq coding sequence and allowing for a frequency of algorithms to rank a set of candidate genes with respect to a set of products of genes ,1% in the controls and ,50% segmental duplication content. associated with ASD using human PPI networks. We used the integrated human Illumina array CNV calling. CNV calling was performed in hg18 as described PPI network data collected from GeneMANIA22 on 29 August 2011. The PPI previously41, using an HMM that incorporates both allele frequencies (BAF) and network contains 12,007 proteins with ,1.5 million direct physical interactions total intensity values (logR). In total, we generated CNV calls for 841 probands, associated with a reliability score. We obtain the seed proteins for the ASD from 1,651 parents and 793 siblings including the samples reported recently2. Of the 122 the list of ref. 17. For the candidate set we used 126 gene products from the severe

©2012 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH disruptive de novo events from the pilot autism project4 and the current study. recurrent severe de novo events (affecting CHD8 and NTNG1) were pathogenic; Given the GeneMANIA PPI network and Betancur seed gene product list, we used these compose the entire set of twin pairs. The number of singletons is based on the DADA23 for ranking the candidate genes. We emphasize that this ranking is not estimated a priori fraction of the observed events that are pathogenic for autism. implying causality but rather relatedness to genes previously and independently Across this sliding scale, the estimated number of loci is plotted in Supplementary associated with ASD. For testing the significance of this ranking, we rank all the Fig. 13. For example, using the estimator from ref. 47, if 20–50% of our de novo gene products except the seed set using the same algorithm. On the basis of the severe events are considered pathogenic, exome sequencing of a large number of ranking result, we applied a Mann–Whitney U rank sum test (one-tailed) on the additional samples would reveal between 182 and 992 pathogenic genes harbour- candidate set compared to all the other genes. ing coding de novo point mutations (Supplementary Fig. 13); if all the observed MIP protocol. Each of 1,703 autism probands from the SSC collection and 744 severe de novo events in our experiment are included as pathogenic singletons, the controls from the NIMH collection was subjected to MIP-based multiplex capture number of implicated loci increases to more than 3,000. of the six genes: SCN1A, GRIN2B, GRIN2A, LAMC3, FOXP1 and FOXP2. For each library, 50 ng of DNA was used. Individually synthesized 70 mer MIPs (n 5 355) 32. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler were pooled and 59 phosphorylated with T4 PNK (NEB). Hybridization with MIPs, transform. Bioinformatics 25, 1754–1760 (2009). 33. DePristo, M. A. et al. A framework for variation discovery and genotyping using gap filling and ligation were performed in one step for 45–48 h at 60 uC, followed by next-generation DNA sequencing data. Nature Genet. 43 (2011). an exonuclease treatment of 30 min at 37 uC, similar to ref. 45,with modifications for 34. Hach, F. et al. mrsFAST: a cache-oblivious algorithm for short-read mapping. reduced MIP number (B.J.O. et al., manuscript in preparation). Amplification of the Nature Methods 7, 576–577 (2010). library was performed by PCR using different barcoded primers for each library. 35. Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and Then barcoded libraries were pooled, purified using Agencourt AMPure XP and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods 5, 621–628 one lane of 101-bp paired-end reads was generated for each mega-pool (,384) on (2008). an Illumina HiSeq 2000 according to manufacturer’s instructions. Raw reads were 36. Moldin, S. O. NIMH Human Genetics Initiative: 2003 update. Am. J. Psychiatry 160, 621–622 (2003). mapped to the genome as in ref. 4. MIP targeting arms were then removed and 37. Kessler, R. C. & Ustun, T. B. The World Mental Health (WMH) survey initiative 4 variants called using SAMtools . A 25-fold coverage, with AB allele ration ,0.7, version of the World Health Organization (WHO) Composite International and quality 30 threshold was used for high-confident variant calling. Private Diagnostic Interview (CIDI). Int. J. Methods Psychiatr. Res. 13, 93–121 (2004). (possible de novo) variants were identified by filtering against 1,779 other exomes. 38. Biesecker, L. G. et al. The ClinSeq Project: piloting large-scale genome sequencing The parents of children with disruptive rare variants were then captured. Variants for research in genomic medicine. Genome Res. 19, 1665–1674 (2009). not seen or with low coverage in the parents were validated by Sanger capillary- 39. Talati, A., Fyer, A. J. & Weissman, M. M. A comparison between screened NIMH and clinically interviewed control samples on neuroticism and extraversion. Mol. based fluorescent sequencing. No truncating variants of GRIN2B were observed in Psychiatry 13, 122–130 (2008). the MIP sequenced controls or the Exome Variant Server ESP2500 release (NHLBI 40. Baum, A. E. et al. A genome-wide association study implicates diacylglycerol kinase Exome Sequencing Project (ESP), Seattle, Washington, http://evs.gs.washington. eta (DGKH) and several other genes in the etiology of bipolar disorder. Mol. edu/EVS/). Psychiatry 13, 197–207 (2008). Estimating the number of autism loci. The gene-level specificity of exome 41. Itsara, A. et al. Population analysis of large copy number variants and hotspots of sequencing enables the estimation of the number of recurrently mutated genes human genetic disease. Am. J. Hum. Genet. 84, 148–161 (2009). implicated in the genetic aetiology of sporadic ASD. This question can be 42. Craddock, N. et al. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls. Nature 464, 713–720 (2010). reformulated as the ‘unseen species problem’ (see ref. 46 for review and ref. 2 43. Smoot, M. E., Ono, K., Ruscheinski, J., Wang, P. L. & Ideker, T. Cytoscape 2.8: new for application to de novo CNVs discovered in autism), where genes with severe de features for data integration and network visualization. Bioinformatics 27, novo events in probands are considered ‘observed species’, and binned by their 431–432 (2011). frequency of appearance (that is, singletons, doubletons, etc.). We estimated the 44. Assenov, Y., Ramirez, F., Schelhorn, S. E., Lengauer, T. & Albrecht, M. Computing total number of genes implicated in autism (the total number of species) using topological parameters of biological networks. Bioinformatics 24, 282–284 (2008). several different estimators (implemented in the R package SPECIES, http:// 45. Mamanova, L. et al. Target-enrichment strategies for next-generation sequencing. Nature Methods 7, 111–118 (2010). www.jstatsoft.org/), as well as the formula provided in ref. 2. This estimate depends 46. Bunge, J. & Fitzpatrick, M. Estimating the number of species - a Review. J. Am. Stat. on the number of singletons and twin pairs of genes observed in probands, as well Assoc. 88, 364–373 (1993). as the fraction of de novo events believed to be pathogenic for autism, that is, single, 47. Chao, A. & Lee, S. M. Estimating the number of classes via sample coverage. J. Am. disruptive events that can cause autism on their own. We assumed that both of our Stat. Assoc. 87, 210–217 (1992).

©2012 Macmillan Publishers Limited. All rights reserved Sporadic autism exomes reveal a highly interconnected protein network of de novo mutations Brian J. O’Roak, Laura Vives, Santhosh Girirajan, Emre Karakoc, Nik Krumm, Bradley P. Coe, Roie Levy, Arthur Ko, Choli Lee, Joshua D. Smith, Emily H. Turner, Ian B. Stanaway, Benjamin

SUPPLEMENTARYVernot, Maika Malig, Carl Baker, Beau INFORMATION Reilly, Joshua M. Akey, Elhanan Borenstein,doi:10.1038/nature10989 Mark J. Rieder, Deborah A. Nickerson, Raphael Bernier, Jay Shendure, and Evan E. Eichler

Supplementary Table of Contents Supplementary Discussion ...... 3 Supplementary Figure 1. Distribution of nonverbal intelligence quotient (NVIQ) of the SSC189 sample based on different mutation groupings...... 13 Supplementary Figure 2. Brower views showing complex de novo mutation events...... 14 Supplementary Figure 3. Confirmed event showing weak allele signature...... 15 Supplementary Figure 4. Observed number of mutation events fits the expected Poisson distribution...... 16 Supplementary Figure 5. Multivariate analysis to examine effect of number of “extreme” de novo coding mutations and the presence of a CNV...... 17 Supplementary Figure 6. Ratio of samples with various mutation types binned by NVIQ. 18 Supplementary Figure 7. Distribution of locus specific mutation rates bases on human- chimp comparisons...... 19 Supplementary Figure 8. DYRK1A falls in a Down Syndrome critical region disrupted by CNVs...... 20 Supplementary Figure 9. Histograms of Network statistics for 10,000 simulated null networks...... 21 Supplementary Figure 10. Protein-protein interaction network based prioritization of 126 gene products with severe mutations...... 22 Supplementary Figure 11. Top interaction network from IPA analysis of 126 genes with severe mutations...... 23 Supplementary Figure 12. De novo mutations in 12752.p1...... 24 Supplementary Figure 13. Estimating the number of genes contributing to sporadic autism pathogenicity via recurrent de novo mutation...... 25 Supplementary Table 1. Summary of sequenced families, including sex, parental age, NVIQ, CNV pre-screening, trio bases screened, and point mutations...... 26 Supplementary Table 2. Summary of Exome Sequencing Results from 209 ASD Families 27 Supplementary Table 3. All 242 de novo point mutations found in 189 trios...... 28 Supplementary Table 4. Comparison of mutation rates between O’Roak et al. and Sanders et al...... 29

! "!

WWW.NATURE.COM/NATURE | 1 RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 5. De novo events identified in 50 unaffected siblings and 20 pilot probands...... 30 Supplementary Table 6. 70 rare inherited and 6 de novo CNVs identified in 122 trios...... 31 Supplementary Table 7. Expanded top de novo ASD risk contributing mutations* ...... 39 Supplementary Table 8. Mutation rates and probability of recurrence for genes with >1 mutation...... 42 Supplementary Table 9. Selected inherited hemizygous and compound heterozygous sites...... 43 Supplementary Table 10. List of the 21 severe de novo mutations that map to regions of recurrent CNV associated with Developmental Delay and ASD...... 44 Supplementary Table 11. Other mutations intersecting previous CNV loci and animal models for ASD...... 45 Supplementary Table 12. List of the 126 genes/proteins with severe mutations used for the PPI, along w/ summary stats...... 46 Supplementary Table 13. Top IPA function for the PPI connected component...... 48 Supplementary References ...... 57

2 | WWW.NATURE.COM/NATURE

! #! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Discussion

Sample Overlap

Three of the previously reported families (12325, 12680, and 12647) are the only samples known to overlap with other studies (Sanders et al. 2012).

Rate of De Novo CNVs

We expected the de novo CNV rate for this cohort would be less than for other ASD cohorts as

77% (94/122) had previously been screened negative for large, disruptive de novo events.

Nonetheless, our observed rate of de novo CNVs (6/122, ~5%) is in line with other recent estimates for ASD1,2 owing possibly to the increased resolution of detecting gene disruptions with exome sequencing.

Effect of Multiple Genetic Lesions on Intellectual Functioning

We performed a multivariate analysis to examine effect of number of “extreme” de novo coding mutations (0, 1 or 2 or more) and the presence of either de novo or rare inherited copy number variation (122/189 probands) on nonverbal IQ (NVIQ) and verbal IQ (VIQ) (Supplementary

Fig. 5). Extreme mutations (n = 62) were defined as de novo protein truncating, intersections with known OMIM and ASD candidate genes, and CNVs predicted to be gene breaking and pathogenic. In the sample of 122 individuals for whom CNV analysis had been completed, we observed a significant decrease in NVIQ with increased numbers of events (F(2,116) = 5.45, p<.01, partial !2 = 0.09), but not in VIQ (F(2,116) = 1.13, p = ns, partial !2 = 0.02). This result in NVIQ was strengthened, but not exclusively driven, by the presence of CNVs (F(2,116) =

0.97, p = ns, partial !2 = 0.02); there was no main effect of strictly having a CNV on cognitive ability (F(2,116) = 0.71, p = ns, partial !2 = 0.006). Post hoc analyses indicated individuals with

WWW.NATURE.COM/NATURE | 3 ! $! RESEARCH SUPPLEMENTARY INFORMATION

one and two or more events scored significantly lower in NVIQ than individuals with no events

(mean difference = 18.0 points, p < 0.05, Cohen’s d = 0.63; 38.5 points, p < 0.01, d = 1.69;

respectively). The significant difference in NVIQ between individuals with no de novo coding

mutations and those individuals with two or more mutations was also observed with the complete

sample of 189 individuals (F(2,186) = 6.129, p<.01, partial !2 = 0.06) (Fig. 1c).

IPA Analysis

Within our 49 PPI network members, IPA detected the most significant functional enrichment in

Gene Expression (B-H p-value 9.45E-03-8.57E-02), Behavior (B-H p-value 9.45E-03-8.57E-02),

Organismal Development (B-H p-value 9.45E-03-7.74E-02), Embryonic Development (B-H p-

value 9.45E-03-8.01E-02), and Nervous System Development and Function (B-H p-value 9.45E-

03-8.91E-02) (Supplementary Table 13).

We then performed an additional IPA analysis on the 126 genes identified in 209 samples.

The top interconnected network consists of 22 genes (15 of which are PPI members), of which

CTNNB1 is a central node (Supplementary Fig. 11). To further investigate the potential role of

CTNNB1 interactors in autism, we selected all direct upstream interacting genes from beta-

Catenin in IPA and noted that 8/358 (p = 0.0030) were present in our mutation list. Furthermore,

we note that CTNNB1 is directly linked to multiple highly interconnected genes in the PPI

network (MYBBP1A, PBRM1, RUVBL1, TBL1XR1, and CHD8), suggesting that additional

mutated genes involved in CTNNB1 function are represented in autism. This enrichment for

CTNNB1 interactors further supports the hypothesis that the WNT/beta-catenin pathway may

play a role in the etiology of autism3.

4 | WWW.NATURE.COM/NATURE ! %! SUPPLEMENTARY INFORMATION RESEARCH

Phenotyping Summaries for Selected Families

Family 13844. Proband is second of three children with an older sister (13844.s1) and younger

brother (13844.s2).

Patient ID: 13844.fa

Summary: Father is an adult non-Hispanic white male. Age at conception of proband is 40.

Normative range of social responsiveness, but elevated score for rigidity on broader autism

phenotype. Some signs of alcoholism (use, attempting to cut down, annoyed by criticism about

drinking, feeling bad about drinking, eye opening experience). No medication use endorsed for

current or past. Some college education. Annual household income = $101–130K. Father has

head circumference of 58.5 cm (z = 1.57) and normative BMI. No comorbid diagnoses endorsed.

Patient ID: 13844.mo

Summary: Mother is an adult non-Hispanic white female. Age at conception of proband is 35.

Normative range of social responsiveness. No evidence of broader autism phenotype. Antibiotics

taken during second trimester of pregnancy with proband. Currently taking thyroid medication

and antidepressant (not taken during pregnancy). Endorsement of current tobacco use and past

marijuana use. Some college education. Annual household income = $101–130K. Mother has

head circumference of 54 cm (z = -.41) and normative BMI. No comorbid diagnoses endorsed.

Patient ID: 13844.s1

Summary: Sibling is a non-Hispanic white 10-year-old female. Normative adaptive scores and

social responsiveness from parent and teacher noted. Behavioral elevations for somatic problems

and complaints. Mother was prescribed an unspecified hormone treatment to aid with growth in

past (not currently taking). No other endorsement of medication use. Head circumference of 54

cm (z = 0.96) and normative BMI. No comorbid diagnoses endorsed. Cognitive decline following Ebstein-Barr virus reported by parents.

WWW.NATURE.COM/NATURE | 5 ! &! RESEARCH SUPPLEMENTARY INFORMATION

Patient ID: 13844.s2

Summary: Sibling is a non-Hispanic white 5-year-old male. Adaptive scores not available.

Normative social responsiveness from parent and teacher. No behavioral elevations across any

domain. No endorsement of medication use. Head circumference of 52 cm (z = 0.15) and BMI

suggestive of being underweight. No comorbid diagnoses endorsed.

Patient ID: 13844.p1

Event: de novo CHD8 truncating, de novo CUBN truncating, 2X inherited CNV

Summary: Patient is a 99-month-old non-Hispanic white male diagnosed with autism. Extremely

low VIQ (20), NVIQ (34), and adaptive (59) scores. Clinical range deficits in social

responsiveness (120). Possible loss of language skills during development and elevated social

withdrawal behaviors with no comorbid diagnoses. Large head (z = 2.62) and normal BMI. Food

allergies (gluten and casein). Gastrointestinal constipation diagnosis with bloating and abdominal

pain. Roseola diagnosed at 2.5 years and Epstein bar virus contracted at 8 years. Respiratory

problems diagnosed at 11 months and kidney problems diagnosed at 9 months. No diagnosis of

cardiac or metabolic syndromes noted. No report of congenital anomalies. Family history of

Down syndrome (maternal cousin). NICU admission shortly after birth with oxygen treatment.

Meconium aspiration at birth. Family history among several members for migraines. Currently

on GFCF diet. Took asthma medication in the past but not currently.

Family 12752. Proband is an only child.

Patient ID: 12752.fa

Summary: Patient is an adult non-Hispanic white male. Age at conception of proband is 38.

Normative range social responsiveness. Elevated score for aloofness and pragmatic social skills.

6 | WWW.NATURE.COM/NATURE ! '! SUPPLEMENTARY INFORMATION RESEARCH

Diagnosis of diabetes. Current tobacco and alcohol use endorsed. Current and past use of antihypertensive meds and medication for high cholesterol. Past use of sedatives and pain killers.

Some college education. Annual household income = $36–50K. Father has a head circumference of 59.5 cm (z = 1.56). BMI information unavailable.

Patient ID: 12752.ma

Summary: Patient is an adult non-Hispanic white female. Age at conception of proband is 36.

Normative range of social responsiveness. No presence of broader autism phenotype. No endorsement of medications currently or during pregnancy with proband. Current tobacco and alcohol use endorsed. Some college education. Annual household income = $36–50K. Mother has head circumference of 54 cm (z = -.41). BMI information unavailable. Mother has been diagnosed with heart disease.

Patient ID: 12752.p1

Event: de novo CHD8 truncating, de novo ETFB truncating, de novo IQGAP2 truncating

Summary: Patient is a 55-month-old non-Hispanic white female diagnosed with autism.

Normative range VIQ (90) and NVIQ (93) with low adaptive behavior skills (59). Clinical range deficits in social responsiveness (90). Clinical elevations in attention problems, internalizing problems, and affective problems with no comorbid diagnoses. Large head (z = 2.40) and BMI indications of being underweight. No loss or regression of language skills. Diagnosis of chronic constipation, ongoing from 3.5 months with intermittent episodes of abnormal stool.

Coordination problems noted since 3.5 months. No cardiac or metabolic syndromes noted. No report of congenital anomalies. Hyperbilirubinema diagnosis with phototherapy shortly after birth, no complications after treatment.

WWW.NATURE.COM/NATURE | 7 ! (! RESEARCH SUPPLEMENTARY INFORMATION

Patient ID: 11660.p1

Event: de novo NTNG1 missense, inherited CNV

Summary: Patient is a 60-month-old non-Hispanic white female diagnosed with autism. Low

range VIQ (63) and NVIQ (60) with low adaptive skills (65). Clinical range deficits in social

responsiveness (90). No language loss or regression noted. Clinical elevations in withdrawn

behaviors, attention difficulties, and affective problems with no comorbid diagnoses. Large head

(z = 2.5) with BMI indications of being underweight. Improvement in repetitive behaviors during

fever symptoms noted by parents. No cardiac, metabolic, or autoimmune syndromes noted.

Dysmorphology assessment indicating nondysmorphic features. No noted congenital anomalies.

Patient ID: 12532.p1

Event: de novo NTNG1 missense, de novo NAA40 missense

Summary: Patient is a 141-month-old non-Hispanic white male diagnosed with autism. Very

high VIQ (135) and normative range NVIQ (110) with low adaptive behavior composite scores

(71). Clinical range elevations in social responsiveness (74) with word loss regression occurring

early in development. Borderline and clinical range problems with attention, internalizing, and

affective problem behaviors with no comorbid diagnoses. Normal head circumference with BMI

suggesting underweight. Penicillin allergy noted beginning at 5 years of age. Diagnosis of

chronic otitis media at approximately age 6. No cardiac, metabolic, or autoimmune syndromes

noted. No report of congenital anomalies.

8 | WWW.NATURE.COM/NATURE ! )! SUPPLEMENTARY INFORMATION RESEARCH

Patient ID: 13733.p1

Event: de novo CHD7 missense, inherited CNV

Summary: Patient is a 160-month-old non-Hispanic white female diagnosed with autism.

Normative VIQ scores (90) with very low NVIQ scores (68) and adaptive scores (69). No regression or loss of language. Borderline range anxiety scores and with no comorbid mental health diagnoses. Normal head circumference and BMI. Vision problems with correction. No hearing deficits. Diagnosis of Tourette’s Syndrome at six years. Myringotomy procedure for recurrent problems with otitis media. Diagnosis of respiratory problems but no diagnosis of cardiac or metabolic syndromes. No report of congenital anomalies.

Patient ID: 11390.p1

Event: de novo PTEN missense

Summary: Patient is a 99-month-old non-Hispanic white female diagnosed with autism. Very low VIQ (57) and low NVIQ (77) with low average adaptive behavior skill scores (79). Clinical range elevations in social responsiveness (90). Language regression and word loss noted in early development as well as occurrence of nonfebrile seizures. Borderline and clinical range problems with social withdrawal, attention, and affective problematic behaviors with no comorbid diagnoses. Large head (z = 2.84) with normal range BMI scores. Chronic unusual stools noted from 6 months of age. Chronic otitis media diagnoses at 2.5 years of age with noted improvements in repetitive behaviors during periods of fever. No cardiac, metabolic, or autoimmune syndromes noted. Dysmorphology assessment indicating nondysmorphic features.

No report of congenital anomalies. Sleep difficulties noted for falling asleep with night time incontinence. Normal menstrual cycle and pubertal changes taking place. Mood stabilizer medication used in the past but not current. Special education services in school 100% of time

WWW.NATURE.COM/NATURE | 9 ! *! RESEARCH SUPPLEMENTARY INFORMATION

since age 3 and continuing to current. Occupational therapy services 1 hour per week year round

beginning at age 3 continuing to current.

Patient ID: 12346.p1

Event: de novo MBD5 truncating, de novo MYBBP1A missense, de novo PBRM1 missense

Summary: Patient is a 833-month-old non-Hispanic white male diagnosed with autism.

Normative VIQ (106) with low NVIQ (77) and adaptive behavior skills (64). Clinical range

elevations in social responsiveness (90). No language loss or regression noted in early

development. Clinical elevations in withdrawn behaviors and no comorbid diagnoses. Normative

head circumference and BMI. Chronic constipation diagnosed by PCP at 3 years of age.

Coordination problems diagnosed by PCP at 16 months. Suspected cerebral palsy (unsure) noted

at 1 year of age by orthopedist. Grand mal seizure occurrences beginning at 1 month and

occurring approximately once per month in frequency. Chicken pox contraction at four years.

Chronic Otitis Media and intermittent strep throat occurrences. No report of congenital

anomalies.

Patient ID: 13890.p1

Event: de novo DYRK1A truncating

Summary: Patient is a 164-month-old non-Hispanic white female diagnosed with autism. Very

low VIQ (26) and NVIQ (42) and adaptive behavior skills (41). Clinical range elevations in

social responsiveness (82). No language loss or regression noted in development. No clinical

elevations in behavior ratings from parents or teacher and no comorbid diagnoses. Small head (z

= -1.64) and BMI suggestive of being overweight. Vision difficulties with correction. Pollen

allergies. Intermittent constipation (undiagnosed). Chronic otitis media diagnosed by PCP at age

10 | WWW.NATURE.COM/NATURE ! "+! SUPPLEMENTARY INFORMATION RESEARCH

5. Surgery on release tendons and ligaments in right foot. Dysmorphology assessment indicates nonspecific dysmorphic features with microcephaly but no evidence of known syndrome.

Abnormal hair growth, ear structure, nose size, face size, philtrum, mouth, lips, fingers, fingernails, and feet noted upon exam.

Patient ID: 12933.p1

Event: de novo SETBP1 truncating, de novo MYO7B missense, de novo OR10Z1 missense

Summary: Patient is a 120-month-old non-Hispanic white male diagnosed with autism. Very low

VIQ (44) and NVIQ (41) and low adaptive behavior skills (68). Clinical range elevations in social responsiveness (85). No language loss or regression noted in development. Borderline and clinical elevations in anxious/depressed, attention deficit, aggression, internalizing, affective problems, oppositional problems, and externalizing behavior with no comorbid diagnoses.

Normative head circumference and BMI suggestive of being underweight. Vision difficulties with correction. Food allergies diagnosed by PCP at 4 months. Intermittent problems with vomiting diagnosed at 5 years old. Chronic acid reflux diagnosed at 7 years of age. Excessive clumsiness and coordination problems suspected beginning at age 5. Surgery at 1.1 years for undescended testicle. Adenoids removed at 7 years. Dysmorphology assessment indicates nonspecific dysmorphic features without microcephaly. Abnormal ear structure, nose size, face size, teeth, hands, fingers, thumbs, and fingernails noted upon exam.

Patient ID: 11834.p1

Event: inherited 16p12 duplication

Summary: Patient is a 126-month-old non-Hispanic white male diagnosed with autism. Very low

VIQ (43) and normative NVIQ (93) with very low adaptive behavior skills (57). Elevations in

WWW.NATURE.COM/NATURE | 11 ! ""! RESEARCH SUPPLEMENTARY INFORMATION

social responsiveness (75) with no word loss or regression in early development. Borderline

elevations in anxiety problems with no comorbid diagnoses. Large head (z = 2.29) and normative

BMI scores. Diagnosed with Tourette/tics at age 7. Diagnosed with roseola at age 1. No report of

congenital anomalies.

12 | WWW.NATURE.COM/NATURE ! "#! SUPPLEMENTARY INFORMATION RESEARCH

NVIQ SSC189 NVIQ Samples w/ any non-syn NVIQ Samples w/ "top candidate mut" NVIQ Samples w/ "extreme mut" NVIQ SSC189 NVIQ Samples w/ any non-syn 25 25 25 25 25 25 20 20 20 20 20 20 15 15 15 15 15 15 10 10 10 10 Frequency Frequency Frequency Frequency 10 10 Frequency Frequency 5 5 5 5 5 5 0 0 0 0 0 0 20 40 60 80 100 120 140 20 40 60 80 100 120 140 20 40 60 80 100 120 140 20 40 60 80 100 120 140 20 40 60 80 100 120 140 20 40 60 80 100 120 140 NVIQ NVIQ NVIQ NVIQ !"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!NVIQ !"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!!NVIQ !"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!! !!1.%22!!!34%22!!!4.%22!!!56%31!!!65%22!!&.4%22!! !!1.%22!!!3&%22!!!56%22!!!57%18!!!68%53!!&.4%22! !!1.%22!!!33%13!!!55%32!!!57%.6!!!61%53!!&.4%22!! NVIQ Samples w/ severe non-syn NVIQ Samples w/ truncating NVIQ Samples w/ severe non-syn NVIQNVIQ Samples Samples w/ "top w/ truncating candidate mut" NVIQ Samples w/ "extreme mut" 25 25 25 25 25 25 20 20 20 20 20 20 15 15 15 15 15 15 10 10 Frequency Frequency 10 10 10 10 Frequency Frequency Frequency Frequency 5 5 5 5 5 5 0 0 0 0 0 0 20 40 60 80 100 120 140 20 40 60 80 100 120 140 20 40 60 80 100 120 140 2020 4040 6060 8080 100100 120120 140140 20 40 60 80 100 120 140 NVIQ NVIQ NVIQ NVIQNVIQ NVIQ !"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!! !"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!! !"#$%!&'(!)*%!!"+,#-$!!!!"+-$!./,!)*%!!!!"-0%!! !!1.%22!!!86%22!!!55%32!!!58%88!!!68%22!!&.4%22!! !!1.%22!!!86%32!!!53%32!!!52%35!!!43%13!!&.4%22!! !!1.%22!!!3&%22!!!58%22!!!52%67!!!47%22!!&.4%22!!

Supplementary Figure 1. Distribution of nonverbal intelligence quotient (NVIQ) of the ! SSC189 sample based on different mutation groupings.

Histograms in each panel show the distribution of samples based on those having one or more event fitting each mutational category. Initial distribution was approximately bimodal. Summary statistics for each distribution are listed below. Red line indicates NVIQ of 70, the general threshold of intellectual disability.

! "$!

WWW.NATURE.COM/NATURE | 13 RESEARCH SUPPLEMENTARY INFORMATION

!"

#"

!

Supplementary Figure 2. Brower views showing complex de novo mutation events.

a, Proband reads show deletion of a G base and G/T substitution (Top), neither event is present

in the father (middle) or mother (bottom) tracks. b, Proband reads show an exonic G/C

substitution and intronic T/C substitution (Top), neither event is present in the father (middle) or

mother (bottom) tracks.

! "%!

14 | WWW.NATURE.COM/NATURE SUPPLEMENTARY INFORMATION RESEARCH

!" #"

Fa-For

Fa-Rev

Mo-For

Mo-Rev

P1-For

P1-Rev

Supplementary Figure 3. Confirmed event showing weak allele signature. a, Proband reads show G/A substitution at 24% frequency (Top), event is present in the father

(middle) or mother (bottom) tracks. b, Sanger traces in both forward (For) and reverse (Rev) confirm event and show reduced signal suggesting somatic mosaicism.

! "&!

WWW.NATURE.COM/NATURE | 15 RESEARCH SUPPLEMENTARY INFORMATION

*!"

)!" ==>#*?"301.467"

(!" 31@7715" '!"

&!"

%!" +,-./0"12"301.4567" $!"

#!"

!" !" #" $" %" &" '" +,-./0"12"8,94:15";

Supplementary Figure 4. Observed number of mutation events fits the expected Poisson

distribution.

We calculated the expected counts of mutations per person from the Poisson distribution based

on the observed trio mutation rate of 1.28. The observed number of counts per probands matches

closely to this distribution.

! "'!

16 | WWW.NATURE.COM/NATURE SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Figure 5. Multivariate analysis to examine effect of number of “extreme” de novo coding mutations and the presence of a CNV.

Extreme mutations (n = 62) were defined as de novo protein-truncating intersections with known

OMIM and ASD candidate genes and CNVs predicted to be gene breaking and pathogenic.

Boxplots show the samples with and without a CNV, either de novo or rare inherited

(Supplementary Discussion). We observed a significant decrease in NVIQ with increased numbers of events (F(2,116) = 5.45, p<.01, partial !2 = 0.09), but not in VIQ (F(2,116) = 1.13, p

= ns, partial !2 = 0.02). This result in NVIQ was strengthened, but not exclusively driven, by the presence of CNVs (F(2,116) = 0.97, p = ns, partial !2 = 0.02); there was no main effect of strictly having a CNV on cognitive ability (F(2,116) = 0.71, p = ns, partial !2 = 0.006).

! "(!

WWW.NATURE.COM/NATURE | 17 RESEARCH SUPPLEMENTARY INFORMATION

!"#$%$&%'"()*+,% <" (" 56%!"8(&:" !#'" %(>??"8$':" !#&" ?&>7!"8$@:"

!#%" 7(>'?"8$7:" '&>(!!"8%7:" !#$" (!(>((?"8$&:" !" ;6((&"8(&:" )*+"," -."," ./0/1/"," 234"," =" (" !#'" !#&" 567!"879:" !#%" ;7!"8((&:" !#$" !" )*+"," -."," ./0/1/"," 234","

Supplementary Figure 6. Ratio of samples with various mutation types binned by NVIQ. !

The proportion of individuals within various NVIQ bins with 1+ event across “disruptive”

classification schemes: any event, any nonsynonymous event, any severe nonsynonymous event,

and our “top candidate” list (Table 1). a Grouped by IQ standard deviations. We observed a

nonsignificant trend by Fisher’s exact test (lower IQ  higher probability of 1+ de novo event)

for any event (p = 0.096), any nonsynonymous event (p = 0.064), and any severe

nonsynonymous event (p = 0.052), but conflicting results for top candidates (p = 0.19). b

Grouped by high (>70) and low (<=70) to increase power. Strongest p-value 0.032, severe

nonsynonymous, again suggesting a nonsignificant trend. Error bars are 95% confidence

intervals.

! ")! 18 | WWW.NATURE.COM/NATURE

SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Figure 7. Distribution of locus specific mutation rates bases on human- chimp comparisons.

Red triangles represent CHD8: 3.95E-09, NTNG1: 2.34E-09, LAMC3: 1.73E-08, SCN1A: 8.83E-

09, GRIN2B: 6.94E-09, FOXP2: 7.01E-09, FOXP1: 5.51E-09, and GRIN2A: 9.38E-09.

! "*!

WWW.NATURE.COM/NATURE | 19 RESEARCH SUPPLEMENTARY INFORMATION

chr21 37650000 37700000 37750000 37800000 37850000 37900000

Cases

Controls 21q22.13 DYRK1A

Supplementary Figure 8. DYRK1A falls in a Down Syndrome critical region disrupted by

CNVs.

Each red (deletion) and green (duplication) line represents an identified CNV in cases (solid

lines) versus controls (dashed lines), with arrowheads showing point mutations.

20 | WWW.NATURE.COM/NATURE ! #+! SUPPLEMENTARY INFORMATION RESEARCH

4 4

3.5 3.5

3 3

2.5 2.5

2 2 Percent of Null Networks 1.5 1.5

1 1

0.5 0.5

0 0 0 50 100 150 200 250 0 0.05 0.1 0.15 0.2 0.25 Number of Edges Clustering Coefficient Supplementary Figure 9. Histograms of Network statistics for 10,000 simulated null networks.

Left: Number of edges in the network. Right: Clustering coefficient of the network. Red dotted line indicates the corresponding value in the experimentally determined network.

WWW.NATURE.COM/NATURE | 21 ! #"! RESEARCH SUPPLEMENTARY INFORMATION

0 2000 4000 6000 8000 10000 12000 Ranking of Gene Products!

Proband Unconnected! Proband Connected! Sibling Events!

Supplementary Figure 10. Protein-protein interaction network based prioritization of 12!6 gene products with severe mutations.

X-axis represents the ranking of gene products from GeneMANIA PPI database using ASD

associated gene products from Betancaur et al. as a seed. The red lines represents the gene

products (49) that are in the largest connected component of the PPI network generated using the

proband severe disruptive de novo events, gray lines represents the gene products not in the

connected component, and the blue lines represent the gene products with severe de novo

mutations in the unaffected siblings. The gene products of the connected component ranks

significantly higher compared to all other gene products (Mann-Whitney U one-sided, p < 1.6E-

08) whereas the unconnected gene products do not (Mann-Whitney U, one-sided p < 0.28).

Similarly, the siblings events compared to all other gene products do not show significant

rankings (Mann-Whitney U, one-sided p < 0.26).

22 | WWW.NATURE.COM/NATURE ! ##! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Figure 11. Top interaction network from IPA analysis of 126 genes with severe mutations.

Displayed is the highest scoring network from IPA analysis. Solid lines represent direct interactions while dashed lines indicated indirect connections. Genes not found in the PPI connected component are marked in yellow, while those in the PPI connected component are marked in blue.

WWW.NATURE.COM/NATURE | 23 ! #$!

RESEARCH SUPPLEMENTARY INFORMATION

b )&)' )."2

)!08 *+'%,- &3#-( 3 x )&++(% 2.)1 !"#$% $&'( )0),1 2.)% 12752.p1 .)37% 3 de novo Gene Disruptions $&'. ! "#$%&'()%*!+,-./!CHD8 2!34 564.#1 ./0!1 ! "#$%&'()%*!0,1!IQGAP2 ! "#$%&'()%*!0,1!ETFB +'9(5(

!

Supplementary Figure 12. De novo mutations in 12752.p1.

GeneMANIA4 view of three de novo truncating mutations (red labels) which encode proteins

that are part of a beta-catenin linked network. The proband is macrocephalic (z = 2.4) with

normal cognitive ability (VIQ = 90, NVIQ = 93) but has adaptive behavior deficits (Vineland

Standard Score = 59) with significant social impairments. (Supplementary Discussion).

! #%!

24 | WWW.NATURE.COM/NATURE SUPPLEMENTARY INFORMATION RESEARCH

Number of genes implicated in autism Number of genes implicated in autism 120 de novo SNPs; 2 doublets observed 181 de novo SNPs; 2 doublets observed 5000 5000 4000 4000 ! 3000 3000 2000 2000 Estimated number of autism genes Estimated number of autism genes 1000 1000 992 genes

182 genes 0 0

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fraction of de novo SNPs considered pathogenic Fraction of de novo SNPs considered pathogenic

Estimator Used Sanders (2010) Chao and Lee (1992)

Supplementary Figure 13. Estimating the number of genes contributing to sporadic autism pathogenicity via recurrent de novo mutation.

Left: Estimate based on 120 de novo mutations especially likely to be pathogenic (including both observed doubletons) based on GERP and Grantham score. Right: Estimate based on all de novo nonsynonymous events. Two different estimators for the “unseen species problem” were used.

! WWW.NATURE.COM/NATURE#&! | 25 RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 1. Summary of sequenced families, including sex, parental age,

NVIQ, CNV pre-screening, trio bases screened, and point mutations.

See attached Excel document

26 | WWW.NATURE.COM/NATURE ! #'! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Table 2. Summary of Exome Sequencing Results from 209 ASD Families

Nonsense % Ti/ SNV Type Total Average Silent Missense or Read- Splice CpG Tv Through Ti SSC189 all 3,513,050 18,588 1,859,735 1,631,320 16,449 5,546 3.27 20.9 Pro rare 32,028 169 10,682 20,515 581 250 2.87 31.1

DN 225 1.19 61 145 16 3 2.69 36.0

SSC189 DN 29 0.94 9 19 1 0 3.14 41.4 Sib (31)& Pilot DN 17 0.85 7 9 0 1 7.50 41.1 Pro (20) Pilot DN 21 1.11 7 12 2 0 1.63 38.1 Sib (19)

Truncatin Indels Type Total Average 3n Indels Splice g Indels SSC189 all 42,929 227 18,817 23,270 842 Pro rare 806 4.26 222 561 23

DN 17 0.09 2 15 0

SSC189 DN 0 0 0 0 0 Sib (31) Pilot DN 1 0.05 0 1 0 Pro (20) Pilot DN 0 0 0 0 0 Sib (19)

Estimated All Screened   Mutation All Rate Sub Rate ! Events Bases* Rates SSC189 2.17 1.94 242 11,144,644,963 Pro (1.90-2.46) (1.68-2.21) SSC189 1.57 1.52 29 1,838,730,129 Sib (31) (1.05-2.26) (1.01-2.20) Pilot Pro 1.92 1.81 18 937,498,416 (20) (1.14-3.04) (1.05-2.90) Pilot Sib 2.35 2.24 21 891,825,463 (19) (1.45-3.59) (1.36-3.46) &Number of children. *Count of all bases screened in child (minimum 8X), i.e. concordant in each member of the trio. !Mutations/base/generation x10^-8 with 95% Confidence intervals. ! Substitutions only, excludes indels and possible somatic mosaics. Pro = proband, Sib = unaffected sibling, DN = de novo, SSC = Simons Simplex Collection

WWW.NATURE.COM/NATURE | 27 ! #(! RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 3. All 242 de novo point mutations found in 189 trios.

See attached Excel document

28 | WWW.NATURE.COM/NATURE ! #)! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Table 4. Comparison of mutation rates between O’Roak et al. and Sanders

et al.

Strict Combined Substitutio # of Bases Total Mosaic Rate and Site/Type Substitutions Indels n Rate and Trios Screened De novo events 95% CI 95% CI (x10-8) (x10-8) UW-189 2.17 1.94 189 11,144,644,963 242 216 17 9 Probands (1.90-2.46) (1.68-2.21) UW-31 2.29 2.13 Matched 31 1,830,218,983 42 39 1 2 (1.65-3.10) (1.51-2.91) Probands UW-31 1.57 1.52 Matched 31 1,838,730,129 29 28 0 1 (1.05-2.26) (1.01-2.20) Siblings UW-Pilot 1.92 1.81 20 937,498,416 18 17 1 0 Probands (1.14-3.04) (1.05-2.90) UW-Pilot 2.02 1.91 Matched 19 890,665,738 18 17 1 0 (1.20-3.19) (1.11-3.05) Probands UW-Pilot 2.35 2.24 Matched 19 891,825,463 21 20 0 1 (1.45-3.59) (1.36-3.46) Sibs UW-All 2.15 1.93 209 12,082,143,379 260 233 18 9 Probands (1.90-2.43) (1.69-2.19)

Yale- 1.61 1.57 Quartet 200 9,718,659,190 156 153 3 0 (1.34-1.87) (1.31-1.84) Probands Yale- 1.28 1.28 Quartet 200 9,718,659,190 124 124 0 0 (1.05-1.50) (1.05-1.50) Siblings Yale-Trio 1.39 1.39 24 1,082,156,674 15 15 0 0 Probands (0.67-2.10) (0.67-2.10) Yale-All 1.58 1.56 225 10,800,815,864 171 168 3 0 Probands (1.33-1.83) (1.31-1.80)

UW: Variant calling threshold was a minimum of 8x in each member of a trio. Bases screened were calculated based on trio concordant positions at 8x (n) and converting to diploid bases (2n), adjusting for sex chromosomes. Probands and siblings were calculated separately as trio units. Observed mutation rate was calculated by dividing the total number of events by total number of bases. The exact 95% Poisson confidence intervals were generated from the observed counts and then dividend by the total number of bases to get the rate confidence intervals.

Yale: The variant calling threshold was a minimum of 20x unique reads in each member of the quartet and a minimum of 8x unique non-reference reads in the child. Bases screened were estimated by assessing the number of in each family with a minimum of 20x unique reads in each member of the quartet and converting to diploid bases (2n). Observed mutation rates were calculated by dividing the total events per sample by the number of nucleotides analyzed per sample and averaging across individuals. The 95% confidence intervals were calculated from the variance of this measure.

WWW.NATURE.COM/NATURE | 29 ! #*! RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 5. De novo events identified in 50 unaffected siblings and 20 pilot

probands.

See attached Excel document

30 | WWW.NATURE.COM/NATURE ! $+! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Table 6. 70 rare inherited and 6 de novo CNVs identified in 122 trios.

Chr (hg18) Start End Family ID CNV Type Inheritance Genes chr20 6706891 8271234 11013.p1 Gain Inherited Mother BMP2 (Gene Broken), HAO1, TMX4, PLCB1 (Gene Broken) chr2 197997071 198254315 11023.p1 Gain Inherited Mother SF3B1 (Gene Broken), COQ10B, HSPD1, HSPE1, MOBKL3, RFTN2 chr2 208742960 208763173 11023.p1 Loss Inherited Mother C2orf80 (Gene Broken) chr2 33593707 36437270 11064.p1 Gain Inherited Mother RASGRP3 (Gene Broken), FAM98A, MYADML, CRIM1 (Gene Broken) chr13 49022622 49083205 11083.p1 Gain Inherited Father RCBTB1 (Gene Broken) chr4 47538 117452 11093.p1 Gain Inherited Mother ZNF595 (Gene Broken), ZNF718 (Gene Larger Than CNV) chr8 13295432 15139503 11141.p1 Gain Inherited Mother DLC1 (Gene Broken), C8orf48, SGCZ (Gene Broken), MIR383 chr19 48612452 48651954 11141.p1 Loss Inherited Mother TEX101 (Gene Broken) chr16 79736150 79748195 11184.p1 Loss Inherited Father PKD1L2 (Gene Larger Than CNV) chr4 108066404 109130669 11190.p1 Gain Inherited Father DKK2 (Gene Broken), PAPSS1, SGMS2, CYP2U1, HADH (Gene Broken) chr4 5781934 5823956 11224.p1 Gain Inherited Father EVC (Gene Larger Than CNV) chr7 33097353 33153804 11346.p1 Gain Inherited Mother RP9, BBS9 (Gene Broken) chr5 157006525 157051157 11375.p1 Loss Inherited Father SOX30 (Gene Broken), C5orf52 chr7 11117201 12440046 11398.p1 Gain Inherited Father PHF14 (Gene Broken), THSD7A, TMEM106B, VWDE

WWW.NATURE.COM/NATURE | 31 ! $"! RESEARCH SUPPLEMENTARY INFORMATION

Chr (hg18) Start End Family ID CNV Type Inheritance Genes chr12 43879949 43967314 11452.p1 Loss Inherited Mother PLEKHA9 (Gene Broken), ANO6 (Gene Broken) chr6 88374109 88424279 11459.p1 Loss Inherited Father ORC3L (Gene Larger Than CNV) chr1 173697478 174026214 11469.p1 Gain Inherited Father TNR (Gene Broken) chr5 112931249 113726857 11469.p1 Gain Inherited Father YTHDC2 (Gene Broken), KCNN2 (Gene Broken) chr6 26077937 26393662 11480.p1 Gain Inherited Father TRIM38 (Gene Broken), HIST1H1A, HIST1H3A, HIST1H4A, HIST1H4B, HIST1H3B, HIST1H2AB, HIST1H2BB, HIST1H3C, HIST1H1C, HFE, HIST1H4C, HIST1H1T, HIST1H2BC, HIST1H2AC, HIST1H1E, HIST1H2BD, HIST1H2BE, HIST1H4D, HIST1H3D, HIST1H2AD, HIST1H2BF, HIST1H4E, HIST1H2BG, HIST1H2AE, HIST1H3E, HIST1H1D, HIST1H4F, HIST1H4G, HIST1H3F, HIST1H2BH, HIST1H3G, HIST1H2BI, HIST1H4H (Gene Broken) chr8 15992606 16066256 11556.p1 Loss Inherited Father MSR1 (Gene Broken) chr3 147531173 147649939 11653.p1 Gain Inherited Father PLSCR2 (Gene Broken) chr2 110167952 110510396 11660.p1 Loss Inherited Mother MIR4267, MALL, NPHP1, NCRNA00116 chr7 16805611 17800441 11696.p1 Gain Inherited Father AGR2 (Gene Broken), AGR3, AHR, SNX13 (Gene Broken)

32 | WWW.NATURE.COM/NATURE ! $#! SUPPLEMENTARY INFORMATION RESEARCH

Chr (hg18) Start End Family ID CNV Type Inheritance Genes chr3 37265222 37416340 11696.p1 Loss de novo NA GOLGA4, C3orf35 (Gene Broken) chr1 205383785 205599637 11707.p1 Gain Inherited Father C4BPA (Gene Broken), CD55 (Gene Broken) chr9 28181261 28338013 11707.p1 Loss Inherited Mother LINGO2 (Gene Larger Than CNV) chr17 3946651 4368839 11707.p1 Gain Inherited Father ZZEF1 (Gene Broken), CYB5D2, ANKFY1, UBE2G1, SPNS3, SPNS2 (Gene Broken) chr2 160248507 160313660 11711.p1 Gain Inherited Mother MARCH7 (Gene Broken) chr10 67942427 68177427 11711.p1 Loss Inherited Father CTNNA3 (Gene Larger Than CNV) chr4 135140571 135406410 11715.p1 Loss Inherited Mother PABPC4L chr7 48259354 48398102 11722.p1 Loss Inherited Mother ABCA13 (Gene Larger Than CNV) chr8 30067949 30170383 11722.p1 Gain Inherited Father LEPROTL1, MBOAT4, DCTN6 chr19 62680865 62710992 11753.p1 Gain Inherited Father ZNF419, ZNF773 (Gene Broken) chr16 21675903 22357384 11834.p1 Gain Inherited Father OTOA (Gene Broken), RRN3P1, LOC10019098 6, UQCRC2, PDZD9, C16orf52, VWA3A, EEF2K, POLR3E, CDR2, RRN3P3, LOC641298 (Gene Broken) chr7 153588034 154254473 11843.p1 Loss Inherited Mother DPP6 (Gene Larger Than CNV) chr12 7882951 8015652 11843.p1 Loss Inherited Father SLC2A14 (Gene Broken), SLC2A3 chr2 86145907 86418710 11895.p1 Gain Inherited Father POLR1A (Gene Broken), PTCD3, SNORD94, IMMT, MRPL35, REEP1 (Gene Broken) chr15 28712984 30303265 11928.p1 Gain de novo NA ARHGAP11B (Gene Broken), FAN1,

WWW.NATURE.COM/NATURE | 33 ! $$! RESEARCH SUPPLEMENTARY INFORMATION

Chr (hg18) Start End Family ID CNV Type Inheritance Genes MTMR10, TRPM1, MIR211, KLF13, LOC283711, OTUD7A, CHRNA7 chr6 162692702 162951722 11947.p1 Gain Inherited Father PARK2 (Gene Larger Than CNV) chr22 39048803 39223310 11947.p1 Gain Inherited Father TNRC6B (Gene Broken), ADSL, SGSM3, MKL1 (Gene Broken) chr11 14831730 14864039 11964.p1 Gain Inherited Father PDE3B (Gene Broken), CYP2R1 (Gene Broken) chr16 82990535 83030397 11964.p1 Loss Inherited Mother ATP2C2 (Gene Larger Than CNV) chr10 133426828 134361413 12118.p1 Gain Inherited Father PPP2R2D, BNIP3, JAKMIP3, DPYSL4, STK32C, LRRC27, PWWP2B, C10orf91, INPP5A (Gene Broken) chr5 112941833 112974949 12130.p1 Gain Inherited Mother YTHDC2 (Gene Broken) chr8 15992606 16066256 12130.p1 Loss Inherited Mother MSR1 (Gene Broken) chr6 107599923 107887182 12212.p1 Loss Inherited Father PDSS2 (Gene Larger Than CNV) chr11 31133684 31384778 12430.p1 Loss Inherited Mother DCDC1, DNAJC24 (Gene Broken) chr12 110659017 110799706 12581.p1 Gain Inherited Mother ACAD10 (Gene Broken), ALDH2, C12orf47, MAPKAPK5 (Gene Broken) chr9 139800210 140131086 12581.p1 Loss de novo NA EHMT1 (Gene Broken), MIR602, CACNA1B chr1 183364423 183402407 12667.p1 Gain Inherited Mother C1orf25 (Gene Broken), C1orf26 (Gene Broken) chr18 37874638 37962267 12667.p1 Gain Inherited Mother PIK3C3 (Gene Broken) chr11 50035201 50669978 12810.p1 Gain Inherited Father LOC441601, LOC646813 chr22 30835976 31086819 12810.p1 Gain Inherited Father SLC5A1 (Gene

34 | WWW.NATURE.COM/NATURE ! $%! SUPPLEMENTARY INFORMATION RESEARCH

Chr (hg18) Start End Family ID CNV Type Inheritance Genes Broken), C22orf42, RFPL2, SLC5A4, RFPL3 (Gene Broken), RFPL3S (Gene Broken) chr3 28443079 28490738 13008.p1 Loss Inherited Father ZCWPW2 (Gene Larger Than CNV) chr12 180014 645460 13008.p1 Gain Inherited Mother SLC6A12 (Gene Broken), SLC6A13, KDM5A, CCDC77, B4GALNT3, NINJ2 chr3 143305311 143566711 13335.p1 Gain Inherited Father TFDP2 (Gene Broken), GK5, XRN1 (Gene Broken) chr17 613719 644926 13335.p1 Loss Inherited Father GLOD4 (Gene Broken), RNMTL1 chr16 29502984 30107398 13335.p1 Gain de novo NA SLC7A5P1, SPN, QPRT, C16orf54, MAZ, PRRT2, C16orf53, MVP, CDIPT, LOC440356, SEZ6L2, ASPHD1, KCTD13, TMEM219, TAOK2, HIRIP3, INO80E, DOC2A, C16orf92, FAM57B, ALDOA, PPP4C, TBX6, YPEL3, GDPD3, MAPK3, LOC10027183 1, CORO1A (Gene Broken) chr17 69851956 70220158 13409.p1 Gain Inherited Father KIF19 (Gene Broken), BTBD17, GPR142, GPRC5C, CD300A, CD300LB, CD300C, CD300LD, C17orf77, CD300E, RAB37 (Gene

WWW.NATURE.COM/NATURE | 35 ! $&! RESEARCH SUPPLEMENTARY INFORMATION

Chr (hg18) Start End Family ID CNV Type Inheritance Genes Broken), CD300LF (Gene Broken) chr6 57019780 57062509 13415.p1 Loss Inherited Father KIAA1586 (Gene Broken) chr15 55430140 55596476 13415.p1 Gain Inherited Mother CGNL1 (Gene Broken) chr10 67981151 68085796 13494.p1 Loss Inherited Mother CTNNA3 (Gene Larger Than CNV) chr18 75004789 75209184 13494.p1 Loss Inherited Father ATP9B (Gene Larger Than CNV) chr14 67088466 67343145 13530.p1 Gain Inherited Father PLEKHH1 (Gene Broken), PIGH, ARG2, VTI1B, RDH11, RDH12, ZFYVE26 (Gene Broken) chr4 91076267 91171447 13533.p1 Loss Inherited Father MMRN1 (Gene Broken) chr14 73586157 73611473 13533.p1 Loss Inherited Father C14orf45 (Gene Broken), ALDH6A1 (Gene Broken) chr10 67724488 67879503 13557.p1 Loss Inherited Mother CTNNA3 (Gene Larger Than CNV) chr11 56599848 59990367 13726.p1 Loss de novo NA LRRC55, APLNR, TNKS1BP1, SSRP1, P2RX3, PRG3, PRG2, SLC43A3, RTN4RL2, SLC43A1, TIMM10, SMTNL1, UBE2L6, SERPING1, MIR130A, YPEL4, CLP1, ZDHHC5, MED19, TMX2, C11orf31, BTBD18, CTNND1, OR9Q1, OR6Q1, OR9I1, OR9Q2, OR1S2, OR1S1, OR10Q1, OR10W1, OR5B17, OR5B3, OR5B2,

36 | WWW.NATURE.COM/NATURE ! $'! SUPPLEMENTARY INFORMATION RESEARCH

Chr (hg18) Start End Family ID CNV Type Inheritance Genes OR5B12, OR5B21, LPXN, ZFP91, ZFP91-CNTF, CNTF, GLYAT, GLYATL2, LOC283194, GLYATL1, FAM111B, FAM111A, DTX4, MPEG1, OR5AN1, OR5A2, OR5A1, OR4D6, OR4D10, OR4D11, OR4D9, OSBP, MIR3162, PATL1, OR10V1, STX3, MRPL16, GIF, TCN1, PLAC1L, MS4A3, MS4A2, MS4A6A, MS4A4A, MS4A6E, MS4A7, MS4A14, MS4A5, MS4A1 (Gene Broken) chr17 10287416 10297580 13733.p1 Loss Inherited Mother MYH4 (Gene Larger Than CNV) chr8 102800655 103493638 13741.p1 Gain Inherited Father NCALD (Gene Broken), RRM2B, UBR5 (Gene Broken) chr12 19361460 19461747 13793.p1 Gain Inherited Mother PLEKHA5 (Gene Broken) chr15 55423402 55596476 13812.p1 Gain Inherited Father CGNL1 (Gene Broken) chr19 62524033 62624661 13815.p1 Loss Inherited Mother ZNF543 (Gene Broken), ZNF304, TRAPPC2P1, ZNF547, ZNF548, ZNF17 (Gene Broken) chr16 74808505 75067064 13815.p1 Loss de novo NA TERF2IP (Gene Broken), CNTNAP4 (Gene Broken)

WWW.NATURE.COM/NATURE | 37 ! $(! RESEARCH SUPPLEMENTARY INFORMATION

Chr (hg18) Start End Family ID CNV Type Inheritance Genes chr10 3113861 3204971 13844.p1 Loss Inherited Mother PFKP (Gene Broken), PITRM1 (Gene Broken) chr21 34648298 34909180 13844.p1 Gain Inherited Mother KCNE2, FAM165B, KCNE1, RCAN1 (Gene Broken)

38 | WWW.NATURE.COM/NATURE ! $)! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Table 7. Expanded top de novo ASD risk contributing mutations*

Candidate Gran- Location Proband NVIQ AA change GERP Chrom Position Ref Alt Gene tham (hg19)

12225.p1 89 ABCA2 p.VAL1845MET 3.4 21 9q34 9 139906388 C T

11653.p1 44 ADCY5 p.ARG603CYS 3.74 180 3q13.2-q21 3 123046605 G A

12130.p1 55 ADNP frameshift indel 20q13.13 20 49510027 * -TT

11224.p1 112 AP3B2 p.ARG435HIS 4.81 29 15q 15 83346497 C T

13447.p1 51 ARID1B frameshift indel 6q25.3 6 157527664 * -TGTT

13415.p1 48 BRSK2 3n indel 11p15.5 11 1466811 * -AGA

14292.p1 49 BRWD1 frameshift indel 21q22.2 21 40568453 * -T

11872.p1 65 CACNA1D p.ALA769GLY 4.91 60 3p14.3 3 53764493 C G

11773.p1 50 CACNA1E p.GLY1209SER 4.96 56 1q25-q31 1 181708295 G A

13606.p1 60 CDC42BPB p.ARG764TERM 3.19 14q32.32 14 103434646 G A

12086.p1 108 CDH5 p.ARG545TRP 4.9 101 16q22.1 16 66434715 C T

12630.p1 115 CHD3 p.ARG1818TRP 2.97 101 17p13 17 7812028 C T

13733.p1 68 CHD7 p.GLY996SER 5.48 56 8q12.2 8 61735090 G A

13844.p1 34 CHD8 p.GLN959TERM 4.98 14q11.2 14 21871178 G A

12752.p1 93 CHD8 frameshift indel 14q11.2 14 21861376 * -CT

13415.p1 48 CNOT4 p.ASP48ASN 5.52 23 7q22-qter 7 135122938 C T

12703.p1 58 CTNNB1 p.THR551MET 5.43 81 3p21 3 41275757 C T

11452.p1 80 CUL3 p.GLU246TERM 5.51 2q36.2 2 225376218 C A

11571.p1 94 CUL5 p.VAL355ILE NA 29 11q22.3 11 107944174 G A

13890.p1 42 DYRK1A splice site 5.63 21q22.13 21 38865466 G A

12741.p1 87 EHD2 p.ARG167CYS 2.55 180 19q13.3 19 48221860 C T

11629.p1 67 FBXO10 p.GLU54LYS 4.25 56 9p13.1 9 37541606 C T

13629.p1 63 GPS1 p.ARG492GLN 3.66 43 17q25.3 17 80014816 G A

13757.p1 91 GRINL1A 3n indel 15q22.1 15 58001002 * -AGA

11184.p1 94 HDGFRP2 p.GLU83LYS 4.77 56 19p13.3 19 4475539 G A

11610.p1 138 HDLBP p.ALA639SER 5.81 99 2q37.3 2 242186202 C A

11872.p1 65 KATNAL2 splice site 5.48 18q21.1 18 44603833 G C

WWW.NATURE.COM/NATURE | 39 ! $*! RESEARCH SUPPLEMENTARY INFORMATION

12346.p1 77 MBD5 frameshift indel 2q23.2 2 149225965 * -TC

p.GLU433LYS/ 11947.p1 33 MDM2 3.59 56 12q13-q14 12 69233432 G A p.TRP160TERM

11148.p1 82 MLL3 p.TYR4691TERM -6.76 7q36 7 151842339 G T

12157.p1 91 NLGN1 p.HIS795TYR 5.55 83 3q26.32 3 173999004 C T

19p13.2- 11193.p1 138 NOTCH3 p.GLY1134ARG 3.63 125 19 15290235 C G p13.1

11172.p1 60 NR4A2 p.TYR275HIS 4.53 83 2q22-q23 2 157185876 A G

1p13.2- 11660.p1 60 NTNG1 p.THR135ILE 5.36 89 1 107867061 C T p13.1

1p13.2- 12532.p1 110 NTNG1 p.TYR23CYS 3.97 194 1 107691283 A G p13.1

11093.p1 91 OPRL1 p.ARG157CYS 3.42 180 20q13.33 20 62729390 C T

13793.p1 56 PCDHB4 p.ASP555HIS 3.76 81 5q31 5 140503243 G C

11707.p1 23 PDCD1 frameshift indel 2q37.3 2 242795103 * -G & K

12304.p1 83 PSEN1 p.THR421ILE 5.61 89 14q24.3 14 73685855 C T

11390.p1 77 PTEN p.THR167ASN 5.39 65 10q23 10 89711882 C A

6q22.2- 13629.p1 63 PTPRK p.ARG784HIS 5.52 29 6 128326372 C T q22.3

13333.p1 69 RGMA p.VAL379ILE 3.41 29 15q26.1 15 93588446 C T

Xp22.2- 13222.p1 86 RPS6KA3 p.SER369TERM 4.32 X 20193403 G C p22.1

11257.p1 128 RUVBL1 p.LEU365GLN 4.62 113 3q21 3 127806574 A T

11843.p1 113 SESN2 p.ALA46THR 4.88 58 1p35.2 1 28595739 G A

12933.p1 41 SETBP1 frameshift indel 18q21.1 18 42532021 * +GG, -C

12565.p1 79 SETD2 frameshift indel 3p21.31 3 47098932 * -T

12335.p1 47 TBL1XR1 p.LEU282PRO 5.43 98 3q26.33 3 176765107 A G

11480.p1 41 TBR1 frameshift indel 2q24.2 2 162273322 * -C

11569.p1 67 TNKS p.ARG568THR 4.94 71 8p23.1 8 9567684 G C

12621.p1 120 TSC2 p.ARG1580TRP 2.06 101 16p13.3 16 2136269 C T

11291.p1 83 TSPAN17 p.SER75TERM 1.09 5q35.3 5 176078840 C A

11006.p1 125 UBE3C p.SER845PHE 5.39 155 7q36.3 7 157041114 C T

12161.p1 95 UBR3 frameshift indel 2q31.1 2 170732427 * -AAA, +C

40 | WWW.NATURE.COM/NATURE ! %+! SUPPLEMENTARY INFORMATION RESEARCH

12521.p1 78 USP15 frameshift indel 12q14 12 62775296 * -TGAG

11526.p1 92 ZBTB41 p.TYR886HIS 5.28 83 1q31.3 1 197128563 A G

13335.p1 25 ZNF420 p.LEU76PRO 2.54 98 19q13.12 19 37618120 T C

CNV

Candidate Proband Type Location Chrom Start Stop Gene

11928.p1 66 CHRNA7 DUPLICATION 15q13.3 15 30925692 32515973

13815.p1 56 CNTNAP4 DELETION 16q23.1 16 75690104 76523780

13726.p1 59 CTNND1 DELETION 11q12.1 11 56843272 60233791

12581.p1 34 EHMT1 DELETION 9 9 140680073 141023914

13335.p1 25 TBX6 DUPLICATION 16p11.2 16 29595483 30199897

*Top candidate de novo mutations based on severity and/or supporting evidence from the literature.

WWW.NATURE.COM/NATURE | 41 ! %"! RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 8. Mutation rates and probability of recurrence for genes with >1

mutation.

Gene GRIN2B LAMC3 SCN1A NTNG1 CHD8 chimp diffs 15 40 26 2 15 length mapped to 4503 4815 6134 1782 7904 chimp total length of 4503 4840 6134 1782 7904 sequence mut rate per site (diffs 3.33E-03 8.31E-03 4.24E-03 1.12E-03 1.90E-03 / mapped seq) Mutation 6.94E-09 1.73E-08 8.83E-09 2.34E-09 3.95E-09 rate/base/generation

People_Screened 1703 1703 1703 189 189 Size of Coding and 4499 4836 6130 1778 7900 Splice # of bases screened 15323594 16471416 20878780 672084 2986200 Avg# DN events 1.06E-01 2.85E-01 1.84E-01 1.57E-03 1.18E-02

#of De Novo 3 2 2 2 2

P(X+) 1.85E-04 3.37E-02 1.50E-02 1.23E-06 6.92E-05

De Novo Events Proband Type Chrom Pos (hg19) Genotype MIP GRIN2B 12681 splice 12 13722953 Y GRIN2B 12547 nonsense 12 13764762 Y GRIN2B 11691 frameshift indel 12 14019043 +G/* LAMC3 11666 missense 9 133914290 R LAMC3 11704 missense 9 133952690 R SCN1A 12499 missense 2 166848071 R SCN1A 12340 missense 2 166848006 Y

Exome NTNG1 11660 missense 1 107867061 Y NTNG1 12532 missense 1 107691283 R CHD8 12752 frameshift indel 14 21861376 -CT CHD8 13844 nonsense 14 21871178 R

42 | WWW.NATURE.COM/NATURE ! %#! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Table 9. Selected inherited hemizygous and compound heterozygous sites.

Hemizygous rare singletons Gene Proband Type Chrom Pos (hg19) Geno Type AFF2 11056 missense X 147743569 G xmr ATP7A 11388 missense X 77245127 A xmr ATRX 11096 missense X 76938115 G xmr ATRX 11504 missense X 76939319 G xmr ATRX 11827 missense X 76938353 C xmr AWAT1 12238 nonsense X 69455937 A x_truncating CASK 11291 missense X 41469222 C xmr CNGA2 13793 nonsense X 150912962 T x_truncating DMD 12157 missense X 31284928 T xmr FMR1 12390 splice X 147019617 A xmr FRMPD4 12036 missense X 12712524 C xasd FTSJ1 11425 missense X 48340830 C xmr GPR119 11172 nonsense X 129518668 A x_truncating IQSEC2 11526 missense X 53265000 A xmr NHS 11989 missense X 17745600 T xmr OCRL 11388 missense X 128674731 C xmr PRKY 13822 splice Y 7224175 T y_truncating PTCHD1 11083 missense X 23397843 T xmr PTCHD1 11638 missense X 23411984 C xmr RPGR 12157 splice X 38178241 T x_truncating RPGR 11218 missense X 38145259 A x_truncating ZNF185 12373 splice X 152101415 C x_truncating NLGN4Y 12373 nonsense Y 16734300 T yasd_truncating Compound Het rare singletons Gene Proband Type Chrom Pos (hg19) Geno Type CNTNAP4 11009 missense 16 76482674 M asd_rc CNTNAP4 11009 missense 16 76482817 R asd_rc VPS13B 11659 missense 8 100568761 R asd_rc VPS13B 11659 missense 8 100729575 M asd_rc MYH7B 11390 frameshift indel 20 33574698 -TCTG rc MYH7B 11390 missense 20 33574761 Y rc ITGAM 11863 missense 16 31308874 R rc ITGAM 11863 splice 16 31340548 R rc MATN2 11947 frameshift indel 8 99045849 -C rc MATN2 11947 missense 8 99045883 S rc QRFPR 12667 nonsense 4 122254158 Y rc QRFPR 12667 missense 4 122301478 Y rc SCN5A 13169 missense 3 38622556 Y rc SCN5A 13169 nonsense 3 38648178 Y rc_truncating_ms xmr = X-linked mental retardation loci, asd = ASD candidate loci, rc = possible recessive loci

WWW.NATURE.COM/NATURE | 43 ! %$! RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 10. List of the 21 severe de novo mutations that map to regions of

recurrent CNV associated with Developmental Delay and ASD.

#

* all cases loss#

ASD ASD

all loss* Type cases Gene gains Sig ASD Sig Sig Loss CT disorder Proband Gains Genomic Gains CT Gains Loss ASD Gains ASD Gains Sig all of mutation Sig all gains Loss

MBD5 fs 2q23.1 del 12346 0 0 0 ~ ~ 11 1 2 5.48E-02 4.52E-02

HDLBP ms 2q37 del 11610 1 0 0 ~ 6.54E-01 24 3 1 4.58E-01 5.77E-03

PDCD1 ms 2q37 del 11707 3 0 0 ~ 2.80E-01 25 1 1 2.64E-01 2.38E-04

UIMC1 ms 5q35.2 del 12285 0 0 0 ~ ~ 9 0 0 ~ 2.20E-02

PSMG4 ms 6p25.3 del 11064 2 0 0 ~ 4.28E-01 8 0 0 ~ 3.36E-02 7p22.1 GPR146 ms 11518 13 1 2 5.48E-02 2.21E-02 8 3 0 ~ 4.38E-01 del/dup TSNARE1 ms 8q24.3 del 13557 3 0 2 2.02E-02 2.80E-01 14 10 2 5.25E-01 ~

PTGES ms 9q34 del 13822 2 0 0 ~ 4.28E-01 10 0 2 2.02E-02 1.44E-02

PAEP ms 9q34 del 11109 4 2 0 ~ 6.56E-01 56 0 3 2.86E-03 4.68E-11

ABCA2 ms 9q34 del 12225 5 0 0 ~ 1.20E-01 67 4 4 1.76E-02 6.82E-09

PNPLA7 ms 9q34 del 12249 14 0 1 1.42E-01 2.60E-03 64 3 4 9.92E-03 3.39E-09 12p13.3 SLC6A13 ms 11388 14 1 0 ~ 1.54E-02 6 2 1 3.69E-01 4.38E-01 del/dup 16p13.3 TSC2 ms 12621 16 0 3 2.86E-03 1.10E-03 48 4 1 5.35E-01 6.42E-06 del/dup 16p13.3 KIAA0182 ms 11711 5 0 0 ~ 1.20E-01 14 0 1 1.42E-01 2.63E-03 del/dup 17q12 SYNRG ms 11599 18 3 0 ~ 3.61E-02 15 2 1 3.69E-01 3.54E-02 del/dup DNAH17 ms 17q25 del 11587 3 2 0 ~ ~ 42 1 4 1.80E-03 2.80E-07

GPS1 ms 17q25 del 13629 1 1 0 ~ ~ 45 4 4 1.76E-02 1.82E-05

DUS1L ms 17q25 del 13274 1 1 0 ~ ~ 45 3 4 9.92E-03 4.10E-06 19p13.3 POLRMT ns 13333 28 1 3 1.02E-02 7.00E-05 14 1 1 2.64E-01 1.54E-02 del/dup 19p13.3 HDGFRP2 ms 11184 31 1 3 1.02E-02 2.00E-05 7 0 1 1.42E-01 5.13E-02 del/dup 22q13 SBF1 ms 13793 9 0 0 ~ 2.20E-02 53 0 3 2.86E-03 1.68E-10 del/dup ms = missense, ns = nonsense, fs = frameshifting indel, All Signature Cases (n = 15,767), ASD (n = 1,379), Controls (n = 8,329) *Fisher exact p-values ASD cases versus controls, #all cases versus controls5.

44 | WWW.NATURE.COM/NATURE ! %%! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary Table 11. Other mutations intersecting previous CNV loci and animal models for ASD.

Gene Type Region Proband Notes ADNP fs 20q13.13 12130 Animal Model MYH10 ms 17p13 13742 Animal Model CACNA1D ms 3p14.3 11872 Animal Model CACNA1E ms 1q25-q31 11773 Animal Model AP3B2 ms 15q 11224 CNV Region ARID1B fs 6q25.3 13447 CNV Region BRSK2 del_aa 11p15.5 13415 CNV Region DYRK1A sp 21q22.13 13890 CNV Region NLGN1 ms 3q26.32 12157 CNV Region SETBP1 fs 18q21.1 12933 CNV Region TNKS ms 8p23.1 11569 CNV Region TSPAN17 ns 5q35.3 11291 CNV Region ms = missense, ns = nonsense, fs = frameshifting indel, del_aa = deletion of conserved amino acid

WWW.NATURE.COM/NATURE | 45 ! %&! RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 12. List of the 126 genes/proteins with severe mutations used for the

PPI, along w/ summary stats.

Protein Degree Clustering Connected Protein Degree Clustering Connected Coefficient component? Coefficient component? ADCY5 2 1 Yes RPS6KA3 1 0 Yes ADNP 15 0.949 Yes RUVBL1 24 0.489 Yes ARID1B 2 1 Yes SCN1A 5 0.7 Yes BRSK2 2 1 Yes SFPQ 29 0.374 Yes BRWD1 1 0 Yes SRBD1 1 0 Yes CDC42BPB 15 0.94285714 Yes SYNE1 2 1 Yes CDH5 1 0 Yes TBL1XR1 16 0.858 Yes CHD3 13 1 Yes TSR2 8 0.929 Yes CHD7 13 1 Yes UBE3C 5 1 Yes CHD8 18 0.68 Yes UBR3 2 1 Yes CNOT1 16 0.858 Yes YTHDC2 19 0.661 Yes CNOT3 12 0.894 Yes AMY2B 0 0 No CTNNB1 10 0.333 Yes APAF1 0 0 No CUL3 16 0.292 Yes ASAH2 0 0 No DDX20 9 0.694 Yes ASAH2C 0 0 No DEPDC7 15 0.943 Yes BMP1 0 0 No DYRK1A 2 0 Yes CACNA1D 0 0 No EIF4G1 6 0.867 Yes CACNA1E 0 0 No FBXW9 1 0 Yes COL25A1 0 0 No H2AFV 7 1 Yes CUBN 0 0 No HDGFRP2 4 0.833 Yes DDR2 0 0 No HDLBP 16 0.842 Yes DNAH17 0 0 No HNRNPF 30 0.37 Yes DNAH5 0 0 No INCENP 2 0 Yes DNAJB9 0 0 No IQGAP2 8 0.857 Yes DUS1L 0 0 No KATNAL2 4 1 Yes EFCAB8 0 0 No KRT80 6 1 Yes EHD2 0 0 No MAP4 6 1 Yes ETFB 0 0 No MKI67 6 0.667 Yes FAM45A 0 0 No MYBBP1A 29 0.34 Yes FBXO10 0 0 No MYH10 6 0.467 Yes FOXP1 0 0 No NACA 6 1 Yes GPR146 0 0 No NOTCH3 1 0 Yes GRIN2B 0 0 No NR4A2 1 0 Yes KIAA0100 0 0 No PBRM1 18 0.647 Yes KIAA0182 0 0 No PDIA6 13 0.628 Yes KRBA1 0 0 No POLRMT 2 1 Yes L1TD1 0 0 No PSEN1 2 0 Yes LAMC3 0 0 No

46 | WWW.NATURE.COM/NATURE ! %'! SUPPLEMENTARY INFORMATION RESEARCH

Protein Degree Clustering Connected Protein Degree Clustering Connected Coefficient component? Coefficient component? MBD5 0 0 No SESN2 0 0 No MCAM 0 0 No SETBP1 0 0 No MDM2 0 0 No SETD2 0 0 No MEGF11 0 0 No SGSM3 0 0 No MLL3 0 0 No SLC30A5 0 0 No MUC16 0 0 No SLC7A7 0 0 No MYO7B 0 0 No SP7 0 0 No NAA40 0 0 No ST3GAL3 0 0 No NLGN1 0 0 No STIL 0 0 No NTNG1 0 0 No STK36 0 0 No OPRL1 0 0 No SYNRG 0 0 No OR10Z1 0 0 No TBR1 0 0 No PCDHB4 0 0 No TLK2 0 0 No PCNX 0 0 No TNKS 0 0 No PDCD1 0 0 No TRIO 0 0 No PHF19 0 0 No TRPM5 0 0 No PION 0 0 No TSC2 0 0 No PITPNM3 0 0 No TSNARE1 0 0 No PLEKHA8 0 0 No TSPAN17 0 0 No PNPLA7 0 0 No USP15 0 0 No POLQ 0 0 No VPS39 0 0 No PTEN 0 0 No ZBTB41 0 0 No PTGR1 0 0 No ZNF420 0 0 No RBMS3 0 0 No ZNF644 0 0 No RGS22 0 0 No RNF160 0 0 No

WWW.NATURE.COM/NATURE | 47 ! %(! RESEARCH SUPPLEMENTARY INFORMATION

Supplementary Table 13. Top IPA function for the PPI connected component.

Category Function Function Annotation B-H p-value Molecules #

ARID1B, BRWD1, CHD3, CHD8, CTNNB1, DDX20, DYRK1A, MYBBP1A, NOTCH3, NR4A2, POLRMT, PSEN1, RUVBL1, SFPQ, Gene Expression transcription transcription 9.45E-03 TBL1XR1 15

BRWD1, CHD3, CHD8, CTNNB1, transcription of DNA DDX20, NR4A2, RUVBL1, Gene Expression transcription endogenous promoter 3.06E-02 TBL1XR1 8

transcription of AP1/CRE Gene Expression transcription element 3.42E-02 CTNNB1 1

transcription of simian Gene Expression transcription virus 40 3.42E-02 CHD3 1

transcription of LEF1 Gene Expression transcription binding site 4.32E-02 CTNNB1 1

ARID1B, CTNNB1, MYBBP1A, Gene Expression transcription transcription of DNA 4.57E-02 NOTCH3, NR4A2, TBL1XR1 6

transcription of Ets1 Gene Expression transcription binding site 4.90E-02 CTNNB1 1

transcription of Gene Expression transcription mitochondrial DNA 5.42E-02 POLRMT 1

transcription of T-cell factor recognition Gene Expression transcription sequence 6.82E-02 CTNNB1 1

transcription of TCF Gene Expression transcription binding site 6.82E-02 CTNNB1 1

transcription of Ets Gene Expression transcription element 7.06E-02 CTNNB1 1 Gene Expression transcription transcription of TATA box 8.01E-02 CTNNB1 1

activation of promoter Gene Expression activation fragment 3.06E-02 CTNNB1, RPS6KA3 2

activation of LEF1 binding Gene Expression activation site 3.06E-02 CTNNB1, PSEN1 2

activation of Tbe3 Gene Expression activation response element 4.10E-02 CTNNB1 1

activation of Tcf-4 Gene Expression activation response element 4.10E-02 CTNNB1 1

48 | WWW.NATURE.COM/NATURE ! %)! SUPPLEMENTARY INFORMATION RESEARCH

activation of T-cell factor Gene Expression activation responsive element 5.42E-02 CTNNB1 1

activation of Smad3- Gene Expression activation Smad4 binding element 8.25E-02 CTNNB1 1

activation of T-cell factor Gene Expression activation recognition sequence 8.25E-02 CTNNB1 1

transactivation of LEF1 Gene Expression transactivation binding site 4.10E-02 CTNNB1 1

transactivation of TCF Gene Expression transactivation binding site 4.10E-02 CTNNB1 1

transactivation of Tcf4 Gene Expression transactivation binding site 4.10E-02 CTNNB1 1

transactivation of RBP- Gene Expression transactivation J/CBF binding site 4.57E-02 NOTCH3 1

transactivation of androgen receptor binding Gene Expression transactivation site 4.57E-02 CTNNB1 1

transactivation of thyroid Gene Expression transactivation hormone response element 5.42E-02 TBL1XR1 1

CTNNB1, DDX20, NOTCH3, Gene Expression transactivation transactivation 7.00E-02 NR4A2, TBL1XR1 5

transactivation of DNA Gene Expression transactivation endogenous promoter 8.57E-02 NOTCH3 1

binding of progesterone Gene Expression binding response element 4.57E-02 SFPQ 1 Gene Expression binding binding of gene 5.42E-02 PSEN1 1

binding of TPA response Gene Expression binding element 6.21E-02 PSEN1 1

expression of Nfat binding Gene Expression expression site 4.57E-02 DYRK1A 1

expression of synthetic CTNNB1, DYRK1A, NOTCH3, Gene Expression expression promoter 6.82E-02 PSEN1 4

repression of p53 Gene Expression repression consensus binding site 4.57E-02 CUL3 1

translation of reporter Gene Expression translation mRNA 7.74E-02 EIF4G1 1

Behavior freezing behavior freezing behavior of mice 9.45E-03 DYRK1A, PSEN1 2

ADCY5, CHD7, NR4A2, PSEN1, Behavior locomotion locomotion 3.06E-02 SCN1A 5

WWW.NATURE.COM/NATURE | 49 ! %*! RESEARCH SUPPLEMENTARY INFORMATION

ADCY5, DYRK1A, NR4A2, PSEN1, Behavior behavior behavior 3.42E-02 SCN1A, UBR3 6 Behavior behavior behavior of mice 7.11E-02 DYRK1A, PSEN1, SCN1A 3 Behavior walking walking 3.42E-02 CHD7, SCN1A 2

psychological process of Behavior psychological process mice 4.35E-02 ADCY5, DYRK1A, PSEN1, SCN1A 4 Behavior learning learning by mice 4.57E-02 ADCY5, PSEN1 2 Behavior cognition cognition 4.58E-02 ADCY5, CHD7, PSEN1 3 Behavior motor learning motor learning by mice 4.90E-02 ADCY5 1

Behavior long-term memory long-term memory of mice 6.82E-02 PSEN1 1 Behavior startle response startle response of mice 7.06E-02 DYRK1A 1 Behavior mating behavior mating behavior of mice 8.25E-02 SCN1A 1

ADNP, CDH5, CHD7, CHD8, Organismal CTNNB1, CUL3, DYRK1A, MYH10, Development development development of organism 9.45E-03 PSEN1, RUVBL1, UBR3 11

CDH5, CHD7, CHD8, CTNNB1, Organismal CUL3, DYRK1A, MYH10, PSEN1, Development development development of animal 9.45E-03 UBR3 9

Organismal CHD7, CHD8, CTNNB1, CUL3, Development development development of embryo 9.45E-03 MYH10, PSEN1, UBR3 7

Organismal CDH5, CTNNB1, DYRK1A, MYH10, Development development development of mice 3.06E-02 PSEN1 5

Organismal development of capillary Development development plexus 4.10E-02 CDH5 1

Organismal development of blood CDH5, CHD7, CTNNB1, MYH10, Development development vessel 4.55E-02 PSEN1 5 Organismal Development development development of head 6.21E-02 CTNNB1 1

Organismal delay in development of Development development mice 6.82E-02 DYRK1A 1

ADNP, CDH5, CHD7, CHD8, Organismal developmental process of CTNNB1, CUL3, DYRK1A, MYH10, Development developmental process animal 9.45E-03 PSEN1, UBR3 10

Organismal developmental process of ADNP, CDH5, CTNNB1, DYRK1A, Development developmental process mice 3.06E-02 MYH10, PSEN1 6 Organismal Development formation formation of body axis 3.06E-02 CHD8, CTNNB1 2

Organismal formation of dorsal-ventral Development formation axis 3.42E-02 CTNNB1 1

Organismal Development morphology morphology of aortic root 3.06E-02 MYH10 1

50 | WWW.NATURE.COM/NATURE ! &+! SUPPLEMENTARY INFORMATION RESEARCH

Organismal neovascularization of Development neovascularization corpus luteum 3.06E-02 CDH5 1

Organismal patterning of umbilical Development patterning vessels 3.06E-02 CTNNB1 1 Organismal Development patterning patterning of embryo 4.54E-02 CTNNB1, PSEN1 2 Organismal Development patterning patterning of vasculature 7.06E-02 CTNNB1 1 Organismal Development morphogenesis morphogenesis of limb 3.28E-02 CHD7, CTNNB1, PSEN1 3

Organismal morphogenesis of Development morphogenesis hindlimb 3.32E-02 CHD7, CTNNB1 2 Organismal Development morphogenesis morphogenesis of arm 4.57E-02 CTNNB1 1 Organismal Development duplication duplication of body axis 4.32E-02 CHD8 1 Organismal Development angiogenesis angiogenesis of mice 4.90E-02 CDH5, MYH10 2 Organismal Development segmentation segmentation of somites 4.90E-02 PSEN1 1 Organismal Development segmentation segmentation of embryo 5.77E-02 PSEN1 1 Organismal Development growth growth of mice 5.19E-02 ADNP, CTNNB1, DYRK1A 3 Organismal Development growth delay in growth of mice 7.74E-02 DYRK1A 1

Embryonic CHD7, CHD8, CTNNB1, CUL3, Development development development of embryo 9.45E-03 MYH10, PSEN1, UBR3 7

Embryonic development of Development development trophoblast 4.32E-02 PBRM1 1

Embryonic development of second Development development branchial arch 6.82E-02 PSEN1 1

Embryonic patterning of embryonic Development patterning tissue 3.06E-02 CTNNB1, PSEN1 2

Embryonic patterning of vitelline Development patterning vessel 3.06E-02 CTNNB1 1 Embryonic Development patterning patterning of embryo 4.54E-02 CTNNB1, PSEN1 2

Embryonic maintenance of apical Development maintenance ectodermal ridge 3.06E-02 CTNNB1 1

Embryonic onset of regression of Development regression apical ectodermal ridge 3.06E-02 CTNNB1 1 Embryonic Development size size of ventricular zone 3.06E-02 CTNNB1 1 Embryonic Development morphogenesis morphogenesis of limb 3.28E-02 CHD7, CTNNB1, PSEN1 3

Embryonic morphogenesis of Development morphogenesis hindlimb 3.32E-02 CHD7, CTNNB1 2 Embryonic Development morphogenesis morphogenesis of arm 4.57E-02 CTNNB1 1

Embryonic morphogenesis of Development morphogenesis metanephros 4.57E-02 CTNNB1 1

WWW.NATURE.COM/NATURE | 51 ! &"! RESEARCH SUPPLEMENTARY INFORMATION

Embryonic Development morphogenesis morphogenesis of foregut 6.82E-02 CTNNB1 1 Embryonic Development adhesion adhesion of blastomeres 3.42E-02 CTNNB1 1

Embryonic Development formation formation of renal vesicle 3.42E-02 CTNNB1 1

Embryonic formation of apical Development formation ectodermal ridge 4.32E-02 CTNNB1 1

Embryonic formation of visceral Development formation endoderm 4.57E-02 MYH10 1 Embryonic Development formation formation of endoderm 5.77E-02 CTNNB1 1

Embryonic maturation of embryonic Development maturation cell lines 3.42E-02 NR4A2 1

Embryonic thickness of ventricular Development thickness zone 3.42E-02 PSEN1 1

Embryonic morphology of outflow Development morphology tract 4.10E-02 MYH10 1 Embryonic Development quantity quantity of somites 4.32E-02 ADNP 1

Embryonic quantity of mesenchymal Development quantity cells 4.57E-02 PSEN1 1

Embryonic quantity of embryonic cell Development quantity lines 8.01E-02 ADNP 1

Embryonic apoptosis of neural crest Development apoptosis cells 4.90E-02 CTNNB1 1 Embryonic Development segmentation segmentation of somites 4.90E-02 PSEN1 1 Embryonic Development segmentation segmentation of embryo 5.77E-02 PSEN1 1

Embryonic specification of dorsal- Development specification ventral axis 6.62E-02 CTNNB1 1

Embryonic developmental process of Development developmental process embryonic cell lines 7.25E-02 DYRK1A, NR4A2 2

Nervous System Development and development of nervous CHD7, CTNNB1, DYRK1A, MYH10, Function development system 9.45E-03 NOTCH3, NR4A2, PSEN1, RPS6KA3 8

Nervous System Development and development of central CHD7, CTNNB1, MYH10, NOTCH3, Function development nervous system 1.65E-02 PSEN1, RPS6KA3 6

Nervous System Development and Function development development of forebrain 3.06E-02 CTNNB1, NOTCH3, PSEN1 3

Nervous System Development and development of fourth Function development cerebral ventricle 3.06E-02 MYH10 1

52 | WWW.NATURE.COM/NATURE ! &#! SUPPLEMENTARY INFORMATION RESEARCH

Nervous System Development and development of third Function development cerebral ventricle 3.06E-02 MYH10 1

Nervous System Development and Function development development of brain 3.06E-02 CTNNB1, MYH10, NOTCH3, PSEN1 4

Nervous System Development and development of lateral Function development cerebral ventricle 4.32E-02 MYH10 1

Nervous System Development and development of Function development dopaminergic neurons 5.42E-02 NR4A2 1

Nervous System Development and Function size size of brain 3.06E-02 CTNNB1, DYRK1A 2

Nervous System Development and Function size size of superior colliculus 3.06E-02 DYRK1A 1

Nervous System Development and neurological process of Function neurological process brain cells 3.06E-02 ADCY5, ADNP, PSEN1 3

Nervous System Development and neurological process of ADCY5, ADNP, CTNNB1, PSEN1, Function neurological process neurons 3.42E-02 SCN1A 5

Nervous System Development and neurological process of Function neurological process corticostriatal neurons 4.32E-02 ADCY5 1

Nervous System Development and neurological process of Function neurological process cerebral cortex cells 4.42E-02 ADNP, PSEN1 2

Nervous System Development and neurological process of Function neurological process mice 4.57E-02 ADCY5, DYRK1A, NR4A2, PSEN1 4

Nervous System Development and differentiation of neuronal Function differentiation progenitor cells 3.06E-02 NOTCH3, PSEN1 2

Nervous System Development and Function differentiation differentiation of neurons 4.32E-02 BRSK2, NOTCH3, NR4A2, PSEN1 4

Nervous System Development and differentiation of amacrine Function differentiation cells 4.90E-02 NR4A2 1

! WWW.NATURE.COM/NATURE&$! | 53 RESEARCH SUPPLEMENTARY INFORMATION

Nervous System Development and differentiation of Function differentiation neuroepithelial cells 7.74E-02 PSEN1 1

Nervous System Development and differentiation of Function differentiation dopaminergic neurons 8.25E-02 NR4A2 1

Nervous System Development and Function differentiation differentiation of neuroglia 8.25E-02 CTNNB1, NOTCH3 2

Nervous System re-entry into cell cycle Development and progression of Function cell cycle progression neuroepithelial cells 3.06E-02 CTNNB1 1

Nervous System Development and Function cytostasis cytostasis of neurites 3.06E-02 MYH10 1

Nervous System Development and Function recovery recovery of brain tissue 3.06E-02 ADNP 1

Nervous System Development and surface area of cerebral Function surface area cortex 3.06E-02 CTNNB1 1

Nervous System Development and cell-cell contact of nervous Function cell-cell contact tissue cell lines 3.42E-02 CDH5 1

Nervous System Development and shrinkage of cerebral Function shrinkage cortex 3.42E-02 PSEN1 1

Nervous System Development and maturation of Function maturation dopaminergic neurons 4.10E-02 NR4A2 1

Nervous System Development and morphology of cerebral Function morphology cortex 4.10E-02 CTNNB1 1

Nervous System Development and Function neurogenesis neurogenesis of brain 4.10E-02 PSEN1 1

Nervous System Development and Function neurogenesis neurogenesis 8.02E-02 NOTCH3, PSEN1, SYNE1 3

Nervous System Development and Function retraction retraction of neurites 4.20E-02 MYH10, PSEN1 2

54 | WWW.NATURE.COM/NATURE ! &%! SUPPLEMENTARY INFORMATION RESEARCH

Nervous System Development and Function retraction retraction of dendrites 6.21E-02 PSEN1 1

Nervous System Development and Function neurotransmission neurotransmission 4.32E-02 ADNP, CTNNB1, PSEN1, SCN1A 4

Nervous System Development and neuroprotection of cortical Function neuroprotection neurons 4.32E-02 ADNP 1

Nervous System Development and tubulation of nervous Function tubulation tissue cell lines 4.32E-02 CDH5 1

Nervous System Development and Function migration migration of neurons 4.55E-02 MYH10, NR4A2, PSEN1 3

Nervous System Development and Function quantity quantity of neurons 4.55E-02 DYRK1A, NR4A2, PSEN1 3

Nervous System Development and quantity of Cajal-Retzius Function quantity neurons 4.57E-02 PSEN1 1

Nervous System Development and Function quantity quantity of amacrine cells 5.77E-02 NR4A2 1

Nervous System Development and quantity of neuroepithelial Function quantity cells 6.21E-02 CTNNB1 1

Nervous System Development and quantity of dopaminergic Function quantity neurons 8.57E-02 NR4A2 1

Nervous System Development and Function learning learning by mice 4.57E-02 ADCY5, PSEN1 2

Nervous System Development and Function motor learning motor learning by mice 4.90E-02 ADCY5 1

Nervous System Development and Function morphogenesis morphogenesis of brain 5.77E-02 PSEN1 1

Nervous System Development and Function morphogenesis morphogenesis of neurites 8.65E-02 MYH10, SYNE1 2

WWW.NATURE.COM/NATURE | 55 ! &&! RESEARCH SUPPLEMENTARY INFORMATION

Nervous System Development and Function survival survival of nervous tissue 5.77E-02 ADNP 1

Nervous System Development and synaptic transmission of Function synaptic transmission neurons 5.77E-02 ADNP, CTNNB1, PSEN1 3

Nervous System Development and synaptic transmission of Function synaptic transmission hippocampal neurons 6.82E-02 PSEN1 1

Nervous System Development and Function long-term memory long-term memory of mice 6.82E-02 PSEN1 1

Nervous System Development and long-term potentiation of Function long-term potentiation hippocampal neurons 6.82E-02 PSEN1 1

Nervous System Development and proliferation of cerebral Function proliferation cortex cells 7.06E-02 DYRK1A 1

Nervous System Development and proliferation of neural Function proliferation precursor cells 8.01E-02 DYRK1A 1

Nervous System Development and Function startle response startle response of mice 7.06E-02 DYRK1A 1

Nervous System Development and Function extension extension of neurites 8.01E-02 ADNP, MYH10 2

Nervous System Development and Function axonogenesis axonogenesis of axons 8.57E-02 PSEN1 1

Nervous System Development and long term depression of Function long term depression brain cells 8.57E-02 ADCY5 1

Nervous System Development and long term depression of Function long term depression neurons 8.91E-02 ADCY5 1

56 | WWW.NATURE.COM/NATURE! &'! SUPPLEMENTARY INFORMATION RESEARCH

Supplementary References

1 Sanders, S. J. et al. Multiple recurrent de novo CNVs, including duplications of

the 7q11.23 Williams syndrome region, are strongly associated with autism.

Neuron 70, 863-885, doi:10.1016/j.neuron.2011.05.002 (2011).

2 Levy, D. et al. Rare de novo and transmitted copy-number variation in autistic

spectrum disorders. Neuron 70, 886-897, doi:10.1016/j.neuron.2011.05.015

(2011).

3 De Ferrari, G. V. & Moon, R. T. The ups and downs of Wnt signaling in prevalent

neurological disorders. Oncogene 25, 7545-7553 (2006).

4 Warde-Farley, D. et al. The GeneMANIA prediction server: biological network

integration for gene prioritization and predicting gene function. Nucleic acids

research 38, W214-220, doi:10.1093/nar/gkq537 (2010).

5 Cooper, G. M. et al. A copy number variation morbidity map of developmental

delay. Nature Genetics 43, 838-846, doi:10.1038/ng.909 (2011).

WWW.NATURE.COM/NATURE | 57 ! &(!