Estimating the effect-size of gene dosage on across the coding genome

Abbreviations:

Samples: ● unselected cohorts/populations ● autism cohorts/populations ● G-Scot: generation Scotland ● IMAGEN ● CaG-Omni2.5; CaG-GSA: Cartagene (by technology) ● LBC1936: Lothian birth cohort ● SSC-Omni2.5; SSC-1Mv1; SSC-1Mv3: Simon simplex collection (by technology) ● SYS parents; SYS children: Saguenay youth study ● MSSNG ● Sainte-Justine family genetic cohort Statistics: ● n: sample size ● P: p-value ● SE: standard error ● SD: standard deviation ● ρ: correlation /genetic scores: ● pLI, ● pLI (gnomAD) ● O/E: observed over expected ● O/E-upper CI ● RVIS ● DEL score; DUP score ● PPI: protein-protein interaction ● DS ● PSD genes ● FMRP genes ● eQTL ● No. of genes Other variables: ● “type of test-cohort” variable ● g-factor: general factor ● NVIQ: non-verbal quotient ● Age in years ● z-scored measure of general intelligence

1

Supplementary Online Content

Supplementary Information ...... 6

1. Cohorts ...... 6

1.1. General population cohorts ...... 6

1.2. Disease cohorts ...... 7

2. Definition of phenotypes: IQ and g-factor ...... 8

2.1. Evaluation of IQ ...... 8

2.2. G-factor computation ...... 9

3. Genetic information ...... 11

3.1. CNV calling ...... 11

3.2. CNV filtering ...... 12

3.3. Genetic analysis of pairwise relatedness ...... 13

3.4. Criteria to remove outlier individual with particular CNV ...... 13

3.5. Definition of recurrent CNVs ...... 13

3.6. Annotation of CNVs ...... 14

4. Statistical analyses ...... 14

4.1. Coverage ...... 14

4.2. Assessing age and sex biases in the cohorts ...... 15

4.3. Meta-analyses assessing the effect of pLI on general intelligence ...... 17

4.4. Mega-analysis assessing the effect of pLI or 1/LOEUF on general intelligence ...... 18

4.5. Concordance analysis between prediction of our models and literature observations ...... 21

4.6. Estimation of the probability of a de novo CNV using haploinsufficiency scores ...... 21

4.7. Estimating the effect size of genes categories by LOEUF ...... 22

4.8. R packages used for statistical analyses ...... 23

Extended Data Tables ...... 24

Extended Data Figures ...... 47

2

List of Extended Data Tables

Supplementary table 1: Correlation and concordance between NVIQ and g-factor in 3 unselected populations...... 24 Supplementary table 2: Estimates associated with pLI within each sample included in the meta-analyse...... 24 Supplementary table 3: Comparison of estimates associated with pLI when measuring the effect-size of CNVs on IQ and g-factor...... 25 Supplementary table 4: Comparison of model fit for general intelligence according to annotation score...... 25 Supplementary table 5: Estimated effects of individual genic region on general intelligence according to pLI or 1/LOEUF...... 26 Supplementary table 6: Estimated effects of CNVs on general intelligence, for different thresholds of exonic proportion of gene overlapping a CNV that defines genes involved in sum of pLI or 1/LOEUF computation ...... 28 Supplementary table 7: Estimated effects of CNVs on general intelligence, for different thresholds of intronic proportion of gene overlapping a CNV that defines genes involved in sum of pLI or 1/LOEUF computation ...... 30 Supplementary table 8: Linear regression models including pLI x age interaction as predictor of general intelligence...... 31 Supplementary table 9: Linear regression models including 1/LOEUF x age interaction as predictor of general intelligence...... 31 Supplementary table 10: Linear regression models including pLI x sex interaction as predictor of general intelligence...... 32 Supplementary table 11: Linear regression models including 1/LOEUF x sex interaction as predictor of general intelligence...... 32 Supplementary table 12: Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of pLI on general intelligence on general intelligence...... 33 Supplementary table 13: Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of 1/LOEUF on general intelligence...... 34 Supplementary table 14 : Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of pLI or 1/LOEUF on general intelligence for 3 sensitivity analyses based on exclusion of individuals carrying recurrent CNV or CNV containing ID-gene...... 35 Supplementary table 15: Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of pLI or 1/LOEUF, by gene category (ID- and non ID-genes) on general intelligence...... 35

3

Supplementary table 16 : Distribution of the effect associated with deletion or duplication of genes by gene categories...... 36 Supplementary table 17: Estimates obtained for models performed in Ste-Justine neurodevelopmental cohort ...... 37 Supplementary table 18: Description of the effect of 47 recurrent CNVs on general intelligence and probability of being de novo, estimated using empirical data from the literature and/or UKBB and using our models...... 39 Supplementary table 19 : Detailed table of content for the de novo analysis using Decipher, Sainte-Justine UHC, MSSNG, SSC, Imagen, Generation Scotland, and SYS cohorts...... 40 Supplementary table 20 : Results of the probability of being de novo in function of pLI and 1/LOEUF.41 Supplementary table 21: Estimated effects of deleted or duplicated individual genes on general intelligence according to 4 categories of tolerance to pLOF defined by LOEUF...... 42 Supplementary table 22: Estimated effects of deleted of duplicated individual genes on general intelligence according to several moving categories defined by a sliding window on LOEUF...... 44 Supplementary table 23 : Empirical data on recurrent CNVs ...... 45

4

List of Extended Data Figures

Supplementary figure 1: Distribution of z-scored general intelligence measured either by NVIQ or g-factor according to the age of individuals and colored by cohort...... 47 Supplementary figure 2: Sensitivity analyses for model based on pLI score...... 48 Supplementary figure 3: Effect size associated with pLI divided by category of intellectual disability (ID) genes on general intelligence...... 48 Supplementary figure 4 : Concordance between observation from literature and UKBB for CNV effects on general intelligence...... 49 Supplementary figure 5 : Concordance between model predictions and published observations for CNV effects on general intelligence...... 50 Supplementary figure 6: Estimated probability of de novo, based on model including pLI for ID and non ID genes, and its concordance with de novo frequency observed in literature...... 51 Supplementary figure 7: Estimated effects of individual genes on general intelligence according to categories based on LOEUF for duplications...... 52 Supplementary figure 8: Prediction of gene coverage according to the sample size ...... 53 Supplementary figure 9 : Distribution of z-scored measure of general intelligence before (left panels) and after (right panels) adjustment for age (g-factor in general populations)...... 54 Supplementary figure 10 : Distribution of z-scored measure of general intelligence before (left panels) and after (right panels) adjustment for sex (NVIQ in autism populations)...... 55

5

SUPPLEMENTARY INFORMATION

1. Cohorts

We included five cohorts from the general population, two autism cohorts and one familial cohort with at least one CNV-carrier child recruited for neurodevelopmental disorder (Table1). All cohorts were reviewed by local institutional review boards. Parents/guardians and adult participants gave written informed consent and minors gave assent.

1.1. General population cohorts

We included 1744 unrelated adolescents from Imagen30 and 967 children from the Saguenay

Youth Study (SYS)31, two cohorts with available (IQ) measures, previously pooled and studied in Huguet et al 201819. Here both cohorts of children were analyzed separately. We also included 602 parents from SYS for which a g-factor was derived. In this study, we added three other general population cohorts.

CartaGene (CaG)32 is a cohort of 6,184 unrelated French Canadians. After quality control, we included in our analysis 2,589 individuals with an available cognitive evaluation to compute the g-factor. When data are not pooled, CaG cohort was divided into two samples: Cag-GSA

(N=2,074) and CAG-Omni2.5 (N=515) according to genotyping technology.

The Generation Scotland (G-Scot)33 is a cohort including 16,884 individuals from the UK

(6,020 families). After quality control, we keep for our analysis 13,745 individuals from 5,622 families for which cognitive evaluation allowed us to compute the g-factor34.

The Lothian Birth Cohort35 1936 (LBC1936) includes 1,091 individuals with IQ measured longitudinally in 1947 and in 2004-2007. After quality control, we included in our analysis 504 individuals having an IQ measure obtained at ~70 years old.

6

Of notes, cognitive measurements other than IQ are available for Imagen, SYS-children and LBC1936 such as a g-factor could be derived to assess the correlation between IQ and g- factor in these three samples.

1.2. Disease cohorts

Simon Simplex Collection (SSC)36 cohort includes 2,590 simplex families with one child affected by autism (proband), its parents and for some, its unaffected sibling. Only probands, assessed with IQ test, were kept in our study leading to 2,562 individuals after quality control.

We analyzed separately three SSC samples: SSC-1Mv1 (N=332), SSC-1Mv3 (N=1,182) and

SSC-Omni2.5 (N=1,048) according to genotyping technology.

MSSNG37 is an international consortium (N=3,441) with 2,129 simplex and 617 multiplex autism families including at least one to five affected children. After quality control, we included in our analysis 1379 probands with available IQ measure.

We also included the Sainte-Justine family genetic cohort. Cognitive and behavioral measures were collected in families who carry a CNVs classified as pathogenic or as a variant of unknown significance (VUS). CNV carriers and their first-degree relatives were ascertained from the pediatric developmental disorder clinic. Data includes 75 families with at least one child carrying a large or a recurrent CNV. All 282 participants (70 fathers, 100 mothers, 75 probands,

37 siblings) were assessed with IQ test. The characteristics of this cohort reflects the criteria for chromosomal microarray (CMA) testing in the developmental pediatric clinic which include: intellectual disabilities (ID), disabilities, autism spectrum disorder as well as children with several comorbidities including attention deficit hyperactivity disorder (ADHD), speech and language disorders and developmental coordination disorders (DCD). The same assessments were performed in first degree relatives (carriers and non-carriers) making it possible to adjust for the effect of additional genetic and environmental background.

7

2. Definition of phenotypes: IQ and g-factor

In this study, we focus on the non-verbal IQ (NVIQ) when available and on the g-factor otherwise. Both intelligence measures were normalized using z-score transformations to render them comparable. The NVIQ z-score has a mean of 100 and a standard deviation (SD) of 15.

Since cognitive measures used in the computation of the g-factor are not the same between cohorts, the g-factor was computed and normalized separately within each cohort using the mean and SD computed on all available individuals. This was feasible since the g-factor was computed in general population cohorts only. Of note, g-factor was computed before excluding individuals due to array quality control, leading to g-factors with means and SDs slightly different from 0 and

1 for the final subset of individuals included in our analyses.

2.1. Evaluation of IQ

IQ is an age-standardized general cognitive ability metric that provides an estimate of how one ranks relative to age-matched peers. In all analyses, IQ here refers to NVIQ, which is also called performance IQ. In the general populations examined here (Table 1), IQ has been measured using the Wechsler Intelligence Scale for Children (Imagen: WISC-IV38; SYS: WISC-

III39) and the Moray house test4041 (LBC1936). Of note, in LBC1936, the Moray house test is based on cognitive measures obtained at 70 years old and converted into an IQ-type scale with a mean of 100 and SD of 15, as it was previously done in Gow et al.42.

In disease cohorts, adapted tests have been used, such that IQ measures were obtained using Mullen scales of early learning43, the Leiter international performance scale – original and revised44,45, the Raven progressive matrices46, Stanford-Binet intelligence scale47, WISC-IV48,

WISC-V49, Wechsler Abbreviated Scale of Intelligence (WASI-I; WASI-II)50–52, Wechsler

8

Preschool and Primary Scale of Intelligence (WPPSI-IV) 53, Leiter-reviewed and the Differential

Ability Scales (DAS-II 54) (Table 1).

In the Sainte-Justine family genetic cohort, one adult was removed due to a large difference (49 points) between the performance IQ and the verbal IQ which invalidates his full- scale IQ. Also, 4 probands were removed due to either floor effect or ceiling effect in the tests.

2.2. G-factor computation

The g-factor is an indirect measure of general intelligence, obtained by extracting the first unrotated principal component from principal component analysis (PCA) of different standardized cognitive measures. We computed a g-factor in samples for which IQ was not available (SYS-parents, CaG and G-Scot) as well as in samples for which IQ and cognitive measures allowing g-factor computation were available, only in order to compare both intelligence measures (Imagen, SYS children, LBC1936).

For the LBC1936 cohort, we used the g-factor55 that was already calculated by the LBC consortium. It is based on 6 non-verbal subtests of the Wechsler Adult Intelligence Scale-III56: matrix reasoning, letter number sequencing, block design, symbol search, digit symbol, and digit span backward. We transformed this z-score t using the mean of 0.02 and the SD of 0.987 that was calculated for the available LBC subjects.

The cognitive measures available for parents and children in the SYS cohort are different, thus the g-factor was not based on the same measures for the two subgroups.

In SYS parents sample, the g-factor was computed using 12 cognitive performances2 assessed using the Cambridge brain sciences platform57: color-word remapping, spatial planning, self-ordered search, paired associates learning, digit span, spatial span, visuospatial working

9

, interlocking polygons, feature match, odd one out, grammatical reasoning and spatial rotation. The obtained g-factor represents 31.6% of the observed variance. Then, the g-factor was transformed to a z-score using the mean of -6.221×10-12 and the SD of 1.948.

In SYS children, the g-factor was computed using 63 cognitive measures31 : Dot Location

(Visual/non-verbal memory), Newman’s Card Sorting Task (Perseveration), Self-ordered

Pointing Task (Working memory), Grooved Pegboard Test (Fine motor skills), Children’s

Memory Scale (CMS) Stories subtasks (Auditory/verbal memory), Wechsler Intelligence Scale for Children III (Intelligence), Woodcock-Johnson III (Academic achievement), Stroop Color-

Word Test (Interference), Ruff 2-&-7 Selective Attention test (Selective attention), Verbal fluency (Cognitive flexibility), Tapping. The obtained g-factor represents 23.6% of the observed variance. The g-factor was then transformed to a z-score using the mean of 0.051 and the SD of

3.799.

In CaG cohort, the g-factor was computed using three cognitive tests: verbal and numeric reasoning (fluid intelligence), paired associates learning (episodic memory) and reaction time based on two-choice items. The obtained g-factor represents 43.2% of the variance observed in the cohort. The g-factor was then transformed to a z-score using the mean of -8.681×10-16 and the

SD of 1.084.

In G-Scot cohort, the g-factor is based on four cognitive tests measuring processing speed, verbal declarative memory, executive functions and vocabulary. The g-factor represents 42.3% of the observed variance. The g-factor was then transformed to a z-score using the mean of -

3.649×10-16 and the SD of 1.3.

In Imagen, the g-factor was computed using 4 cognitive measures: similarities score, vocabulary score, block design score, matrix reasoning score (WISC-IV). The obtained g-factor represents 57.3% of the observed variance. The g-factor was then transformed to a z-score using the mean of 9.626×10-17 and the SD of 1.514.

10

Note that the g-factor is a robust measure of general cognitive ability that is not very sensitive to the exact subtest used to calculate it as long as these measure a wide range of cognitive abilities. Moreover, the mean of the cognitive ability levels for the different cohorts were likely different, but our results from the meta-analysis (Figure 1) suggest that the standard deviations obtained across cohorts were very similar as we observed similar effect sizes of CNV impact.

3. Genetic information

3.1. CNV calling

No CNV calling was performed in Sainte-Justine CNV-familial cohort since this cohort focuses only on the CNV investigated in the clinic for which the family was recruited.

In the MSSNG database58, 10,032 individuals were sequenced at multiple sites using Illumina sequencing HiSeq, HiSeq 2,500 or HiSeqX. Next generation sequencing data were analysed using Broad institute Genome Analysis ToolKit (GATK) best practices59. For MSSNG, read alignment data were used to compute CNV calling following the workflow presented in Trost et al. 60. We filtered CNVs, by selecting only the CNVs overlapping at least 10 probes of each of the array technologies used by the different cohorts.

For all other datasets, we used the same methodology as in Huguet et al.19. For all cohorts, we used stringent quality-control criteria: call rate ≥95%; log R ratio-standard deviation <0.35; B allele frequency-standard deviation <0.08 and |wave factor|<0.05. The probes coordinates were updated from hg18 to hg19 using Illumina information and the liftover tool from the genome browser.

11

CNVs detected by PennCNV61 and QuantiSNP62 algorithms were combined (CNVision)10 to minimize the number of potential false discoveries. A manual visualisation of some CNVs showed a high number of false positive CNVs in G-Scot, LBC1936 and CaG cohorts; thus, only the CNVs detected by the both algorithms were considered in these three cohorts. After this merging step, the CNV inheritance analysis algorithm (in-house algorithm), was applied to concatenate adjacent CNVs (same type) into one, according to the following criteria: a) gap between CNVs ≤150 kb; b) size of the CNVs ≥ 1000 bp; and c) number of probes ≥ 3. After these steps, we remove from the analyses, all arrays for which a suspiciously high number of CNVs has been detected (≥ 50 for low resolution arrays [<1 million probes] and ≥ 200 for high resolution arrays [≥1 million probes]).

3.2. CNV filtering

After filtering array according to their quality, we applied filtering for autosomal CNVs (X- linked CNVs were not investigated in this study). The CNVs with the following criteria were selected for analysis: confidence score ≥ 30 (with at least one of both detection algorithms), size

≥ 50 kb, unambiguous type (deletions or duplications) and overlap with segmental duplicates,

HLA regions or centromeric regions < 50%. Moreover, we applied an in-house algorithm based on the random forest method to detect additional artefactual CNVs. This algorithm, based on several strategies (bagging, boosting) and on 9 CNV characteristics (Array criteria: log R ratio- standard deviation, B allele frequency-standard deviation, wave frequency; Localization CNV criteria: % of CNV overlap with centromeric regions and with segmental duplications; CNV criteria: density SNPs (size of CNV / numbers of SNPs), confidence score, % algorithms overlapping, type of CNV) was trained and tested respectively on 70% and 30% of a total of

22,154 CNVs (19,647 true CNVs and 2,507 artefacts from SSC and G-Scot cohort) and manually visualized by at least two individuals representing a high confidence reference training set for the

12

random forest model. Its application on the test set showed a sensitivity of >0.82 and a specificity

>0.88 whatever the strategy. The best strategy was then chosen and tested again on an additional naive dataset of 3,181 CNVs (2808 true CNVs and 373 artefacts from Imagen and SYS cohorts) and showed a sensitivity of 0.89 and a specificity of 0.8.

For the mega-analysis, we applied an additional filtering, by selecting only the CNVs overlapping at least 10 probes of each of the array technologies used by the different cohorts, before pooling the data.

3.3. Genetic analysis of pairwise relatedness

For each cohort, we checked sex genotype and Mendelian error to validate basic information such as sex and parenthood. Also the classical multidimensional scaling (MDS) was used to identify relatedness based on the identity by state (IBS) matrices of genetic distances. These analyses were done using PLINK63 (pngu.mgh.harvard.edu/purcell/plink/). These calculations are based on

SNPs that were filtered to keep only autosomal SNPs with minor allele frequency (MAF) > 5% and with good quality, significance threshold for a test of Hardy-Weinberg equilibrium < 1×10-6 and missing genotype rates < 10%.

3.4. Criteria to remove outlier individual with particular CNV

We excluded from the analyses all individuals carrying a large structural variant ≥10Mb, a mosaic CNV or an anomaly on the X or Y chromosomes. Thus, we removed 2 individuals from

Imagen, 1 from SYS, 2 from CaG, 39 from G-Scot and 3 from MSSNG.

3.5. Definition of recurrent CNVs

We compiled a list of 121 CNVs proposed to be pathogenic in widely accepted sources1,9,13,64,65, presented with the region coordinates in Huguet et al.19 We defined a CNV as recurrent if it overlaps ≥ 40% with one of these 121 CNVs and includes the key genes of this region (if known).

13

3.6. Annotation of CNVs

We annotated the CNVs using Gencode V19 annotation (the reference release for hg19 Human genome release) with ENSEMBL gene name (https://grch37.ensembl.org/index.html). Briefly, we used bedtools suite (https://bedtools.readthedocs.io/en/latest/) to compare the different elements of the genes that overlap CNVs (UTRs, start and stop codons, exons and introns). Thus, for each CNV, we obtained its proportion of overlap with each gene as well as with each gene component (UTR, Coding exons, start and stop codons, introns).

Genes were annotated using different scores: the probability to be loss-of–function (LoF) intolerant (pLI), the residual variation intolerance score (RVIS), the CNV intolerance score

(DEL/DUP score)20,66,67, the upper 95th CI boundary of the ratio of observed over expected number of LoF mutations (LOEUF), a score developed by gnomAD21

(https://gnomad.broadinstitute.org/), the number of protein-protein interactions (PPI)68, the differential stability (DS) score29 as well as two lists defining genes of likely interest for IQ, including postsynaptic density (PSD) of the human cortex69 and genes regulated by FMRP70.

Also, we used a list of 256 ID-genes2,23, previously identified with an excess of de-novo mutations in NDD cohorts. Non-coding regions were annotated with the number of expression quantitative trait loci (eQTLs) regulating genes expressed in the brain71. CNV scores were derived by summing all scores within CNVs. More details about this methodology may be found in Huguet et al.19.

4. Statistical analyses

4.1. Coverage

The number of different genes totally included in at least one CNV was first calculated in our total cohort of 24,092 subjects (Table 1) and then estimated in relationship with sample size. We

14

randomly ordered the 24,092 subjects of our total cohort and counted, for each step of X new additional individuals (X=4000 for whole cohort or general population and X=1000 for ASD population), the number of different genes totally included in at least one CNV. The final curve was estimated by averaging counts obtained from 500 iterations of this procedure and 95% confidence intervals were estimated as the 2.5th and 97.5th quantiles of values resulting from the

500 iterations. This procedure was applied to all CNVs as well as to deletions and duplications separately. We also split the information according to the pLI (genes with pLI >0.9 versus genes with pLI ≤0.9) or according to LOEUF (genes with LOEUF <0.35 versus genes with LOEUF ≥

0.35).

Then, we predicted how the number of different genes totally included in at least one CNV should evolve when including new individuals using a regression model on the results obtained above. We modeled the number of different genes as a function of number of individuals included in the study as follows:

2 Number of different genes ~ β0 + β1 × log(N) + β2 × [log(N)]

where N is the number of individuals included in the study. The regression parameters β0, β1 and

β2 where obtained from the whole cohort or from subsets divided into general population and autism population (Supplementary figure 8).

4.2. Assessing age and sex biases in the cohorts

Since each cohort has specific biases based on their respective ascertainment methods, we first adjusted general intelligence for age and sex within each cohort before performing meta- and mega-analyses.

15

Adjusting for age within samples of unselected populations assessed with a g-factor.

We adjusted g-factor for age in CaG, G-Scot and SYS parents cohorts separately by following these steps:

● We performed a linear regression model to fit g-factor according to the age. Age was

incorporated alone (linear effect) or with its square (quadratic effect) depending on the

cohort.

● The final model could be written as: g-factor = a + f(age) + ε where f(age) = b×age (linear

2 effect) or f(age) = b1 × age + b2 ×age (quadratic effect) with a, b, b1, b2 the regression

coefficients and ε the normally distributed error term.

● We computed the age-adjusted g-factor as the g-factor that each individual would have at

the mean age of the cohort: age-adjusted g-factor = a + f(mean age of the cohort) + ε

(In other words, this corresponds to the g-factor predicted by the model for the mean age of the

cohort + the residual for each individual).

The retained models for adjusting g-factor on age in CaG, G-Scot and SYS parents cohorts were obtained using polynomial regression of order 2 (quadratic effect of the age) in CaG and G-Scot and a simple linear regression in SYS parents. The non-adjusted g-factor and the adjusted g- factor are both displayed in supplementary figure 9.

Adjusting for biological sex within autism cohorts

Autism cohorts are biased for sex with a higher number of males than females and a lower IQ for the latter. Thus, in SSC and MSSNG dataset, we adjusted IQ for sex as follows for each cohort:

● We performed an ANOVA analysis to fit IQ according to sex: IQ = a + b×1(sex=female) + ε

with a and b the regression coefficients, ε the normally distributed error term and

1(sex=female) the indicator variable equaling 1 if female and 0 if male.

16

● The sex-adjusted IQ corresponds to the IQ of each participant considered as a male: sex-

adjusted IQ = a + e.

Results of models are displayed in supplementary figure 10.

All the models presented after are based on this adjusted z-score.

In autism cohorts, we did not adjust for age as we adjusted for the type of test used for assessing IQ that already takes into account the age (as well as language level) of the individual.

In other cohorts z-scores of general intelligence were not adjusted for age or sex as IQ scores are already adjusted for these two parameters. However, with the exceptions of Imagen and

LBC1936 cohorts (there is no variability of age in these two cohorts), in all other cohorts the z- scores have been adjusted for age.

Moreover, models were not adjusted for ancestry components since they did not impact the effect of genetic scores on general intelligence in our previous studies (Huguet et al.19,

Douard et al.72).

However other adjustments were made specifically inside each sample in meta-analysis or over all samples in mega-analysis.

4.3. Meta-analyses assessing the effect of pLI on general intelligence

Firstly, models were computed independently for each sample (supplementary table 3). Outcome of interest was either the adjusted z-score for IQ or the adjusted g-factor (when IQ was not available). Predictor of interest was the pLI score measured at individual level for deletions and duplications separately.

Additional adjustment by sample:

● Imagen: no more adjustment.

● SYS child: this cohort includes siblings, thus we added a random effect in the model taking

into account the correlation between individuals from the same family.

17

● SYS parents: no more adjustment.

● CaG: no more adjustment.

● G-Scot: this cohort includes individuals from the same family and we added a random effect

in the model taking into account the correlation between individuals from the same family.

● LBC1936: no more adjustment.

● SSC: this cohort includes individuals from a large range of young age and heterogeneous

language levels such that IQ was assessed with 5 different tests that are linked to age and IQ

measure leading to an artificial age effect on IQ. To minimize differences associated with

age, we adjusted models for which IQ test was used.

● MSSNG: as for SSC, we adjusted for the IQ test used (for 6 different tests). Moreover, this

cohort includes individuals from the same family and we added a random effect in the model

taking into account the correlation between individuals from the same family.

Secondly, meta-analyses were performed, separately for deletions and duplications, on summary results from each model, specifically on the estimate of regression coefficient associated with the predictor of interest (pLI) (Figure 1).

4.4. Mega-analysis assessing the effect of pLI or 1/LOEUF on general intelligence

We performed linear regression analyses to measure the impact of CNVs (pLI or 1/LOEUF individual’s scores) on general intelligence on the pooled dataset. The model was adjusted for the combined variable “cohort - type of intelligence measure” (as fixed effect) and family identifier as random effect. Estimates were obtained using the restricted maximum likelihood maximization.

Different sensitivity analyses were performed, including a modification of the manner to take into account random effect, the subset of populations (all, unrelated individuals, autism

18

population, general population) and effect of deletions and duplications taking into account separately or in the same model (Supplementary tables 12 and 13).

Comparing with other annotation scores as a predictor of the general intelligence in the pooled dataset.

We explored if the pLI was the best predictor of impact of deletions or duplications on general intelligence by comparing models based on the pLI with models based on other annotation scores. Models were compared on Akaike’s information criteria (AIC) obtained after fitting each model using likelihood maximization. The better fit is obtained for lower AIC (supplementary table 4).

Sensitivity analysis to assess the effect of pLI (resp. 1/LOEUF) after removing individuals with higher measure of pLI (resp. 1/LOEUF)

We performed sensitivity analyses by removing incrementally individuals with total pLI > 10, >

5, >3, >2, >1.5 and >1 point. We did the same for 1/LOEUF with the following threshold: total

1/LOEUF >60, 40, 20, 10, 4, 2.85.

Linear versus non–linear effect of haploinsufficiency scores

We tested non-linearity of the effect of haploinsufficiency scores on general intelligence by fitting different models on the pooled dataset. First, we introduced the pLI score as well as its square in the same model. Then, we proposed to explore a smooth function of the effect of pLI

(resp. 1/LOEUF) using a kernel regression method. We used the Gaussian kernel function as it is flexible enough to account for various types of effects. We defined the Gaussian kernel function such as similarity between two subjects i and j is:

2 Gaussian(Zi, Zj, ρ) = exp(-||Zi – Zj|| )/ρ)

19

where Zi, Zj are scores (total pLI or total 1/LOEUF) for subjects i and j, ||.|| represents the

Euclidean distance and ρ is a tuning parameter defining the smoothness of the function. Several values of the ρ parameter were compared (See sup tables 12 and 13).

Model including partially disrupted genes

Previous models include only genes fully contained in a CNV. Here, we propose to also include information of genes that only partially overlap CNV using two methodologies.

Firstly, we defined 5 new categories of genes based on gene annotation (UTRs, start and stop codons, exons and introns): (1) the start codon overlaps a CNV, (2) the stop codon overlaps a CNV, (3) none of start and stop codons overlap a CNV, (4) only the UTR5’ overlaps a CNV, and (5) only the UTR3’ overlaps a CNV. We recomputed the linear model including 6 predictors of general intelligence (sum of pLI [resp/ 1/LOEUF] within each category of genes: genes fully contained in a CNV and genes from the 5 categories defined above). All models were adjusted on

“type of test-cohort” variable as fixed effect and on familial relationship as random effect, and performed for deletions and duplications separately (Supplementary table 5).

Secondly, we included in the model all genes for which the proportion of exonic coding regions contained in a CNV is higher than a threshold. For each threshold (varying from 0% to

100% by step of 5%), we recomputed the linear model including one predictor of general intelligence (sum of pLI or 1/LOEUF associated with selected genes). Models were adjusted on

“type of test-cohort” variable as fixed effect and on familial relationship as random effect, and performed for deletions and duplications separately. The same analysis was performed using proportion of intronic regions contained in a CNV. Of notes, AIC were computed, by fitting each model using likelihood maximization to compare the different models. The better fit is obtained for lower AIC. (Supplementary tables 6 and 7)

20

Model including information on known ID-genes

CNVs haploinsufficiency scores were divided into sum of haploinsuffiency scores for ID-genes versus non ID-genes; models were then recomputed with the two explanatory variables (sum of scores for ID-genes and sum of scores of non ID-genes included in the CNV) for deletions and duplications separately (Supplementary tables 15).

4.5. Concordance analysis between prediction of our models and literature observations

We examined if our prediction of the effect of haploinsufficiency on cognition was improved using the results of our meta-analysis. To do so, we calculated the concordance between model predictions and empirically measured loss of IQ for 47 known recurrent CNVs obtained from previous publications (Supplementary Methods, Figure 4, Supplementary Figure 5,

Supplementary Tables 18 and 23) or from UKbiobank study14. The concordance was computed

73 using the intraclass coefficient correlation of type (3,1) (ICC(3,1)) .

4.6. Estimation of the probability of a de novo CNV using haploinsufficiency scores

We performed logistic regressions to estimate the probability of a CNV being de novo using the haploinsufficiency scores (pLI or 1/LOEUF scores, accounting for the effect of ID-genes) or estimated IQ loss as explanatory variables. We trained these de novo models on datasets for which mode of transmission of the CNVs was available (inherited or de novo). For these analyses, we added two clinical populations (Decipher (https://decipher.sanger.ac.uk/) and the cytogenetic database of Sainte-Justine Hospital and applied the same filtering as for the previous

CNV selection. We included a total of 26,437 CNVs from two general populations: 810 individuals from G-Scot and 723 individuals from SYS; and four clinical populations: 3,919 individuals from the SSC, 956 individuals from MSSNG, 10,126 individuals from Decipher and

1,560 individuals from the cytogenetic database of Sainte-Justine Hospital (Supplementary table

21

19). The binary outcome variable was the type of transmission (1=de novo, 0=inherited). Models for deletions and duplications were tested independently.

To investigate if the associations between de novo status and the haploinsufficiency scores were different for deletions and duplications, we also performed a logistic regression model combining the deletions and duplications scores in one explanatory variable (e.g. the IQ loss estimated with

1/LOEUF) and added an interaction between the type of CNV (deletion or duplication) and the haploinsufficiency score.

Then, we validated these models by comparing the model estimates with the percentage of de novo computed with Decipher (https://decipher.sanger.ac.uk/) for 27 recurrent CNVs

(Supplementary table 18). We used the estimates from models trained on a dataset that excluded these recurrent CNVs.

4.7. Estimating the effect size of genes categories by LOEUF

We coded the information in the LOEUF score using 4 categories of tolerance for the genes: highly intolerant genes (LOEUF <0.2; n=980), moderately intolerant genes (0.2≤LOEUF<0.35 n=1,762), tolerant genes (0.35≤LOEUF<1; n=7,442) and highly tolerant genes (LOEUF≥1; n=8,267). The numbers of genes within each category were used in a linear model as four explanatory variables of the general intelligence.

Then, we explored whether particular score ranges within LOEUF were associated with unique effect sizes by using a sliding window across the range of scores spanned by LOEUF. We defined 2 variables: number of genes inside and outside the window, and considered them as predictors of the general intelligence in a linear model. We slid a window of size 0.15 LOEUF units, in increments of 0.05 units, thereby creating 37 different analyses.

For both methods, the adjusted z-score measure of general intelligence was the dependent variable and annotation scores, with the new codings, were the independent variables. All models

22

were adjusted for “type of test-cohort” as fixed effects and for familial relationship as random effect; and analyses were performed for deletions and duplications separately.

4.8. R packages used for statistical analyses

Statistical analyses were performed using R version 3.1.1. (R Development Core Team (2005). R:

A language and environment for statistical computing. R Foundation for Statistical Computing,

Vienna, Austria. ISBN 3-900051-07-0, URL: http://www.R-project.org.) Statistical analyses were performed using “nlme”74, “bootStepAIC”75, “KSPM” and “psych”76 packages for mixed effect models, bootstrap for variable selection procedure, kernel semi-parametric models and ICC(3,1) respectively.

23

EXTENDED DATA TABLES

Cohort Correlation(a) (95%CI) ICC(b) (95%CI)

Imagen (n=1,720) 0.77 (0.75;0.78) 0.77 (0.74;0.78)

SYS-child (n=857) 0.61 (0.56;0.65) 0.60 (0.56;0.64)

LBC1936 (n=504) 0.66 (0.61;0.71) 0.66 (0.62;0.70)

Supplementary table 1: Correlation and concordance between NVIQ and g-factor in 3 unselected populations. (a) Pearson’s correlation, (b) Intraclass correlation coefficient, Single fixed raters; CI: confidence interval.

Deletions Duplications Cohorts Est. SE P Est. SE P Imagen -0.234 0.091 1.02x10-02 0.083 0.043 5.56x10-02 SYS-child -0.169 0.054 1.94x10-03 -0.079 0.040 4.74x10-02 SYS-parent -0.334 0.160 3.78x10-02 0.019 0.057 7.33x10-01 LBC1936 -0.277 0.127 2.99x10-02 0.026 0.083 7.51x10-01 CaG-GSA -0.024 0.135 8.61x10-01 -0.105 0.037 4.53x10-03 CaG-Omni2.5 -0.321 0.250 1.99x10-01 -0.074 0.056 1.83x10-01 G-Scot -0.179 0.031 6.03x10-09 -0.066 0.014 2.24x10-06 SSC-1MV1 -0.155 0.068 2.30x10-02 -0.022 0.091 8.06x10-01 SSC-1MV3 -0.176 0.044 6.64x10-05 -0.062 0.023 7.12x10-03 SSC-Omni2.5 -0.160 0.043 1.94x10-04 -0.055 0.038 1.46x10-01 MSSNG -0.200 0.065 2.15x10-03 -0.013 0.036 7.21x10-01 Supplementary table 2: Estimates associated with pLI within each sample included in the meta-analyse. Models were computed separately for deletions and duplications. The adjusted z-scored measure of general intelligence was the dependent variable and pLI was the independent one. All models were adjusted differently as detailed in supplementary methods. SYS: Saguenay youth study; SSC: Simon’s simplex collection; CaG: Cartagene; LBC: Lothian birth cohort; pLI: probability of loss-of-function intolerance; Est.: estimate; SE: standard error; P: p-value.

24

Deletions Duplications Cohort Phenotype Est. SE P Est. SE P

z-scored NVIQ -0.233 0.091 0.011 0.086 0.044 0.048 Imagen (n=1,720) z-scored g-factor -0.222 0.092 0.017 0.004 0.044 0.930

z-scored NVIQ -0.146 0.055 0.009 -0.066 0.042 0.114 SYS-child (n=857) z-scored g-factor -0.150 0.054 0.006 -0.063 0.041 0.130

z-scored IQ Moray -0.277 0.127 0.030 0.026 0.083 0.751 LBC1936 (n=504) z-scored g-factor -0.263 0.131 0.045 -0.011 0.086 0.901 Supplementary table 3: Comparison of estimates associated with pLI when measuring the effect-size of CNVs on IQ and g-factor. Models were computed separately for deletion or duplication. The z-scored measure of general intelligence (unadjusted) was the dependent variable and annotation score was the independent one. In SYS, the model was adjusted for age as fixed effect and on familial relationship as random effect. SYS: Saguenay youth study; LBC: Lothian birth cohort; pLI: probability of loss-of-function intolerance; Est.: estimate; SE: standard error; P: p- value.

Deletions Duplications Variables Est. (SE) P AIC Est. (SE) P AIC No. of genes -0.037 (0.004) 4.43×10-21 70,348.86 -0.011 (0.002) 9.10×10-8 70,409.34 pLI * -0.176 (0.016) 1.25×10-28 70,314.15 -0.054 (0.009) 1.90×10-9 70,401.81 pLI 2019 † -0.194 (0.018) 3.04×10-26 70,325.13 -0.062 (0.010) 3.09×10-9 70,402.76‡ 1/(O/E) † -0.015 (0.001) 1.48×10-28 70,314.49‡ -0.005 (0.001) 3.45×10-9 70,402.97‡ 1/LOEUF † -0.029 (0.003) 7.14×10-29 70,313.03‡ -0.009 (0.002) 8.65×10-9 70,404.76‡ RVIS -0.001 (8.4×10-5) 1.70×10-14 70,378.94 -2.0×10-4 (4.6×10-5) 9.12×10-6 70,418.24

DEL/DUP -16 -6 -0.023 (0.003) 1.58×10 70,369.69 -0.008 (0.002) 3.25×10 70,416.25 score PPI -1.3×10-4 (5.0×10-5) 0.007 70,430.68 -7.7×10-5 (3.2×10-5) 0.015 70,432.07 DS -0.094 (0.009) 1.62×10-26 70,323.88 -0.030 (0.005) 4.5×10-9 70,403.49 PSD genes -0.330 (0.040) 4.40×10-18 70,362.57 -0.114 (0.022) 1.46×10-7 70,410.26 FMRP genes -0.376 (0.050) 6.34×10-14 70,381.54 -0.162 (0.031) 1.73×10-7 70,410.59 eQTL -0.001 (1.1×10-4) 1.68×10-17 70,365.23 -2.6×10-4 (5.6×10-5) 2.57×10-6 70,415.80 Supplementary table 4: Comparison of model fit for general intelligence according to annotation score. The model was computed separately for each annotation score. The adjusted z-scored measure of general intelligence was the dependent variable and annotation score was the independent one. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as random effect. Estimates, SE and P were computed using REML approach whereas AIC was computed using ML approach. A lower AIC means a better fit. ML: maximum likelihood; REML: restricted maximum likelihood; Est.: estimate; SE: standard error; P: p-value; AIC: Akaike’s information criteria; No.: number; pLI: probability of being loss-of-function intolerant; O/E: observed over expected; LOEUF: the upper 95th confidence interval boundary of the ratio of observed over expected number of loss-of-function mutation score; RVIS: residual variation intolerance score; DEL score: the CNV intolerance score for deletion; DUP score: the CNV intolerance score for duplication; PPI: protein-protein interaction; DS: differential stability; PSD: number of genes in postsynaptic density function category; FMRP: number of genes regulated by FMRP; eQTL: the number of SNP regulating genes expressed in the brain (expression quantitative trait loci (eQTLs)). Details about annotation scores are available at section 3.6 (p.); * pLI variable used as reference; † other GNOMAD scores measuring similar concept than pLI variable; ‡ indicates AICs lower than or closed to the AIC obtained with pLI variable.

25

Categories of genes within which the pLI 1/LOEUF Type annotation score was summed Est. SE P Est. SE P

Only the UTR5’ overlaps a CNV 0.015 0.081 0.854 0.002 0.017 0.885

The start codon overlap a CNV 0.009 0.060 0.883 0.001 0.016 0.965

None of start and stop codons overlap a CNV 0.010 0.042 0.817 0.005 0.008 0.487

The stop codon overlaps a CNV 0.064 0.105 0.542 -0.008 0.021 0.708 Deletion Only the UTR3’ overlaps a CNV 0.231 0.529 0.662 -0.036 0.102 0.722

Gene fully contained in a CNV -0.174 0.016 5.9x10-28 -0.029 0.003 1.4 x10-27

Only the UTR5’ overlaps a CNV 0.002 0.052 0.971 -0.002 0.010 0.809

The start codon overlap a CNV -0.021 0.027 0.429 -0.006 0.006 0.300

None of start and stop codons overlap a CNV -0.003 0.056 0.951 0.007 0.014 0.603

The stop codon overlaps a CNV -0.010 0.025 0.669 -0.004 0.004 0.378 Duplication Only the UTR3’ overlaps a CNV -0.161 0.177 0.363 -0.031 0.035 0.374

Gene fully contained in a CNV -0.052 0.009 4.3 x10-09 -0.008 0.001 3.5 x10-07

Supplementary table 5: Estimated effects of individual genic region on general intelligence according to pLI or 1/LOEUF. Models were computed separately for each annotation score and for each type of CNV. The adjusted z-scored measure of general intelligence was the dependent variable and the annotation score summed within each category of genes were the independent one. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as a random effect. Est.: estimates, SE: standard error; P: p-value. pLI: probability of being loss-of-function intolerant; LOEUF: the upper 95th confidence interval boundary of the ratio of observed over expected number of loss-of-function mutation score.

26

pLI 1/LOEUF Type Exonic % Est. SE P AIC Sum Est. SE P AIC Sum ≥ 100% -0.176 0.016 1.25x10-28 70314.15 1172.303 -0.029 0.003 7.14x10-29 70313.03 8160.857 > 95% -0.174 0.016 1.60x10-28 70314.64 1186.01 -0.029 0.003 7.50x10-29 70313.12 8283.342 > 90% -0.176 0.016 4.16x10-29 70311.94 1200.888 -0.029 0.003 3.86x10-29 70311.8 8411.423 > 85% -0.175 0.016 5.17x10-29 70312.38 1213.347 -0.029 0.003 4.11x10-29 70311.92 8506.836 > 80% -0.171 0.015 2.03x10-28 70315.12 1227.935 -0.028 0.003 1.38x10-28 70314.34 8666.396 > 75% -0.171 0.015 1.38x10-28 70314.34 1235.864 -0.028 0.003 8.48x10-29 70313.37 8774.278 > 70% -0.171 0.015 9.85x10-29 70313.67 1246.777 -0.028 0.003 7.58x10-29 70313.15 8912.486 > 65% -0.171 0.015 1.09x10-28 70313.87 1260.88 -0.028 0.003 7.14x10-29 70313.03 9080.091 -28 -28 > 60% -0.170 0.015 1.97x10 70315.05 1272.06 -0.028 0.003 1.24x10 70314.13 9193.02 > 55% -0.168 0.015 4.12x10-28 70316.53 1293.586 -0.028 0.003 9.83x10-29 70313.67 9314.191 > 50% -0.166 0.015 1.54x10-27 70319.16 1317.071 -0.028 0.003 4.56x10-29 70312.13 9520.785 > 45% -0.165 0.015 2.14x10-27 70319.82 1330.192 -0.028 0.002 4.30x10-29 70312.01 9691.772 Deletions > 40% -0.164 0.015 1.83x10-27 70319.51 1344.357 -0.028 0.002 5.71x10-29 70312.58 9838.025 > 35% -0.162 0.015 2.63x10-27 70320.24 1365.639 -0.028 0.002 1.11x10-28 70313.91 10041.698 > 30% -0.156 0.015 1.90x10-26 70324.19 1485.224 -0.027 0.002 1.43x10-28 70314.42 10382.95 > 25% -0.154 0.015 4.27x10-26 70325.81 1525.584 -0.027 0.002 1.79x10-28 70314.86 10645.228 > 20% -0.154 0.015 4.80x10-26 70326.04 1543.388 -0.027 0.002 2.02x10-28 70315.11 10802.737 > 15% -0.152 0.014 1.02x10-25 70327.55 1614.991 -0.027 0.002 3.81x10-28 70316.38 11127.241 > 10% -0.152 0.014 3.59x10-26 70325.47 1654.58 -0.027 0.002 1.91x10-28 70315.01 11496.768 > 5% -0.147 0.014 2.96x10-25 70329.68 1766.101 -0.026 0.002 6.52x10-28 70317.46 12160.681 > 0% -0.145 0.014 1.37x10-25 70328.15 2037.331 -0.026 0.002 6.88x10-28 70317.57 13682.882 ≥ 100% -0.054 0.009 1.90x10-09 70401.81 4136.823 -0.009 0.002 8.65x10-09 70404.76 27861.65 > 95% -0.054 0.009 1.25x10-09 70400.99 4303.01 -0.009 0.001 7.29x10-09 70404.43 28768.89 > 90% -0.053 0.009 1.77x10-09 70401.68 4452.587 -0.008 0.001 9.35x10-09 70404.92 30037.98 > 85% -0.054 0.009 6.64x10-10 70399.75 4627.836 -0.009 0.001 4.82x10-09 70403.63 31509.74 > 80% -0.054 0.009 4.54x10-10 70399.01 4765.355 -0.009 0.001 3.34x10-09 70402.91 33039.53 > 75% -0.053 0.009 3.37x10-10 70398.42 4937.299 -0.009 0.001 2.25x10-09 70402.14 34079.15 > 70% -0.053 0.008 4.06x10-10 70398.79 5079.168 -0.008 0.001 3.74x10-09 70403.13 34923.23 > 65% -0.048 0.008 7.15x10-09 70404.39 5908.768 -0.008 0.001 3.01x10-08 70407.19 39028.13

> 60% -0.047 0.008 1.19x10-08 70405.38 6057.202 -0.007 0.001 4.45x10-08 70407.95 39987.65 > 55% -0.046 0.008 9.63x10-09 70404.98 6274.067 -0.007 0.001 3.14x10-08 70407.27 41835.54 > 50% -0.047 0.008 7.86x10-09 70404.58 6371.265 -0.007 0.001 1.93x10-08 70406.33 42686.07 > 45% -0.046 0.008 9.57x10-09 70404.96 6475.598 -0.007 0.001 4.59x10-08 70408.01 43596.93 -09 -08

Duplications > 40% -0.046 0.008 9.05x10 70404.86 6571.645 -0.007 0.001 4.23x10 70407.86 44355.56 > 35% -0.046 0.008 8.36x10-09 70404.7 6650.54 -0.007 0.001 4.11x10-08 70407.8 44930.4 > 30% -0.044 0.008 7.55x10-09 70404.5 7489.882 -0.007 0.001 3.24x10-08 70407.34 46977.21 > 25% -0.045 0.008 3.01x10-09 70402.7 7641.286 -0.007 0.001 2.32x10-08 70406.69 47838.28 > 20% -0.044 0.007 5.07x10-09 70403.73 7777.317 -0.007 0.001 4.30x10-08 70407.89 48748.39 > 15% -0.044 0.007 3.53x10-09 70403.02 7946.874 -0.007 0.001 3.59x10-08 70407.54 49749.36 > 10% -0.043 0.007 4.00x10-09 70403.26 8194.957 -0.007 0.001 2.07x10-08 70406.46 51449.29 > 5% -0.041 0.007 1.68x10-08 70406.06 8579.347 -0.007 0.001 5.46x10-08 70408.35 53236.74 > 0% -0.042 0.007 3.70x10-09 70403.11 8975.889 -0.007 0.001 1.68x10-08 70406.06 55709.73

27

Supplementary table 6: Estimated effects of CNVs on general intelligence, for different thresholds of exonic proportion of gene overlapping a CNV that defines genes involved in sum of pLI or 1/LOEUF computation The model was computed separately for each annotation score and for each type of CNV. The adjusted z-scored measure of general intelligence was the dependent variable and the annotation score was the independent one. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as a random effect. Est., SE and P were computed using REML approach whereas AIC was computed using ML approach. A lower AIC means a better fit. ML: maximum likelihood; REML: restricted maximum likelihood; Est.: estimate; SE: standard error; P: p-value; AIC: Akaike’s information criteria; Sum: total pLI (resp. 1/LOEUF) in the entire cohort; pLI: probability of being loss-of-function intolerant; LOEUF: the upper 95th confidence interval boundary of the ratio of observed over expected number of loss-of-function mutation score.

28

pLI 1/LOEUF Type Intronic % Est. SE P AIC Sum Est. SE P AIC Sum ≥ 100% - - - - 0 - - - - 0 > 95% - - - - 0 - - - - 0 > 90% 0.260 1.136 0.819 70316.100 0.96 0.073 0.321 0.819 70316.100 3.39 > 85% 0.260 0.435 0.550 70315.790 2.96 0.067 0.112 0.551 70315.790 11.26 > 80% 0.464 0.353 0.189 70314.420 5.58 0.124 0.087 0.156 70314.130 28.56 > 75% 0.077 0.307 0.801 70316.080 8.52 0.020 0.061 0.746 70316.040 64.68 > 70% -0.038 0.213 0.858 70316.120 21.56 -0.005 0.032 0.877 70316.120 172.93 > 65% 0.039 0.171 0.818 70316.100 36.33 0.006 0.027 0.823 70316.100 269.58

> 60% -0.025 0.125 0.840 70316.110 55.25 -0.005 0.017 0.764 70316.060 402.57 > 55% 0.041 0.108 0.701 70316.000 75.99 0.004 0.016 0.795 70316.080 518.87 > 50% 0.049 0.101 0.625 70315.910 90.01 0.004 0.016 0.787 70316.080 600.96 > 45% 0.090 0.090 0.315 70315.140 112.14 0.011 0.015 0.462 70315.610 717.82 Deletions > 40% 0.062 0.084 0.456 70315.590 127.64 0.009 0.014 0.517 70315.730 822.18 > 35% 0.081 0.076 0.286 70315.010 163.61 0.013 0.013 0.315 70315.140 1016.74 > 30% 0.064 0.069 0.358 70315.300 206.79 0.011 0.012 0.376 70315.360 1294.46 > 25% 0.078 0.057 0.170 70314.270 314.83 0.013 0.010 0.196 70314.470 1960.86 > 20% 0.061 0.053 0.248 70314.810 357.48 0.010 0.010 0.285 70315.000 2300.26 > 15% 0.055 0.049 0.264 70314.900 392.96 0.011 0.009 0.226 70314.680 2695.44 > 10% 0.033 0.045 0.462 70315.610 441.21 0.010 0.008 0.259 70314.870 2978.81 > 5% 0.040 0.044 0.369 70315.340 459.30 0.011 0.008 0.193 70314.460 3102.95 > 0% 0.040 0.044 0.369 70315.340 459.30 0.011 0.008 0.195 70314.470 3104.91 ≥ 100% - - - - 0 - - - - 0 > 95% - - - - 0 - - - - 0 > 90% - - - - 0 - - - - 0 > 85% 0.443 0.887 0.617 70403.560 1.46 0.124 0.195 0.526 70403.400 8.20 > 80% 0.128 0.575 0.824 70403.760 3.80 0.097 0.140 0.487 70403.320 18.38 > 75% -0.298 0.461 0.518 70403.390 6.55 0.000 0.121 0.997 70403.810 28.83 > 70% -0.125 0.377 0.740 70403.700 11.19 0.018 0.104 0.866 70403.780 44.50 > 65% 0.192 0.198 0.332 70402.860 22.13 0.058 0.059 0.327 70402.840 87.27

> 60% 0.283 0.175 0.106 70401.190 28.84 0.093 0.053 0.077 70400.690 110.21 > 55% 0.177 0.161 0.270 70402.590 37.11 0.034 0.043 0.430 70403.180 160.72 > 50% -0.015 0.131 0.906 70403.790 55.89 -0.021 0.034 0.542 70403.430 239.56 > 45% -0.024 0.125 0.850 70403.770 62.94 -0.016 0.031 0.610 70403.540 292.54

Duplications > 40% 0.005 0.117 0.966 70403.800 75.48 -0.014 0.029 0.630 70403.570 378.45 > 35% 0.041 0.112 0.712 70403.670 81.74 -0.004 0.024 0.884 70403.780 556.17 > 30% 0.027 0.103 0.791 70403.740 98.80 0.003 0.022 0.896 70403.790 645.89 > 25% 0.010 0.101 0.923 70403.800 105.50 -0.003 0.021 0.879 70403.780 676.98 > 20% 0.053 0.085 0.530 70403.410 138.04 0.014 0.018 0.416 70403.140 834.43 > 15% 0.066 0.079 0.409 70403.120 157.37 0.017 0.017 0.309 70402.770 929.11 > 10% 0.007 0.074 0.929 70403.800 173.57 0.006 0.016 0.693 70403.650 986.64 > 5% 0.008 0.061 0.895 70403.790 199.31 0.006 0.015 0.688 70403.640 1048.02 > 0% 0.013 0.059 0.830 70403.760 204.15 0.007 0.015 0.654 70403.600 1059.48

29

Supplementary table 7: Estimated effects of CNVs on general intelligence, for different thresholds of intronic proportion of gene overlapping a CNV that defines genes involved in sum of pLI or 1/LOEUF computation The model was computed separately for each annotation score and for each type of CNV. The adjusted z-scored measure of general intelligence was the dependent variable and the annotation score was the independent one. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as a random effect. Est., SE and P were computed using REML approach whereas AIC was computed using ML approach. A lower AIC means a better fit. ML: maximum likelihood; REML: restricted maximum likelihood; Est.: estimate; SE: standard error; P: p-value; AIC: Akaike’s information criteria; Sum: total pLI (resp. 1/LOEUF) in the entire cohort; pLI: probability of being loss-of-function intolerant; LOEUF: the upper 95th confidence interval boundary of the ratio of observed over expected number of loss-of-function mutation score.

30

Deletions Duplications Model Variables Est. SE P Est. SE P -273 -275

(Intercept) -1.244 0.034 5.94x10 -1.256 0.035 6.28x10 -12 -4 pLI -0.169 0.024 4.89x10 -0.048 0.014 7.98x10 -5 -5 -1 -5 -5 -1 pLI x Age -6.76x10 8.19x10 4.09x10 -2.77x10 3.33x10 4.06x10 -13 -13 Sex (female) 0.100 0.014 4.05x10 0.100 0.014 4.63x10 (n=24,092) All cohorts cohorts All -4 -5 -52 -4 -5 -51 Age -7.0x10 4.55x10 1.01x10 -6.92x10 4.58x10 6.16x10 -14 -14

(Intercept) 0.266 0.035 1.67x10 0.263 0.035 5.07x10 -3 -1 pLI -0.167 0.059 4.65x10 -0.034 0.030 2.60x10 -5 -4 -1 -5 -5 -1 pLI x Age -4.09x10 1.22x10 7.37x10 -4.64x10 5.13x10 3.66x10 -23 -23 Sex (female) 0.133 0.013 3.66x10 0.134 0.013 1.85x10 (n=20,151) population population Unselected Unselected -4 -5 -39 -4 -5 -37 Age -5.66x10 4.29x10 2.68x10 -5.53x10 4.35x10 8.21x10 -3 -3 (Intercept) -0.177 0.063 5.19x10 -0.171 0.064 7.56x10

-2 -1 pLI -0.138 0.072 5.53x10 -0.060 0.046 1.96x10 -4 -4 -1 -4 -4 -1 pLI x Age -1.36x10 5.54x10 8.07x10 1.59x10 3.77x10 6.74x10 ASD ASD -5 -5 Sex (female) -0.242 0.061 8.76x10 -0.254 0.061 4.10x10 (n=3,941) population population -62 -62 Age -0.013 0.001 1.33x10 -0.013 0.001 1.24x10 Supplementary table 8: Linear regression models including pLI x age interaction as predictor of general intelligence. The model was computed separately for for deletions and duplications. The z-scored measure of general intelligence (unadjusted) was the dependent variable and pLI (total sum by individual), age (in months) and sex (male as reference level) as well as the pLI x age interaction were the independent ones. All models were adjusted on “type of test-cohort”, as fixed effect and on familial relationship as random effect. Est.: estimates; SE: standard error; P: p-value; SYS: Saguenay youth study; LBC: Lothian birth cohort; CaG: Cartagene; G-Scot: generation Scotland; SSC: Simon simplex collection; pLI: probability of loss-of-function intolerance.

Deletions Duplications Model Variables Est. SE P Est. SE P -271 -275

(Intercept) -1.240 0.035 6.04x10 -1.257 0.035 6.74x10 -14 -3 1/LOEUF -0.031 0.004 4.16x10 -0.008 0.003 2.26x10 -7 -5 -1 -6 -6 -1 1/LOEUF x Age 3.51x10 1.36x10 9.79x10 -4.40x10 5.56x10 4.29x10 -13 -13 Sex (female) 0.100 0.014 2.95x10 0.100 0.014 5.07x10 (n=24,092) All cohorts cohorts All -4 -5 -53 -4 -5 -50 Age -7.01x10 4.55x10 8.89x10 -6.92x10 4.59x10 1.03x10 -14 -14

(Intercept) 0.266 0.035 1.73x10 0.265 0.035 3.58x10 -3 -1 1/LOEUF -0.027 0.010 6.10x10 -0.005 0.005 2.95x10 -7 -5 -1 -6 -6 -1 1/LOEUF x Age -9.07x10 2.03x10 9.64x10 -7.98x10 8.25x10 3.33x10 -23 -23 Sex (female) 0.133 0.013 2.96x10 0.134 0.013 1.95x10 (n=20,151) population population Unselected Unselected -4 -5 -39 -4 -5 -36 Age -5.65x10 4.30x10 3.77x10 -5.52x10 4.36x10 2.05x10 -3 -3 (Intercept) -0.172 0.063 6.63x10 -0.170 0.065 8.41x10

-2 -1 1/LOEUF -0.029 0.012 1.60x10 -0.010 0.008 2.05x10 -5 -5 -1 -5 -5 -1 1/LOEUF x Age 1.72x10 9.01x10 8.49x10 2.99x10 6.66x10 6.53x10 ASD ASD -4 -5 Sex (female) -0.239 0.061 1.06x10 -0.255 0.061 3.93x10 (n=3,941) population population -62 -62 Age -0.013 0.001 1.21x10 -0.013 0.001 3.17x10 Supplementary table 9: Linear regression models including 1/LOEUF x age interaction as predictor of general intelligence. The model was computed separately =for deletion or duplication. The z-scored measure of general intelligence (unadjusted) was the dependent variable and 1/LOEUF (total sum by individual), age (in months) and sex (male as reference level) as well as the 1/LOEUF x age interaction were the independent ones. All models were adjusted on “type of test-cohort”, as fixed effect and on familial relationship as random effect. Est.: estimates; SE: standard error; P: p-value; SYS: Saguenay youth study; LBC: Lothian birth cohort; CaG: Cartagene; G-Scot: generation Scotland; SSC: Simon simplex collection; LOEUF: the upper 95thCI boundary of the ratio of observed over expected number of loss-of-function mutations.

31

Deletions Duplications Model Variables Est. SE P Est. SE P -273 -275

(Intercept) -1.243 0.034 3.60x10 -1.253 0.035 4.32x10 -16 -7 pLI -0.165 0.020 5.43x10 -0.061 0.012 2.71x10 -1 -1 pLI x Sex (female) -0.049 0.032 1.30x10 0.010 0.018 5.90x10 -13 -12 Sex (female) 0.102 0.014 1.55x10 0.098 0.014 3.17x10 (n=24,092) All cohorts cohorts All -4 -5 -53 -4 -5 -52 Age -7.02x10 4.55x10 4.87x10 -6.96x10 4.56x10 4.98x10 -14 -14

(Intercept) 0.265 0.035 1.83x10 0.268 0.035 1.00x10 -4 -5 pLI -0.147 0.039 1.67x10 -0.074 0.018 6.31x10 -1 -1 pLI x Sex (female) -0.065 0.051 2.02x10 0.024 0.023 3.06x10 -23 -21 Sex (female) 0.135 0.013 1.57x10 0.130 0.014 4.17x10 (n=20,151) population population Unselected Unselected -4 -5 -39 -4 -5 -39 Age months -5.67x10 4.28x10 1.06x10 -5.60x10 4.28x10 8.59x10 -3 -3 (Intercept) -0.175 0.063 5.46x10 -0.176 0.063 5.39x10

-8 -2 pLI -0.161 0.029 8.12x10 -0.044 0.019 1.95x10 -1 -1 pLI x Sex (female) 0.023 0.053 6.68x10 0.007 0.037 8.39x10 ASD ASD -0.246 0.061 8.35x10-5 -0.258 0.062 4.63x10-5

(n=3,941) Sex (female) population population -63 -63 Age -0.013 0.001 2.91x10 -0.013 0.001 1.83x10 Supplementary table 10: Linear regression models including pLI x sex interaction as predictor of general intelligence. The model was computed separately for deletions and duplications. The z-scored measure of general intelligence (unadjusted) was the dependent variable and pLI (total sum by individual), age (in months) and sex (male as reference level) as well as the pLI x sex interaction were the independent ones. All models were adjusted on “type of test-cohort” as fixed effect and on familial relationship as random effect. Est.: estimates; SE: standard error; P: p-value; SYS: Saguenay youth study; LBC: Lothian birth cohort; CaG: Cartagene; G-Scot: generation Scotland; SSC: Simon simplex collection; pLI: probability of loss-of-function intolerance.

Deletions Duplications Model Variables Est. SE P Est. SE P -272 -275 (Intercept) -1.240 0.034 6.41x10 -1.253 0.035 3.12x10 -17 -7 1/LOEUF -0.029 0.003 2.27x10 -0.010 0.002 5.97x10 1/LOEUF x Sex -0.003 0.005 5.85x10-1 0.002 0.003 5.03x10-1

cohorts cohorts (female) -13 -12

(n=24,092) 0.101 0.014 2.67x10 0.097 0.014 6.69x10

All All Sex (female) -4 -5 -53 -4 -5 -52 Age -7.01x10 4.55x10 6.78x10 -6.96x10 4.56x10 4.91x10 -14 -15 (Intercept) 0.266 0.035 1.68x10 0.271 0.035 5.72x10 -4 -5 1/LOEUF -0.027 0.007 1.12x10 -0.012 0.003 7.58x10 1/LOEUF x Sex -1 -1 -0.002 0.009 8.42x10 0.004 0.004 3.22x10 (female) -23 -21

(n=20,151) 0.133 0.013 6.70x10 0.130 0.014 9.82x10

population population Sex (female) Unselected Unselected -4 -5 -39 -4 -5 -39 Age -5.66x10 4.28x10 1.62x10 -5.60x10 4.28x10 8.75x10 -3 -3 (Intercept) -0.173 0.063 5.95x10 -0.176 0.063 5.47x10

-8 -2 1/LOEUF -0.028 0.005 1.87x10 -0.007 0.003 2.64x10 1/LOEUF x Sex -1 -1 0.005 0.008 5.32x10 0.002 0.007 7.50x10

ASD ASD (female) -5 -5 (n=3,941) -0.246 0.061 8.67x10 -0.261 0.063 4.35x10 population population Sex (female) -63 -63 Age -0.013 0.001 3.29x10 -0.013 0.001 1.80x10 Supplementary table 11: Linear regression models including 1/LOEUF x sex interaction as predictor of general intelligence. The model was computed separately for deletions and duplications. The z-scored measure of general intelligence (unadjusted) was the dependent variable and 1/LOEUF (total sum by individual), age (in months) and sex (male as reference level) as well as the 1/LOEUF x sex interaction were the independent ones. All models were adjusted on “type of test-cohort” as fixed effect and on familial relationship as random effect. Est.: estimates; SE: standard error; P: p-value; SYS: Saguenay youth study; LBC: Lothian birth cohort; CaG: Cartagene; G-Scot: generation Scotland; SSC: Simon simplex collection; LOEUF: the upper 95th CI boundary of the ratio of observed over expected number of loss_of-function mutations.

32

Models Deletions Duplications Linear effect of pLI n= Est. (SE) P Est. (SE) P Mixed effect models Main models (random effect on family 24,092 -0.176 (0.016) 1.25 x 10-28 -0.054 (0.009) 1.90 x 10-9 identifier) Deletions and Mixed effect models duplications (random effect on family 24,092 -0.175 (0.016) 1.46 x 10-28 -0.053 (0.009) 2.22 x 10-9 effect in a identifier) unique model* General Mixed effect models population (random effect on family 20,151 -0.181 (0.025) 4.77 x 10-13 -0.058 (0.011) 5.17 x 10-7 subset identifier) Autism Mixed effect models population (random effect on family 3,941 -0.173 (0.026) 2.61 x 10-10 -0.048 (0.017) 5.60 x 10-3 subset identifier) Familial effect was accounted Mixed effect models -29 -9 24,092 -0.176 (0.016) 9.77 x 10 -0.054 (0.009) 1.85 x 10 for using (kinship matrix) kinship matrix Unrelatives Linear regression model in -20 -5 14,874 -0.174 (0.019) 2.54 x 10 -0.043 (0.011) 7.68 x 10 subset unrelated individuals

Derivative Derivative Mean/SD P Mean/SD P (pLI Del.) (pLI Dup.) Kernel semi-parametric Linear kernel -0.085/0.495 -0.035/0.861 model in unrelated 14,874 1.04 x 10-12 8.41 x 10-5 function of pLI = -0.172 = -0.041 individuals

Non linear effect of pLI Est. for Est. for n= quadratic P quadratic effect P effect (SE) (SE) Mixed effect models Quadratic effect (random effect on family 24,092 -0.001 (0.004) 0.85 -0.001 (0.001) 0.46 of pLI identifier) Derivative Derivative mean/SD P mean/SD P (pLI Del.) (pLI Dup.) Gaussian Kernel semi-parametric -0.039/0.495 -11 0.010/0.861 kernel function model in unrelated 14,874 5.93 x 10 0.039 =-0.079 =0.012 of pLI (ρ=10) individuals Supplementary table 12: Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of pLI on general intelligence on general intelligence. Models were performed separately for deletions and duplications except for the one marked with *. The dependent variable was the adjusted z- scored measure of general intelligence and pLI was the independent one. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as random effect when relatives are included in the models. Est.: estimate; SE: standard error; P: p-value; SD: standard deviation; pLI: probability of being loss-of-function intolerant. Gaussian kernel was performed using ρ=0.1, ρ=1 and ρ=10. Only the Gaussian kernel model with the better fit according to AIC was displayed in the table. Of note, according to AIC to compare Gaussian and linear kernel, the linear kernel is the best choice for fitting these data. Mean derivatives divided by standard deviation of the covariate of interest may be interpreted as a beta regression coefficient when assessing the model with linear kernel. When Gaussian kernel is performed, this is just indicative.

33

Models Deletions Duplications Linear effect of 1/LOEUF n= Est. (SE) P Est. (SE) P Mixed effect models 24,092 -2 -3 -2.92×10 -29 -8.73×10 -9 Main models (random effect on family 7.14×10 8.65×10 (2.61×10-3) (1.52×10-3) identifier) Deletions and Mixed effect models 24,092 -2 -3 duplications (random effect on family -2.90×10 -28 -8.58×10 -8 -3 1.14×10 -3 1.40×10 effect in a identifier) (2.60×10 ) (1.51×10 ) unique model General Mixed effect models 20,151 -2 -3 -2.73×10 -11 -9.13×10 -7 population (random effect on family -3 9.27×10 -3 5.85×10 (4.21×10 ) (1.83×10 ) subset identifier) Autism Mixed effect models 3,941 -2 -2 -3.01×10 -11 -4.85×10 -3 population (random effect on family -3 2.56×10 -2 5.60×10 (4.29×10 ) (1.73×10 ) subset identifier) Familial effect -2 -3 was accounted Mixed effect models -2.90×10 -29 -9.00×10 -9 24,092 -3 5.54×10 -3 8.48×10 for using (kinship matrix) (3.00×10 ) (2.00×10 ) kinship matrix -2 -3 Unrelatives Linear regression model in -2.95×10 -21 -7.21×10 -4 14,874 -3 1.08×10 -3 1.02×10 subset unrelated individuals (3.08×10 ) (1.86×10 )

Derivative Derivative mean/SD P mean/SD P (1/LOEUF (1/LOEUF Dup.) Del.) Linear kernel Kernel semi-parametric 14,874 -0.088/3.03 -13 -0.034/5.02 -4 function of model 3.99×10 1.17×10 = -0.029 = -0.007 1/LOEUF in unrelated individuals

Non linear effect of 1/LOEUF Est. for Est. for quadratic P quadratic effect P effect (SE) (SE) Mixed effect models 24,092 -5 -5 Quadratic effect -2.50×10 -1.58×10 (random effect on family -4 0.81 -5 0.66 of 1/LOEUF (1.03×10 ) (3.60×10 ) identifier) Derivative Derivative mean/SD P mean/SD P (1/LOEUF (1/LOEUF Dup.) Del.) Gaussian Kernel semi-parametric 14,874 kernel function model -0.073/3.03 -11 -0.032/5.02 of 1/LOEUF in unrelated individuals 1.29×10 0.13 = -0.024 = -0.006 (DEL: ρ=10; DUP: ρ=1) Supplementary table 13: Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of 1/LOEUF on general intelligence. Models were performed separately for deletions and duplications except for the one marked with *. The dependent variable was the adjusted z- scored measure of general intelligence and pLI was the independent one. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as random effect when relatives are included in the models. SE: standard error; P: p-value; SD: standard deviation; LOEUF: the upper 95th CI boundary of the ratio of observed over expected number of loss-of-function mutations. Gaussian kernel was performed using ρ=0.1, ρ=1 and ρ=10. Only the Gaussian kernel model with the better fit according to AIC was displayed in the table. Of note, according to AIC to compare Gaussian and linear kernel, the linear kernel is the best choice for fitting these data. Mean derivatives divided by standard deviation of the covariate of interest may be interpreted as a beta regression coefficient when assessing the model with linear kernel. When Gaussian kernel is performed, this is just indicative.

34

Models excluding carriers of Models excluding carriers of ID-gene Models excluding carriers of Annotation Type of CNVs recurrent CNVs (n=23,484) CNVs (n=23,967) recurrent or ID-gene CNVs (n=23,416) score Est. SE P Est. SE P Est. SE P

deletions -0.173 0.023 1.98x10-14 -0.157 0.020 1.13 x10-14 -0.113 0.027 3.48 x10-05 pLI duplications -0.061 0.012 1.11 x10-7 -0.037 0.011 5.16 x10-4 -0.047 0.013 3.32 x10-04

deletions -0.028 0.004 2.18 x10-14 -0.026 0.003 4.02 x10-14 -0.018 0.005 6.70 x10-05 1/LOEUF duplications -0.010 0.002 1.13 x10-7 -0.006 0.002 5.81 x10-4 -0.008 0.002 2.43 x10-04

Supplementary table 14 : Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of pLI or 1/LOEUF on general intelligence for 3 sensitivity analyses based on exclusion of individuals carrying recurrent CNV or CNV containing ID-gene. For each sensitivity analysis, the z-scored measure of general intelligence (adjusted) was the dependent variable and annotation score (pLI or 1/LOEUF) for deletions and duplications were the independent ones. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as random effect. Est.: estimates; SE: standard error; P: p-value; pLI: probability of loss-of-function intolerance; LOEUF: the upper 95th CI boundary of the ratio of observed over expected number of loss-of-function mutations.

Model based on pLI Model based on 1/LOEUF Gene categories Est. SE P Est. SE P Deletions Non ID-gene -0.142 0.018 1.27x10-14 -0.023 0.003 4.33 x10-15 Deletions ID-gene -1.025 0.243 2.56x10-5 -0.174 0.035 9.24 x10-7 Duplications Non ID-gene -0.043 0.009 4.93x10-6 -0.007 0.002 2.00 x10-6 Duplications ID-gene -0.650 0.172 1.58x10-4 -0.076 0.026 3.69 x10-3

Supplementary table 15: Linear regression models performed in the mega-analysis to measure the effect of deleted or duplicated units of pLI or 1/LOEUF, by gene category (ID- and non ID-genes) on general intelligence. For both models, the z-scored measure of general intelligence (adjusted) was the dependent variable and annotation score (pLI or 1/LOEUF) computed by category of gene (ID-genes and non ID-genes) and by type of CNV (deletions and duplications) were the independent ones. Models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as random effect. Est.: estimates, SE: standard error; P: p-value; pLI: probability of loss-of-function intolerance. LOEUF: the upper 95th CI boundary of the ratio of observed over expected number of loss-of-function mutations.

35

Point of IQ Tolerance to ID- gene Score value Score Effect size for Del. Effect size for Dup. pLoF gene (n=) Mean Min. Max. Mean Min. Max. Mean Min. Max.

Tolerances No 15644 1.14 0.50 2.86 -0.39 -0.17 -0.99 -0.12 -0.05 -0.30 score LOEUF

≥0.35 Yes 65 1.66 0.53 2.86 -4.33 -1.38 -7.46 -1.89 -0.60 -3.26

1/LOEUF Intolerance No 2575 5.07 2.86 33.33 -1.75 -0.99 -11.50 -0.53 -0.30 -3.50 Score LOEUF <0.35 Yes 167 7.73 2.93 27.03 -20.18 -7.65 -70.54 -8.81 -3.34 -30.81

- - -91 -1.1x10 -3.5x10 No 14432 0.15 5.4x10 0.90 -0.32 90 -1.92 -0.10 91 -0.58 Tolerance Score pLI ≤0.9 - - -45 -1.2x10 -7.5x10 Yes 51 0.30 7.7x10 0.90 -4.63 43 -13.83 -2.93 44 -8.77

pLI No 2833 0.98 0.90 1 -2.09 -1.92 -2.13 -0.63 -0.58 -0.65 Intolerance Score pLI >0.9 Yes 167 0.99 0.90 1 -15.24 -13.90 -15.38 -9.66 -8.82 -9.75

Supplementary table 16 : Distribution of the effect associated with deletion or duplication of genes by gene categories. Estimates are based on models including effect of annotation score (pLI or 1/LOEUF) by gene category (ID- or non ID-genes) and type of CNV (Supplementary table 14). pLI: probability of being loss-of-function intolerant; LOEUF: the upper 95th CI boundary of the ratio of observed over expected number of loss-of-function mutations; Min. Minimum value; Max. maximum value

36

Category of Deletion Duplication Population (n=) genes Est. (SE) P Est. (SE) P Models based on pLI

Probands only (75) All Genes -0.097 (0.031) 2.91×10-3 -0.038 (0.024) 0.13 All (282) All Genes -0.138 (0.026) 2.1×10-7 -0.061 (0.019) 1.8×10-3 ID genes -0.879 (0.441) 0.05 -0.042 (0.457) 0.93 Probands only (75) Non-ID genes -0.066 (0.036) 0.07 -0.037 (0.035) 0.29 ID genes -0.478 (0.470) 0.31 0.303 (0.444) 0.49 All (282) Non-ID genes -0.124 (0.034) 2.1×10-4 -0.076 (0.026) 3.80×10-3 Models based on 1/LOEUF

Probands only (75) All Genes -0.016 (0.005) 2.36×10-3 -0.006 (0.004) 0.12 All (282) All Genes -0.024 (0.004) 5.3×10-8 -0.009 (0.003) 1.8×10-3 ID genes -0.129 (0.073) 0.08 -0.048 (0.069) 0.49 Probands only (75) No ID genes -0.010 (0.006) 0.12 -0.003 (0.005) 0.48 ID genes -0.057 (0.082) 0.48 -0.025 (0.075) 0.74 All (282) No ID genes -0.023 (0.006) 8.3×10-5 -0.009 (0.004) 0.01 Supplementary table 17: Estimates obtained for models performed in Ste-Justine neurodevelopmental cohort The model was computed simultaneously for deletions and duplications for each annotation pLI or 1/LOEUF score. The z-scored measure of general intelligence (unadjusted) was the dependent variable and annotation score was the independent one. All models were adjusted for type of test and sex as fixed effects and for familial kinship matrix as random effect. Est.: estimates; SE: standard error; P: p-value; pLI: probability of loss-of-function intolerance; LOEUF: the upper 95th CI boundary of the ratio of observed over expected number of loss-of-function mutations. .

37

Effect size of z-score for general intelligence Prediction of effect size Prediction to be de novo n Freq. to be Name Name detail Chr START STOP TYPE ID gene 56 pLI and ID- 1/LOEUF and pLI and ID- LOEUF and Genes In UKBB In Lit. based on Mean de novo gene ID-gene gene ID-gene 1 TAR chr1 145390000 145810000 Del. 15 -0.11 -0.73 OR19 -0.42 -0.48 -0.5 0.219 0.39 0.39 2 1q21.1 chr1 146530000 147390000 Del. 7 -0.27 -1.01 NVIQ19 -0.64 -0.24 -0.21 0.23119 0.22 0.2 3 2q11.2 chr2 96740000 97680000 Del. 18 -0.13 -0.13 -0.83 -1.03 4 2q13 chr2 111390000 112010000 Del. 3 -0.16 -0.16 -0.13 -0.12 5 NRXN1 chr2 50140000 51260000 Del. 1 -0.19 -0.19 -0.14 -0.09 6 2q13 (NPHP1) chr2 110860000 110980000 Del. 1 -0.01 -0.01 0 -0.02 7 3q29 (DLG1) chr3 195734000 197340000 Del. 21 -2.1 FSIQ19 -2.1 -0.93 -0.97 0.8419 0.74 0.74 7q11.23 8 (William- chr7 72722981 74155278 Del. 23 -2.09 NVIQ19 -2.09 -1.31 -1.11 0.9719 0.91 0.82 Beuren) 9 8p23.1 chr8 8100055 11764629 Del. 26 -2.39 FSIQ19 -2.39 -0.46 -0.61 119 0.37 0.47 10 10q11.21q11.23 chr10 49390000 51060000 Del. 16 WDFY4 -0.18 -0.18 -0.23 -1.03 11 13q12.12 chr13 23560000 24880000 Del. 5 -0.16 -0.16 -0.01 -0.14 12 13q12 (CRYL1) chr13 20980000 21100000 Del. 0a 0.01 0.01 0 0

15q13.3 (BP4- 19 19 13 chr15 30918248 32515000 Del. 7 -1.46 OR -1.46 -0.24 -0.22 0.282 0.22 0.2 BP5) 14 15q11.2 chr15 22810000 23090000 Del. 4 -0.15 -0.6 NVIQ19 -0.38 -0.24 -0.2 0.09119 0.22 0.19 15 16p11.2-p12.2 chr16 21605180 29315879 Del. 67 -3.33 FSIQ -3.33 -2.9 -3.38 0.92 1 1 16p13.3 ATR-16 16 chr16 60001 834372 Del. 43 CAPN15 -2.25 FSIQ -2.25 -1.81 -1.59 0.94 0.98 0.96 syndrome 17 16p11.2 chr16 29650000 30200000 Del. 27 MAPK3 -0.34 -1.61 NVIQ19 -0.97 -1.37 -1.18 0.5719 0.92 0.85 18 16p11.2 distal chr16 28820000 29050000 Del. 9 -0.41 -0.81 OR19 -0.61 -0.54 -0.5 0.36419 0.43 0.38 19 16p13.11 chr16 15510000 16290000 Del. 6 -0.26 -0.73 OR19 -0.49 -0.32 -0.41 0.25619 0.27 0.32 20 16p12.1 chr16 21950000 22430000 Del. 7 -0.1 -0.5 OR19 -0.3 -0.08 -0.2 0.03919 0.14 0.19 17p11.2 (Smith- 21 chr17 17000000 21450000 Del. 58 RAI1 -2.93 NVIQ19 -2.93 -3.03 -3.26 0.95619 1 1 Magenis) 22 17q12 chr17 34815551 36249430 Del. 15 -0.77 FSIQ19 -0.77 -0.72 -0.87 0.69619 0.58 0.68 23 17q21.31 chr17 43700000 44340000 Del. 5 KANSL1 -3.4 FSIQ19 -3.4 -1.17 -0.83 0.97819 0.86 0.65 NF1- 24 microdeletion chr17 29107097 30263321 Del. 13 -1.54 FSIQ -1.54 -0.85 -0.6 0.882 0.68 0.47 syndrome 25 17p12 (HNPP) chr17 14140000 15430000 Del. 4 0.01 0 OR19 0.01 -0.16 -0.12 0.08319 0.18 0.15 26 22q11.2 chr22 18893541 21901736 Del. 49 -1.9 FSIQ -1.9 -1.73 -1.92 0.89 0.98 0.98 27 TAR chr1 145390000 145810000 Dup. 15 -0.12 -0.12 -0.15 -0.16 28 1q21.1 chr1 146530000 147390000 Dup. 7 -0.08 -0.08 -0.07 -0.07 0.227 0.08 0.08 29 2q21.1 chr2 131480000 131930000 Dup. 4 0.07 0.07 -0.01 -0.04 30 2q13 chr2 111390000 112010000 Dup. 3 -0.02 -0.02 -0.04 -0.04 31 2q13 (NPHP1) chr2 110860000 110980000 Dup. 1 0.01 0.01 0 -0.01 32 7q11.23 chr7 72700000 74100000 Dup. 24 -0.93 NVIQ -0.93 -0.4 -0.36 0.613 0.21 0.2 33 10q11.21q11.23 chr10 49390000 51060000 Dup. 16 WDFY4 0.03 0.03 -0.07 -0.41

38

34 13q12.12 chr13 23560000 24880000 Dup. 5 -0.01 -0.01 0 -0.04 15q11q13 (BP3- 35 chr15 29160000 30380000 Dup. 3 -0.18 -0.18 -0.05 -0.12 BP4) 36 15q11.2 chr15 22810000 23090000 Dup. 4 -0.03 -0.05 NVIQ -0.04 -0.07 -0.06 0.022 0.08 0.08 37 15q13.3 chr15 31080000 32460000 Dup. 5 -0.09 -0.09 -0.07 -0.06

15q13.3 b 38 chr15 32020000 32460000 Dup. 0 0.01 0.01 0 0 (CHRNA7) 39 16p11.2 chr16 29650000 30200000 Dup. 27 MAPK3 -0.41 -0.81 NVIQ -0.61 -0.57 -0.42 0.232 0.32 0.23 40 16p11.2 distal chr16 28820000 29050000 Dup. 9 -0.2 -0.2 -0.16 -0.16 41 16p13.11 chr16 15510000 16290000 Dup. 6 -0.06 -1.09 NVIQ -0.58 -0.1 -0.13 0.049 0.09 0.1 42 16p12.1 chr16 21950000 22430000 Dup. 7 -0.09 -0.09 -0.02 -0.06 43 17p11.2 chr17 17000000 21400000 Dup. 58 RAI1 -3.28 NVIQ -3.28 -1.29 -1.25 0.857 0.83 0.84 44 17q12 (HNF1B) chr17 34810000 36220000 Dup. 15 -0.18 -0.704 NVIQ -0.44 -0.22 -0.28 0.1 0.13 0.16 45 17p12 (CMT1A) chr17 14140000 15430000 Dup. 4 -0.04 -0.04 -0.05 -0.04 DYRK1A, KCNJ6, SON, 46 Trisomic 21 chr21 1 48100000 Dup. 222 -3.3 FSIQ -3.3 -4.62 -3.68 1 1 1 DSCAM 47 22q11.2 chr22 19040000 21470000 Dup. 42 -0.32 -1.51 FSIQ -0.91 -0.45 -0.55 0.923 0.24 0.32

Supplementary table 18: Description of the effect of 47 recurrent CNVs on general intelligence and probability of being de novo, estimated using empirical data from the literature and/or UKBB and using our models. Most of these CNV are described in Kendall et al. 201914 and in Huguet et al. 201819. The others are described in supplementary table 23. IQ loss was estimated form OR according to the method presented in Huguet et al. 201819. The de novo frequency is defined by Decipher or extracted from Huguet et al. 201819. a) partially CRYL1; b) partially OTUD7A and CHRNA7. Chr: chromosome, START: start of CNV, STOP: stop of CNV, TYPE: deletion (Del.) or duplication (Dup.), Lit.: Literature, Freq.: Frequency, OR: odds-ratio, FSIQ: Full Scale Intelligence Quotient.

39

N CNVs de novo N CNVs with a N CNVs with a N CNVs N de novo CNVs N non-exonic CNVs N ID-gene CNVs with a 1/LOEUF = 0 1/LOEUF ≥ 1/0.35 1/LOEUF = 0 Cohort n ind. Del. Dup. Del. Dup. Del. Dup. Del. Dup. Del. Dup. Del. Dup. Del. Dup.

DECIPHER 10,126 6,500 5,670 3,451 1,418 1,580 1,273 1,341 784 4,246 3,258 1,592 1,300 284 106

G-Scot 810 610 740 5 15 422 221 2 9 24 104 514 386 3 6

MSSNG 956 797 1,074 17 19 533 360 7 22 26 89 665 629 5 5

SSC 2,053 1,966 2,182 51 101 1,311 711 28 58 109 275 1,630 1,254 24 17 probands SSC 1,866 1,723 1,949 53 6 1,187 655 3 47 51 201 1,457 1,143 15 13 siblings Ste- Justine 1,560 924 852 403 140 205 189 156 130 645 524 205 194 34 7 UHC SYS 723 594 856 6 8 385 307 4 2 32 121 513 451 3 4

Total 18,094 13,114 13,323 3,986 1,707 5,623 3,716 1,541 1,052 5,133 4,572 6,576 5,357 368 158

Supplementary table 19 : Detailed table of content for the de novo analysis using Decipher, Sainte-Justine

UHC, MSSNG, SSC, Imagen, Generation Scotland, and SYS cohorts. N: Number; CNVs: Copy number variants; DEL: Deletions; DUP: Duplications; G-Scot: generation Scotland; SSC: Simon simplex collection; Ste Justine: Sainte-Justine UHC cytogenetic database; SYS: Saguenay youth study; LOEUF : the upper 95th CI boundary of the ratio of observed over expected number of LoF mutations.

40

Del. Dup. Type of Y value Type of X value proba (%) 95%CI P proba (%) 95%CI P for all genes 18.24 [17.46-19.05] <1.00x10-314 8.10 [7.62-8.61] 1.78x10-270 Per point of intolerance score for ID-genes 49.14 [43.22-55.09] <1.00x10-314 17.32 [14.36-20.73] 1.51x10-25

-314 -169

for non-ID-genes 17.22 [16.46-18.01] <1.00x10 7.84 [7.26-8.35] 6.30x10

-314 -270 pLI for all genes 13.86 [13.18-14.56] <1.00x10 8.58 [8.08-9.11] 1.78x10 Per point of NVIQ computed adjusted for ID-genes 14.12 [13.44-14.83] <1.00x10-314 8.56 [8.06-9.08] 5.77x10-263 with intolerance score adjusted for ID-genes without -314 -228 13.75 [13.05-14.49] <1.00x10 8.33 [7.82-8.86] 2.015x10 recurrent CNVs for all genes 12.09 [11.45-12.75] <1.00x10-314 6.62 [6.19-7.09] 9.02x10-260 Per point of intolerance score for ID-genes 14.67 [13.80-15.58] 1.94x10-65 7.21 [6.71-7.74] 4.63x10-24

for non-ID-genes 11.7 [11.07-12.36] <1.00x10-314 6.59 [6.15-7.05] 2.59x10-182 for all genes 13.34 [12.67-14.04] <1.00x10-314 8.57 [8.07-9.10] 9.02x10-260 Per point of NVIQ computed 1/LOEUF adjusted for ID-genes 13.49 [12.82-14.19] <1.00x10-314 8.56 [8.07-9.09] 1.66x10-238 with intolerance score adjusted for ID-genes without -314 -204 13.12 [12.43-13.84] <1.00x10 8.35 [7.85-8.88] 2.01x10 recurrent CNVs

Supplementary table 20 : Results of the probability of being de novo in function of pLI and 1/LOEUF.

Models were computed independently for deletions and duplications with pLI or 1/LOEUF score as explanatory variables. The status of de novo or inherited variant was the dependent variable. Estimates of the probability of being de novo (proba), 95% of confidence interval (95%CI) and p- values were computed using a logistic regression. DEL: Deletions; DUP: Duplications; pLI: probability of loss-of-function intolerance; LOEUF : the upper 95th CI boundary of the ratio of observed over expected number of LoF mutations.

41

Model with deletion Model with duplication Variable Est. SE P Est. SE P nb of highly intolerant genes -0.378 0.102 2.20x10-04 0.015 0.054 0.779 (LOEUF <0.2)

nb of moderately intolerant genes -04 -0.25 0.066 1.48x10 -0.098 0.031 0.002 (0.2≤LOEUF<0.35) nb of tolerant genes -0.039 0.013 0.002 -0.009 0.008 0.261 (0.35≤LOEUF<1) nb of genes highly tolerant 0.011 0.011 0.316 -0.006 0.005 0.275 (LOEUF≥1)

Supplementary table 21: Estimated effects of deleted or duplicated individual genes on general intelligence according to 4 categories of tolerance to pLOF defined by LOEUF. The model was computed separately for deletion or duplication. The z-scored measure of general intelligence (adjusted) was the dependent variable and number of genes within each category were the 4 independent ones. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as random effect. Est.: estimate; SE: standard error; P: p-value. LOEUF : the upper 95th CI boundary of the ratio of observed over expected number of LoF mutations.

42

Deletions Duplications Window Interval Selection Est. (SE) P P adj Sum of genes Est. P P adj Sum of genes in the window -0.739 (0.13) 1.35x10-08 3.78x10-07 68 -0.042 (0.074) 5.71x10-01 8.68x10-01 233 [0;0.15] out of the window -0.027 (0.005) 1.07x10-08 1.70x10-08 6317 -0.012 (0.002) 1.21x10-06 5.10x10-06 22010 in the window -0.488 (0.097) 4.96x10-07 6.28x10-06 100 -0.022 (0.052) 6.69x10-01 8.70x10-01 387 [0.05;0.2] out of the window -0.026 (0.005) 9.91x10-08 1.34x10-07 6285 -0.012 (0.003) 2.79x10-06 8.83x10-06 21856 in the window -0.38 (0.083) 4.32x10-06 2.73x10-05 165 -0.09 (0.036) 1.32x10-02 1.25x10-01 670 [0.1;0.25] out of the window -0.022 (0.006) 1.77x10-04 1.82x10-04 6220 -0.008 (0.003) 3.60x10-03 3.90x10-03 21573 in the window -0.372 (0.078) 2.16x10-06 2.05x10-05 158 -0.112 (0.034) 1.12x10-03 2.13x10-02 695 [0.15;0.3] out of the window -0.023 (0.006) 4.01x10-05 4.23x10-05 6227 -0.007 (0.003) 8.80x10-03 9.29x10-03 21548 in the window -0.347 (0.062) 1.99x10-08 3.78x10-07 324 -0.097 (0.029) 9.25x10-04 2.13x10-02 1028 [0.2;0.35] out of the window -0.017 (0.006) 5.27x10-03 5.27x10-03 6061 -0.007 (0.003) 2.46x10-02 2.46x10-02 21215 in the window -0.204 (0.044) 4.12x10-06 2.73x10-05 454 -0.031 (0.027) 2.55x10-01 6.06x10-01 1182 [0.25;0.4] out of the window -0.025 (0.005) 4.29x10-06 4.94x10-06 5931 -0.011 (0.003) 1.60x10-04 2.34x10-04 21061 in the window -0.153 (0.04) 1.32x10-04 7.15x10-04 547 -0.015 (0.025) 5.51x10-01 8.68x10-01 1331 [0.3;0.45] out of the window -0.027 (0.006) 3.15x10-06 3.74x10-06 5838 -0.012 (0.003) 1.07x10-04 1.73x10-04 20912 in the window -0.128 (0.049) 9.16x10-03 2.49x10-02 480 0.005 (0.027) 8.63x10-01 9.37x10-01 1173 [0.35;0.5] out of the window -0.03 (0.007) 6.20x10-06 6.93x10-06 5905 -0.014 (0.003) 2.02x10-05 4.04x10-05 21070 in the window -0.081 (0.045) 7.08x10-02 1.12x10-01 434 -0.038 (0.028) 1.73x10-01 5.27x10-01 1170 [0.4;0.55] out of the window -0.034 (0.006) 3.73x10-08 5.45x10-08 5951 -0.01 (0.003) 1.60x10-03 2.03x10-03 21073 in the window -0.02 (0.047) 6.67x10-01 7.04x10-01 406 -0.041 (0.028) 1.45x10-01 5.02x10-01 1212 [0.45;0.6] out of the window -0.041 (0.006) 9.36x10-11 1.87x10-10 5979 -0.01 (0.003) 2.09x10-03 2.56x10-03 21031 in the window -0.044 (0.041) 2.79x10-01 3.31x10-01 389 -0.067 (0.025) 8.75x10-03 1.11x10-01 1282 [0.5;0.65] out of the window -0.038 (0.006) 3.40x10-11 8.08x10-11 5996 -0.008 (0.003) 1.63x10-02 1.67x10-02 20961 in the window -0.067 (0.04) 9.86x10-02 1.42x10-01 581 -0.05 (0.023) 2.81x10-02 1.78x10-01 2038 [0.55;0.7] out of the window -0.036 (0.006) 8.20x10-10 1.53x10-09 5804 -0.009 (0.003) 3.03x10-03 3.39x10-03 20205 in the window -0.104 (0.039) 8.21x10-03 2.40x10-02 596 -0.045 (0.023) 4.62x10-02 2.20x10-01 2126 [0.6;0.75] out of the window -0.032 (0.006) 2.75x10-08 4.17x10-08 5789 -0.009 (0.003) 2.39x10-03 2.84x10-03 20117 in the window -0.115 (0.04) 3.86x10-03 1.47x10-02 631 -0.026 (0.02) 1.95x10-01 5.29x10-01 2469 [0.65;0.8] out of the window -0.03 (0.006) 4.25x10-07 5.39x10-07 5754 -0.011 (0.003) 3.74x10-04 5.07x10-04 19774 in the window -0.169 (0.049) 4.91x10-04 2.33x10-03 447 0.001 (0.024) 9.65x10-01 9.65x10-01 1814 [0.7;0.85] out of the window -0.026 (0.006) 3.80x10-05 4.13x10-05 5938 -0.014 (0.003) 3.08x10-05 5.86x10-05 20429 in the window -0.117 (0.044) 8.08x10-03 2.40x10-02 536 -0.001 (0.023) 9.60x10-01 9.65x10-01 1890 [0.75;0.9] out of the window -0.03 (0.006) 2.65x10-06 3.25x10-06 5849 -0.013 (0.003) 1.09x10-04 1.73x10-04 20353 in the window -0.077 (0.042) 6.48x10-02 1.12x10-01 534 0.005 (0.024) 8.34x10-01 9.32x10-01 1698 [0.8;0.95] out of the window -0.035 (0.006) 6.39x10-09 1.06x10-08 5851 -0.014 (0.003) 3.63x10-05 6.56x10-05 20545 in the window -0.052 (0.038) 1.71x10-01 2.24x10-01 635 -0.008 (0.019) 6.87x10-01 8.70x10-01 2095 [0.85;1] out of the window -0.037 (0.006) 8.47x10-10 1.53x10-09 5750 -0.013 (0.003) 9.99x10-05 1.72x10-04 20148 in the window -0.089 (0.041) 2.80x10-02 5.99x10-02 754 0.014 (0.022) 5.34x10-01 8.68x10-01 2210 [0.9;1.05] out of the window -0.033 (0.006) 1.14x10-07 1.49x10-07 5631 -0.015 (0.003) 6.26x10-06 1.59x10-05 20033 in the window -0.081 (0.038) 3.36x10-02 6.72x10-02 798 -0.014 (0.021) 5.12x10-01 8.68x10-01 2315 [0.95;1.1] out of the window -0.034 (0.006) 3.91x10-09 6.76x10-09 5587 -0.012 (0.003) 2.69x10-04 3.79x10-04 19928 in the window -0.136 (0.043) 1.70x10-03 7.20x10-03 659 -0.025 (0.022) 2.53x10-01 6.06x10-01 1947 [1;1.15] out of the window -0.03 (0.006) 8.84x10-08 1.24x10-07 5726 -0.011 (0.003) 4.39x10-04 5.75x10-04 20296 [1.05;1.2] in the window -0.085 (0.041) 3.75x10-02 7.12x10-02 630 -0.046 (0.02) 2.40x10-02 1.78x10-01 2541

43

out of the window -0.035 (0.005) 8.81x10-11 1.86x10-10 5755 -0.009 (0.003) 2.94x10-03 3.39x10-03 19702 in the window -0.087 (0.043) 4.21x10-02 7.62x10-02 597 0.003 (0.019) 8.91x10-01 9.41x10-01 3358 [1.1;1.25] out of the window -0.035 (0.005) 6.13x10-11 1.37x10-10 5788 -0.014 (0.003) 4.12x10-06 1.20x10-05 18885 in the window 0.031 (0.043) 4.65x10-01 5.19x10-01 629 0.009 (0.014) 5.05x10-01 8.68x10-01 4171 [1.15;1.3] out of the window -0.045 (0.005) 9.75x10-17 2.47x10-16 5756 -0.014 (0.003) 4.47x10-08 5.77x10-07 18072 in the window 0.051 (0.046) 2.67x10-01 3.27x10-01 526 0.007 (0.014) 6.26x10-01 8.70x10-01 3325 [1.2;1.35] out of the window -0.045 (0.005) 1.53x10-18 5.30x10-18 5859 -0.014 (0.003) 6.07x10-08 5.77x10-07 18918 in the window 0.053 (0.047) 2.53x10-01 3.21x10-01 470 0.014 (0.021) 5.07x10-01 8.68x10-01 2259 [1.25;1.4] out of the window -0.045 (0.005) 3.41x10-19 1.44x10-18 5915 -0.014 (0.003) 1.57x10-07 9.93x10-07 19984 in the window 0.035 (0.048) 4.57x10-01 5.19x10-01 416 0.016 (0.026) 5.38x10-01 8.68x10-01 1313 [1.3;1.45] out of the window -0.043 (0.005) 5.89x10-19 2.24x10-18 5969 -0.014 (0.003) 6.28x10-07 3.41x10-06 20930 in the window 0.09 (0.049) 6.80x10-02 1.12x10-01 376 0.054 (0.027) 4.41x10-02 2.20x10-01 1332 [1.35;1.5] out of the window -0.048 (0.005) 1.01x10-19 4.80x10-19 6009 -0.017 (0.003) 8.56x10-09 3.25x10-07 20911 in the window 0.128 (0.047) 7.09x10-03 2.40x10-02 441 0.043 (0.029) 1.35x10-01 5.02x10-01 1283 [1.4;1.55] out of the window -0.052 (0.006) 5.03x10-21 3.19x10-20 5944 -0.016 (0.003) 1.08x10-07 8.18x10-07 20960 in the window 0.121 (0.049) 1.39x10-02 3.51x10-02 431 0.043 (0.027) 1.15x10-01 4.88x10-01 1233 [1.45;1.6] out of the window -0.051 (0.005) 1.41x10-20 7.65x10-20 5954 -0.016 (0.003) 5.03x10-08 5.77x10-07 21010 in the window 0.11 (0.048) 2.14x10-02 5.08x10-02 414 0.005 (0.024) 8.25x10-01 9.32x10-01 1349 [1.5;1.65] out of the window -0.049 (0.005) 1.50x10-21 1.96x10-20 5971 -0.014 (0.003) 2.57x10-06 8.83x10-06 20894 in the window 0.065 (0.044) 1.38x10-01 1.88x10-01 411 -0.01 (0.024) 6.79x10-01 8.70x10-01 1504 [1.55;1.7] out of the window -0.045 (0.005) 3.47x10-21 2.64x10-20 5974 -0.012 (0.003) 1.36x10-05 2.86x10-05 20739 in the window 0.012 (0.04) 7.64x10-01 7.85x10-01 485 -0.033 (0.024) 1.80x10-01 5.27x10-01 1529 [1.6;1.75] out of the window -0.042 (0.005) 4.18x10-18 1.32x10-17 5900 -0.011 (0.003) 1.36x10-04 2.07x10-04 20714 in the window -0.002 (0.04) 9.61x10-01 9.61x10-01 478 -0.006 (0.022) 7.99x10-01 9.32x10-01 1438 [1.65;1.8] out of the window -0.041 (0.005) 1.74x10-17 5.10x10-17 5907 -0.013 (0.003) 2.03x10-06 7.73x10-06 20805 in the window -0.018 (0.036) 6.28x10-01 6.82x10-01 612 -0.009 (0.021) 6.44x10-01 8.70x10-01 1438 [1.7;1.85] out of the window -0.04 (0.005) 3.11x10-17 8.43x10-17 5773 -0.012 (0.003) 5.29x10-06 1.44x10-05 20805 in the window 0.057 (0.034) 9.19x10-02 1.40x10-01 661 0.004 (0.018) 8.14x10-01 9.32x10-01 1902 [1.75;1.9] out of the window -0.047 (0.005) 2.16x10-21 2.05x10-20 5724 -0.014 (0.003) 7.65x10-07 3.63x10-06 20341 in the window 0.047 (0.029) 1.01x10-01 1.42x10-01 778 -0.011 (0.015) 4.44x10-01 8.68x10-01 2175 [1.8;1.95] out of the window -0.048 (0.005) 1.55x10-21 1.96x10-20 5607 -0.012 (0.003) 7.45x10-06 1.76x10-05 20068 in the window 0.071 (0.032) 2.84x10-02 5.99x10-02 598 -0.013 (0.015) 3.75x10-01 8.37x10-01 2046 [1.85;2] out of the window -0.05 (0.005) 2.63x10-22 1.00x10-20 5787 -0.012 (0.003) 7.89x10-06 1.76x10-05 20197

Supplementary table 22: Estimated effects of deleted of duplicated individual genes on general intelligence according to several moving categories defined by a sliding window on LOEUF. We used a sliding window to define 2 categories of genes by LOEUF, one in the window and other out the window. We evaluated the effect sizes of genes within 37 windows with a sliding of 0.05 for each step. The model was computed separately for deletion or duplication. The z-scored measure of general intelligence (adjusted) was the dependent variable number of gene in and out of the window were the 2 independent ones. All models were adjusted on “type of test-cohort” variable as fixed effect and on familial relationship as random effect. Est.: estimate; SE: standard error; P: p-value. LOEUF : the upper 95th CI boundary of the ratio of observed over expected number of LoF mutations.

44

FSIQ VIQ NVIQ Final n= Name Name detail Loci IQ References IQ test used loss loss loss z-score cases mean mean mean value 15 16p11.2-p12.2 chr16:21605180-29315879 (Del.) Hempel et al 200977 - 6 50 - - -3.33 a Milone et al 201678 Leiter-R 1 32 - - Wilkie et al 199079 - 8 34 - - 16 16p13.3 ATR-16 syndrome chr16:60001-834372 (Del.) Tam et al 201380 WISC IV 1 16 - - -2.25 a Gibson et al 200881 - 4 38.25 - - Weighted mean 14 33.79 - - 24 NF1-microdeletion syndrome chr17:29107097-30263321 (Del.) Mautner et al 201082 - 21 23.1 - - -1.54 a Mervis et al 201583 DAS-II 63 17.95 15.86 13.36 Sanders et al 201184 - 4 16 - - 85 WISC-III, WPPSI-III, BSID-III, b 32 7q11.23 chr7:72700000-74100000 (Dup.) Berg et al 2007 4 - 35.25 22.25 -0.93 DAS Castiglia et al 201886 WISC-III, K-ABC, CFT-20 10 41.5 - - Weighted mean 81 20.91 17.02 13.89 36 15q11.2 chr15:22810000-23090000 (Dup.) Stefansson et al 201413 WASI-I 136 - 0.9 0.75 -0.05 b 25 D'Angelo et al 2016 b 39 16p11.2 chr16:29650000-30200000 (Dup.) WASI 39 to 40 6.8 5.1 8.2 -0.55 (USA adult carrier cohort) Stefansson et al 20141 WASI-I 22 - 10.95 16.35 Siu et al 201687 WAIS-III 3 10.67 - - 41 16p13.11 chr16:15510000-16290000 (Dup.) -1.09 b Ullman et al 200788 - 2 37 - - Weighted mean 27 21.2 10.95 16.35 Treadwell-Deering et al 89 MSEL, SB-IV, SB-V, WISC-III 14 47.64 - - 2010 43 17p11.2 chr17:17000000-21400000 (Dup.) Potocki et al 200790 SB-IV, MSEL 7 51 48 49 -3.28 b Greco et al 200891 WISC-III, Leiter-R 2 to 3 52 47 50 Weighted mean 10 to 14 49.17 47.78 49.22 Stefansson et al 201413 WASI-I 7 12 6 6.5 44 17q12 (HNF1B) chr17:34810000-36220000 (Dup.) Verhoeven et al 201792 WAIS-III 1 49 52 39 -0.70 b Weighted mean 8 16.63 11.75 10.56 Devenny et al 200093 WISC-R 44 46,89 - - Nicham et al 200394 HAWIK-III, HAWIE-R 20 53.15 45.33 51 Capone et al 200595 SB-IV 33 59.4 - - 46 Trisomic 21 chr21:1-48100000 (Dup.) -3.30 b Breia et al 201496 WAIS-III 26 50.35 47.73 49.23 Breslin et al 201497 KBIT-II 12 51.08 45.58 47.33 Weighted mean 135 51.91 46.46 49.45 Courtens et al 200898 - 2 17.5 16.5 13.5 Rochebrochard et al 200699 WAIS-III 1 28 29 24 47 22q11.2 chr22:19040000-21470000 (Dup.) Van Copenhaut et al -1.51 a 100 BSID, WISC-III 8 23.19 - - 2012 Weighted mean 11 22.6 20.67 17

Supplementary table 23 : Empirical data on recurrent CNVs

45

a) value based on FSIQ; b) value based on NVIQ. FSIQ: Full Scale Intelligence Quotient; VIQ: Quotient, NVIQ: Non Verbal Intelligence Quotient.

46

EXTENDED DATA FIGURES

Supplementary figure 1: Distribution of z-scored general intelligence measured either by NVIQ or g- factor according to the age of individuals and colored by cohort. Z-scored measures of general intelligence were obtained by z-scoring the NVIQ with a mean of 100 and a standard deviation of 15 and z- scored g-factor was obtained separately for each cohort using mean and standard deviation of the cohort; NVIQ: non-verbal intelligence quotient; g-factor: general factor; SSC: Simon simplex collection; SYS: Saguenay youth study; G-Scot: generation Scotland, CaG: Cartagene; LBC: Lothian birth cohort.

47

a b Individual pLI Deletion, pLI > 0.9 n= Total pLI

0.4 threshold Deletion, pLI <= 0.9 2063 1217 All Duplication, pLI > 0.9 6081 4137 Duplication, pLI <= 0.9 Del and Dup > 0.9 2055 1120 pLI < 10 Del and Dup <= 0.9 6056 3800 0.3 2009 803 pLI < 5 5982 3262

1978 690 pLI < 3 0.2 5833 2715

1942 602 pLI < 2 5607 2167

1828 400 0.1 genes totally included in a CNV pLI < 1.5 5298 1614 Proportion, by pLI category, of different 1776 335 pLI < 1 5004 1254 −0.177 −0.054 0.0

0 5000 10000 15000 20000 25000 −0.2 0.0 0.2 Estimate and its 95%CI associated with Number of individuals included in the study pLI effect in linear mixed model including all subjects

Supplementary figure 2: Sensitivity analyses for model based on pLI score. a. Estimated proportion of the coding genome, within each category defined by pLI, encompassed in CNVs present in the mega-analysis according to sample size (randomly selected within the mega-analysis). b. Estimated effect of pLI on general intelligence after removing individuals with a sum of pLI larger that 10, 5, 3, 2, 1.5 and 1. n: number of individuals with total sum of pLI > 0.

Supplementary figure 3: Effect size associated with pLI divided by category of intellectual disability (ID) genes on general intelligence. The z-scored measure of general intelligence (adjusted) was the dependent variable. In the first model, the 2 predictors are the sum of pLI for deletion and the sum of pLI for duplications. In the second model, the 4 predictors are the sum of pLI for ID genes and non-ID genes for deletions and duplication. Both models were adjusted for “type of test-cohort” variable as fixed effect and on familial relationship as random effect. pLI: probability of loss-of-function intolerance; CI: confidence interval.

48

2.0 ICC = 0.36 [-0.12;0.7], P = 0.10

1 - TAR (Del.) 17 2 - 1q21.1 (Del.) 47 14 - 15q11.2 (Del.) 1.5 17 - 16p11.2 (Del.) 18 - 16p11.2 distal (Del.) 19 - 16p13.11 (Del.) 41 2 20 - 16p12.1 (Del.) 1.0 25 - 17p12 (Del., HNPP) 18 39 36 - 15q11.2 (Dup.) 1 44 19 14 39 - 16p11.2 (Dup.) 20 41 - 16p13.11 (Dup.) 0.5 44 - 17q12 (Dup.) 47 - 22q11.2 (Dup.)

36 Recurrent CNV without ID gene Z-scored loss of NVIQ reported in the literature 25 0.0 Recurrent CNV with ID gene

0.0 0.2 0.4 0.6 0.8 1.0 Z-scored loss of average effect size on cognitive functions reported in UKBB

Supplementary figure 4 : Concordance between observation from literature and UKBB for CNV effects on general intelligence. X and Y values: effect size of CNVs on z-scored general intelligence. Concordance between literature reports for general intelligence loss observed in clinically and UKBB ascertained carriers of 13 recurrent CNVs (Supplementary table 18). Each point represents a recurrent CNV, red and blue points are deletions and duplications respectively. Empty circles are CNVs encompassing ID-genes. The model uses 2 explanatory variables (LOEUF of non-ID-genes and ID-genes). ICC indicates intraclass correlation coefficient (3, 1).

49

a b 46

-11 ICC = 0.81 [0.7;0.88], P = 2.34x10 -6 4

ICC Del. = 0.77 [0.59;0.87], P = 1.46x10 0.6

-7 39 ICC Dup. = 0.84 [0.69;0.92], P = 4.76x10 18

0.5 1 21 3 47 15

32 0.4

19 2 0.3 16 26 14 2 10 17 44 8 43 0.2 Z-scored loss of general intelligence predicted by pLI and ID genes model 23 25 40 1 7 27 5 3 24 4 Part B 22 41 0.1 36 28 20 47 9 33 37 45 35 13 30 42 29 11 0

0.0 34,31,38 12 6

0 1 2 3 4 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Z-scored loss of general intelligence estimated in literature Z-scored loss of general intelligence estimated in literature

Supplementary figure 5 : Concordance between model predictions and published observations for CNV effects on general intelligence. a. and b. Concordance between model estimates (with pLI and ID-genes) and literature of clinical data and UKBB reports for general intelligence loss observed in respectively 27 and 33 recurrent CNVs for a total of ascertained carriers of 47 recurrent CNVs (supplementary table 15). X- and Y-values: effect size of CNVs on z-scored general intelligence. b. Zoom of the rectangle drawn in the lower left section of panel a. We represented values from clinical data by a circle and those from UKBB data by a square. The cross represents the mean value of z-scored IQ loss for the 13 recurrent CNVs observed both in literature and in UKBB. Deletions are in red and duplications in blue. Empty circles or square are CNVs encompassing ID-genes. The model uses 2 explanatory variables (pLI of non-ID-genes and ID-genes). ICC indicates intraclass correlation coefficient (3, 1). Each point represents a recurrent CNV: (1) TAR Deletion; (2) 1q21.1 Deletion; (3) 2q11.2 Deletion; (4) 2q13 Deletion; (5) NRXN1 Deletion; (6) 2q13 (NPHP1) Deletion; (7) 3q29 (DLG1) Deletion; (8) 7q11.23 (William-Beuren) Deletion; (9) 8p23.1 Deletion; (10) 10q11.21q11.23 Deletion; (11) 13q12.12 Deletion; (12) 13q12 (CRYL1) Deletion; (13) 15q13.3 (BP4-BP5) Deletion; (14) 15q11.2 Deletion; (15) 16p11.2-p12.2 Deletion; (16) 16p13.3 ATR-16 syndrome Deletion; (17) 16p11.2 Deletion; (18) 16p11.2 distal Deletion; (19) 16p13.11 Deletion; (20) 16p12.1 Deletion; (21) 17p11.2 (Smith-Magenis) Deletion; (22) 17q12 Deletion; (23) 17q21.31 Deletion; (24) NF1-microdeletion syndrome Deletion; (25) 17p12 (HNPP) Deletion; (26) 22q11.2 Deletion; (27) TAR Duplication; (28) 1q21.1 Duplication; (29) 2q21.1 Duplication; (30) 2q13 Duplication; (31) 2q13 (NPHP1) Duplication; (32) 7q11.23 Duplication; (33) 10q11.21q11.23 Duplication; (34) 13q12.12 Duplication; (35) 15q11q13 (BP3-BP4) Duplication; (36) 15q11.2 Duplication; (37) 15q13.3 Duplication; (38) 15q13.3 (CHRNA7) Duplication; (39) 16p11.2 Duplication; (40) 16p11.2 distal Duplication; (41) 16p13.11 Duplication; (42) 16p12.1 Duplication; (43) 17p11.2 Duplication; (44) 17q12 (HNF1B) Duplication; (45) 17p12 (CMT1A) Duplication; (46) Trisomic 21 Duplication; (47) 22q11.2 Duplication.

50

Supplementary figure 6: Estimated probability of de novo, based on model including pLI for ID and non ID genes, and its concordance with de novo frequency observed in literature. a. Probability of de novo estimated by our de novo model (Y-axis) according to the loss of IQ estimated by a model using pLI for ID and non- ID genes as two explanatory variables (X-axis). The de novo model was fitted on 13,114 deletions (red) and 13,323 duplications (blue) with available inheritance information observed in DECIPHER, CHU Sainte-Justine, SSC, MSSNG, SYS and G-Scot. b. Concordance between de novo frequency observed in DECIPHER (X-axis) and the probability of being de novo estimated by models when excluding recurrent CNVs of the training dataset (Y-axis) pLI as an explanatory variable for 28 recurrent CNVs. The first bisector represents the perfect concordance. ICC indicates intraclass correlation coefficient (3, 1). Each point corresponds to a known recurrent CNV: (1) TAR Deletion; (2) 1q21.1 Deletion; (7) 3q29 (DLG1) Deletion; (8) 7q11.23 (William-Beuren) Deletion; (9) 8p23.1 Deletion; (13) 15q13.3 (BP4-BP5) Deletion; (14) 15q11.2 Deletion; (15) 16p11.2-p12.2 Deletion; (16) 16p13.3 ATR-16 syndrome Deletion; (17) 16p11.2 Deletion; (18) 16p11.2 distal Deletion; (19) 16p13.11 Deletion; (20) 16p12.1 Deletion; (21) 17p11.2 (Smith-Magenis) Deletion; (22) 17q12 Deletion; (23) 17q21.31 Deletion; (24) NF1-microdeletion syndrome Deletion; (25) 17p12 (HNPP) Deletion; (26) 22q11.2 Deletion; (32) 7q11.23 Duplication; (36) 15q11.2 Duplication; (39) 16p11.2 Duplication; (41) 16p13.11 Duplication; (43) 17p11.2 Duplication; (44) 17q12 (HNF1B) Duplication; (46) Trisomic 21 Duplication; (47) 22q11.2 Duplication.

51

0.15 250 0.10 200 Number of genes 0.05 150 0.00 100 -0.05

P for model (1) ≥ 0.05 and < 0.05 50

-0.10 P (FDR) for models (2) ≥ 0.05 and < 0.05 of general intelligence with duplication Effect size of individual genes on z-scores 0 -0.15

0.0 0.5 1.0 1.5 2.0 LOEUF Supplementary figure 7: Estimated effects of individual genes on general intelligence according to categories based on LOEUF for duplications. The light grey histogram represents the distribution of LOEUF values for 18,451 autosomal genes. The blue line represents the estimates for a gene in each of the 4 categories of LOEUF included in the model (Supplementary methods): highly intolerant genes (LOEUF <0.2, n=980), moderately intolerant genes (0.2≤LOEUF<0.35 n=1,762), tolerant genes (0.35≤LOEUF<1, n=7,442) and genes highly tolerant to pLoF (LOEUF≥1, n=8,267). The orange line represents the estimated effect size of 37 categories of genes based on their LOEUF values (sliding windows=0.15) in the model (Supplementary methods). Genes with a LOEUF below 0.35 (vertical red line) are considered to be intolerant to pLoF by gnomAD. Left Y axis values: z-scored general intelligence (1 z-score is equivalent to 15 points of IQ) for duplication. Right Y axis values: number of genes represented in the histogram.

52

a

20000 Deletions Duplications All CNVs Observed data Predictions 15000 10000 erent genes totally included in a CNV f 5000 Number of dif 0

0 100,000 200,000 300,000 400,000 500,000 b Number of individuals from ASD + general population included in the study 20000 15000 10000 erent genes totally included in a CNV f 5000 Number of dif 0

0 100,000 200,000 300,000 400,000 500,000 c Number of individuals from general population included in the study 20000 15000 10000 erent genes totally included in a CNV f 5000 Number of dif 0

0 100,000 200,000 300,000 400,000 500,000 Number of individuals from ASD population included in the study Supplementary figure 8: Prediction of gene coverage according to the sample size a. Predictions based on coefficients from regression analysis obtained using gene coverage estimated in the entire cohort (n=24,092). b. Predictions based on coefficients from regression analysis obtained using gene coverage estimated in general population (n=20,151). c. Predictions based on coefficients from regression analysis obtained using gene coverage estimated in ASD population (n=3,941). 53

CNV: copy number variant; ASD: autism spectrum disorder.

a b 3 2 2 1 1 al intelligence al intelligence r r 0 or age f 0 −1 −1 adjusted −2 −2 z−scored measure of gene −3 −3 z−scored measure of gene

40 45 50 55 60 40 45 50 55 60 c d 3 3 2 2 1 1 al intelligence al intelligence r r 0 or age f 0 −1 −1 adjusted −2 −2 −3 −3 z−scored measure of gene z−scored measure of gene −4 −4 40 45 50 55 60 65 70 40 45 50 55 60 65 70 e f 2 2 al intelligence al intelligence r r 0 0 or age f −2 −2 adjusted −4 −4 z−scored measure of gene z−scored measure of gene

20 40 60 80 20 40 60 80

age (years) age (years)

Supplementary figure 9 : Distribution of z-scored measure of general intelligence before (left panels) and after (right panels) adjustment for age (g-factor in general populations). a,b: SYS-parents; c,d: CaG; e,f: G-Scot. In a-f panels, X-axis represents the age of individuals, and Y-axis represents the z-scored measure of general intelligence before (left panels) and after (right panels) age-adjustment. Points were fitted using different models as represented in the legend: “lowess” is the local weighted polynomial regression curve, “linear”: the effect of the age was modeled as linear, “poly. d=2”: the effect of the age was modeled as quadratic; “poly. d=3”: the effect of the age was modeled as cubic. No effect of age shows that the phenotype was correctly adjusted for age. Of note, adjustment was made using quadratic effect of the age in CaG (intercept: Est.=-0.96 [SE=0.98, P=0.3260], age: estimate=5.94×10-3 [SE=3.00×10-3, P=0.0480], age²: estimate=-6.82×10-6 [SE=2.27×10-6, P=0.0027]) and G-Scot (intercept: Est.=-1.09 [SE=6.81×10-2, P=9.85×10-57], age: Est.=4.84×10-3 [SE=2.56×10-4, P=8.46×10-79], age²: Est.=-4.71×10-6 [SE=2.27×10-7, P=7.18×10-94]) and using linear effect of the age in SYS-parents (intercept: Est.=3.48 [SE=0.39, P=1.07×10-17], age: Est.=-0.01 [SE=6.60×10- 4, P=8.71x10-18]). Est.: estimate; SE: standard error; P: p-value.

54

a b 4 4 2 2 al intelligence r x al intelligence r e 0 0 or s f −2 −2 adjusted −4 −4 z−scored measure of gene −6 z−scored measure of gene −6 Males Females Males Females c d 4 2 2 al intelligence r x e al intelligence r or s f 0 0 adjusted −2 −2 −4 z−scored measure of gene −4 z−scored measure of gene Males Females Males Females

Supplementary figure 10 : Distribution of z-scored measure of general intelligence before (left panels) and after (right panels) adjustment for sex (NVIQ in autism populations). a,b: SSC, c,d: MSSNG. In a-d panels, boxplots represents the distribution of the z-scored measure of general intelligence before (left panels) -132 and after (right panels) sex-adjustment for males and females in SSC (intercept: Est.=-0.96 [SE=0.04, P=1.43×10 ], sexF/M=-0.55 [SE=0.10, -8 -19 P=6.61×10 ]) and MSSNG (intercept: Est.=-0.43 [SE=0.05, P=3.03×10 ], sexF/M=-0.17 [SE=0.11, P=0.1155]). CaG: Cartagene; G-Scot: generation Scotland; SYS: Saguenay youth study; SSC: Simon simplex collection; NVIQ: non-verbal intelligence quotient; g-factor: general factor. Est.: estimate; SE: standard error; P: p-value.

55

References:

30. Schumann, G. et al. The IMAGEN study: reinforcement-related behaviour in normal brain function and psychopathology. Mol. Psychiatry 15, 1128–1139 (2010). 31. Pausova, Z. et al. Cohort Profile: The Saguenay Youth Study (SYS). Int. J. Epidemiol. doi:10.1093/ije/dyw023. 32. Awadalla, P. et al. Cohort profile of the CARTaGENE study: Quebec’s population- based biobank for public health and personalized genomics. Int. J. Epidemiol. 42, 1285–1299 (2013). 33. Cohort Profile: Generation Scotland: Scottish Family Health Study (GS:SFHS). The study, its participants and their potential for genetic research on health and illness | International Journal of | Oxford Academic. https://academic.oup.com/ije/article/42/3/689/909916. 34. Marioni, R. E. et al. Common Genetic Variants Explain the Majority of the Correlation Between Height and Intelligence: The Generation Scotland Study. Behav. Genet. 44, 91–96 (2014). 35. Deary, I. J., Gow, A. J., Pattie, A. & Starr, J. M. Cohort Profile: The Lothian Birth Cohorts of 1921 and 1936. Int. J. Epidemiol. 41, 1576–1584 (2012). 36. Fischbach, G. D. & Lord, C. The Simons Simplex Collection: A Resource for Identification of Autism Genetic Risk Factors. Neuron 68, 192–195 (2010). 37. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5501701/. 38. Kaufman, A. S., Flanagan, D. P., Alfonso, V. C. & Mascolo, J. T. Test Review: Wechsler Intelligence Scale for Children, Fourth Edition (WISC-IV). J. Psychoeduc. Assess. 24, 278–295 (2006). 39. Canivez, G. L. & Watkins, M. W. Long-Term Stability of the Wechsler Intelligence Scale for Children-Third Edition among Demographic Subgroups: Gender, Race/Ethnicity, and Age. J. Psychoeduc. Assess. 17, 300–313 (1999). 40. Ensor, R. C. K. The trend of Scottish intelligence: a comparison of the 1947 and 1932 surveys of the intelligence of eleven-year-old pupils. Eugen. Rev. 41, 196–197 (1950). 41. Deary, I. J. et al. The Lothian Birth Cohort 1936: a study to examine influences on cognitive ageing from age 11 to age 70 and beyond. BMC Geriatr. 7, 28 (2007). 42. Gow, A. J. Reverse causation in activity-cognitive ability associations: The Lothian Birth Cohort 1936. Psychology and Aging /fulltext/2011-11456-001.html (20110606) doi:10.1037/a0024144. 43. Akshoomoff, N. Use of the Mullen Scales of Early Learning for the Assessment of Young Children with Autism Spectrum Disorders. Child Neuropsychol. J. Norm. Abnorm. Dev. Child. Adolesc. 12, 269–277 (2006). 44. Leiter, R. G. Leiter international performance scale. (1979). 45. Gale H. Roid & Miller, L. J. Leiter international performance scale-revised. (1997). 46. Raven, J. C., Court, J. H. & Raven, J. Raven’s Progressive Matrices. (1998). 47. Coolican, J., Bryson, S. E. & Zwaigenbaum, L. Brief report: data on the Stanford- Binet Intelligence Scales (5th ed.) in children with autism spectrum disorder. J Autism Dev Disord 38, 190–7 (2008). 48. Baron, I. S. Test review: Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV). Child Neuropsychol. J. Norm. Abnorm. Dev. Child. Adolesc. 11, 471–475 (2005). 49. Intelligent Testing with the WISC-V | Wiley. Wiley.com https://www.wiley.com/en- us/Intelligent+Testing+with+the+WISC+V-p-9781118589236. 50. Wechsler, D. Wechsler Abbreviated Scale of Intelligence. (1999). 56

51. Wechsler, D. Wechsler Abbreviated Scale of Intelligence - Second Edition. (2011). 52. Ryan, J. J. et al. Exploratory Factor Analysis of the Wechsler Abbreviated Scale of Intelligence (WASI) in Adult Standardization and Clinical Samples. Appl. Neuropsychol. 10, 252–256 (2003). 53. Wechsler, D. Wechsler Preschool and Primary Scale of Intelligence - Fourth Edition. (2012). 54. Farmer, C., Golden, C. & Thurm, A. Concurrent Validity of the Differential Ability Scales, Second Edition with the Mullen Scales of Early Learning in Young Children with and without Neurodevelopmental Disorders. Child Neuropsychol. J. Norm. Abnorm. Dev. Child. Adolesc. 22, 556–569 (2016). 55. Harris, S. E. et al. Longitudinal telomere length shortening and cognitive and physical decline in later life: The Lothian Birth Cohorts 1936 and 1921. Mech. Ageing Dev. 154, 43– 48 (2016). 56. Wechsler, D. WAIS-III UK administration and scoring manual. (1998). 57. Hampshire, A., Highfield, R. R., Parkin, B. L. & Owen, A. M. Fractionating . Neuron 76, 1225–1237 (2012). 58. Yuen, R. K. C. et al. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder. Nat. Neurosci. 20, 602–611 (2017). 59. Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinforma. 43, 11.10.1-33 (2013). 60. Trost, B. et al. A Comprehensive Workflow for Read Depth-Based Identification of Copy-Number Variation from Whole-Genome Sequence Data. Am. J. Hum. Genet. 102, 142– 155 (2018). 61. Wang, K. et al. PennCNV: An integrated hidden Markov model designed for high- resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007). 62. Colella, S. et al. QuantiSNP: an Objective Bayes Hidden-Markov Model to detect and accurately map copy number variation using SNP genotyping data. Nucleic Acids Res 35, 2013–25 (2007). 63. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559–75 (2007). 64. Cooper, G. M. et al. A Copy Number Variation Morbidity Map of Developmental Delay. Nat. Genet. 43, 838–846 (2011). 65. Moreno-De-Luca, D. et al. Using large clinical data sets to infer pathogenicity for rare copy number variants in autism cohorts. Mol. Psychiatry 18, 1090–1095 (2013). 66. Petrovski, S. et al. The Intolerance of Regulatory Sequence to Genetic Variation Predicts Gene Dosage Sensitivity. PLOS Genet. 11, e1005492 (2015). 67. Ruderfer, D. M. et al. Patterns of genic intolerance of rare copy number variation in 59,898 human exomes. Nat. Genet. 48, 1107–1111 (2016). 68. Szklarczyk, D. et al. STRING v10: protein–protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015). 69. Bayés, A. et al. Characterization of the proteome, diseases and evolution of the human postsynaptic density. Nat. Neurosci. 14, 19–21 (2011). 70. Darnell, J. C. et al. FMRP stalls ribosomal translocation on mRNAs linked to synaptic function and autism. Cell 146, 247–261 (2011). 71. Ramasamy, A. et al. Genetic variability in the regulation of gene expression in ten regions of the human brain. Nat. Neurosci. 17, 1418–1428 (2014). 72. Douard, E. et al. Effects-sizes of deletions and duplications on autism risk across the genome. bioRxiv 2020.03.09.979815 (2020) doi:10.1101/2020.03.09.979815. 57

73. Shrout, P. E. & Fleiss, J. L. Intraclass correlations: uses in assessing rater reliability. Psychol. Bull. 86, 420–428 (1979). 74. Pinheiro, J., Bates, D., DebRoy, S., Sarkar, D. & R Core Team. nlme: Linear and Nonlinear Mixed Effects Models. (2016). 75. Rizopoulos, D. bootStepAIC: Bootstrap stepAIC. (2009). 76. Revelle, W. psych: Procedures for Psychological, Psychometric, and Personality Research. (Northwestern University, 2017). 77. Hempel, M. et al. Microdeletion syndrome 16p11.2-p12.2: Clinical and molecular characterization. Am. J. Med. Genet. A. 149A, 2106–2112 (2009). 78. Milone, R., Ferrari, A. R., Pasquariello, R. & Bargagna, S. Complex Phenotype of a Boy With De Novo 16p13.3-13.2 Interstitial Deletion. Child Neurol. Open 3, (2016). 79. Wilkie, A. O. M. et al. Clinical features and molecular analysis of the α thalassemia/mental retardation syndromes. 1. Cases due to deletions involving chromosome band 16p13.3. Am. J. Hum. Genet. 46, 1112–1126 (1990). 80. Tam, A. et al. Bilateral Radial Ulnar Synostosis and Vertebral Anomalies in a Child with a De Novo 16p13.3 Interstitial Deletion. Case Rep. Genet. 2013, (2013). 81. Gibson, W. T. et al. Phenotype–genotype characterization of alpha-thalassemia mental retardation syndrome due to isolated monosomy of 16p13.3. Am. J. Med. Genet. A. 146A, 225–232 (2008). 82. Mautner, V.-F. et al. Clinical characterisation of 29 neurofibromatosis type-1 patients with molecularly ascertained 1.4 Mb type-1 NF1 deletions. J. Med. Genet. 47, 623–630 (2010). 83. Mervis, C. B. et al. Children with 7q11.23 Duplication Syndrome: Psychological Characteristics. Am. J. Med. Genet. A. 167, 1436–1450 (2015). 84. Sanders, S. J. et al. Multiple recurrent de novo copy number variations (CNVs), including duplications of the 7q11.23 Williams-Beuren syndrome region, are strongly associated with autism. Neuron 70, 863–885 (2011). 85. Berg, J. S. et al. Speech delay and autism spectrum behaviors are frequently associated with duplication of the 7q11.23 Williams-Beuren syndrome region. Genet. Med. 9, 427–441 (2007). 86. Castiglia, L. et al. 7q11.23 microduplication syndrome: neurophysiological and neuroradiological insights into a rare chromosomal disorder. J. Intellect. Disabil. Res. 62, 359–370 (2018). 87. Siu, W.-K. et al. Diagnostic yield of array CGH in patients with autism spectrum disorder in Hong Kong. Clin. Transl. Med. 5, (2016). 88. Ullmann, R. et al. Array CGH identifies reciprocal 16p13.1 duplications and deletions that predispose to autism and/or mental retardation. Hum. Mutat. 28, 674–682 (2007). 89. Treadwell-Deering, D. E., Powell, M. P. & Potocki, L. Cognitive and Behavioral Characterization of the Potocki-Lupski Syndrome (Duplication 17p11.2). J. Dev. Behav. Pediatr. 31, 137–143 (2010). 90. Potocki, L. et al. Characterization of Potocki-Lupski Syndrome (dup(17)(p11.2p11.2)) and Delineation of a Dosage-Sensitive Critical Interval That Can Convey an Autism Phenotype. Am. J. Hum. Genet. 80, 633–649 (2007). 91. Greco, D. et al. Three new patients with dup(17)(p11.2p11.2) without autism. Clin. Genet. 73, 294–296 (2008). 92. Verhoeven, W.M.A., Egger, J.I.M., de Leeuw, N. & Kleefstra, T. Treatment contingent upon etiology : two adult female patients with mild intellectual disability and a causative copy number variation. Clin. Neuropsychiatry 14, 135–140 (2017).

58

93. Devenny, D. A., Krinsky-McHale, S. J., Sersen, G. & Silverman, W. P. Sequence of cognitive decline in dementia in adults with Down’s syndrome. J Intellect Disabil Res 44 ( Pt 6), 654–65 (2000). 94. Nicham, R. et al. Spectrum of cognitive, behavioral and emotional problems in children and young adults with Down syndrome. J Neural Transm 67, 173–191 (2003). 95. Capone, G. T., Grados, M. A., Kaufmann, W. E., Bernad-Ripoll, S. & Jewell, A. Down syndrome and comorbid autism-spectrum disorder: characterization using the aberrant behavior checklist. Am J Med Genet A 134, 373–80 (2005). 96. Breia, P. et al. Adults with Down syndrome: characterization of a Portuguese sample. Acta Med Port 27, 357–63 (2014). 97. Breslin, J. et al. Obstructive sleep apnea syndrome and cognition in Down syndrome. Dev Med Child Neurol 56, 657–64 (2014). 98. Courtens, W., Schramme, I. & Laridon, A. Microduplication 22q11.2: a benign polymorphism or a syndrome with a very large clinical variability and reduced penetrance?-- Report of two families. Am J Med Genet A 146A, 758–63 (2008). 99. Céline de la Rochebrochard & et al. The intrafamilial variability of the 22q11.2 microduplication encompasses a spectrum from minor cognitive deficits to severe congenital anomalies. Am J Med Genet Part A 140A, 1608–1613 (2006). 100. Van Campenhout, S. et al. Microduplication 22q11.2: a description of the clinical, developmental and behavioral characteristics during childhood. Genet Couns 23, 135–48 (2012).

59