Copy Number Variations and Cognitive Phenotypes in Unselected Populations
Total Page:16
File Type:pdf, Size:1020Kb
Supplementary Online Content Katrin Männik K, Mägi R, Macé A et al. Copy Number Variations and Cognitive Phenotypes in Unselected Populations. JAMA. doi: eMethods eTable 1. Phenotypes of EGCUT individuals with DECIPHER-listed recurrent rearrangements eTable 2. Prevalence and characteristic features of DECIPHER-listed genomic disorders eTable 3. Sample demographics and characteristics eTable 4. Summary scores of ALSPAC participants Standard Assessment Tests (SATs) eTable 5. Prevalence of NAHR-mediated recurrent CNVs in clinical and general population cohorts eTable 6. Follow-up phenotyping of 16p11.2 600kb BP4-BP5 deletions and duplications identified in the EGCUT cohort eTable 7. Individual CNVs in EGCUT discovery and replication cohorts eTable 8. Education attainment in EGCUT replication cohorts separately and combined with discovery cohort eTable 9. Mean Standard Assessment Tests (SATs) scores for English and Mathematics in ALSPAC CNV carriers eTable 10. Education attainment in Italian HYPERGENES cohort eTable 11. Education attainment in European American MCTFR cohort eTable 12. MetaCore Enrichment by GO Processes analysis report eFigure 1. Diagnoses reported in the EGCUT participants according to the WHO ICD-10 classification eFigure 2. Multidimensional scaling analysis of EGCUT population structure eFigure 3. Assessment of CNV deleteriousness This supplementary material has been provided by the authors to give readers additional information about their work. © 2015 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/26/2021 eMethods EGCUT The Estonian population was influenced by trends encountered by most of the European populations. Before the Second World War, Estonia had a relatively homogenous population (88% of ethnic Estonians in the 1934 population census) with strong cultural influence from previously ruling countries such as Germany, Sweden and Denmark. This make up was modified during the 1941 to 1991 Soviet Union occupation (i.e. mass deportations and executions of local people, flight of many Estonians, as well as of a majority of local German and Swedish minority members to Western Europe and Northern America in the 1940s, followed by implementation of a “russification” ideology). It resulted in the actual distribution of 69.7% Estonians, 25.2% Russians and 5.1% other minorities (2014 population census; Statistics Estonia, http://www.stat.ee/en). Religion plays a minor role in Estonia largely due to the Soviet occupation from 1941-1991 when “elimination of religion” was an ideological objective. As a result, Estonia is one of the least religious countries in the world – less than a third of the population defines itself as “believers”. The majority of these are Lutheran and Eastern Orthodox (Statistics Estonia, http://www.stat.ee/en). It is important to stress here that there is no restrictions to the accessibility of education in Estonia based on ethnicity or religious beliefs (see also below). The Estonian Genome Centre of the University of Tartu (EGCUT) cohort is a longitudinal and prospective population biobank that contains close to 52,000 participants and represents 5% of the Estonian adult population. The long-term recruitment via general practitioners and a widespread network of special recruitment offices (rather than self-initiated and/or web-based) has granted that the samples have been collected throughout the country and diverse social groups. The resulting representation of a wide range of phenotypes (eFigure 1), age and educational groups makes the cohort ideally suited to population-based studies. The distribution of participants’ geographical origin, age, sex and achieved education level closely reflects those of the Estonian population in general. At baseline, the general practitioners (GPs) performed a standardized objective examination of the participants, who also donated blood samples for DNA, white blood cells and plasma tests and filled out a 16-module questionnaire that encompass more than 1000 health- and lifestyle-related questions, as well as uniformed report of clinical diagnoses according to the World Health Organization international classification of diseases (WHO ICD-10, http://www.who.int/classifications/icd). The data are continuously updated through follow-up interviews, as well as national electronic health databases and citizen registries (see1 and www.biobank.ee for details). Analyses of participants’ age, gender, diseases and educational level show that this cohort is representative of the country’s population. EGCUT is conducted according to Estonian Human Genes Research Act and managed in conformity with the standard ISO 9001:2008. The Ethics Review Committee on Human Research of the University of Tartu approved the project. Written informed consent was obtained from all voluntary participants for the baseline and follow-up investigations. All population carriers of 16p11.2 600kb BP4-BP5 (breakpoint) recurrent copy number variants (CNVs) were invited back for follow-up investigations using the clinical and neuropsychological protocol previously used to study 16p11.2 syndrome patients ascertained through clinical cohorts2,3. The EGCUT cohort (and Estonian population in general) is an outbred population with no substantial regional differences. Single-nucleotide polymorphism (SNP) allele frequencies and linkage disequilibrium patterns are similar to the one found in populations with European ancestry4. We do not find small series of non-recurrent CNVs and/or inflation of recurrent rearrangements typical of founder effects5,6 (see also CNV calling section below). Accordingly, its samples have been successfully used to discover or replicate hundreds of SNP associations, which are vulnerable to population frequencies and stratification differences (e.g. genome wide association studies on education attainment, adult height and age of menarche7-9. Of note all quality control (QC) procedures suggested in 10 were applied to results submitted by the EGCUT cohort in these meta-analyses and no problems were uncovered. The identity-by-descent of EGCUT participants was estimated using SNP genotypes and PLINK software. 14.6% and 5.9% of discovery and replication cohort individuals show cryptic relatedness (pi_hat >0.15) on par with enrolling 5% of the population. To exclude the possibility that the EGCUT cohort could be affected by hidden population stratification, multidimensional scaling (MDS) analysis was performed using PLINK1.07 (http://pngu.mgh.harvard.edu/~purcell/plink/strat.shtml). SNPs that passed quality control from a full set of genotyped EGCUT samples were pruned so that all SNPs © 2015 American Medical Association. All rights reserved. Downloaded From: https://jamanetwork.com/ on 09/26/2021 within a given window size of 50 had pairwise r2 < 0.5. Pairwise IBS (identical by state) distance was calculated using all autosomal SNPs that remained after pruning. MDS dimensions were extracted using the "MDS-plot" option. R 3.0.2 was used for plotting and visualization of the results. This analysis demonstrated that genetic stratification could not lead to the observed associations (eFigure 2). No differences were observed upon exclusion of pairs with high relatedness (>0.1). CNV calling The genomic DNA of 8110 subjects (7020 for the discovery and 1090 for the replication cohort) was randomly selected among the 52,000 EGCUT participants. A third cohort of 1066 individuals (“high- functioning replication cohort”) was used to further assess the significance of the signal obtained regarding education attainment. The three cohorts were selected and SNP genotyped at three different time-points over a period of four years using Illumina HumanOmniExpress (discovery cohort) and Illumina Human CNV370 BeadChips (both replication cohorts) (Illumina Inc., San Diego, CA; USA). The HumanOmniExpress BeadChip covers the entire human genome with median spacing of 2.1 kb and the HumanCNV370 BeadChip has a genome-wide median spacing of 5 kb. All samples were processed and the assay performed according to a routine protocol provided by the manufacturer. Genotypes were called by GenomeStudio software GT module v3.1 (Illumina Inc). Log R ratio (LRR) and B Allele Frequency (BAF) values produced by the GenomeStudio software were formatted for further CNV calling with Hidden Markov Model-based software PennCNV (ver. June 2011)11 using the parameters suggested by the software authors together with “GC model adjustments” and “Merging adjacent CNV calls” function. The 6819 discovery, 1058 replication and 993 “high-functioning” replication samples with a call rate greater than 98% and less than 50 CNV calls that passed the quality control parameters were retained. To minimize the number of false positive findings, CNVs ≥250kb in size were filtered for the PennCNV confidence score ≥30 (HumanOmniExpress) or ≥40 (HumanCNV370) and visual confirmation in GenomeStudio GenomeViewer. As >1Mb rearrangements have a high likelihood of having pathogenic effect, we used an initial size threshold of 1Mb for both types of CNVs and subsequent series of thresholds half this size (500kb, 250 kb and 125kb) until loss of the association. The genotype of 23 carriers of DECIPHER-listed CNVs was assessed by quantitative PCR and resulted in no false positive findings. The EGCUT genotyping facility is licensed to use the same procedure to provide SNP-array genotyping for diagnostic purpose to the Medical Genetics Center of Tartu University Hospital. This method identified CNVs