A Discovery Resource of Rare Copy Number Variations in Individuals with Autism Spectrum Disorder
Total Page:16
File Type:pdf, Size:1020Kb
INVESTIGATION A Discovery Resource of Rare Copy Number Variations in Individuals with Autism Spectrum Disorder Aparna Prasad,* Daniele Merico,* Bhooma Thiruvahindrapuram,* John Wei,* Anath C. Lionel,*,† Daisuke Sato,* Jessica Rickaby,* Chao Lu,* Peter Szatmari,‡ Wendy Roberts,§ Bridget A. Fernandez,** Christian R. Marshall,*,†† Eli Hatchwell,‡‡ Peggy S. Eis,‡‡ and Stephen W. Scherer*,†,††,1 *The Centre for Applied Genomics, Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto M5G 1L7, Canada, †Department of Molecular Genetics, University of Toronto, Toronto M5G 1L7, Canada, ‡Offord Centre for Child Studies, Department of Psychiatry and Behavioural Neurosciences, McMaster University, Hamilton L8P 3B6, § Canada, Autism Research Unit, The Hospital for Sick Children, Toronto M5G 1X8, Canada, **Disciplines of Genetics and Medicine, Memorial University of Newfoundland, St. John’s, Newfoundland A1B 3V6, Canada, ††McLaughlin Centre, University of Toronto, Toronto M5G 1L7, Canada, and ‡‡Population Diagnostics, Inc., Melville, New York 11747 ABSTRACT The identification of rare inherited and de novo copy number variations (CNVs) in human KEYWORDS subjects has proven a productive approach to highlight risk genes for autism spectrum disorder (ASD). A rare variants variety of microarrays are available to detect CNVs, including single-nucleotide polymorphism (SNP) arrays gene copy and comparative genomic hybridization (CGH) arrays. Here, we examine a cohort of 696 unrelated ASD number cases using a high-resolution one-million feature CGH microarray, the majority of which were previously chromosomal genotyped with SNP arrays. Our objective was to discover new CNVs in ASD cases that were not detected abnormalities by SNP microarray analysis and to delineate novel ASD risk loci via combined analysis of CGH and SNP array cytogenetics data sets on the ASD cohort and CGH data on an additional 1000 control samples. Of the 615 ASD cases molecular analyzed on both SNP and CGH arrays, we found that 13,572 of 21,346 (64%) of the CNVs were exclusively pathways detected by the CGH array. Several of the CGH-specific CNVs are rare in population frequency and impact previously reported ASD genes (e.g., NRXN1, GRM8, DPYD), as well as novel ASD candidate genes (e.g., CIB2, DAPP1, SAE1), and all were inherited except for a de novo CNV in the GPHN gene. A functional enrichment test of gene-sets in ASD cases over controls revealed nucleotide metabolism as a potential novel pathway involved in ASD, which includes several candidate genes for follow-up (e.g., DPYD, UPB1, UPP1, TYMP). Finally, this extensively phenotyped and genotyped ASD clinical cohort serves as an invalu- able resource for the next step of genome sequencing for complete genetic variation detection. Rare inherited and de novo copy number variations (CNVs) contribute Smaller intragenic CNVs (often called indels) can be involved, or CNVs to the genetic vulnerability in autism spectrum disorder (ASD) in as can encompass an entire gene, and in some cases they can affect several many as 5–10% of idiopathic cases examined (Devlin and Scherer 2012). genesaspartofagenomicdisorder(LeeandScherer2010).Screening for CNVs using microarrays has proven to be a rapid method to identify Copyright © 2012 Prasad et al. both large and small genomic imbalances associated with ASD suscep- doi: 10.1534/g3.112.004689 tibility (Jacquemont et al. 2006; Sebat et al. 2007; Autism Genome Manuscript received August 1, 2012; accepted for publication October 24, 2012 Project Consortium 2007; Christian et al. 2008; Marshall et al. 2008; This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/ Pinto et al. 2010; Shen et al. 2010; Sanders et al. 2011). by/3.0/), which permits unrestricted use, distribution, and reproduction in any A trend emerging from recent investigations in autism genetics is medium, provided the original work is properly cited. that significant heterogeneity and complexity exists. However, some Supporting information is available online at http://www.g3journal.org/lookup/ highly penetrant risk genes (e.g.,theSHANK, NRXN,andNLGN suppl/doi:10.1534/g3.112.004689/-/DC1. 1Corresponding author: The Centre for Applied Genomics, The Hospital for Sick family members) and CNV loci (e.g., 1q21.1, 15q13.3, and 16p11.2) Children, 14th Floor, Toronto Medical Discovery Tower/MaRS Discovery District, 101 are now known (Devlin and Scherer 2012). Moreover, there has been College St., Toronto, Ontario, M5G 1L7, Canada. E-mail: [email protected] some progress in identifying multiple mutations in single individuals Volume 2 | December 2012 | 1665 (Schaaf et al. 2011), suggesting possible multigenic threshold models Control cohort: A control cohort of 1000 DNA samples from for ASD (Cook and Scherer 2008). Scientific designs enabling the reportedly healthy donors (502 males and 498 females) were acquired discovery of a comprehensive set of genetic variants will be required from BioServe (Beltsville, MD) and are part of a collection of .12,000 to fully assess the genetic burden in ASD. control samples originally banked by Genomics Collaborative, Inc. CNV detection in case and control cohorts have been performed (acquired by BioServe in 2007) for the purpose of large-scale genomic using several different microarray platforms, including those utilizing studies (Ardlie et al. 2002). Donors were consented and deidentified via single-nucleotide polymorphisms (SNPs) and others using comparative a protocol approved by the institutional review board. The control genomic hybridization (CGH) arrays (Redon et al. 2006; Graubert et al. DNA samples were derived from apparently healthy white donors older 2007; Christian et al. 2008; Conrad et al. 2010; Pinto et al. 2011). There than 45 years of age. Health history information, documented at the are advantages to each microarray platform. SNP-based microarray time of consent, was used to select the samples based on the following platforms are typically cost-effective and have the potential to analyze attributes: body mass index between 15 and 35, blood glucose level the genomic DNA sample at both the SNP genotype and copy number ,125 mg/dL, total cholesterol level between 100 and 300, systolic blood level, whereas CGH arrays are often used in a clinical diagnostic setting pressure between 100 and 150, and no major diseases (e.g.,cancerand because of the better signal-to-noise-ratio achieved in comparison to neurodegenerative diseases) or psychiatric disorders (e.g., alcoholism, SNP arrays (Shinawi and Cheung 2008). Multiple algorithms, some mental illness, and depression). The control cohort DNA samples (also fi speci c to certain array types, are available to detect CNVs, and they called PDx controls) used in this study are proprietary and currently can vary considerably in the number of CNV calls made even for an biobanked at Population Diagnostics, Inc. (Melville, NY) for future identical array experiment. In recent comprehensive studies, authors microarray, sequencing and genotyping validation experiments. have provided performance assessments of CNV detection platforms and methods by examining control DNA samples (Lai et al. 2005; Array CGH Curtis et al. 2009; Hester et al. 2009; Winchester et al. 2009; Pinto The ASD test and reference samples were labeled with Cy5 and Cy3, et al. 2011). The consensus is that using multiple microarray technol- respectively using Invitrogen BioPrime CGH labeling kit (Invitrogen, ogies and prediction algorithms increases CNV detection rates. Carlsbad, CA). A pool of 50 sex-matched Caucasian control samples In this study, we used a high-resolution Agilent CGH array was used as a reference. The PDx control test and reference samples comprising 1 million (1M) probes to assay for CNVs in a Canadian were labeled with Cy3 and Cy5, respectively and one sex-matched cohort of 696 unrelated ASD cases, 615 of which were genotyped sample was used as a reference. All samples have been run on Agilent previously on SNP arrays, including Illumina 1M single/duo (Pinto 1M CGH array according to manufacturer’s protocol (Agilent Tech- et al. 2010), Affymetrix 500K (Marshall et al. 2008), Affymetrix 6.0 nologies, Santa Clara, CA). The Agilent 1M CGH array consists of (Lionel et al. 2011; and A. C. Lionel, unpublished data), and Illumina a total of 974,016 probes providing relatively uniform whole genome Omni 2.5M (A. C. Lionel, unpublished data) arrays. Our high-quality coverage. The arrays were scanned using the Agilent microarray scan- data, interpreted for the first time with CNV control data generated ner, and data were extracted using Feature extraction software version on 1000 individuals using the same 1M CGH array, enabled detection 10.5.1.1. The array experiments for ASD cases and PDx controls were of numerous rare CNVs in each ASD sample examined and identified run at The Centre for Applied Genomics (Toronto, Canada) and in new potential ASD risk genes. These data allowed us to significantly the service laboratory of Oxford Gene Technology (Oxford, UK), expand the profile of genetic variants that are potentially causative of respectively. ASD and to identify novel molecular pathways affecting ASD vulner- ability in this cohort. CNV detection and analysis MATERIALS AND METHODS All array CGH data (both ASD cases and PDx controls) were analyzed in precisely the same manner using two programs—DNA Analytics DNA samples and CNV analysis v.4.0.85 (Agilent