
ARTICLE © The Author(s) Prevalence and properties of intragenic copy-number variation in Mendelian disease genes Rebecca Truty, PhD1, Joshua Paul, PhD1, Michael Kennemer, MS1, Stephen E. Lincoln, BS1, Eric Olivares, PhD1, Robert L. Nussbaum, MD, FACMG1,2 and Swaroop Aradhya, PhD, FACMG1 Purpose: We investigated the frequencies and characteristics of the clinical phenotype yielded 4054 CNVs. Most of these CNVs intragenic copy-number variants (CNVs) in a deep sampling of were novel rare events, present as duplications, and enriched in disease genes associated with monogenic disorders. genes associated with recessive disorders or lacking loss-of-function Methods: Subsets of 1507 genes were tested using next-generation mutational mechanisms. sequencing to simultaneously detect sequence variants and CNVs Conclusion: Universal intragenic CNV analysis adds substantial in >143,000 individuals referred for genetic testing. We analyzed clinical sensitivity to genetic testing. Clinically relevant CNVs have CNVs in gene panels for hereditary cancer syndromes and distinct properties that distinguish them from CNVs contributing cardiovascular, neurological, or pediatric disorders. to normal variation in human disease genes. Results: Our analysis identified 2844 intragenic CNVs in 384 clinically tested genes. CNVs were observed in 1.9% of the entire Genetics in Medicine (2019) 21:114–123; https://doi.org/10.1038/s41436- cohort but in a disproportionately high fraction (9.8%) of 018-0033-5 individuals with a clinically significant result. CNVs accounted for 4.7–35% of pathogenic variants, depending on clinical specialty. Distinct patterns existed among CNVs in terms of copy number, Keywords: Intragenic deletion/duplication copy-number location, exons affected, clinical classification, and genes affected. variant; Next-generation sequencing panel; Structural variant; Separately, analysis of de-identified data for 599 genes unrelated to Pathogenic variation prevalence; Diagnostic genetic testing INTRODUCTION demonstrated by the 1000 Genomes Project, Exome Within the broad spectrum of variation in our genome, Aggregation Consortium (ExAC) projects, and other large- – relatively little is known about the prevalence of intragenic scale efforts (Maxwell E, 2017, unpublished data).10 12 The deletions and duplications, or copy-number variants (CNVs), data from these studies are being compiled in archives such as beyond those responsible for a small number of hereditary the Database of Genomic Variants (DGV) and ClinVar and disorders such as Duchenne muscular dystrophy and have contributed to our understanding of dosage-sensitive BRCA1-associated breast cancer. Exon-level copy-number regions of the genome.13,14 Investigating intragenic CNVs can detection methods are used in genetic testing for these further refine the resolution of this dosage map and disorders but are not universally applied to other genes, complement our understanding of larger chromosomal which greatly limits our knowledge of intragenic CNV CNVs. prevalence across the medically relevant portions of the We have developed and validated an NGS assay and genome. Although exome-based approaches are bioinformatics pipeline to simultaneously identify sequence beginning to shed light on these CNVs, the resolution and variants and intragenic CNVs with high sensitivity.9 Herein, confidence of CNV calling with these methods remains we describe CNVs identified in a cohort of >143,000 unrelated limited.1,2 Exon-focused microarray-based comparative individuals referred for diagnostic genetic testing for a genomic hybridization (array CGH) provides high resolution suspected Mendelian disorder. In addition, we studied CNVs – for CNV analysis but is not used universally.3 5 Array CGH is in disease genes unrelated to these patients’ phenotypes to now being replaced by highly sensitive and scalable targeted estimate a baseline prevalence. Our data show that a next-generation sequencing (NGS) methods, providing an substantial number of pathogenic variants are in fact opportunity to investigate CNVs broadly across disease intragenic CNVs that have not been routinely investigated – genes.6 9 across a wide range of disease genes in traditional genetic CNVs also explain a proportion of the benign variation in testing. We also show that clinically relevant CNVs are the human genome. Microarray and NGS methods are characteristically different from CNVs that are present as illuminating this variation in healthy individuals, as naturally existing variation in the genome. 1Invitae, San Francisco, CA, USA; 2Volunteer Clinical Faculty, University of California, San Francisco, CA, USA. Correspondence: Swaroop Aradhya ([email protected]) Submitted 10 January 2018; accepted: 22 March 2018 Published online: 12 June 2018 114 Volume 21 | Number 1 | January 2019 | GENETICS in MEDICINE TRUTY et al ARTICLE MATERIALS AND METHODS types of CNVs always confirmed as true events in clinically NGS assay and bioinformatics tested genes. However, we excluded single-exon baseline Each gene on Invitae’s NGS panels was targeted with CNVs flagged as low-quality calls. oligonucleotide baits (Agilent Technologies, Santa Clara, CA; Roche, Pleasanton, CA; IDT, Coralville, IA) to capture RESULTS exons, the 10–20 bases flanking intronic sequences, and Prevalence of intragenic CNVs in a large clinical cohort certain noncoding regions of clinical interest. Baits were We tested various subsets of 1507 genes in 143,515 unrelated iteratively balanced to obtain a minimum of 50× and an individuals referred for diagnostic NGS gene panel testing. A average of 350× depth-of-sequence read coverage. Invitae uses total of ~4.8 million single-gene analyses were completed. NGS assays (not exome–based) containing hundreds of genes Among nearly 8.1 million variants of all types, we identified that constitute many panels and designed to address 2844 intragenic CNVs (1237 distinct events). These CNVs differential diagnoses within clinical specialties. Our bioinfor- accounted for 0.03% of all variants, 3.1% of reported variants, matics pipeline incorporates both community standard and and notably, 9.1% of variants classified as LP/P (Supplemen- custom algorithms to identify single-nucleotide variants tary Table 1 and Supplementary Figure 1). These variants (SNVs) and small indels, large indels, structural variants with were found across 384 genes and included 1810 deletions and breakpoints in target sequences, and exon-level CNVs.9,15 1034 duplications, which together represented a prevalence of 1.9% in this cohort, 4.4% among individuals with at least one Patient samples and analysis reported variant, and more significantly, 9.8% among Blood, saliva, or genomic DNA from patients who were individuals who received a report with an LP/P variant of consented for diagnostic testing were processed by NGS as any type. described previously.9,15 For this study, clinically reported 1234567890():,; CNVs were collected from Invitae’s variant database. In Patterns of intragenic CNV occurrence addition, data from genes unrelated to the presenting clinical CNVs fell into one of three categories—single rare events, phenotype were de-identified for analysis of baseline CNVs common recurrent events, and low-frequency recurrent under institutional review board (IRB) approval (Western events (Fig. 1a). Each category represented approximately IRB, #20161796). A Pearson’s chi-squared test was used to one-third of all observed CNVs. The vast majority of the 384 compute p values for differences in observed CNV counts. genes with CNVs had just one CNV each, but these single Ninety-five percent confidence intervals (CIs) were calculated CNVs together accounted for less than 10% of all events using the Wilson method. (Fig. 1b). By contrast, 31 of the 384 genes had 15 or more CNVs, but these represented nearly 70% of all CNVs. Aside Clinical classification of CNVs from frequencies, the intragenic locations and sizes of CNVs The clinical interpretation of CNVs was based on their were examined, because these properties can determine predicted effects on the transcript and established mutational clinical impact. One-fourth of the CNVs included just one mechanism for each gene. CNVs were classified using the exon. A majority of intragenic CNVs were multi-exonic five-tier system for grading evidence for pathogenicity as partial-gene events, and most encompassed only internal recommended by the American College of Medical Genetics exons without involving the terminal (first or last) coding and Genomics (ACMG).16 To ensure consistent evidence- exons (Fig. 1c,d). Among partial-gene CNVs involving based CNV interpretation and maintain congruence with terminal exons, more deletions than duplications included ACMG’s sequence variant interpretation guidelines, we the first exons, whereas a similar number of deletions and developed a point-based scoring system that includes criteria duplications included the last exons. Finally, a greater for classifying CNVs.17 proportion of duplications than deletions included the full gene. Nearly one-fifth of all distinct (non-redundant) CNVs Orthogonal confirmation included a full gene, and in 40 instances, the CNVs For clinically tested genes, we used array CGH to confirm encompassed several neighboring genes and were present on CNVs if they had been classified as likely pathogenic or at least 10 chromosomes (Supplementary Tables 1, 2). pathogenic (LP/P) or were single-exon CNVs classified as variants of uncertain significance (VUS). Multiplex
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages10 Page
-
File Size-