PECONPI: a Novel Software for Uncovering Pathogenic Copy

RESEARCH ARTICLE PECONPI: A Novel Software for Uncovering Pathogenic Copy Number Variations in Non-Syndromic Sensorineural Hearing Loss and Other Genetically Heterogeneous Disorders Ellen A. Tsai,1,2 Micah A. Berman,3 Laura K. Conlin,2 Heidi L. Rehm,4 Lauren J. Francey,3 Matthew A. Deardorff,3 Jenelle Holst,3 Maninder Kaur,3 Emily Gallant,3 Dinah M. Clark,3 Joseph T. Glessner,1,5 Shane T. Jensen,6 Struan F.A. Grant,5 Peter J. Gruber,7 Hakon Hakonarson,5 Nancy B. Spinner,2 and Ian D. Krantz3* 1Genomics and Computational Biology Graduate Group, University of Pennsylvania, Philadelphia, Pennsylvania 2Department of Pathology and Laboratory Medicine, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 3Division of Human Genetics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 4Laboratory for Molecular Medicine, Partners Healthcare Center for Personalized Genetic Medicine, Cambridge, Massachusetts 5The Center for Applied Genomics, The Children’s Hospital of Philadelphia, Philadelphia, Pennsylvania 6The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 7Cardiovascular and Thoracic Surgery, Primary Children’s Medical Center, Salt Lake City, Utah Manuscript Received: 2 November 2012; Manuscript Accepted: 11 April 2013 This report describes an algorithm developed to predict the pathogenicity of copy number variants (CNVs) in large sample How to Cite this Article: cohorts. CNVs (genomic deletions and duplications) are found Tsai EA, Berman MA, Conlin LK, Rehm in healthy individuals and in individuals with genetic diagno- HL, Francey LJ, Deardorff MA, Holst J, ses, and differentiation of these two classes of CNVs can be Kaur M, Gallant E, Clark DM, Glessner JT, challenging and usually requires extensive manual curation. Jensen ST, Grant SFA, Gruber PJ, We have developed PECONPI, an algorithm to assess the Hakonarson H, Spinner NB, Krantz ID. pathogenicity of CNVs based on gene content and CNV fre- 2013. PECONPI: A novel software for quency. This software was applied to a large cohort of patients uncovering pathogenic copy number with genetically heterogeneous non-syndromic hearing loss to variations in non-syndromic sensorineural score and rank each CNV based on its relative pathogenicity. Of hearing loss and other genetically 636 individuals tested, we identified the likely underlying heterogeneous disorders. etiology of the hearing loss in 14 (2%) of the patients (1 with a homozygous deletion, 7 with a deletion of a known Am J Med Genet Part A 161A:2134–2147. hearing loss gene and a point mutation on the trans allele and 6 with a deletion larger than 1 Mb). We also identified two probands with smaller deletions encompassing genes that Conflict of interest: none. may be functionally related to their hearing loss. The ability Ellen A. Tsai, Micah A. Berman, Laura K. Conlin contributed equally to of PECONPI to determine the pathogenicity of CNVs was this work. tested on a second genetically heterogenous cohort with con- Grant sponsor: NIH/NIDCD; Grant numbers: R01DC005247; R33DC008630; U01HG006546; Grant sponsor: Ring Chromosome 20 genital heart defects (CHDs). It successfully identified a likely Foundation Grant; Grant numbers: T32HG000046; T32DC005363; etiology in 6 of 355 individuals (2%). We believe this tool is T32GM008638. Ã useful for researchers with large genetically heterogeneous Correspondence to: cohorts to help identify known pathogenic causes and novel Dr. Ian D. Krantz, Genomics and Computational Biology Graduate disease genes. Ó 2013 Wiley Periodicals, Inc. Group, University of California, 423 Guardian Drive, Philadelphia, PA 19104. E-mail: [email protected] Article first published online in Wiley Online Library (wileyonlinelibrary.com): 29 July 2013 Key words: copy number variation; genetic heterogeneity DOI 10.1002/ajmg.a.36038 Ó 2013 Wiley Periodicals, Inc. 2134 TSAI ET AL. 2135 INTRODUCTION tion). The ability to delineate pathogenic CNVs is important for gene discovery in heterogeneous disorders. The contribution of a The recent advent of high resolution, genome-wide copy number deleterious CNV containing a dominant disease gene (resulting analysis using chromosomal microarray platforms to detect small in haploinsufficiency) to a disorder is direct and clear, however a genomic deletions and duplications has revolutionized both the heterozygous deletion of a gene can also unmask a recessive diagnosis of genetic disorders and the discovery of disease genes mutation on the trans allele and lead to the discovery of a new [Vissers et al., 2004; Beaudet and Belmont, 2008]. Copy number genetic locus for a disorder. variations (CNVs) can be detected by both comparative genomic In this study, we demonstrate the utility of PECONPI to identify hybridization (CGH) and single nucleotide polymorphism (SNP) pathogenic gene deletions in NSSNHL. We determined a likely array platforms, which share the ability to scan across the genome at underlying molecular diagnosis in 14 cases of NSSNHL as well as high resolution. Multiple studies have shown that there is consid- some candidate genes that require further validation. Our work erable copy number variation among normal human subjects, with these common, genetically heterogeneous disorders may have comprising a significant percentage of the genome at the structural relevance for investigators using genome-wide CNV analysis to level [Iafrate et al., 2004; Sebat et al., 2004; Tuzun et al., 2005; study common, genetically complex human traits such as autism Conrad et al., 2006; McCarroll et al., 2006; Redon et al., 2006; and schizophrenia [Sebat et al., 2007; Szatmari et al., 2007; Marshall Sebat, 2007]. Therefore, it has become clear that CNVs can occur in et al., 2008; Walsh et al., 2008; Weiss et al., 2008]. genes but still be clinically benign, while the deletion or duplication of other genes may directly cause disease. CNV-calling algorithms have been developed to automate the process of detecting CNV SUBJECTS AND METHODS intervals from array hybridization [Komura et al., 2006; Colella et al., 2007; Marioni et al., 2007; Wang et al., 2007], however; these Sample Selection detection algorithms do not discern the benign from the pathogenic NSSNHL cohort. The bilateral sensorineural hearing loss co- CNVs. hort is comprised of 636 children with NSSNHL presenting at The CNV analysis in large cohorts of individuals with specific phe- Genetics for Hearing Loss Clinic at CHOP for a genetic evaluation notypes can be used to uncover deletions or duplications of specific of their hearing loss or through the Laboratory of Molecular genes that contribute to the phenotype. This is particularly useful Medicine (LMM) at the Harvard Medical School—Partners for phenotypes that are genetically heterogeneous, since CNV Healthcare Center for Genetics and Genomics (HPCGG). All detection provides an unbiased analysis across the entire genome. probands and families studied were either voluntarily enrolled in For Mendelian disorders, the deletion of a causative gene within a an IRB-approved protocol of informed consent (CHOP, LMM) or pathogenic CNV may be a rare event. However, the identification of analyzed through an anonymous discarded tissue IRB-approved rare deletion events can lead to the discovery of candidate genes to protocol (LMM). All probands studied had a confirmed non- be screened for point mutations, a more common form of gene syndromic bilateral sensorineural hearing loss as based upon disruption. It is laborious to peruse through CNV data for each clinical history, exam, review of records and audiometric testing. patient of a large cohort. Moreover, the discrimination of patho- Other pre-screening and selection factors are described in the genic CNVs from benign CNVs is not always clear. The large Supplementary Information. All individuals with confirmed ho- number of CNVs identified per individual (24 CNVs on average mozygous or compound heterozygous causative mutations were on a half-million marker SNP array) [Redon et al., 2006] and cohort excluded from the analysis, with the exception of patients with GJB6 sizes spanning many hundreds or thousands of patients turns the deletions, who were retained in the cohort as positive controls. process of unmasking pathogenic CNVs into a high-throughput CHD cohort. This cohort consists of 355 probands with ap- task. We have developed Perl Copy Numbers of Potential Interest parent isolated congenital heart defects (CHDs) ascertained (PECONPI), an algorithm to ranks CNVs to help researchers through the Department of Cardiothoracic Surgery at CHOP prioritize their search for the pathogenic CNVs. PECONPI utilizes and voluntarily enrolled in an IRB-approved protocol of informed the properties of each CNV to rank them based on their potential to consent. All probands had standard cytogenetic analysis performed be damaging. We developed a logistic regression based algorithm to through G-banding and a subset had targeted fluorescent in situ measure the performance of PECONPI’s heuristic algorithm. hybridization (FISH) or CGH studies performed when requested Both algorithms were benchmarked on a cohort of 100 clinically. This cohort was selected due to the genetic heterogeneity patients with known pathogenic CNVs as described in Supple- of CHDs, with some being caused by large CNVs (e.g., 22q11.21 mentary Information. For this study, we chose to focus on only deletion) and most being of unknown etiology, echoing the issues the copy number losses (deletions) and not duplications, as gene faced in the

PECONPI: a Novel Software for Uncovering Pathogenic Copy

Genetic Variation Across the Human Olfactory Receptor Repertoire Alters Odor Perception

A Computational Approach for Defining a Signature of Β-Cell Golgi Stress in Diabetes Mellitus

Identification of Potential Markers for Type 2 Diabetes Mellitus Via Bioinformatics Analysis

Genome-Wide Linkage and Admixture Mapping of Type 2 Diabetes In

The Adaptation and Changes of Titin System Following Exercise Training Hassan Farhadi 1 and Mir Hojjat Mosavineghad 2

Supplementary Figure S4

LETTER Doi:10.1038/Nature09515

A Population-Specific Major Allele Reference Genome from the United

Crosstalk in Competing Endogenous RNA Network Reveals the Complex Molecular Mechanism Underlying Lung Cancer

Rabbit Anti-OR13C9 Polyclonal Antibody - Middle Region

Supplementary Table 1

Misexpression of Cancer/Testis (Ct) Genes in Tumor Cells and the Potential Role of Dream Complex and the Retinoblastoma Protein Rb in Soma-To-Germline Transformation