Discovery of the First Genome-Wide Significant Risk Loci for ADHD

Total Page:16

File Type:pdf, Size:1020Kb

Discovery of the First Genome-Wide Significant Risk Loci for ADHD Supplementary Information Discovery of the first genome-wide significant risk loci for ADHD Table of Contents Supplementary Methods and Results .......................................................................................... 4 Detailed description of individual Samples ........................................................................................... 4 iPSYCH, Denmark ............................................................................................................................................. 4 Samples from the Psychiatric Genomics Consortium (PGC) ................................................................................ 6 Parent-offspring trio samples .............................................................................................................................. 6 CHOP, USA ....................................................................................................................................................... 6 IMAGE-I, Europe .............................................................................................................................................. 6 PUWMa, USA ................................................................................................................................................... 7 Toronto, Canada .............................................................................................................................................. 8 Case-control samples ........................................................................................................................................... 9 Barcelona, Spain .............................................................................................................................................. 9 Beijing, China ................................................................................................................................................... 9 Bergen, Norway ............................................................................................................................................. 10 Cardiff, UK ..................................................................................................................................................... 10 Germany ........................................................................................................................................................ 11 IMAGE-II, Europe & USA ................................................................................................................................ 12 Yale-Penn, USA .............................................................................................................................................. 13 Replication samples ........................................................................................................................................... 14 23andMe, self-reported ADHD diagnoses ..................................................................................................... 14 EAGLE, ADHD symptom scores ..................................................................................................................... 15 Bioinformatic pipeline for quality control and association analyses .................................................... 17 Pre-imputation quality control .......................................................................................................................... 17 Genotype imputation ......................................................................................................................................... 17 Relatedness and population stratification ......................................................................................................... 18 GWAS and meta-analysis .................................................................................................................... 19 Defining independent genome-wide significant loci ........................................................................... 21 Evaluating putative secondary signals ................................................................................................ 21 Correlation of secondary signals with their respective lead index variants ....................................................... 21 Conditional association analysis ........................................................................................................................ 22 Bayesian credible set analysis ............................................................................................................ 22 Credible set estimation method ........................................................................................................................ 23 Variants considered for credible set analysis ..................................................................................................... 24 Credible set results in PGC and iPSYCH data ...................................................................................................... 25 Functional annotation of variants in credible set .............................................................................................. 25 Gene-based association analysis ........................................................................................................ 27 Exome-wide association of single genes with ADHD ......................................................................................... 27 Gene-wise association of candidate genes for ADHD ........................................................................................ 28 Gene-set analyses .............................................................................................................................. 28 1 Hypothesis free gene set analyses ..................................................................................................................... 28 FOXP2 downstream target gene set analysis ..................................................................................................... 29 Highly constrained gene set analysis ................................................................................................................. 30 LD Score intercept evaluation ............................................................................................................. 30 Genetic correlations between PGC and iPSYCH ADHD samples ........................................................... 31 Genetic correlation between PGC case-control and trio samples ........................................................ 31 Polygenic risk scores for ADHD ........................................................................................................... 32 Polygenic risk score prediction in iPSYCH samples ............................................................................................ 33 Polygenic risk score prediction in PGC samples ................................................................................................. 34 Leave-one-out analysis across cohorts .............................................................................................................. 35 SNP heritability .................................................................................................................................. 35 Partitioning heritability by functional annotation and cell type .......................................................... 36 Genetic correlations of ADHD with other traits .................................................................................. 37 Replication analysis in 23andMe and EAGLE cohorts .......................................................................... 38 Sign test ............................................................................................................................................................. 38 Replication meta-analyses and heterogeneity tests .......................................................................................... 39 Results for replication in 23andMe .................................................................................................................... 40 Results for replication in EAGLE ......................................................................................................................... 41 Results for ADHD+23andMe+EAGLE meta-analysis ........................................................................................... 42 Method for meta-analysis of continuous and dichotomous ADHD measures ...................................... 43 Basic genetic model for latent-scale phenotypes .............................................................................................. 44 Defining independent genetic factors ................................................................................................................ 45 Effects of individual variants .............................................................................................................................. 46 GWAS test statistics for ßj .................................................................................................................................. 48 Test of ßij in GWAS of dichotomous Y1 ............................................................................................................... 50 Test of ß1j in GWAS of continuous Y2 ................................................................................................................
Recommended publications
  • Analysis of Trans Esnps Infers Regulatory Network Architecture
    Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 © 2014 Anat Kreimer All rights reserved ABSTRACT Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer eSNPs are genetic variants associated with transcript expression levels. The characteristics of such variants highlight their importance and present a unique opportunity for studying gene regulation. eSNPs affect most genes and their cell type specificity can shed light on different processes that are activated in each cell. They can identify functional variants by connecting SNPs that are implicated in disease to a molecular mechanism. Examining eSNPs that are associated with distal genes can provide insights regarding the inference of regulatory networks but also presents challenges due to the high statistical burden of multiple testing. Such association studies allow: simultaneous investigation of many gene expression phenotypes without assuming any prior knowledge and identification of unknown regulators of gene expression while uncovering directionality. This thesis will focus on such distal eSNPs to map regulatory interactions between different loci and expose the architecture of the regulatory network defined by such interactions. We develop novel computational approaches and apply them to genetics-genomics data in human. We go beyond pairwise interactions to define network motifs, including regulatory modules and bi-fan structures, showing them to be prevalent in real data and exposing distinct attributes of such arrangements. We project eSNP associations onto a protein-protein interaction network to expose topological properties of eSNPs and their targets and highlight different modes of distal regulation.
    [Show full text]
  • Discovery of the First Genome-Wide Significant Risk Loci for Attention Deficit/Hyperactivity Disorder
    ARTICLES https://doi.org/10.1038/s41588-018-0269-7 Discovery of the first genome-wide significant risk loci for attention deficit/hyperactivity disorder Ditte Demontis 1,2,3,69, Raymond K. Walters 4,5,69, Joanna Martin5,6,7, Manuel Mattheisen1,2,3,8,9,10, Thomas D. Als 1,2,3, Esben Agerbo 1,11,12, Gísli Baldursson13, Rich Belliveau5, Jonas Bybjerg-Grauholm 1,14, Marie Bækvad-Hansen1,14, Felecia Cerrato5, Kimberly Chambert5, Claire Churchhouse4,5,15, Ashley Dumont5, Nicholas Eriksson16, Michael Gandal17,18,19,20, Jacqueline I. Goldstein4,5,15, Katrina L. Grasby21, Jakob Grove 1,2,3,22, Olafur O. Gudmundsson13,23,24, Christine S. Hansen 1,14,25, Mads Engel Hauberg1,2,3, Mads V. Hollegaard1,14, Daniel P. Howrigan4,5, Hailiang Huang4,5, Julian B. Maller5,26, Alicia R. Martin4,5,15, Nicholas G. Martin 21, Jennifer Moran5, Jonatan Pallesen1,2,3, Duncan S. Palmer4,5, Carsten Bøcker Pedersen1,11,12, Marianne Giørtz Pedersen1,11,12, Timothy Poterba4,5,15, Jesper Buchhave Poulsen1,14, Stephan Ripke4,5,27, Elise B. Robinson4,28, F. Kyle Satterstrom 4,5,15, Hreinn Stefansson 23, Christine Stevens5, Patrick Turley4,5, G. Bragi Walters 23,24, Hyejung Won 17,18, Margaret J. Wright 29, ADHD Working Group of the Psychiatric Genomics Consortium (PGC)30, Early Lifecourse & Genetic Epidemiology (EAGLE) Consortium30, 23andMe Research Team30, Ole A. Andreassen 31, Philip Asherson32, Christie L. Burton 33, Dorret I. Boomsma34,35, Bru Cormand36,37,38,39, Søren Dalsgaard 11, Barbara Franke40, Joel Gelernter 41,42, Daniel Geschwind 17,18,19, Hakon Hakonarson43, Jan Haavik44,45, Henry R.
    [Show full text]
  • Multiplexed Engineering Glycosyltransferase Genes in CHO Cells Via Targeted Integration for Producing Antibodies with Diverse Complex‑Type N‑Glycans Ngan T
    www.nature.com/scientificreports OPEN Multiplexed engineering glycosyltransferase genes in CHO cells via targeted integration for producing antibodies with diverse complex‑type N‑glycans Ngan T. B. Nguyen, Jianer Lin, Shi Jie Tay, Mariati, Jessna Yeo, Terry Nguyen‑Khuong & Yuansheng Yang* Therapeutic antibodies are decorated with complex‑type N‑glycans that signifcantly afect their biodistribution and bioactivity. The N‑glycan structures on antibodies are incompletely processed in wild‑type CHO cells due to their limited glycosylation capacity. To improve N‑glycan processing, glycosyltransferase genes have been traditionally overexpressed in CHO cells to engineer the cellular N‑glycosylation pathway by using random integration, which is often associated with large clonal variations in gene expression levels. In order to minimize the clonal variations, we used recombinase‑mediated‑cassette‑exchange (RMCE) technology to overexpress a panel of 42 human glycosyltransferase genes to screen their impact on antibody N‑linked glycosylation. The bottlenecks in the N‑glycosylation pathway were identifed and then released by overexpressing single or multiple critical genes. Overexpressing B4GalT1 gene alone in the CHO cells produced antibodies with more than 80% galactosylated bi‑antennary N‑glycans. Combinatorial overexpression of B4GalT1 and ST6Gal1 produced antibodies containing more than 70% sialylated bi‑antennary N‑glycans. In addition, antibodies with various tri‑antennary N‑glycans were obtained for the frst time by overexpressing MGAT5 alone or in combination with B4GalT1 and ST6Gal1. The various N‑glycan structures and the method for producing them in this work provide opportunities to study the glycan structure‑and‑function and develop novel recombinant antibodies for addressing diferent therapeutic applications.
    [Show full text]
  • Glycosyltransferase B4GALNT2 As a Predictor of Good Prognosis in Colon Cancer: Lessons from Databases
    International Journal of Molecular Sciences Article Glycosyltransferase B4GALNT2 as a Predictor of Good Prognosis in Colon Cancer: Lessons from Databases Michela Pucci, Nadia Malagolini and Fabio Dall’Olio * Department of Experimental, Diagnostic and Specialty Medicine (DIMES), General Pathology Building, University of Bologna, Via San Giacomo 14, 40126 Bologna, Italy; [email protected] (M.P.); [email protected] (N.M.) * Correspondence: [email protected]; Tel.: +39-051-2094704 Abstract: Background: glycosyltransferase B4GALNT2 and its cognate carbohydrate antigen Sda are highly expressed in normal colon but strongly downregulated in colorectal carcinoma (CRC). We previously showed that CRC patients expressing higher B4GALNT2 mRNA levels displayed longer survival. Forced B4GALNT2 expression reduced the malignancy and stemness of colon cancer cells. Methods: Kaplan–Meier survival curves were determined in “The Cancer Genome Atlas” (TCGA) COAD cohort for several glycosyltransferases, oncogenes, and tumor suppressor genes. Whole expression data of coding genes as well as miRNA and methylation data for B4GALNT2 were downloaded from TCGA. Results: the prognostic potential of B4GALNT2 was the best among the glycosyltransferases tested and better than that of many oncogenes and tumor suppressor genes; high B4GALNT2 expression was associated with a lower malignancy gene expression profile; differential methylation of an intronic B4GALNT2 gene position and miR-204-5p expression play major roles in B4GALNT2 regulation. Conclusions: high B4GALNT2 expression is a strong predictor of good prognosis in CRC as a part of a wider molecular signature that includes ZG16, ITLN1, BEST2, and Citation: Pucci, M.; Malagolini, N.; GUCA2B. Differential DNA methylation and miRNA expression contribute to regulating B4GALNT2 Dall’Olio, F.
    [Show full text]
  • Regulation and Function of Silayltransferases in Pancreatic Cancer
    REGULATION AND FUNCTION OF SILAYLTRANSFERASES IN PANCREATIC CANCER Sònia Bassagañas i Puigdemont Dipòsit legal: Gi. 240-2015 http://hdl.handle.net/10803/285783 ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996). Para otros usos se requiere la autorización previa y expresa de la persona autora.
    [Show full text]
  • TRAPPC9-Related Autosomal Recessive Intellectual Disability: Report of a New Mutation and Clinical Phenotype
    European Journal of Human Genetics (2013) 21, 229–232 & 2013 Macmillan Publishers Limited All rights reserved 1018-4813/13 www.nature.com/ejhg SHORT REPORT TRAPPC9-related autosomal recessive intellectual disability: report of a new mutation and clinical phenotype Giuseppe Marangi1, Vincenzo Leuzzi2, Filippo Manti2, Serena Lattante1, Daniela Orteschi1, Vanna Pecile3, Giovanni Neri1 and Marcella Zollino*,1 Intellectual disability (ID) with autosomal recessive (AR) inheritance is believed to be common; however, very little is known about causative genes and genotype–phenotype correlations. The broad genetic heterogeneity of AR-ID, and its usually nonsyndromic nature make it difficult to pool multiple pedigrees with the same underlying genetic defect to achieve consistent nosology. Nearly all autosomal genes responsible for recessive cognitive disorders have been identified in large consanguineous families from the Middle East, and nonsense mutations in TRAPPC9 have been reported in a total of 5. Although several recurrent phenotypic abnormalities are described in some of these patients, the associated phenotype is usually referred to as nonsyndromic. By means of single-nucleotide polymorphism-array first and then by exome sequencing, we identified a new pathogenic mutation in TRAPPC9 in two Italian sisters born to healthy and apparently nonconsanguineous parents. It consists of a homozygous splice site mutation causing exon skipping with frameshift and premature termination, as confirmed by mRNA sequencing. By detailed phenotypic analysis of our patients, and by critical literature review, we found that homozygous TRAPPC9 loss-of-function mutations cause a distinctive phenotype, characterized by peculiar facial appearance, obesity, hypotonia (all signs resembling a Prader–Willi-like phenotype), moderate-to-severe ID, and consistent brain abnormalities.
    [Show full text]
  • Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes
    Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes Slave´ Petrovski1,2*, Quanli Wang1, Erin L. Heinzen1,3, Andrew S. Allen1,4, David B. Goldstein1* 1 Center for Human Genome Variation, Duke University, School of Medicine, Durham, North Carolina, United States of America, 2 Departments of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria, Australia, 3 Department of Medicine, Section of Medical Genetics, Duke University, School of Medicine, Durham, North Carolina, United States of America, 4 Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America Abstract A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease.
    [Show full text]
  • Acetyl-Coa Flux from the Cytosol to the ER Regulates Engagement And
    www.nature.com/scientificreports OPEN Acetyl‑CoA fux from the cytosol to the ER regulates engagement and quality of the secretory pathway Inca A. Dieterich1,2,3,11, Yusi Cui4,11, Megan M. Braun1,2,3, Alexis J. Lawton5,6, Nicklaus H. Robinson1,2, Jennifer L. Peotter5, Qing Yu4,10, Jason C. Casler7, Benjamin S. Glick7, Anjon Audhya5, John M. Denu5,6, Lingjun Li4* & Luigi Puglielli1,2,8,9* Nε‑lysine acetylation in the ER is an essential component of the quality control machinery. ER acetylation is ensured by a membrane transporter, AT‑1/SLC33A1, which translocates cytosolic acetyl‑ CoA into the ER lumen, and two acetyltransferases, ATase1 and ATase2, which acetylate nascent polypeptides within the ER lumen. Dysfunctional AT‑1, as caused by gene mutation or duplication events, results in severe disease phenotypes. Here, we used two models of AT‑1 dysregulation to investigate dynamics of the secretory pathway: AT‑1 sTg, a model of systemic AT‑1 overexpression, and AT‑1S113R/+, a model of AT‑1 haploinsufciency. The animals displayed reorganization of the ER, ERGIC, and Golgi apparatus. In particular, AT‑1 sTg animals displayed a marked delay in Golgi‑to‑ plasma membrane protein trafcking, signifcant alterations in Golgi‑based N‑glycan modifcation, and a marked expansion of the lysosomal network. Collectively our results indicate that AT‑1 is essential to maintain proper organization and engagement of the secretory pathway. Nε-lysine acetylation in the Endoplasmic Reticulum (ER) has emerged as an essential component of the qual- ity control (QC) machinery that maintains protein homeostasis (proteostasis) within the ER1–6.
    [Show full text]
  • Altered Glycosylation in Pancreatic Cancer: Development of New Tumor Markers and Therapeutic Strategies
    ALTERED GLYCOSYLATION IN PANCREATIC CANCER: DEVELOPMENT OF NEW TUMOR MARKERS AND THERAPEUTIC STRATEGIES Pedro Enrique Guerrero Barrado Per citar o enllaçar aquest document: Para citar o enlazar este documento: Use this url to cite or link to this publication: http://hdl.handle.net/10803/671007 ADVERTIMENT. L'accés als continguts d'aquesta tesi doctoral i la seva utilització ha de respectar els drets de la persona autora. Pot ser utilitzada per a consulta o estudi personal, així com en activitats o materials d'investigació i docència en els termes establerts a l'art. 32 del Text Refós de la Llei de Propietat Intel·lectual (RDL 1/1996). Per altres utilitzacions es requereix l'autorització prèvia i expressa de la persona autora. En qualsevol cas, en la utilització dels seus continguts caldrà indicar de forma clara el nom i cognoms de la persona autora i el títol de la tesi doctoral. No s'autoritza la seva reproducció o altres formes d'explotació efectuades amb finalitats de lucre ni la seva comunicació pública des d'un lloc aliè al servei TDX. Tampoc s'autoritza la presentació del seu contingut en una finestra o marc aliè a TDX (framing). Aquesta reserva de drets afecta tant als continguts de la tesi com als seus resums i índexs. ADVERTENCIA. El acceso a los contenidos de esta tesis doctoral y su utilización debe respetar los derechos de la persona autora. Puede ser utilizada para consulta o estudio personal, así como en actividades o materiales de investigación y docencia en los términos establecidos en el art. 32 del Texto Refundido de la Ley de Propiedad Intelectual (RDL 1/1996).
    [Show full text]
  • Investigating Developmental and Epileptic Encephalopathy Using Drosophila Melanogaster
    International Journal of Molecular Sciences Review Investigating Developmental and Epileptic Encephalopathy Using Drosophila melanogaster Akari Takai 1 , Masamitsu Yamaguchi 2,3, Hideki Yoshida 2 and Tomohiro Chiyonobu 1,* 1 Department of Pediatrics, Graduate School of Medical Science, Kyoto Prefectural University of Medicine, Kyoto 602-8566, Japan; [email protected] 2 Department of Applied Biology, Kyoto Institute of Technology, Matsugasaki, Sakyo-ku, Kyoto 603-8585, Japan; [email protected] (M.Y.); [email protected] (H.Y.) 3 Kansai Gakken Laboratory, Kankyo Eisei Yakuhin Co. Ltd., Kyoto 619-0237, Japan * Correspondence: [email protected] Received: 15 August 2020; Accepted: 1 September 2020; Published: 3 September 2020 Abstract: Developmental and epileptic encephalopathies (DEEs) are the spectrum of severe epilepsies characterized by early-onset, refractory seizures occurring in the context of developmental regression or plateauing. Early infantile epileptic encephalopathy (EIEE) is one of the earliest forms of DEE, manifesting as frequent epileptic spasms and characteristic electroencephalogram findings in early infancy. In recent years, next-generation sequencing approaches have identified a number of monogenic determinants underlying DEE. In the case of EIEE, 85 genes have been registered in Online Mendelian Inheritance in Man as causative genes. Model organisms are indispensable tools for understanding the in vivo roles of the newly identified causative genes. In this review, we first present an overview of epilepsy and its genetic etiology, especially focusing on EIEE and then briefly summarize epilepsy research using animal and patient-derived induced pluripotent stem cell (iPSC) models. The Drosophila model, which is characterized by easy gene manipulation, a short generation time, low cost and fewer ethical restrictions when designing experiments, is optimal for understanding the genetics of DEE.
    [Show full text]
  • Mammalian Glycosylation in Immunity
    REVIEWS Mammalian glycosylation in immunity Jamey D. Marth and Prabhjit K. Grewal Abstract | Glycosylation produces a diverse and abundant repertoire of glycans, which are collectively known as the glycome. Glycans are one of the four fundamental macromolecular components of all cells, and are highly regulated in the immune system. Their diversity reflects their multiple biological functions that encompass ligands for proteinaceous receptors known as lectins. Since the discovery that selectins and their glycan ligands are important for the regulation of leukocyte trafficking, it has been shown that additional features of the vertebrate immune system are also controlled by endogenous cellular glycosylation. This Review focuses on the emerging immunological roles of the mammalian glycome. Glycome Glycosylation is the enzymatic process that produces that is formed by Ogt, secretory glycosylation in the The biological repertoire of glycosidic linkages of saccharides to other saccharides, ER and Golgi produces a large structural repertoire, glycan structures. proteins and lipids, and is probably as ancient as life including oligomeric glycan linkages that are presented itself. Unicellular and multicellular organisms depend at the cell surface and in extracellular compartments. on glycosylation to produce monomeric and multimeric Intracellular glycosylation has been reviewed elsewhere glycan linkages that are essential for cell viability and and is not discussed further in this Review12. normal function1–4. The resulting glycome encompasses a Glycosylation can substantially modify the structure diverse and abundant repertoire of glycans, which are one and function of proteins by steric influences involving of the four fundamental macromolecular com ponents of intermolecular and intramolecular interactions and all cells (together with nucleic acids, proteins and lipids) can mediate the production of glycan ligands for lectins (FIG.
    [Show full text]
  • Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes
    Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes Slave´ Petrovski1,2*, Quanli Wang1, Erin L. Heinzen1,3, Andrew S. Allen1,4, David B. Goldstein1* 1 Center for Human Genome Variation, Duke University, School of Medicine, Durham, North Carolina, United States of America, 2 Departments of Medicine, The University of Melbourne, Austin Health and Royal Melbourne Hospital, Melbourne, Victoria, Australia, 3 Department of Medicine, Section of Medical Genetics, Duke University, School of Medicine, Durham, North Carolina, United States of America, 4 Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, United States of America Abstract A central challenge in interpreting personal genomes is determining which mutations most likely influence disease. Although progress has been made in scoring the functional impact of individual mutations, the characteristics of the genes in which those mutations are found remain largely unexplored. For example, genes known to carry few common functional variants in healthy individuals may be judged more likely to cause certain kinds of disease than genes known to carry many such variants. Until now, however, it has not been possible to develop a quantitative assessment of how well genes tolerate functional genetic variation on a genome-wide scale. Here we describe an effort that uses sequence data from 6503 whole exome sequences made available by the NHLBI Exome Sequencing Project (ESP). Specifically, we develop an intolerance scoring system that assesses whether genes have relatively more or less functional genetic variation than expected based on the apparently neutral variation found in the gene. To illustrate the utility of this intolerance score, we show that genes responsible for Mendelian diseases are significantly more intolerant to functional genetic variation than genes that do not cause any known disease, but with striking variation in intolerance among genes causing different classes of genetic disease.
    [Show full text]