Common Homozygosity for Predicted Loss-Of-Function Variants Reveals Both Redundant and Advantageous Effects of Dispensable Human Genes
Total Page:16
File Type:pdf, Size:1020Kb
Common homozygosity for predicted loss-of-function variants reveals both redundant and advantageous effects of dispensable human genes Antonio Rausella,b,1,2, Yufei Luoa,b,1, Marie Lopezc, Yoann Seeleuthnerb,d, Franck Rapaporte, Antoine Faviera,b, Peter D. Stensonf, David N. Cooperf, Etienne Patinc, Jean-Laurent Casanovab,d,e,g,h,2, Lluis Quintana-Murcic,i, and Laurent Abelb,d,e,2 aClinical Bioinformatics Laboratory, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France; bUniversity of Paris, Imagine Institute, 75015 Paris, France; cHuman Evolutionary Genetics Unit, Institut Pasteur, UMR2000, CNRS, Paris 75015, France; dLaboratory of Human Genetics of Infectious Diseases, Necker Branch, INSERM UMR1163, Necker Hospital for Sick Children, 75015 Paris, France; eSt. Giles Laboratory of Human Genetics of Infectious Diseases, Rockefeller Branch, The Rockefeller University, New York, NY 10065; fInstitute of Medical Genetics, School of Medicine, Cardiff University, CF14 4XN Cardiff, United Kingdom; gHoward Hughes Medical Institute, New York, NY 10065; hPediatric Hematology and Immunology Unit, Necker Hospital for Sick Children, 75015 Paris, France; and iHuman Genomics and Evolution, Collège de France, Paris 75005, France Contributed by Jean-Laurent Casanova, January 10, 2020 (sent for review October 24, 2019; reviewed by Philippe Froguel, John M. Greally, and Lennart Hammarström). Humans homozygous or hemizygous for variants predicted to in-frame insertions−deletions (indels), missense variants, splice cause a loss of function (LoF) of the corresponding protein do not region variants not affecting the essential splice regions, and even necessarily present with overt clinical phenotypes. We report here synonymous or deep intronic mutations, that may actually be LoF 190 autosomal genes with 207 predicted LoF variants, for which but cannot be systematically identified as such in silico. the frequency of homozygous individuals exceeds 1% in at least Many predicted LoF variants have nevertheless been confirmed one human population from five major ancestry groups. No such experimentally, typically when they are associated with a clinical genes were identified on the X and Y chromosomes. Manual phenotype. Of the 229,161 variants reported in the Human Gene curation revealed that 28 variants (15%) had been misannotated Mutation Database (HGMD) (7), as many as 99,027 predicted as LoF. Of the 179 remaining variants in 166 genes, only 11 alleles LoF alleles in 5,186 genes have been found to be disease causing GENETICS in 11 genes had previously been confirmed experimentally to be in association and/or functional studies. For example, for the LoF. The set of 166 dispensable genes was enriched in olfactory subset of 253 genes implicated in recessive forms of primary receptor genes (41 genes). The 41 dispensable olfactory receptor immunodeficiencies (8, 9), 12,951 LoF variants are reported in genes displayed a relaxation of selective constraints similar to that HGMD. Conversely, a substantial proportion of genes harboring observed for other olfactory receptor genes. The 125 dispensable nonolfactory receptor genes also displayed a relaxation of selec- biallelic null variants have no discernible associated pathological tive constraints consistent with greater redundancy. Sixty-two of phenotype, and several large-scale sequencing surveys in adults these 125 genes were found to be dispensable in at least three human populations, suggesting possible evolution toward pseudo- Significance genes. Of the 179 LoF variants, 68 could be tested for two neutrality statistics, and 8 displayed robust signals of positive selection. These Human genes homozygous for apparent loss of function (LoF) latter variants included a known FUT2 variant that confers resis- variants are increasingly reported in a sizeable proportion of tance to intestinal viruses, and an APOL3 variant involved in resis- individuals without overt clinical phenotypes. Here, we found tance to parasitic infections. Overall, the identification of 166 genes 166 genes with 179 predicted LoF variants for which the fre- for which a sizeable proportion of humans are homozygous for quency of homozygous individuals exceeds 1% in at least one predicted LoF alleles reveals both redundancies and advantages of of the populations present in databases ExAC and gnomAD. such deficiencies for human survival. These putatively dispensable genes showed relaxation of se- lective constraints, suggesting that a considerable proportion redundancy | pseudogenization | loss of function | positive selection | of these genes may be undergoing pseudogenization. Eight of negative selection these LoF variants displayed robust signals of positive selec- tion, including two variants in genes involved in resistance to he human genome displays considerable DNA sequence diversity infectious diseases. The identification of dispensable genes will Tat the population level. One of its most intriguing features is the facilitate the identification of functions that are now re- homozygosity or hemizygosity for variants of protein-coding genes dundant, or possibly even advantageous, for human survival. predicted to be loss-of-function (LoF) found at various frequencies in – Author contributions: A.R., E.P., J.-L.C., L.Q.-M., and L.A. designed research; A.R., Y.L., different human populations (1 3). An unknown proportion of these M.L., and L.A. performed research; Y.L., M.L., Y.S., F.R., A.F., P.D.S., and D.N.C. contributed reported variants are not actually LoF, instead being hypomorphic or new reagents/analytic tools; A.R., Y.L., M.L., Y.S., and F.R. analyzed data; and A.R., D.N.C., isomorphic, because of a reinitiation of translation, readthrough, or a E.P., J.-L.C., L.Q.-M., and L.A. wrote the paper. redundant tail, resulting in lower, normal, or even higher than normal Reviewers: P.F., Imperial College London; J.M.G., Albert Einstein College of Medicine; and levels of protein function. Indeed, a bona fide nonsense allele, pre- L.H., Karolinska Institutet. Competing interest statement: L.H. coauthored research papers with J.-L.C. in 2017 and dicted to be LoF, can actually be gain-of-function (hypermorphic), as with E.P., J.-L.C., L.Q.-M., and L.A. in 2018. κ α illustrated by I B mutations (4). Moreover, the LoF may apply Published under the PNAS license. selectively to one isoform or a subset of isoforms of a given gene, but 1A.R. and Y.L. contributed equally to this work. not others (e.g., if the exon carrying the premature stop is spliced out 2To whom correspondence may be addressed. Email: [email protected], for a specific set of alternative transcripts) (5). Finally, there are at [email protected], or [email protected]. least 400 discernible cell types in the human body (6), and the mutant This article contains supporting information online at https://www.pnas.org/lookup/suppl/ transcript may be expressed in only a limited number of tissues. doi:10.1073/pnas.1917993117/-/DCSupplemental. Conversely, there are also mutations not predicted to be LoF, such as www.pnas.org/cgi/doi/10.1073/pnas.1917993117 PNAS Latest Articles | 1of11 Downloaded by guest on September 28, 2021 from the general population have reported human genes ap- genes (Table 1 and Dataset S1). No genes on the X or Y chro- parently tolerant to homozygous LoF variants (10–15). Four mosomes fulfilled these criteria. Relaxing the thresholds on the studies in large bottlenecked or consanguineous populations single-nt polymorphism (SNP) and insertions and deletions (indel) detected between 781 and 1,317 genes homozygous for muta- call quality filters (variant quality score recalibration [VQSR] tions predicted to be LoF (11, 12, 14, 15). These studies focused score; Methods), the variant call rate, or the nonrestriction to the principally on low-frequency variants (minor allele frequency < principal isoform did not substantially increase the number of 1 to 5%), and associations with some traits, benign or disease- putatively dispensable genes (SI Appendix, Figs. S1 and S2). The related, were found for a few rare homozygous LoF variants. frequency of homozygous individuals at which a gene is defined as Two studies provided a more comprehensive perspective of the dispensable appeared to be the criterion with the largest impact on allele frequency spectrum of LoF variants. A first systematic the number of dispensable genes identified, thereby justifying the survey of LoF variants, mostly in the heterozygous state, was use of the stringent threshold (>1% homozygotes in at least one performed with whole-genome sequencing data from 185 in- specific population) described above (SI Appendix,Figs.S1andS2). dividuals of the 1000 Genomes Project; it identified 253 genes with homozygous LoF variants (10). In a larger study of more Manual Curation of the LoF Variants. We then investigated the po- than 60,000 whole exomes, the Exome Aggregation Consortium tential molecular consequences of the 207 putative LoF variants (ExAC) database project identified 1,775 genes with at least retained, collectively associated with the 190 genes, by manually one homozygous LoF variant, with a mean of 35 homozygous curating the annotation of these LoF variants. An examination of potential LoF variants per individual (13). the sequencing reads in gnomAD showed that 18 variants—10 These studies clearly confirmed the presence of genes with single-nt variants (SNVs) and 8 indels initially annotated as stop- homozygous LoF variants in apparently healthy humans, but no gains and frameshift variants, respectively—were components of specific study of such variants present in the homozygous state in haplotypes with consequences other than the initial LoF annota- a sizeable proportion (>1%) of large populations has yet been tion (Dataset S1). Thus, for each of the 10 SNVs, a haplotype performed. In principle, these variants may be neutral (indicating encompassing a contiguous second variant led to the creation of a gene redundancy) (16), or may even confer a selective advantage missense rather than a stop variant allele. For each of the eight “ ” (the so-called less is more hypothesis) (3, 17).