Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S

https://doi.org/10.1038/s41586-021-03855-y Accelerated Article Preview Rare variant contribution to human disease W in 281,104 UK Biobank exomes E VI Received: 3 November 2020 Quanli Wang, Ryan S. Dhindsa, Keren Carss, Andrew R. Harper, Abhishek N ag, I oa nn a Tachmazidou, Dimitrios Vitsios, Sri V. V. Deevi, Alex Mackay, EDaniel Muthas, Accepted: 28 July 2021 Michael Hühn, Sue Monkley, Henric O ls so n , S eb astian Wasilewski, Katherine R. Smith, Accelerated Article Preview Published Ruth March, Adam Platt, Carolina Haefliger & Slavé PetrovskiR online 10 August 2021 P Cite this article as: Wang, Q. et al. Rare variant This is a PDF fle of a peer-reviewed paper that has been accepted for publication. contribution to human disease in 281,104 UK Biobank exomes. Nature https:// Although unedited, the content has been subjectedE to preliminary formatting. Nature doi.org/10.1038/s41586-021-03855-y (2021). is providing this early version of the typeset paper as a service to our authors and Open access readers. The text and fgures will undergoL copyediting and a proof review before the paper is published in its fnal form. Please note that during the production process errors may be discovered which Ccould afect the content, and all legal disclaimers apply. TI R A D E T A R E L E C C A Nature | www.nature.com Article Rare variant contribution to human disease in 281,104 UK Biobank exomes W 1,19 1,19 2,19 2 2 https://doi.org/10.1038/s41586-021-03855-y Quanli Wang , Ryan S. Dhindsa , Keren Carss , Andrew R. Harper , Abhishek Nag , 2 2 2 3 E3 Ioanna Tachmazidou , Dimitrios Vitsios , Sri V. V. Deevi , Alex Mackay , Daniel Muthas , Received: 3 November 2020 Michael Hühn3, Sue Monkley3, Henric Olsson3, AstraZeneca Genomics Initiative*,I Accepted: 28 July 2021 Sebastian Wasilewski2, Katherine R. Smith2, Ruth March4, Adam Platt5, Carolina Haefliger2 & 2,6 ✉ V Slavé Petrovski Published online: 10 August 2021 Open access E Genome-wide association studies have uncovered thousands of common variants associated with human disease, but the contribution of rareR variation to common disease remains relatively unexplored. The UK BiobankP (UKB) contains detailed phenotypic data linked to medical records for approximately 500,000 participants, ofering an unprecedented opportunity to evaluate the impact of rare variation on a broad collection of traits1,2. Here, we studiedE the relationships between rare protein-coding variants and 17,361 binaryL and 1,419 quantitative phenotypes using exome sequencing data from 269,171 UKB participants of European ancestry. Gene-based collapsing analysesC revealed 1,703 statistically signifcant gene-phenotype associationsI for binary traits, with a median odds ratio of 12.4. Furthermore, 83% of these associations were undetectable via single variant association tests, emphasizingT the power of gene-based collapsing analysis in the setting of high allelicR heterogeneity. Gene-phenotype associations were also signifcantly enriched for loss-of-function-mediated traits and approved drug targets. Finally, we performedA ancestry-specifc and pan-ancestry collapsing analyses using exome sequencing data from 11,933 UKB participants of African, East Asian, or South Asian ancestry. Together, our results highlight a signifcant contribution of rare variantsD to common disease. Summary statistics are publicly available through an Einteractive portal (http://azphewas.com/). T 16–23 The identification of genetic variants that contribute to human dis- series and can provide a clear link between the causal gene and ease has facilitated the development of highly efficacious and safe phenotype. Applications of these methods to the first 50,000 UKB 3–5 A therapeutics . Drug candidates targeting genes with evidence of exome sequences have indicated an important role of rare variation human disease causality are in fact significantly more likely to be in complex disease but have also highlighted a need for larger sample approved6,7. Exome sequencingR has revolutionized our understand- sizes10,11. ing of rare diseases, uncovering causal rare variants for hundreds of In this study, we perform a phenome-wide association study these disorders. However,E most efforts for complex human diseases (PheWAS) using exome sequence data from 269,171 UKB participants of and traits have relied on genome-wide association studies (GWAS), European ancestry to evaluate the association between protein-coding which focus on commonL variants. Compared with rare variants, com- variants and 17,361 binary and 1,419 quantitative phenotypes. We first mon variants tend to confer smaller effect sizes and can be difficult present the diversity of phenotypes and sequence variation present in to map to causalE genes8 . this cohort. We then perform variant- and gene-level association tests The UKB offers an unprecedented opportunity to assess the con- to identify risk factors across the allele frequency spectrum. Finally, tributionC of both common and rare genetic variation to thousands of we perform additional collapsing analyses in 11,933 individuals of Afri- human traits and diseases1,2,9–13. Testing for the association between can, East Asian, or South Asian genetic ancestry. Using these results, Crare variants and phenotypes is typically performed at the level variant we implement a pan-ancestry analysis of 281,104 UKB participants. or gene level. Gene-level association tests include collapsing analy- Altogether, this study comprehensively examines the contribution of A ses, burden tests, and others14–17. Collapsing analyses are particularly rare protein-coding variation to the genetic architecture of complex well-suited to detect genetic risk for phenotypes driven by an allelic human diseases and quantitative traits. 1Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Waltham, MA, USA. 2Centre for Genomics Research, Discovery Sciences, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK. 3Translational Science and Experimental Medicine, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R&D, AstraZeneca, Gothenburg, Sweden. 4Precision Medicine & Biosamples, Oncology R&D, AstraZeneca, Cambridge, UK. 5Translational Science and Experimental Medicine, Research and Early Development, Respiratory and Immunology, BioPharmaceuticals R&D, AstraZeneca, Cambridge, UK. 6Departments of Medicine and Neurology, University of Melbourne, Royal Melbourne Hospital, Melbourne, Victoria, Australia. 19These authors contributed equally: Quanli Wang, Ryan S. Dhindsa, Keren Carss. *A list of consortium authors and their affiliations appears at the end of the paper. ✉e-mail: [email protected] Nature | www.nature.com | 1 Article model, rare variants accounted for 26% of statistically significant asso- Cohort characteristics ciations. Further, 21% (227/1,088) of binary trait associations and 12% We processed 998 terabytes of raw exome sequence data from 302,355 (1,330/10,770) of quantitative trait associations identified using the UKB participants through a cloud-based bioinformatics pipeline (meth- recessive model were not detected using the dominant model. Associa- ods). Through stringent quality control, we removed samples with low tions with more common variants have been previously published9,12. sequencing quality and from closely related individuals (methods). Effect sizes of significant rare variant associations were substantially To harmonize variable categorisation modes, scaling, and follow-up higher than those of common variants (Wilcox P=1.1 x 10-57) (Fig. 1c). responses inherent to the phenotype data, we developed PEACOK, a While some significant variants are likely in linkage with nearby causalW W modification of the PHESANT package24 (methods). variants, associated PTVs and missense variants often represent the 26 We considered 17,361 binary traits and 1,419 quantitative traits, which causal variant themselves . Notably, associations for 13%E (3/24) and E we categorised into 22 ICD-10-based chapters (Extended Data Fig. 1a, b; 29% (96/326) of the significant PTVs and missense variants, respectively, Supplementary Table 1). We also computed the union of cases across have not been reported in FinnGen release 5, OMIM, IClinVar, or the I similar phenotypes, resulting in 4,911 union phenotypes (methods, GWAS catalogue (Fig. 1d, Supplementary Tables 4,V 5)30–32. V Supplementary Table 1). The median number of European cases per We explored how often significant variant-level associations between binary union phenotype was 191 and the median number of individuals different variants in the same gene have opposing directions of effect tested for quantitative traits was 13,782 (Extended Data Fig. 1c, d). The on a phenotype. Among quantitative trait associationsE with at least five E median number of binary union traits was 25 (Extended Data Fig. 1e). significant non-synonymous variants (MAF<0.1%) in a particular gene, Approximately 95% of the sequenced UKB participants are of Euro- at least 80% of variants had the same directionR of effect (Extended Data R pean ancestry (Extended Data Fig. 1f). This impacts healthcare equity, as Fig. 2b). This is in contrast to disease-associated noncoding variants, the resolution to evaluate variants across the allele frequency spectrum which can variably affect the directionP of gene expression33. P is proportional to the number of sequenced individuals in a popula- We compared the results of our Fisher’s exact tests to regression-based tion. For example, individuals from

Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S

Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P

The ELIXIR Core Data Resources: Fundamental Infrastructure for The

Open Targets Genetics

Select Committee on Science and Technology Corrected Oral Evidence: Ageing: Science, Technology and Healthy Living

(DDD) Project: What a Genomic Approach Can Achieve

Different Evolutionary Patterns of Snps Between Domains and Unassigned Regions in Human Protein‑Coding Sequences

Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I

1 Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: P

NIH-GDS: Genomic Data Sharing

Genome-Wide Rare Variant Analysis for Thousands of Phenotypes in Over 70,000 Exomes from Two Cohorts

Wellcome Trust Annual Report and Financial Statements 2019 Is © the Wellcome Trust and Is Licensed Under Creative Commons Attribution 2.0 UK

UK Biobank: a Model for Public Engagement?

Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W ­ 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S

Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S