Professor David Roberts – Thousands of Genetic Variants Modulate Blood
Total Page:16
File Type:pdf, Size:1020Kb
Thousands of genetic variants modulate blood cell variation and function in humans David J Roberts University of Oxford and NHSBT William J. Astle* , Heather Elding*, Tao Jiang*, Dave Allen, Dace Ruklisa , Heleen Bouman, Fernando Riveros-Mckay, Alice L. Mann, Daniel Mead, Myrto A. Kostadima, John J. Lambourne , Suthesh Sivapalaratnam , Kate Downes, Kousik Kundu, Lorenzo Bomba, Kim Berentsen, John R. Bradley, Louise C. Daugherty, Olivier Delaneau, Stephen F. Garner, Luigi Grassi, Matthias Haimel, Eva M. Janssen-Megens, Anita Kaan, Mihir Kamat, Bowon Kim, Amit Mandoli, Jonathan Marchini, Joost H.A. Martens, Stuart Meacham, Karyn Megy, Jared O’Connell, Romina Petersen, Nilofar Sharifi, Simon M. Sheard, James R. Staley, Salih Tuna, Martijn van der Ent, Shuang-Yin Wang, Eleanor Wheeler, Steven P. Wilder, Valentina Iotchkova , Carmel Moore, Jennifer Sambrook, Hendrik G. Stunnenberg, Emanuele Di Angelantonio, Stephen Kaptoge, Taco W. Kuijpers, Mattia Frontini, John Danesh §, David J. Roberts §, Willem H. Ouwehand §, Adam S. Butterworth§, Nicole Soranzo§ GWAS studies • Genome wide association studies are potentially a powerful way to determine the association of disease or phenotype with genetic traits • Discovery of novel large effect sizes now unlikely but associations may define pathophysiological pathways and therapeutic possibilities • Notable successes in SCD – association of BCL11A as negative regulator of HbF Haematological disorders • Acquired and inherited haematological diseases are of global importance to public health • Global anemia prevalence in 2010 was 32.9% causing ~70 million years lived with disability – Over one billion people suffer from iron deficiency anaemia – Haemoglobinopathies are the most common monogenic diseases and have many unknown genetic modifying factors • Variation in platelet function and number causes abnormal clotting and may contribute to cardiovascular disease • Variation in neutrophil and monocyte function changes susceptibility to infection and inflammatory conditions Summary of findings from previous genetic mapping efforts Soranzo et al, Nat Genet (2009) Ganesh et al Nat Genet (2009) 75 68 GWAS discoveries Meisinger et al AJHG (2009) Soranzo et al, Blood (2009) • 145 loci discovered for red and white cell and platelet traits Nalls et al, PLoS Genet (2011) Gieger et al, Nature (2011) van der harst et al, Nature (2012) Gene functions Nearby genes enriched for relevant GO biological processes terms Gieger et al, Nature (2011) • haematopoiesis (FDR ≤ 1E-3; genes involved in the van der harst et al, Nature (2012) process are RUNX1, TAL1) Vasquez, Mann et al (in press) • immune system development (2E-3; IFl16, PTPRC) • oxygen transport (8E-2; HBQ1, HBA1) Knock-down of 6 genes resulted in a hematologic • HbF (BCL11A, HBB) phenotype Model organisms Gieger et al, Nature (2011) KO models for nearby genes display a hematological Serbanovic et al Blood (2011) phenotype Bielczyk-Maczyn´ska et al PLoS Genet • Zebrafish (p-value=0.03) (2014) • Fly (p-value=0.002) Regulatory and functional Tijssen et al, Dev Cell (2011) • Enrichment in open chromatin regions (p-value<10-3) Paul et al, PLoS Genet (2011) • Enrichment in hematopoietic functional maps (p-value<10-6) Nürnberg et al, Blood (2011) Paul et al Genome Res (2013) Disease associations Mendelian disorders: Enrichment with causative genes Vasquez, Mann et al (in press) (OMIM) unpublished Complex: Association with incident and prevalent ischemic stroke (p≤0.004) Increase statistical power for genetic discoveries (of rare variants) Characterise the extent to which variants/genes individually modulate the formation of individual cell types and risk of disease Annotate the putative functional consequences of variants and link regulatory elements to the genes they control Exploiting powerful UK population resources 1. Large scale population resources o Multivariate phenotypes o Environmental exposures (N=500,000) o Linkage to eHRs Epigenetics Protein Disease risk factors DNA Disease RNA Metabolites eHealth Serum lipids records Inflammatory NMR panel Hematology DIHRMS Iron/anemia Metabolon Infectious Questionnaire Clinic (N=50,000) o Serial and multivariate molecular phenotypes o Recall by genotype Willem Ouwehand, John Danesh, Dave Roberts Exploiting powerful UK population resources 2. Enhanced imputation reference panels UK10K + 1000GP (phase 3) • >70M variants imputed • Increases in imputation for low frequency and rare variants • Substantial increases in power <2.5% MAF • Comparison with WES in INTERVAL samples gives >90% concordance for rare variants Walter et al (in press) and Huang*, Howie* et al (in press) Exploiting powerful UK population resources 3. High quality, standardised phenotypes Adjust phenotypes to remove environmental variation due to: N=100,000 (of 150,000) • identifiable technical differences between measurements • biological variation UK Biobank, mean NEUT UK Biobank, mean NEUT 4.75 5 ● Accounting for environmental● variation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Menopausal status 4.50 Acquisition● Time● ● Effects● ● ● Block Time Series ) ● ● ●● ● ● ● ● ● 1 UK Biobank: Mean of WBC# ● ● ● ● ● ● - ● ● UKUK Biobank, Biobank, mean mean NEUT ● ● ● ● ● ● ● ● ) Instrument ID: AK30431 ● ● L ● ● ● ● ● 1 ● ● ● 9 - 9 ● ● ● ● ● ● ● ● ● ● 0 ● ● ● L ● ● ● ● ● ● 9 ● 1 ● ● ● ● ● ● ( ● Females 0 ● ● ● Males 4 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ● T ● ( ● ● 4.75 ●● MalesPost−menopausals U ● T ● ● E U ● 5 8 ● Pre−menopausals N E ● ● ● ● ● ● N ● ● ● ● ● ● ● ) ● ● 1 ● ● ● ● ● ● - ● ● ● ● ● ● L ● ● ● ● ● ● ● ● ● ● ● 4.25 9 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ●● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ( ●● 4.50 ● ● ● ● ● ● ● ) ● ● ● ) ● ● ● ● ● t ● ● ● ● 1 ● ● 1 ● ● ● ● - - ● ● ● n 7 ● ●●● ● ● ● ● ● ● ● ● L L ●● u ● ● ● ● ● ● ● 9 ● ●● ● 9 ● ● ● ● ● o ● ● ● ● ● ● 0 ● ● 0 ● ● ● ● ● Males c ● 4 ● ● ● ● ● ● 1 ● 1 ● ● ● ● ● ● ● l ● ● ( ( ● Females l ● ● e ● ● ● ● ● ● Post−menopausals T ● ● ● ● T ● c ● ● ●● ● ● ● ● ● Males U U ● ● ● ● ● ● ● ● d ● ●●● Pre−menopausals E 3 ● ● ● ● E o ● ● ● ● ● ● ● ● ● N ● ● N o ● ● ● ● ● l ● ● ● ● ● b ● ● ● ● ● ● ● ● ● ● ● ● ● ● e ● t 6 ● 4.25 i ● ● ● ● ● ● ● ● h ● ● ● ● W ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● 4.00 ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● 4.00 ● ● 2 2 40 15 20 50 25 30 60 35 70 4040 5050 6060 7070 40 Delay between50 venepuncture and acqusition (Hours)60 70 AgeAge at at Acquisition Acquisition AgeAge at Acquisition at Acquisition Time of Day Effects Acquisition Time Effects Periodic Effects with Annual Period UK Biobank: Mean of PLT# Instrument ID: AK26401 ● 300 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ) ● ●● ● 1 ● ● ●● ● ● ● ● - ● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● L ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● 9 ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● 0 ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ●● ●● ●●●● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ●●●●●● ● ● ● ● ● ●● 1 ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●●● ● ●●●● ● ● ●● ●●● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ( ● ● ● ● ● ●● ●● ●● ● ● ●●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●● ●● ● ●● ● ●● ● ●● ● ● ●● ●● ● ●●● ● ●● ● ● ● ●● t ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●●● ●● ● ● ●●●● ●● ● ● ● ●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ●●● ● ● n ● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● ●●●● ● ●●● ● ●● ● ● ●●● ● ● ● ● 250 ●● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ● ●● ● ● ● u ● ● ● ● ● ● ● ● ● ●● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ●●●●● ● ● ● ●● ●● ●● o ● ● ● ●● ● ● ●● ●●●●●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●● c ● ● ● ● ● ●●●●●●●● ●● ● ● ● ● ●● ● ●● ● ● ●● ● ●● t ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● e ● ●● ● ● ● ●● ● l ● ●●●● ● ● ● ● ● ● ●●● ● e ●● ●● ● t ● ● ● ● ● a ● l P ● ● ● 200 150 0 300 600 900 1200 Acquisition time (Days into study) William Astle, Heleen Bouman Accounting for environmental variation Days into Study Menopausal status • Technical and seasonal effects explain 16% of phenotypic variation • Environmental and biological effects explain 40% of phenotypic variation • Estimate power of study doubled by correction for phenotypic variation Study Design Genetic analysis Genotyping with Affymetrix Axiom arrays • 36 blood cell traits Sample and variant QC including platelets, Imputation to mature and immature UK10K+1000Genomes red blood cells, myeloid Project (phase 3) and lymphoid white Sample and variant QC blood cells • 29 million imputed Study-specific association analysis variants (>0.01% MAF, >0.4 INFO) Meta-analysis • Linear mixed model using BOLT-LMM, adjusted for Multiple Regression Analysis age, sex, clinic, menopause and the first 15 PCs LD-clumping • Meta-analysis using double Genomic Control in Annotation and integrative METAL analyses 6,736 Associations Discovered at 2,706 Independent Loci • 2,706 loci (p ≤ 8.31x10-9). 210 are low frequency (1-5% MAF) and 130 are rare (<1% MAF) • ~2,400 novel Previously reported Novel independent loci Non independent loci Cell, 2016, under revision GWAS in UK BioBank reveals hundreds of putative novel hematopoietic regulators • Analysis of 100,000 UK Biobank participants • 24 traits, log transformed • Linear regression, adjusted for gender, age and for 15 PCAs • 2,706 independent loci (distance-based) at P-value ≤ 9.3x10-10 • ~2550 novel, replication pending (N~100,000) Name p-value blood coagulation 1.44E-06 coagulation 1.53E-06 hemostasis 1.60E-06 wound healing 4.41E-06 Name p-value transcription cofactor activity 6.61E-06 iron ion transmembrane transporter activity