Thousands of genetic variants modulate blood cell variation and function in humans

David J Roberts University of Oxford and NHSBT William J. Astle* , Heather Elding*, Tao Jiang*, Dave Allen, Dace Ruklisa , Heleen Bouman, Fernando Riveros-Mckay, Alice L. Mann, Daniel Mead, Myrto A. Kostadima, John J. Lambourne , Suthesh Sivapalaratnam , Kate Downes, Kousik Kundu, Lorenzo Bomba, Kim Berentsen, John R. Bradley, Louise C. Daugherty, Olivier Delaneau, Stephen F. Garner, Luigi Grassi, Matthias Haimel, Eva M. Janssen-Megens, Anita Kaan, Mihir Kamat, Bowon Kim, Amit Mandoli, Jonathan Marchini, Joost H.A. Martens, Stuart Meacham, Karyn Megy, Jared O’Connell, Romina Petersen, Nilofar Sharifi, Simon M. Sheard, James R. Staley, Salih Tuna, Martijn van der Ent, Shuang-Yin Wang, Eleanor Wheeler, Steven P. Wilder, Valentina Iotchkova , Carmel Moore, Jennifer Sambrook, Hendrik G. Stunnenberg, Emanuele Di Angelantonio, Stephen Kaptoge, Taco W. Kuijpers, Mattia Frontini, John Danesh §, David J. Roberts §, Willem H. Ouwehand §, Adam S. Butterworth§, Nicole Soranzo§

GWAS studies

• Genome wide association studies are potentially a powerful way to determine the association of disease or phenotype with genetic traits • Discovery of novel large effect sizes now unlikely but associations may define pathophysiological pathways and therapeutic possibilities • Notable successes in SCD – association of BCL11A as negative regulator of HbF Haematological disorders

• Acquired and inherited haematological diseases are of global importance to public health • Global anemia prevalence in 2010 was 32.9% causing ~70 million years lived with disability – Over one billion people suffer from iron deficiency anaemia – Haemoglobinopathies are the most common monogenic diseases and have many unknown genetic modifying factors • Variation in platelet function and number causes abnormal clotting and may contribute to cardiovascular disease • Variation in neutrophil and monocyte function changes susceptibility to infection and inflammatory conditions Summary of findings from previous genetic mapping efforts

Soranzo et al, Nat Genet (2009) Ganesh et al Nat Genet (2009) 75 68 GWAS discoveries Meisinger et al AJHG (2009) Soranzo et al, Blood (2009) • 145 loci discovered for red and white cell and platelet traits Nalls et al, PLoS Genet (2011) Gieger et al, Nature (2011) van der harst et al, Nature (2012) functions Nearby enriched for relevant GO biological processes terms Gieger et al, Nature (2011) • haematopoiesis (FDR ≤ 1E-3; genes involved in the van der harst et al, Nature (2012) process are RUNX1, TAL1) Vasquez, Mann et al (in press) • immune system development (2E-3; IFl16, PTPRC) • oxygen transport (8E-2; HBQ1, HBA1) Knock-down of 6 genes resulted in a hematologic • HbF (BCL11A, HBB) phenotype Model organisms Gieger et al, Nature (2011) KO models for nearby genes display a hematological Serbanovic et al Blood (2011) phenotype Bielczyk-Maczyn´ska et al PLoS Genet • Zebrafish (p-value=0.03) (2014) • Fly (p-value=0.002)

Regulatory and functional Tijssen et al, Dev Cell (2011) • Enrichment in open chromatin regions (p-value<10-3) Paul et al, PLoS Genet (2011) • Enrichment in hematopoietic functional maps (p-value<10-6) Nürnberg et al, Blood (2011) Paul et al Genome Res (2013)

Disease associations Mendelian disorders: Enrichment with causative genes Vasquez, Mann et al (in press) (OMIM) unpublished Complex: Association with incident and prevalent ischemic stroke (p≤0.004) Increase statistical power for genetic discoveries (of rare variants)

Characterise the extent to which variants/genes individually modulate the formation of individual cell types and risk of disease

Annotate the putative functional consequences of variants and link regulatory elements to the genes they control Exploiting powerful UK population resources

1. Large scale population resources o Multivariate phenotypes o Environmental exposures (N=500,000) o Linkage to eHRs

Epigenetics Disease risk factors

DNA Disease

RNA Metabolites eHealth Serum lipids records Inflammatory NMR panel Hematology DIHRMS Iron/anemia Metabolon Infectious Questionnaire Clinic (N=50,000)

o Serial and multivariate molecular phenotypes o Recall by genotype Willem Ouwehand, John Danesh, Dave Roberts Exploiting powerful UK population resources

2. Enhanced imputation reference panels

UK10K + 1000GP (phase 3)

• >70M variants imputed

• Increases in imputation for low frequency and rare variants

• Substantial increases in power <2.5% MAF

• Comparison with WES in INTERVAL samples gives >90% concordance for rare variants

Walter et al (in press) and Huang*, Howie* et al (in press) Exploiting powerful UK population resources

3. High quality, standardised phenotypes

Adjust phenotypes to remove environmental variation due to: N=100,000 (of 150,000) • identifiable technical differences between measurements • biological variation UK Biobank, mean NEUT

UK Biobank, mean NEUT

4.75

5

● Accounting for environmental● variation ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Menopausal status 4.50 Acquisition● Time● ● Effects● ● ● Block Time Series ) ● ● ●● ● ● ● ● ● 1 UK Biobank: Mean of WBC# ● ● ● ● ● ● - ● ● UKUK Biobank, Biobank, mean mean NEUT ● ● ● ● ● ● ● ● ) Instrument ID: AK30431 ● ● L ● ● ● ● ●

1 ● ● ● 9

- 9 ● ● ● ● ● ● ● ● ● ● 0 ● ● ● L ● ● ● ● ● ● 9 ● 1 ● ● ● ● ● ●

( ● Females 0 ● ● ● Males 4 ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ●

T ● ( ● ● 4.75 ●● MalesPost−menopausals U ●

T ● ● E

U ● 5

8 ● Pre−menopausals N E ● ● ● ● ● ● N ● ● ● ● ● ● ● ) ● ● 1 ● ● ● ● ● ● - ● ● ● ● ● ●

L ● ● ● ● ● ● ● ● ● ● ● 4.25 9 ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 ● ● ● ● ● ●● ● ● ● ● ● ● ● 1 ● ● ● ● ● ● ( ●● 4.50 ● ● ● ● ●

● ● ) ● ● ● ) ● ● ● ● ●

t ● ● ● ● 1

● ● 1 ● ● ● ● - - ● ● ● n 7 ● ●●● ● ●

● ● ● ● ● ● L L ●●

u ● ● ● ● ● ● ● 9 ● ●● ● 9 ● ● ● ● ●

o ● ● ● ● ● ● 0 ● ● 0 ● ● ● ● ● Males

c ● 4 ● ● ● ● ● ●

1 ● 1 ● ● ● ● ● ● ●

l ● ● (

( ● Females

l

● ●

e ● ● ● ● ● ● Post−menopausals T ● ● ● ● T ● c ● ● ●● ● ● ●

● ● Males U U ● ● ● ● ● ● ● ●

d ● ●●● Pre−menopausals E 3 ● ● ● ● E o ● ● ● ● ●

● ● ● ● N ● ● N o ● ● ● ● ● l ● ● ● ● ● b ● ● ● ● ● ● ● ● ● ● ● ● ● ● e ●

t 6 ● 4.25 i ● ● ● ● ● ● ● ●

h ● ● ● ● W ● ● ● ● ● ● ● ● ● ● ● ● ● ● 3 ● ● ● ● ● ● ● ● 4.00 ● ● ● ● ● ● ● ● ● 5 ● ● ● ● ● ● ● 4.00 ● ●

2 2 40 15 20 50 25 30 60 35 70 4040 5050 6060 7070 40 Delay between50 venepuncture and acqusition (Hours)60 70 AgeAge at at Acquisition Acquisition AgeAge at Acquisition at Acquisition Time of Day Effects Acquisition Time Effects Periodic Effects with Annual Period UK Biobank: Mean of PLT# Instrument ID: AK26401

300 ●

● ●● ● ● ● ● ● ● ● ● ● ● ●●● ) ● ●● ●

1 ● ● ●● ● ● ● ● - ● ●● ● ● ●●● ● ● ●●● ● ● ● ● ● ●● ● ● ● ●● ●● ●● ● ● ●● ● ● ● L ● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ●● ● ● 9 ● ● ● ● ● ●●● ●● ● ●● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ●●●● ●● ● ● ● ● ● ● ●● ●● ● ● 0 ● ● ● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●●● ●● ● ●● ●● ●●●● ●●● ● ●● ● ● ●● ● ● ● ● ● ● ●●●●●● ● ● ● ● ● ●● 1 ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ●●● ● ●●●● ● ● ●● ●●● ● ● ● ●● ●● ● ●● ● ● ●● ●● ● ● ●● ● ● ● ● ● ( ● ● ● ● ● ●● ●● ●● ● ● ●●●● ● ● ● ● ● ● ●●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●● ●● ● ●● ● ●● ● ●● ● ● ●● ●● ● ●●● ● ●● ● ● ● ●● t ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ●●●● ●● ● ● ●●●● ●● ● ● ● ●●●● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ●●●● ● ● ● ●● ● ● ●● ● ●● ●● ● ● ●● ●●● ● ● n ● ● ● ● ●● ● ● ● ●●● ●● ●● ● ● ●●●● ● ●●● ● ●● ● ● ●●● ● ● ● ● 250 ●● ● ● ● ●● ● ● ●●● ● ● ● ● ●● ● ●● ● ●● ● ● ● u ● ● ● ● ● ● ● ● ● ●● ●●● ●● ●● ● ● ●● ● ● ● ● ● ● ●●●●● ● ● ● ●● ●● ●● o ● ● ● ●● ● ● ●● ●●●●●● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ●●

c ● ● ● ● ● ●●●●●●●● ●●

● ● ● ● ●● ● ●● ● ● ●● ● ●● t ● ● ● ● ● ●●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● e ● ●● ● ● ● ●● ● l ● ●●●● ● ● ● ● ● ● ●●● ● e ●● ●● ●

t ● ● ● ● ●

a ● l

P ● ● ●

200

150 0 300 600 900 1200 Acquisition time (Days into study) William Astle, Heleen Bouman Accounting for environmental variation

Days into Study Menopausal status

• Technical and seasonal effects explain 16% of phenotypic variation • Environmental and biological effects explain 40% of phenotypic variation • Estimate power of study doubled by correction for phenotypic variation Study Design Genetic analysis Genotyping with Affymetrix Axiom arrays

• 36 blood cell traits Sample and variant QC including platelets, Imputation to mature and immature UK10K+1000Genomes red blood cells, myeloid Project (phase 3) and lymphoid white Sample and variant QC blood cells • 29 million imputed Study-specific association analysis variants (>0.01% MAF, >0.4 INFO) Meta-analysis

• Linear mixed model using BOLT-LMM, adjusted for Multiple Regression Analysis age, sex, clinic, menopause and the first 15 PCs LD-clumping • Meta-analysis using double Genomic Control in Annotation and integrative METAL analyses 6,736 Associations Discovered at 2,706 Independent Loci • 2,706 loci (p ≤ 8.31x10-9). 210 are low frequency (1-5% MAF) and 130 are rare (<1% MAF) • ~2,400 novel Previously reported Novel independent loci

Non independent loci

Cell, 2016, under revision GWAS in UK BioBank reveals hundreds of putative novel hematopoietic regulators

• Analysis of 100,000 UK Biobank participants • 24 traits, log transformed • Linear regression, adjusted for gender, age and for 15 PCAs • 2,706 independent loci (distance-based) at P-value ≤ 9.3x10-10 • ~2550 novel, replication pending (N~100,000)

Name p-value blood coagulation 1.44E-06 coagulation 1.53E-06 hemostasis 1.60E-06 wound healing 4.41E-06

Name p-value transcription cofactor activity 6.61E-06 iron ion transmembrane transporter activity 9.19E-06

Name p-value cytokine receptor activity 4.83E-09 G-protein coupled chemoattractant receptor activity 9.83E-06 chemokine receptor activity 9.83E-06

Name p-value negative regulation of T cell mediated immunity 6.18E-05

William Astle, Heleen Bouman, Tao Jiang, Adam Butterworth, Heather Elding Reticulocyte count GWAS

Highly significant associations (P-value <10-35) for: • structural proteins (alpha spectrin and ankyrin 1) that modulate cytoskeletal architecture and integrity • hexokinase 1 and PEX12 (an integral membrane protein of peroxisomes) that influence red cell metabolism • guanine nucleotide binding protein (G protein) and to SH2B adaptor protein 3 in red cell signalling

Dave Roberts Reticulocyte count GWAS

Additional retic loci with actin • APOE Apolipoprotein E • CD2AP CD2 associated protein • DYNLL1 Dynein, light chain, • FLNB Filamin B • MAST2 Microtubule associated serine/threonine kinase 2

Dave Roberts Reticulocyte count GWAS

Additional retic loci assoc with actin

• APOE Apolipoprotein E • CD2AP CD2 associated protein • DYNLL1 Dynein, light chain, • FLNB Filamin B • MAST2 Microtubule associated serine/threonine kinase 2

Additional retic loci

• KIF1B Kinesin family member 1B • MARK3 Microtubule affinity regulating kinase 3 • MARK4 Microtubule affinity regulating kinase 4 • MYH9 Myosin, heavy chain 9, non-muscle • PACSIN2 Protein kinase C and casein kinase substrate in neurons 2 Dave Roberts

Reticulocyte count GWAS

Highly significant associations …….

• AIMP2 part of the aminoacyl-tRNA synthetase complex • AP2B1 Adaptor protein complex 1 is found at the cytoplasmic face of coated vesicles located at the Golgi complex, where it mediates both the recruitment of clathrin to the membrane and the recognition of sorting signals within the cytosolic tails of transmembrane receptors • APBB1IP amyloid beta (A4) precursor protein-binding, family B, member 1 interacting protein • ARHGAP42 • ARL15 ADP-ribosylation factor-like 15 • ATP11A the protein encoded by this gene is an integral membrane ATPase. The encoded protein is probably phosphorylated in its intermediate state and likely drives the transport of ions such as calcium across membranes.

Dave Roberts Iron pathways in the RBC GWAS

Highly significant associations (P-value <10-35) for at least 7 components of iron handling pathways

Dave Allen Strong Relationship Between Allele Frequency and Effect Size, Indicative of Selection Variant Function is Associated with Allele Frequency

β=-0.45 β=0.36 β=-0.03 β=-0.02 Variants Affecting Blood Cell Traits Co-localises with Various QTLs • BLUEPRINT data for eQTL, hQTL –marking active or poised enhancers and spliceQTL in neutrophils, monocytes and T-cells

• SMR/HEIDI test for co- localisation

• While some variants showed co-localisation, many others in the same region do not

• Suggests variants may identify additional regulatory regions E P Non-Erythroid

Erythroid Summary

• High-resolution imputation and large population-scale meta- analysis identifies 2,706 loci associated with blood-cell traits increasing known associations nearly tenfold

• Many rare variants with strong effect sizes and consequences

• Integration of cell-type specific data points to regulatory mechanisms

• Identified traits and diseases with shared genetic predictors suggesting causal links

Acknowledgments

Tao Jiang Dave Allen Heather Elding John Danesh David Roberts Heleen Bouman Adam Butterworth Daniel Mead Dace Ruklisa Nicole Soranzo William Astle Willem Ouwehand Differential impact of genetic determinants of glycated on type 2 1 diabetes risk and 2 diagnosis in ancestrally diverse populations

or Why using HbA1c to diagnose T2D requires understanding of haemolysis MAGIC STUDY ACKNOWLEDGEMENTS

Eleanor Wheeler*, Aaron Leong*, Ching-Ti Liu, Marie-France Hivert, Rona J. Strawbridge, Clara Podmore, Man Li, Jie Yao, Xueling Sim, Jaeyoung Hong, Audrey Y. Chu, Weihua Zhang, Xu Wang, Peng Chen, Nisa M. Maruthur, Bianca C. Porneala, Stephen J . Sharp, Yucheng Jia, Edmond K. Kabagambe, Li-Ching Chang, Wei-Min Chen, Daniel S. Evans, Qiao Fan, Franco Giulianini, Min Jin Go, Jouke-Jan Hottenga, Yao Hu, Anne U. Jackson, Stavroula Kanoni, Young Jin Kim, Marcus E. Kleber, Claes Ladenvall, Cecile Lecoeur, Sing-Hui Lim, Yingchang Lu, Anubha Mahajan, Carola Marzi, Mike A. Nalls, Ilja M. Nolte, Lynda M. Rose, Denis V. Rybin, Serena Sanna, Yuan Shi, Daniel O. Stram, Fumihiko Takeuchi, Shu Pei Tan, Peter J. van der Most, Jana V. Van Vliet-Ostaptchouk, Loic Yengo, Wanting Zhao, Anuj Goel, Maria Teresa Martinez Larrad , Dörte Radke, Perttu Salo, Erik P.A . van Iperen, Goncalo Abecasis, Saima Afaq, Behrooz Z. Alizadeh, Beverley Balkau, Alain G. Bertoni, Amelie Bonnefond, Yvonne Böttcher, Erwin P. Bottinger, Harry Campbell, Chien-Hsiun Chen, Yoon Shin Cho, Mary Cushman, Cathy E. Elks, Christian Gieger, Mark O. Goodarzi, Harald Grallert, Anders Hamsten, Catharina A. Hartman, Christian Herder, Chao Agnes. Hsiung, Jie Huang, Michiya Igase, Masato Isono, Tomohiro Katsuya, Chiea-Chuen Khor, Wieland Kiess, Katsuhiko Kohara, Peter Kovacs, Juyoung Lee, Wen-Jane Lee, Benjamin C. Lehne, Huaixing Li, Jianjun Liu, Stephane Lobbens, Jian'an Luan, Valeriya Lyssenko, Thomas Meitinger, Tetsuro Miki, Iva Miljkovic, Sanghoon Moon, Antonella Mulas, Gabriele Müller , Martina Müller-Nurasyid, Ramaiah Nagaraja, Matthias Nauck, James S. Pankow, Ozren Polasek, Inga Prokopenko, Laura Rasmusen-Torvik, Wolfgang Rathmann, Stephen S. Rich, Neil R. Robertson, Michael Roden, Igor Rudan, Robert A. Scott, William R. Scott, Bengt Sennblad, David S. Siscovik, Konstantin Strauch, Liang Sun, Morris Swertz, Salman M. Tajuddin, Kent D. Taylor, Yik-Ying Teo, Yih Chung Tham, Anke Tönjes, Nicholas J. Wareham, Gonneke Willemsen, Tom Wilsgaard , Lifelines Cohort Study , EPIC-CVD Consortium, EPIC-InterAct Consortium, G.Kees Hovingh, Antti Jula, Inger Njølstad, Colin N.A. Palmer, Manuel Serrano Ríos , Michael Stumvoll, Hugh Watkins, Tin Aung, Matthias Blüher, Michael Boehnke, Dorret I. Boomsma, Stefan R. Bornstein , John C. Chambers, Daniel I. Chasman, Yii-Der Ida Chen, Yuan-Tsong Chen, Ching-Yu Cheng, Francesco Cucca, Eco J.C. de Geus , Panos Deloukas, Michele K. Evans, Myriam Fornage, Yechiel Friedlander, Philippe Froguel, Leif Groop, Myron D. Gross, Tamara B. Harris, Caroline Hayward, Chew-Kiat Heng, Erik Ingelsson, Norihiro Kato, Bong-Jo Kim, Woon-Puay Koh, Jaspal S. Kooner, Antje Körner, Johanna Kuusisto, Markku Laakso, Xu Lin, Yongmei Liu, Ruth J.F. Loos, Patrik K.E. Magnusson, Winfried März, Mark I. McCarthy, Albertine J. Oldehinkel, Nancy L. Pedersen, Mark A. Pereira, Annette Peters, Paul M. Ridker, Charumathi Sabanayagam, Michele Sale, Danish Saleheen, Juha Saltevo, Peter EH. Schwarz , Wayne H.H. Sheu, Harold Snieder, Timothy D. Spector, Yasuharu Tabara, Jaakko Tuomilehto, Rob M. van Dam, James G. Wilson, James F. Wilson, Bruce HR. Wolffenbuttel, Tien Yin Wong, Jer-Yuarn Wu, Jian-Min Yuan, Alan B. Zonderman, Nicole Soranzo, Xiuqing Guo, David J. Roberts, Jose C. Florez, Robert Sladek, Josée Dupuis, Andrew P. Morris, E-Shyong Tai, Elizabeth Selvin, Jerome I. Rotter, Claudia Langenberg, Inês Barroso‡, James B. Meigs‡ HbA1c and haemolysis

• Identified common genetic variants associated 55 with HbA1c using GWAS in 159,940 individuals from 82 cohorts of European, African and Asian ancestry • Nineteen glycemic and 21 erythrocytic variants were associated with HbA1c at genome-wide significance • Estimated the impact of erythrocytic variants on T2D classification using HbA1c≥6.5% (N=19,628) in individuals without known T2D • Tested whether additive genetic scores of erythrocytic variants (GS-E) or glycemic variants (GS-G) were associated with higher T2D incidence in multi-ethnic longitudinal cohorts (N=33,241).

European Trans-ethnic Effect Other ancestry Markername Chr. Position (bp) Gene Status Signals Classification MANTRA Allele Allele METAL log10BF p-value

rs2375278 1 25401625 A G SYF2 Novel Single Unclassified 2.03E-07 6.93 rs267738 1 149207249 T G CERS2 Novel Single Unclassified 2.59E-09 6.41 rs12132919 1 154584765 A C TMEM79 Known Single Erythrocytic 0.0169 10.08 rs857691 1 156893002 T C SPTA1 Known Single Erythrocytic 3.97E-25 25.52 rs17509001 2 23874735 C T ATAD2B Novel Single Unclassified 1.94E-15 13.30 rs12621844 2 48268239 T C FOXN2 Novel Single Unclassified 1.87E-08 5.32 rs13387347 2 169463092 T C G6PC2 Known Multiple Glycemic 0.308 5.77 rs560887 2 169471394 C T G6PC2 Known Multiple Glycemic 1.48E-58 55.77 rs17256082 2 175000610 C T SCRN3 Novel Single Unclassified 0.00112 6.27 rs7616006 3 12242648 A G SYN2 Novel Single Erythrocytic 5.07E-10 10.16 rs9818758 3 49357929 A G USP4 Novel Single Unclassified 7.74E-10 7.20 rs11708067 3 124548468 A G ADCY5 Novel Single Glycemic 1.42E-12 10.62 rs8192675 3 172207577 T C SLC2A2 Novel Single Glycemic 1.38E-11 10.33 rs4894799 3 173278234 A G FNDC3B Novel Single Unclassified 1.80E-06 6.05 rs13134327 4 144879245 A G FREM3 Novel Single Glycemic 2.64E-15 12.66 rs11954649 5 156988069 G C SOX30 Novel Single Unclassified NA 6.20 rs7756992 6 20787688 G A CDKAL1 Known Single Glycemic 2.80E-12 16.53 rs1800562 6 26201120 G A HFE Known Multiple Erythrocytic 4.67E-28 26.81 rs198846 6 26215442 G A HFE Known Multiple Erythrocytic 1.18E-23 23.72 rs11964178 6 109668728 A G C6orf183 Novel Single Erythrocytic 6.38E-10 7.03 rs11154792 6 135473333 T C MYB Known Single Erythrocytic 7.45E-18 17.89 rs592423 6 139882386 A C CITED2 Novel Single Erythrocytic 3.96E-08 4.50 rs2191349 7 15030834 T G DGKB Novel Single Glycemic 2.09E-07 6.63 rs4607517 7 44202193 A G GCK Known Multiple Glycemic 8.76E-38 51.28 rs3824065 7 44213783 C T GCK Known Multiple Glycemic 4.22E-35 31.87 rs6474359 8 41668351 T C ANK1 Known Multiple Unclassified 1.50E-16 14.88 rs4737009 8 41749562 A G ANK1 Known Multiple Erythrocytic 4.48E-27 32.08 rs6980507 8 42502241 A G SLC20A2 Novel Single Erythrocytic 3.58E-08 8.73 rs11558471 8 118254914 A G SLC30A8 Known Single Glycemic 1.38E-19 23.26 rs2383208 9 22122076 A G MTAP Novel Single Glycemic 7.04E-12 11.74 rs7040409 9 90693056 C G C9orf47 Novel Single Erythrocytic 2.56E-14 11.29 rs1467311 9 109576753 G A KLF4 Novel Single Unclassified 2.09E-07 8.72 rs579459 9 135143989 C T ABO Novel Single Glycemic 9.42E-09 10.14 rs4745982 10 70759849 T G HK1 Known Multiple Erythrocytic 2.87E-65 63.05 rs10823343 10 70761019 A G HK1 Known Multiple Unclassified 1.68E-55 49.45 rs17747324 10 114742493 C T TCF7L2 Known Single Glycemic 6.12E-11 8.49 rs3782123 11 195198 C A BET1L Novel Single Unclassified 1.51E-10 9.51 rs2237896 11 2815016 G A KCNQ1 Novel Single Glycemic 0.00246 6.07 rs174577 11 61361390 C A FADS2 Novel Single Glycemic 5.45E-07 8.45 rs11603334 11 72110633 G A ARAP1 Novel Single Glycemic 6.85E-09 6.53 rs10830963 11 92348358 G C MTNR1B Known Single Glycemic 2.23E-23 26.64 rs11224302 11 99961814 C T CNTN5 Novel Single Erythrocytic 4.76E-07 6.40 rs2110073 12 6946143 T C PHB2 Novel Single Unclassified 4.44E-08 7.18 rs2408955 12 46785398 T G SENP1 Novel Single Erythrocytic 1.42E-15 11.65 rs10774625 12 110394602 G A ATXN2 Novel Single Erythrocytic 1.46E-08 6.38 rs11619319 13 27385599 G A PDX1 Novel Single Glycemic 4.58E-07 8.38 rs576674 13 32452302 G A KL Novel Single Glycemic 1.39E-05 6.38 rs282587 13 112399663 G A ATP11A Known Single Unclassified 1.70E-12 13.92 rs9604573 13 113571085 T C GAS6 Novel Single Unclassified 9.60E-09 6.72 rs11248914 16 233563 T C ITFG3 Novel Single Erythrocytic 2.56E-14 10.60 rs1558902 16 52361075 A T FTO Novel Single Unclassified 3.27E-08 6.88 rs4783565 16 67307691 A G CDH3 Novel Single Erythrocytic 1.73E-07 6.73 rs837763 16 87381230 T C CDT1 Known Single Erythrocytic 1.68E-28 28.89 rs9914988 17 24207230 A G ERAL1 Novel Single Erythrocytic 2.77E-11 11.34 rs2073285 17 73628956 C T TMC6 Novel Single Unclassified 1.27E-04 6.47 rs1046896 17 78278822 T C FN3KRP Known Single Unclassified 4.46E-64 71.79 rs11086054 19 17107737 A T MYO9B Known Multiple Unclassified 8.16E-06 9.12 rs17533903 19 17117523 A G MYO9B Known Multiple Erythrocytic 5.27E-12 9.912 rs4820268 22 35799537 G A TMPRSS6 Known Single Erythrocytic 1.40E-22 20.79 rs5987239 X 153186763 C G G6PD Novel Single Erythrocytic NA 50.60 Mean of individuals at the bottom 5% and top 5% of the distribution of ancestry-specific genetic scores and rs1050828 by genotype.

Mean glycated hemoglobin of individuals at the bottom 5% and top 5% of the distribution of ancestry-specific genetic scores and rs1050828 by genotype.

In African Americans, only the G6PD G202A variant (T allele frequency 11%) reached P<2.5x10-8; this was associated with an decrease in HbA1c of 0.81 %-units (95%CI 0.66-0.96) per allele in hemizygous men, 0.68 %-units (95%CI 0.38-0.97) in homozygous women Underdiagnosis of T2D in G6PD

G6PD ______NORMAL ……

HbA1c reduced by 0.8% In G6PD

T2D cutoff 6.5%

Non - diagnosed true T2D

HbA1c In USA 800,000 underdiagnosed T2D in G6PD

G6PD ______NORMAL ……

HbA1c reduced by 0.8% In G6PD

T2D cutoff 6.5%

Non - diagnosed true T2D

HbA1c Genomics and G6PD

In European and Asian ancestral groups, HbA1c- associated variants in aggregate had modest effects on T2D classification

In African Americans, a single variant in G6PD would cause approximately 2%, or 860,000 individuals with T2D, to remain undiagnosed when screened with HbA1c if genetic information were ignored

Summary

• High-resolution imputation and large population-scale meta- analysis identifies 2,706 loci associated with blood-cell traits increasing known associations nearly tenfold

• Many rare variants with strong effect sizes and consequences

• Identified traits and diseases with shared genetic predictors suggesting causal links

• G6PD may substantially modify HbA1c level and cause misdiagnosis of type 2 diabetes

• Large scale genomics and large teams may now produce biologically and medically significant results