1 Uganda Genome Resource Enables Insights

Uganda Genome Resource enables insights into population history and genomic discovery in Africa Gurdasani D.*1, Carstensen T.2* , Fatumo S.* 3,4,5, Chen G.*6, Franklin CS.*2, Prado-Martinez J.* 2, Bouman H.* 2, Abascal F. 2 , Haber M. 2, Tachmazidou I. 2, Mathieson I.7, Ekoru K.8, DeGorter MK. 9, Nsubuga RN.8, Finan C. 2, Wheeler E. 2, Chen L.2, Cooper DN.10, Schiffels S.11, Chen Y. 2, Ritchie GRS.2, Pollard MO. 2, Fortune MD. 2, Mentzer AJ.12, Garrison E. 2, Bergström A.2, Hatzikotoulas K.2, Adebowale A. 6, Doumatey A. 4, Elding H. 2, Wain LV.13,14, Ehret G.15,16, Auer PL.17, Kooperberg CL.18 , Reiner AP.19,20, Franceschini N.21, Maher DP.6, Montgomery SB.7,22, Kadie C.23, Widmer C.24, Xue Y.2, Seeley J. 6,3 , Asiki G. 8 , Kamali A. 8, Young EH. 25,2, Pomilla C. 25, Soranzo N. 2,26,27, Zeggini E. 28, Pirie F.29, Morris AP.30,12, Heckerman D.24, Tyler-Smith C2‡, Motala A.29‡ , Rotimi C6‡, Kaleebu P. ‡3,4,8, Barroso I.‡31,2, Sandhu MS.23 ‡ *these authors contributed equally ‡ these authors contributed equally Lead Contact: Manjinder Sandhu [email protected] 1 William Harvey Research Institute, Queen Mary’s University of London, London, UK 2 Wellcome Sanger Institute, Hinxton, Cambridge, UK 3 London School of Hygiene and Tropical Medicine, London, UK 4 Uganda Medical Informatics Centre (UMIC), MRC/UVRI and LSHTM (Uganda Research Unit), Entebbe-Uganda 5 H3Africa Bioinformatics Network (H3ABioNet) Node, Center for Genomics Research and Innovation (CGRI)/ National Biotechnology Development Agency CGRI/NABDA, Abuja, Nigeria 6 Center for Research on Genomics and Global Health, National Institute of Health, USA 7 Perelman School of Medicine, University of Pennsylvania, Philadelphia, USA 8 Medical Research Council/Uganda Virus Research Institute (MRC/UVRI) Uganda Research Unit on AIDS, Uganda 9 Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA 10 Institute of Medical Genetics, School of Medicine, Cardiff University, Cardiff, UK 11 Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany 12 The Wellcome Centre for Human Genetics, University of Oxford, Oxford, UK 13 Department of Health Sciences, University of Leicester, Leicester, UK 14 National Institute for Health Research, Leicester Biomedical Research Centre, Glenfield Hospital, Leicester, UK 15 McKusick-Nathans Institute of Genetic Medicine Johns Hopkins University School of Medicine Baltimore MD, USA 16 Geneva University Hospitals, Rue Gabrielle-Perret-Gentil,41211 Genève 14, Switzerland 17 Zilber School of Public Health, University of Wisconsin-Milwaukee, Milwaukee WI, USA 18 Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle WA, USA 19 Department of Epidemiology, University of Washington, Seattle WA, USA 20 Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle WA, USA 21 Department of Epidemiology, University of North Carolina, Chapel Hill, NC, USA 22 Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA 23 Microsoft Research, Redmond, CA, USA 24 Microsoft Research, Los Angeles, CA, USA 25 Department of Medicine, University of Cambridge, Cambridge, UK 26 Department of Haematology, University of Cambridge, Cambridge, UK 27 The National Institute for Health Research Blood and Transplant Unit (NIHR BTRU) in Donor Health and Genomics, University of Cambridge, Cambridge, UK 28 Institute of Translational Genomics, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany 29 Department of Diabetes and Endocrinology, University of KwaZulu-Natal, Durban, South Africa 30 Department of Biostatistics, University of Liverpool, Liverpool, UK 31 MRC Epidemiology Unit, University of Cambridge, Cambridge, UK 1 Corresponding authors: Manjinder Sandhu [email protected] Inês Barroso [email protected] Charles Rotimi [email protected] Ayesha Motala [email protected] Chris Tyler-Smith [email protected] Pontiano Kaleebu [email protected] 2 Abstract Genomic studies in African populations provide unique opportunities to understand disease aetiology, human diversity and population history. In the largest study of its kind, comprising genome-wide data from 6,400 individuals, and whole-genome sequences from 1,978 individuals from rural Uganda, we find evidence of geographically-correlated fine-scale population substructure. Historically, the ancestry of modern Ugandans is best represented by a mixture of ancient East African pastoralists. We demonstrate the value of the largest sequence panel from Africa to date as an imputation resource. Examining 34 cardiometabolic traits, we show systematic differences in trait heritability between European and African populations, probably reflecting the differential impact of genes and environment. In a multi-trait pan-African GWAS of up to 14,126 individuals, we identify novel loci associated with anthropometric, haematological, lipid and glycemic traits. We find that several functionally important signals are driven by Africa-specific variants, highlighting the value of studying diverse populations across the region. Introduction Africa is central to our understanding of human origins, genetic diversity and disease susceptibility.(Tishkoff et al., 2009) The marked genomic diversity and allelic differentiation among populations in Africa, in combination with the substantially lower linkage disequilibrium (correlation) among genetic variants, has the potential to provide new opportunities to understand disease aetiology relevant to African populations but also globally.(Tishkoff et al., 2009, Gurdasani D., 2014) Consequently, there is a clear scientific and public health need to develop large-scale efforts that examine disease susceptibility across diverse populations within the African continent. Such efforts will need to be fully integrated with research-capacity-building initiatives across the region.(Consortium, 2014) Countries in Africa are undergoing epidemiological transitions—with a high burden of endemic infectious disease and growing prevalence of non-communicable diseases.(Organisation, 2015) Importantly, because of varying environments, population history, and adaptive evolution, the distribution of risk factors for a broad range of cardiometabolic and infectious diseases, and their individual contributions, may differ among populations globally.(Campbell and Tishkoff, 2010) Differences in allele frequencies among populations, due to either selection or genetic drift provide unique opportunities to identify novel disease susceptibility loci; highlighting the value of conducting such studies in African populations. However, while there has been a recent increase in genetic studies of cardiometabolic traits including African-Americans,(Peprah et al., 2015, Lanktree et al., 2015) there have been relatively few investigations of population diversity or the genetic determinants of cardiometabolic or infectious traits and diseases across the continent. To conduct genetic studies in diverse populations across Africa, appropriate study designs that account for population structure, admixture and genetic relatedness (overt and cryptic), as well as the development of genetic tools to capture variation in African genomes, are needed.(Gurdasani D., 2014) To leverage the relative benefits of different strategies, we undertook a combined approach of genotyping and low coverage whole-genome sequencing (WGS) in a population-based study of 6,400 individuals from a geographically defined rural community in South-West Uganda (Figure 1a, STAR Methods, Figure 1, Figure S1 and Table S1). We present data from 4,778 individuals with genotypes for ~2.2 million SNPs from the Uganda genome-wide association study (UGWAS) resource (STAR 3 Methods), and sequence data (STAR Methods, Table S1.1) on up to 1,978 individuals including 41.5M SNPs and 4.5M indels (UG2G) (Figure S1, Table S1.1, and STAR Methods). Collectively, these data represent the Uganda Genome Resource (UGR). To enhance trait-associated locus discovery, we also include collective data on up to 14,126 individuals from across the African continent for genome wide association analysis (STAR Methods). Using these resources, we conducted a series of analyses to: 1) understand the population structure, admixture and demographic history in a geographically-defined population from Uganda (STAR Methods); 2) describe the spectrum of disease-causing mutations in the UG2G cohort (STAR Methods); and 3) highlight the value of the UG2G sequence panel as an imputation resource (STAR Methods). 4) refine estimates of heritability of 34 complex traits, accounting for environmental correlation among individuals (STAR Methods); and 5) assess the spectrum of genetic variants associated with cardiometabolic and other complex traits in populations from sub-Saharan Africa (STAR Methods). Importantly, the UGR was designed to help develop local resources for public health and genomic research, including building research capacity, training and collaboration across the region. We envisage that data from these studies will provide a global resource for researchers, as well as facilitate genetic studies in African populations. Results A history of Ugandan ethnic diversity Uganda has a diverse and complex history of extensive historical migration from surrounding regions over several hundred years. Migration has included economic migration for labour, as well as migration due to conflict in surrounding regions. Uganda is home to several

1 Uganda Genome Resource Enables Insights

A New Genetic Method for Isolating Functionally Interacting Genes

HERV-K(HML7) Integrations in the Human Genome: Comprehensive Characterization and Comparative Analysis in Non-Human Primates

Integrating Single-Step GWAS and Bipartite Networks Reconstruction Provides Novel Insights Into Yearling Weight and Carcass Traits in Hanwoo Beef Cattle

Multiple Hits for the Association of Uterine Fibroids on Human Chromosome 1Q43

T-Scan: a Genome-Wide Method for the Systematic Discovery of T Cell Epitopes

Analysis of the Indacaterol-Regulated Transcriptome in Human Airway

Expression of a Mirna Targeting Mutated SOD1 in Astrocytes Induces Motoneuron Plasticity and Improves Neuromuscular Function in ALS Mice

Follow-Up Examination of Linkage and Association to Chromosome 1Q43 in Multiple Sclerosis

Confounding by Linkage Disequilibrium

Genetic Analyses of Human Fetal Retinal Pigment Epithelium Gene Expression Suggest Ocular Disease Mechanisms

Limma: Linear Models for Microarray and RNA-Seq Data User’S Guide

Egfr Activates a Taz-Driven Oncogenic Program in Glioblastoma