<<

News & views main roadblocks to the scientific and equita- 8,9 ble realization of the promise of . The 1000 Project was created in 2008 to generate a more comprehensive catalogue of HGV by systematically sequenc- Expanding diversity ing the genomes of thousands of individu- als from diverse geographical locations, to in genomics identify both common and rare genetic var- iants10. Because of the ever-diminishing cost Charles N. Rotimi & Adebowale A. Adeyemo of , by its completion, the project had amassed 2,504 individuals from 26 popu- In the 20 years since the first drafts of the were lation groups on 5 continents (including sev- made public, an explosion in genome sequencing has revealed eral groups with mixed ancestries), providing how our evolutionary history and health can be understood by a detailed catalogue of genetic variants on a scale previously unimaginable. analysing the diversity in our genomes. The data generated led to an unprecedented range of discoveries about the global distribu- tion of HGV. For instance, it emerged that most The successful sequencing of the human The HapMap project also aided the develop- common variants are shared globally, but rarer genome1,2 in 2001 is considered by many to ment of biotechnological and computational variants are shared by closely related popula- be one of the greatest achievements in biol- approaches such as genome-wide associa- tions, with 86% of rare variants restricted to a ogy. The published sequences were generated tion studies (GWAS), which allow scientists single continental group. The project also con- from the DNA of a few anonymous volunteers to search thousands of individual genomes firmed that there is greater of differing ethnic backgrounds. However, a to discover genetic variants that are linked in African populations than in other groups. single genome (even one generated from to specific traits. GWAS have successfully The small group of that left Africa many individuals) can provide only so much identified genomic regions that increase the about 100,000 years ago to populate the information. It was immediately clear that we risks of common conditions such as , rest of the world carried only a subset of would need to generate and compare more coronary artery disease and Crohn’s disease7. the variations that existed at the time; this sequences from different people, if we were But GWAS have been performed mainly in peo- means that the subset of HGV left behind to harness information coded in genomes to ple of European ancestry7, and as of December can be studied only in Africans11,12. Africa is better understand our health and heritage. 2020, 78% of individuals in all GWAS were of historically under-represented in genomic So far, we have genomes for hundreds of such ancestry (go..com/3ocyhql). Sev- studies owing to inadequate funding and thousands of individuals — more than was eral factors account for this bias, including a investment by African governments. Until a imaginable 20 years ago. Even so, we are just reliance on existing cohorts, preference for few years ago, there was a limited number of beginning to sequence diverse populations homo­geneous population groups, limited African scientists with genomics expertise, in the numbers needed to realize the promise funding for enrolling under-represented and inadequate biomedical research and of genomics. groups and early perceptions that findings computational infrastructure. Last year, the Although human genomes are 99.9% similar, from Europeans should be generalizable to Human and Health in Africa (H3Af- they also contain millions of single other groups. The lingering lack of diversity rica) consortium, of which we are both mem- polymorphisms (SNPs) — single bases where in GWAS has been highlighted as one of the bers, reported whole-genome sequences there is between individu- als. A map of about 1.42 million SNPs was pub- lished alongside the draft genome3, generated a Milestones in part from differences found between the HGP HapMap 1,000 Genomes individuals who contributed their DNA for 1 reference 692 people 2,504 people the draft. Thus, the Human genome 11 populations 26 populations provided a framework for larger-scale projects to analyse human variation. b Ongoing In 2003, a consortium of researchers set out to generate a genetic map of SNPs from Estonian Genome Project deCODE (whole genomes) diverse individuals — an endeavour known as H3Africa 4 the International HapMap Project . The first Genome Denmark iteration of the map, published in 2007, was Genomics a major milestone that documented more TOPmed than 3 million SNPs discovered in 270 indi- All of US viduals from , , the Qatar Genome and Nigeria5. The work shed light on how the Australian Genomics Genome Asia genome is organized, revealing how segments of our DNA are inherited together as blocks, 1990 1995 2000 2005 2010 2015 2021 and highlighting how these blocks vary within and between populations. The HapMap was Figure 1 | Increasing diversity in genomics. a, The (HGP) was established in eventually expanded to include 11 population 1990 and completed in 2003, with the first draft of the human genome1,2 published in 2001. Since then, 6 groups , emphasizing differences in the way in collaborative efforts have resulted in the analysis of large numbers of genomes from increasingly diverse which common human genetic variants (HGV) populations. Milestones of note include the International HapMap Project4 and the 1000 Genomes Project8. are distributed worldwide. b, Today, there are many ongoing projects to sequence populations around the world.

220 | Nature | Vol 590 | 11 February 2021 ©2021 Spri nger Nature Li mited. All rights reserved. ©2021 Spri nger Nature Li mited. All rights reserved.

of 426 individuals from 50 ethnolinguistic second is developing global collaborations Health, Bethesda, 20892, USA. groups in Africa. H3Africa discovered more to establish crucial inter-country biomedical e-mails: [email protected]; [email protected] than three million variants13 — mainly in previ- infrastructure, ethical frameworks and equi- 1. International Human Genome Sequencing Consortium. ously unrepresented ethnolinguistic groups. table data sharing — common barriers to inter- Nature 409, 860–921 (2001). It also observed complex patterns in mixing national collaboration. The third is equitable 2. Venter, J. C. et al. Science 291, 1304–1351 (2001). of ancestries and identified 62 regions of the deployment of genomic advances to avoid 3. The International SNP Map Working Group. Nature 409, 928–933 (2001). genome that have been evolutionarily main- exacerbating health disparities, especially in 4. The International HapMap Consortium. Nature 426, tained at high frequency, perhaps because of -challenged settings across the world. 789–796 (2003). 5. The International HapMap Consortium. Nature 449, protective roles in viral immunity, DNA repair Achieving these goals will greatly improve 851–861 (2007). and metabolism. our knowledge of human genetic diversity, 6. The International HapMap 3 Consortium. Nature 467, These findings highlight — as we and others8 aid disease- discovery efforts and facil- 52–58 (2010). 7. Popejoy, A. B. & Fullerton, S. M. Nature 538, 161–164 have argued for years — the need to increase itate our understanding of human . (2016). diversity in genome science (Fig. 1). Clearly, The road from one genome reference to hun- 8. Bustamante, C. D., De la Vega, F. M. & Burchard, E. G. Eurocentric studies will not be broadly appli- dreds of thousands of genomes has provided Nature 475, 163–165 (2011). 9. Martin, A. R. et al. Nature Genet. 51, 584–591 (2019). cable to all populations. Some disease risk unprecedented insight into human genetic 10. The Consortium. Nature 526, variants are specific to certain populations, variation and the complex tapestry of our 68–74 (2015). and polygenic risk scores (which quantify the ancestry, leading to many practical scientific 11. McClellan, J. M., Lehner, T. & King, M.-C. 171, 261–264 (2017). risk that an individual will develop a given trait and medical benefits. Making these benefits 12. Rotimi, C. N. et al. Hum. Mol. Genet. 26, R225–R236 (2017). or disease, based on the aggregate or sum of available to all humanity is the next frontier. 13. Choudhury, A. et al. Nature 586, 741–748 (2020). 14. Genovese, G. et al. Science 329, 841–845 (2010). variants they carry) might not generalize well 15. Adeyemo, A. A. et al. Nature Commun. 10, 3195 (2019). 9,14–16 across multiple populations . Type 2 dia­ Charles N. Rotimi and Adebowale A. Adeyemo 16. The SIGMA Type 2 Diabetes Consortium. Nature 506, betes is a common disease that demonstrates are at the Center for Research on Genomics 97–101 (2014). 17. Keita, S. O. Y. et al. Nature Genet. 36, S17–S20 (2004). this observation. Despite a set of well-known and Global Health, National Human Genome 18. Nature Biotechnol. 20, 637 (2002). risk variants that are shared across popula- Research Institute, National Institutes of 19. Baker, J. L., Rotimi, C. N. & Shriner, D. Sci. Rep. 7, 1572 (2017). tions, seemingly population-specific variants have been identified in East Asian, Mexican and Metabolism African groups15,16. Understanding how differences between our genomes cluster according to the ances- tral backgrounds of individuals and groups New-found brake calibrates is undoubtedly valuable. However, inferred clusters might not overlap with social descrip- insulin action in β-cells tors such as ‘Black’, ‘Latino’, ‘Asian’ and ‘Euro- pean’ — an assumption that has been used by Rohit N. Kulkarni some to justify racial categorization17. The best evidence so far suggests that social catego- Insulin is produced by pancreatic β-cells. The identification ries and genetic clusters are inconsistent17,18. 19 of a regulator of insulin signalling in these cells cements Indeed, one study that identified 21 global the long-standing idea that this pathway has a key role ancestries reported that its 6,000 individu- als had, on average, DNA from 4 ancestries. in β-. See p.326 This indicates the need for caution when using labels such as African/Black, Hispanic/Latino, Asian or European/white in genome science. It is almost a century since insulin was first abundantly expressed mRNA encoded by a Indeed, the use of these terms in genome used to treat diabetes1. Since then, a great gene on 3. The corresponding science should be discouraged except as deal has been learnt about the complex meta­ human gene is named oestrogen-induced gene self-reported descriptors or to provide bolic pathways that are regulated by insulin (EIG121), and the mouse and human socio-demographic context. Using these and the related molecule insulin-like growth are highly evolutionarily conserved. terms risks distorting our understanding of factor 1 (IGF1), acting through receptor pro- Ansarullah et al. renamed the insulin the distribution of HGV in history and health. teins (reviewed in ref. 2). But it is less clear how inhibitory receptor (inceptor), because of its In future, we will increasingly use genomics the activity of these receptors is regulated in similarities to the insulin and IGF1 receptors. to understand our evolutionary history, pre- the cells that actually produce insulin, the All three receptors span the cell membrane dict individual disease risks, develop thera- pancreatic β-cells. Such knowledge is urgently and have similar extracellular domains. But, peutics such as vaccines, and cure diseases needed because reduced β-cell is a unlike the insulin and IGF1 receptors, the short including sickle-cell anaemia using DNA-edit- key contributor to diabetes. Deciphering cytoplasmic tail of inceptor carries an amino ing technologies. To fully realize these expec- the molecular pathways that regulate β-cells acid sequence known to bind to the assembly tations, we must address several continuing might therefore help to better manage, or even polypeptide 2 (AP2) protein complex. AP2 is challenges, including, but not limited to, three prevent, this disease. On page 326, Ansarullah involved in a process called clathrin-mediated factors related to diversity. The first is increas- et al.3 identify a previously unknown regula- endocytosis, through which molecules and ing the participation of individuals from tor of β-cells, and outline the mechanism by receptors at the cell surface are transported diverse ancestral backgrounds in genome which this protein can ‘tailor’ expression of into the cell. research. This challenge is being met through the insulin receptor. Then, the authors examined the function of support for genomics research and capacity First, the authors analysed levels of mes- inceptor by generating mice completely lack- building from both large consortia (such as the senger RNA in mouse cells, to identify ing the inceptor gene, and mice in which the H3Africa consortium) and national genome that were highly expressed specifically in gene could be deleted specifically in β-cells. projects in under-studied populations. The the embryonic pancreas. This revealed an The two models show generally similar traits.

Nature | Vol 590 | 11 February 2021 | 221 ©2021 Spri nger Nature Li mited. All rights reserved. ©2021 Spri nger Nature Li mited. All rights reserved.