Expanding Diversity in Genomics

News & views main roadblocks to the scientific and equita- 8,9 Human genome ble realization of the promise of genomics . The 1000 Genomes Project was created in 2008 to generate a more comprehensive catalogue of HGV by systematically sequenc- Expanding diversity ing the genomes of thousands of individuals from diverse geographical locations, to in genomics identify both common and rare genetic var- iants10. Because of the ever-diminishing cost Charles N. Rotimi & Adebowale A. Adeyemo of sequencing, by its completion, the project had amassed 2,504 individuals from 26 popu- In the 20 years since the first drafts of the human genome were lation groups on 5 continents (including sev- made public, an explosion in genome sequencing has revealed eral groups with mixed ancestries), providing how our evolutionary history and health can be understood by a detailed catalogue of genetic variants on a scale previously unimaginable. analysing the diversity in our genomes. The data generated led to an unprecedented range of discoveries about the global distribu- tion of HGV. For instance, it emerged that most The successful sequencing of the human The HapMap project also aided the develop- common variants are shared globally, but rarer genome1,2 in 2001 is considered by many to ment of biotechnological and computational variants are shared by closely related popula- be one of the greatest achievements in biol- approaches such as genome-wide associa- tions, with 86% of rare variants restricted to a ogy. The published sequences were generated tion studies (GWAS), which allow scientists single continental group. The project also con- from the DNA of a few anonymous volunteers to search thousands of individual genomes firmed that there is greater genetic diversity of differing ethnic backgrounds. However, a to discover genetic variants that are linked in African populations than in other groups. single genome (even one generated from to specific traits. GWAS have successfully The small group of humans that left Africa many individuals) can provide only so much identified genomic regions that increase the about 100,000 years ago to populate the information. It was immediately clear that we risks of common conditions such as diabetes, rest of the world carried only a subset of would need to generate and compare more coronary artery disease and Crohn’s disease7. the variations that existed at the time; this sequences from different people, if we were But GWAS have been performed mainly in peo- means that the subset of HGV left behind to harness information coded in genomes to ple of European ancestry7, and as of December can be studied only in Africans11,12. Africa is better understand our health and heritage. 2020, 78% of individuals in all GWAS were of historically under-represented in genomic So far, we have genomes for hundreds of such ancestry (go.nature.com/3ocyhql). Sev- studies owing to inadequate funding and thousands of individuals — more than was eral factors account for this bias, including a investment by African governments. Until a imaginable 20 years ago. Even so, we are just reliance on existing cohorts, preference for few years ago, there was a limited number of beginning to sequence diverse populations homo geneous population groups, limited African scientists with genomics expertise, in the numbers needed to realize the promise funding for enrolling under-represented and inadequate biomedical research and of genomics. groups and early perceptions that findings computational infrastructure. Last year, the Although human genomes are 99.9% similar, from Europeans should be generalizable to Human Heredity and Health in Africa (H3Af- they also contain millions of single nucleotide other groups. The lingering lack of diversity rica) consortium, of which we are both mem- polymorphisms (SNPs) — single bases where in GWAS has been highlighted as one of the bers, reported whole-genome sequences there is genetic variation between individuals. A map of about 1.42 million SNPs was published alongside the draft genome3, generated a Milestones in part from differences found between the HGP HapMap 1,000 Genomes individuals who contributed their DNA for 1 reference 692 people 2,504 people the draft. Thus, the Human Genome Project genome 11 populations 26 populations provided a framework for larger-scale projects to analyse human variation. b Ongoing In 2003, a consortium of researchers set out to generate a genetic map of SNPs from Estonian Genome Project deCODE genetics (whole genomes) diverse individuals — an endeavour known as H3Africa 4 the International HapMap Project . The first Genome Denmark iteration of the map, published in 2007, was Genomics England a major milestone that documented more TOPmed than 3 million SNPs discovered in 270 indi- All of US viduals from Japan, China, the United States Qatar Genome and Nigeria5. The work shed light on how the Australian Genomics Genome Asia genome is organized, revealing how segments of our DNA are inherited together as blocks, 1990 1995 2000 2005 2010 2015 2021 and highlighting how these blocks vary within and between populations. The HapMap was Figure 1 | Increasing diversity in genomics. a, The Human Genome Project (HGP) was established in eventually expanded to include 11 population 1990 and completed in 2003, with the first draft of the human genome1,2 published in 2001. Since then, 6 groups , emphasizing differences in the way in collaborative efforts have resulted in the analysis of large numbers of genomes from increasingly diverse which common human genetic variants (HGV) populations. Milestones of note include the International HapMap Project4 and the 1000 Genomes Project8. are distributed worldwide. b, Today, there are many ongoing projects to sequence populations around the world. 220 | Nature | Vol 590 | 11 February 2021 ©2021 Spri nger Nature Li mited. All rights reserved. ©2021 Spri nger Nature Li mited. All rights reserved. of 426 individuals from 50 ethnolinguistic second is developing global collaborations Health, Bethesda, Maryland 20892, USA. groups in Africa. H3Africa discovered more to establish crucial inter-country biomedical e-mails: [email protected]; [email protected] than three million variants13 — mainly in previ- infrastructure, ethical frameworks and equi- 1. International Human Genome Sequencing Consortium. ously unrepresented ethnolinguistic groups. table data sharing — common barriers to inter- Nature 409, 860–921 (2001). It also observed complex patterns in mixing national collaboration. The third is equitable 2. Venter, J. C. et al. Science 291, 1304–1351 (2001). of ancestries and identified 62 regions of the deployment of genomic advances to avoid 3. The International SNP Map Working Group. Nature 409, 928–933 (2001). genome that have been evolutionarily main- exacerbating health disparities, especially in 4. The International HapMap Consortium. Nature 426, tained at high frequency, perhaps because of resource-challenged settings across the world. 789–796 (2003). 5. The International HapMap Consortium. Nature 449, protective roles in viral immunity, DNA repair Achieving these goals will greatly improve 851–861 (2007). and metabolism. our knowledge of human genetic diversity, 6. The International HapMap 3 Consortium. Nature 467, These findings highlight — as we and others8 aid disease-gene discovery efforts and facil- 52–58 (2010). 7. Popejoy, A. B. & Fullerton, S. M. Nature 538, 161–164 have argued for years — the need to increase itate our understanding of human biology. (2016). diversity in genome science (Fig. 1). Clearly, The road from one genome reference to hun- 8. Bustamante, C. D., De la Vega, F. M. & Burchard, E. G. Eurocentric studies will not be broadly appli- dreds of thousands of genomes has provided Nature 475, 163–165 (2011). 9. Martin, A. R. et al. Nature Genet. 51, 584–591 (2019). cable to all populations. Some disease risk unprecedented insight into human genetic 10. The 1000 Genomes Project Consortium. Nature 526, variants are specific to certain populations, variation and the complex tapestry of our 68–74 (2015). and polygenic risk scores (which quantify the ancestry, leading to many practical scientific 11. McClellan, J. M., Lehner, T. & King, M.-C. Cell 171, 261–264 (2017). risk that an individual will develop a given trait and medical benefits. Making these benefits 12. Rotimi, C. N. et al. Hum. Mol. Genet. 26, R225–R236 (2017). or disease, based on the aggregate or sum of available to all humanity is the next frontier. 13. Choudhury, A. et al. Nature 586, 741–748 (2020). 14. Genovese, G. et al. Science 329, 841–845 (2010). variants they carry) might not generalize well 15. Adeyemo, A. A. et al. Nature Commun. 10, 3195 (2019). 9,14–16 across multiple populations . Type 2 dia- Charles N. Rotimi and Adebowale A. Adeyemo 16. The SIGMA Type 2 Diabetes Consortium. Nature 506, betes is a common disease that demonstrates are at the Center for Research on Genomics 97–101 (2014). 17. Keita, S. O. Y. et al. Nature Genet. 36, S17–S20 (2004). this observation. Despite a set of well-known and Global Health, National Human Genome 18. Nature Biotechnol. 20, 637 (2002). risk variants that are shared across popula- Research Institute, National Institutes of 19. Baker, J. L., Rotimi, C. N. & Shriner, D. Sci. Rep. 7, 1572 (2017). tions, seemingly population-specific variants have been identified in East Asian, Mexican and Metabolism African groups15,16. Understanding how differences between our genomes cluster according to the ances- tral backgrounds

Load more