VIewpoInt

against police brutality towards African American citizens4 have strengthened The road ahead in and the antiracism movement and amplified demands for racial equity. To be part of this movement and effect change will require humility. We must actively listen and learn from each other, Amy L. McGuire, Stacey Gabriel, Sarah A. Tishkoff , Ambroise Wonkam , especially when it is uncomfortable and our Aravinda Chakravarti , Eileen E. M. Furlong , Barbara Treutlein , own complicity may be implicated. It will Alexander Meissner , Howard Y. Chang , Núria López-Bigas​ , Eran Segal require solidarity and a recognition that and Jin-Soo Kim​ we are all connected through our common humanity. And it will require courage. It Abstract | In celebration of the 20th anniversary of Nature Reviews Genetics, we may seem like a platitude, but it is true that asked 12 leading researchers to reflect on the key challenges and opportunities nothing will change unless actual change is faced by the field of genetics and genomics. Keeping their particular research area made. If we continue to do things as they in mind, they take stock of the current state of play and emphasize the work that have always been done, we will end up where remains to be done over the next few years so that, ultimately, the benefits of we have always been. It is time to step into the discomfort and dare to do something genetic and genomic research can be felt by everyone. different. So what can we do differently to make Making genomics truly equitable These advancements raise many ethical genomics more equitable? I propose three Amy McGuire. For the field of genetics and policy issues, including concerns areas where we should focus attention to and genomics, the first decade of the about privacy and discrimination, address this important question. First, we twenty-​first century was a time of rapid the right of access to research findings must ensure equitable representation in discovery, transformative technological and direct-​to-consumer​ genetic testing, and genomic research. Examining 2,511 studies development and plummeting costs. We informed consent. Significant investment involving nearly 35 million samples from moved from mapping the human , has been made to better understand the the GWAS Catalog in 2016, Popejoy and an international endeavour that took more risks and benefits of clinical genomic testing, Fullerton found that the vast majority (81%) than a decade and cost billions of dollars, and there has been vigorous debate about come from individuals of European descent, to sequencing individual for a the ethics of human gene editing, with with only 5% coming from non-Asian​ mere fraction of the cost in a relatively many prominent scientists and bioethicists minority populations5. This has created short time. calling for a moratorium on human an ‘information disparity’ that has an During the subsequent decade, the field germline editing until it is proven to be safe impact on the reliability of clinical genomic turned towards making sense of the vast and effective and there is broad societal interpretation for under-represented​ amount of genomic information being consensus on its appropriate application2. minorities6. The US National Institutes generated and situating it in the context These are all important issues that we of Health (NIH) has invested in efforts of one’s environment, lifestyle and other need to continue to explore, but as the to increase diversity in genomic research, non-​genetic factors. Much of the hype technologies that have been developed but to be successful these efforts must be that characterized the previous decade and tested at warp speed over the past accompanied by serious attention to earning was tempered as we were reminded of the two decades begin to be integrated into the trust of disadvantaged and historically exquisite complexity of human biology. routine clinical care, it is imperative that mistreated populations. This will require, at A vision of medicine driven by genetically we also confront one of the most difficult a minimum, more meaningful engagement, determined risk predictions was replaced and fundamental challenges in genomics, improved transparency, robust systems with a vision of precision in which genetics, in medicine and in society — rectifying of accountability, and a commitment environment and lifestyle all converge to structural inequities and addressing factors to creating opportunities that promote deliver the right treatment to the right that privilege some while disadvantaging and support a genomics workforce that patient at the right time1. others. The genomics of the future must be includes scientists and clinicians from As we embark on the third decade of this a genomics for all, regardless of ethnicity, under-represented​ populations. century, we are now faced with the prospect geography or ability to pay. It is insufficient to achieve diverse of being able not only to more accurately This audacious goal of making genomics representation in genomic research; predict disease risk and tailor existing truly equitable requires multifaceted however, there must also be equitable access treatments on the basis of genetic and solutions. The disproportionate burden of to the fruits of that research. An analysis non-​genetic factors but also to potentially illness and death among racial and ethnic of the US Centers for Disease Control and cure or even eliminate some diseases entirely minorities associated with the global Prevention’s 2018 Behavioural Risk Factor with gene-​editing technologies. COVID-19 pandemic3 and recent protests Surveillance System found that non-elderly​

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 581 Viewpoint

The contributors Amy L. McGuire is the Leon Jaworski Professor of Biomedical Ethics and Director of the Center for Medical Ethics and Health Policy at Baylor College of Medicine. She has received numerous teaching awards at Baylor College of Medicine, was recognized by the Texas Executive Women as a Woman on the Move in 2016 and was invited to give a TedMed talk titled “There is No Genome for the Human Spirit” in 2014. In 2020, she was elected as a Hastings Center Fellow. Her research focuses on ethical and policy issues related to emerging technologies, with a particular focus on genomic research, personalized medicine and the clinical integration of novel neurotechnologies. Stacey Gabriel is the Senior Director of the Genomics Platform at the Broad Institute since 2012 and has led platform development, execution and operation since its founding. She is Chair of Institute Scientists and serves on the institute’s executive leadership team. She is widely recognized as a leader in genomic technology and project execution. She has led the Broad’s contributions to numerous flagship projects in , including the International HapMap Project, the 1000 Genomes Project, The Cancer Genome Atlas, the National Heart, Lung, and Blood Institute’s Exome Sequencing Project and the TOPMed programme. She is Principal Investigator of the Broad’s All of Us (AoU) Genomics Center and serves on the AoU Program Steering Committee. Sarah A. Tishkoff is the David and Lyn Silfen University Associate Professor in Genetics and Biology at the University of Pennsylvania, Philadelphia, USA, and holds appointments in the School of Medicine and the School of Arts and Sciences. She is a member of the US National Academy of Sciences and a recipient of an NIH Pioneer Award, a David and Lucile Packard Career Award, a Burroughs/Wellcome Fund Career Award and an American Society of Human Genetics Curt Stern Award. Her work focuses on genomic variation in Africa, human evolutionary history, the genetic basis of adaptation and phenotypic variation in Africa, and the genetic basis of susceptibility to infectious disease in Africa. Ambroise Wonkam is Professor of Medical Genetics, Director of GeneMAP (Genetic Medicine of African Populations Research Centre) and Deputy Dean Research in the Faculty of Health Sciences, University of Cape Town, South Africa. He has successfully led numerous NIH- and Wellcome Trust-​funded projects over the past decade to investigate clinical variability in sickle cell disease, hearing impairment genetics and the return of individual findings in genetic research in Africa. He won the competitive Clinical Genetics Society International Award for 2014 from the British Society of Genetic Medicine. He is president of the African Society of Human Genetics. Aravinda Chakravarti is Director of the Center for Human Genetics and Genomics, the Muriel G. and George W. Singer Professor of Neuroscience and Physiology, and Professor of Medicine at New York University School of Medicine. He is an elected member of the US National Academy of Sciences, the US National Academy of Medicine and the Indian National Science Academy. He has been a key participant in the Human Genome Project, the International HapMap Project and the 1000 Genomes Project. His research attempts to understand the molecular basis of multifactorial disease. He was awarded the 2013 William Allan Award by the American Society of Human Genetics and the 2018 Chen Award by the Human Genome Organization. Eileen E. M. Furlong is Head of the Genome Biology Department at the European Laboratory (EMBL) and a member of the EMBL Directorate. She is an elected member of the European Molecular Biology Organization (EMBO) and the Academia Europaea, and a European Research Council (ERC) advanced investigator. Her group dissects fundamental principles of how the genome is regulated and how it drives cell fate decisions during embryonic development, including how developmental enhancers are organized and function within the 3D nucleus. Her work combines genetics, (single-cell)​ genomics, imaging and computational approaches to understand these processes. Her research has advanced the development of genomic methods for use in complex multicellular organisms. Barbara Treutlein is Associate Professor of Quantitative Developmental Biology in the Department of Biosystems Science and Engineering of ETH Zurich in Basel, Switzerland. Her group uses and develops single-cell​ genomics approaches in combination with stem cell-based​ 2D and 3D culture systems to study how human organs develop and regenerate and how cell fate is regulated. For her work, Barbara has received multiple awards, including the Friedmund Neumann Prize of the Schering Foundation, the Dr. Susan Lim Award for Outstanding Young Investigator of the International Society of Stem Cell Research and the EMBO Young Investigator Award. Alexander Meissner is a scientific member of the Max Planck Society and currently Managing Director of the Max Planck Institute (MPI) for Molecular Genetics in Berlin, Germany. He heads the Department of Genome Regulation and is a visiting scientist in the Department of Stem Cell and Regenerative Biology at Harvard University. Before his move to the MPI, he was a tenured professor at Harvard University and a senior associate member of the Broad Institute, where he co-​directed the epigenomics programme. In 2018, he was elected as an EMBO member. His laboratory uses genomic tools to study developmental and disease biology with a particular focus on epigenetic regulation. Howard Y. Chang is the Virginia and D. K. Ludwig Professor of Cancer Genomics at Stanford University and an investigator at the Howard Hughes Medical Institute. He is a physician–scientist who has focused on deciphering the hidden information in the non-coding​ genome. His laboratory is best known for studies of long non-coding​ RNAs in gene regulation and development of new epigenomic technologies. He is an elected member of the US National Academy of Sciences, the US National Academy of Medicine, and the American Academy of Arts and Sciences. Núria López-​Bigas is ICREA research Professor at the Institute for Research in Biomedicine and Associate Professor at the University Pompeu Fabra. She obtained an ERC Consolidator Grant in 2015 and was elected as an EMBO member in 2016. Her work has been recognized with the prestigious Banc de Sabadell Award for Research in Biomedicine, the Catalan National Award for Young Research Talent and the Career Development Award from the Human Frontier Science Program. Her research focuses on the identification of cancer driver mutations, genes and pathways across tumour types and in understanding the mutational processes that lead to the accumulation of mutations in cancer cells. Eran Segal is Professor in the Department of Computer Science and Applied Mathematics at the Weizmann Institute of Science, heading a multidisciplinary laboratory with extensive experience in machine learning, computational biology and analysis of heterogeneous high-throughput​ genomic data. His research focuses on the microbiome, nutrition and genetics, and their effect on health and disease and aims to develop personalized medicine based on big data from human cohorts. He has published more than 150 publications and received several awards and honours for his work, including the Overton and the Michael Bruno awards. He was recently elected as an EMBO member and as a member of the Israel Young Academy. Jin-​Soo Kim is Director of the Center for Genome Engineering in the Institute for Basic Science in Daejon, South Korea. He has received numerous awards, including the 2017 Asan Award in Medicine, the 2017 Yumin Award in Science and the 2019 Research Excellence Award (Federation of Asian and Oceanian Biochemists and Molecular Biologists). He was featured as one of ten Science Stars of East Asia in Nature (558, 502–510 (2018)) and has been recognized as a highly cited researcher by Clarivate Analytics since 2018. His work focuses on developing tools for genome editing in biomedical research.

582 | October 2020 | volume 21 www.nature.com/nrg Viewpoint

genome to catalogue sequence changes was Coupled with clinical data, building something to wish for in our wildest dreams. up population-scale​ databases of genomic we must strive to achieve Thanks to great strides in technology plus clinical information will fuel the more equitable outcomes from and the hard work of geneticists, engineers, application of better risk interpretation using epidemiologists and clinicians, much progress polygenic risk scores (PRSs)11. More routine genomic medicine has been made; huge numbers of genomes WGS will shorten the ‘diagnostic odyssey’, (and exomes) have been sequenced across in which patients suffer through rounds adults from self-identified​ racial or ethnic the world. Disease gene-finding​ projects of testing and parents are left uncertain minority groups are significantly less such as my graduate work are now done about future reproductive planning. likely to see a doctor because of cost than routinely, rather than one gene at a time, using More efficient clinical trials might be built non-​elderly white adults7. This finding whole-exome​ or whole-genome​ sequencing using genomic information. With existing reflects how the structure and financing of (WGS) in families and affected individuals, genomic information on all individuals in health care in the United States perpetuates enabling the identification of genes and a health system, trials could be designed inequities and contributes to the larger causative mutations in thousands of in a way that selects individuals most likely web of social injustice that is at the heart of Mendelian diseases and some complex to have an event. This enrichment could the problem. Even when socio-economic​ diseases. provide more promising, shorter, smaller factors are controlled for, racial disparities But the real promise of genome and cheaper trial design. in access to genetic services persist8. sequencing lies in true population-scale​ These databases must also rapidly be Large-scale,​ sustained research is needed to sequencing, ultimately at the scale of tens built in such a way that is representative better understand and actively address the of millions of individuals, whereby genome of the population, representing the actual multitude of factors that contribute to this, sequencing of unselected people enables racial and ethnic diversity, not just what including issues related to structural racism, the unbiased, comprehensive study of our was available as banked sample collections. mistrust, implicit and explicit bias, a lack of genome and the variation therein. It provides These are well known to be predominantly knowledge of genetic testing, and concerns a ‘lookup table’ to catalogue disease-causing​ European-descent​ samples and thus about misuse of genetic information. and benign variants (our ‘allelic series’). preclude application of risk prediction tools Finally, and perhaps most daunting, The genome sequence should become part in non-​white individuals and have limited we must strive to achieve more equitable of the electronic health record; it is a stable, the ability to find population-specific​ genetic outcomes from genomic medicine. persistent source of information about a associations, such as those that have been Many racial and ethnic minorities person akin to physical measurements such demonstrated in type 2 diabetes mellitus disproportionately experience chronic as weight or blood pressure, exposures (T2DM)12. disease and premature death compared such as smoking or alcohol use, and (in many We have to solve important issues — with white individuals. Disparities also ways better than) self-reported​ family history. data sharing, privacy and getting the data exist by gender, sexual orientation, age, to scale. Sharing genomic and clinical disability status, socio-economic​ status and data is of key importance to drive forward geographical location. Health outcomes discovery and our understanding of are heavily influenced by social, economic the real promise of genome how to use these data in the health-care​ and environmental factors. Thus, although sequencing lies in true setting. To do this well and responsibly, providing more equitable access to genomic population-scale​ sequencing, trust must be built and maintained services and ensuring more equitable through adherence to the rights of privacy, representation in genomic research are ultimately at the scale of tens protection and non-discrimination.​ necessary first steps, they are not enough9. of millions of individuals Progress is being made through Genomics can only be part of the solution the creation of data platforms and the if it is integrated with broader social, development of frameworks for data economic and political efforts aimed at What can we learn? What needs to protection and sharing; for example, by the addressing disparities in health outcomes. be solved? Even fairly small numbers of work of the Global Alliance for Genomics For genomics to be truly equitable, it must genomes aggregated in a consistent and and Health (GA4GH). operate within a just health-care​ system and searchable form have enabled a new way to Several large biobanks are already being a just society. use and interpret genomic data, just in the established to launch population-scale​ past couple of years providing a glimpse at efforts. The UK Biobank is a vanguard Genome sequencing at population scale the future. Efforts such as gnomAD10 are programme that contains genotype data, Stacey Gabriel. Twenty years ago, a start — this database contains data from questionnaire-based​ health and physical I finished a PhD project that involved more than 15,000 genomes and 125,0000 measurements on 500,000 individuals and laboriously sequencing one gene — a rather exomes. With this resource, the frequency some linkage to their medical records. complicated one, RET — in a couple of of genetic variants within populations is Other efforts such as the All of Us research hundred people to catalogue pathogenic readily available. A clinician interpreting programme have been launched with variants for Hirschsprung disease. This the genome of a patient can ask whether goals directed at true population-based​ work required designing primers on the a variant has been observed before. The representation, and biobanks that link basis of genome sequence data as they were data provide a starting point for assessing genomic data to comprehensive medical gradually released, amplifying the gene exon the functional impact of classes of genetic records in specific health-care​ systems (for by exon (all 20!), running sequencing gels variation and the ability to ask questions example, Geisinger) or in specific countries and manually scoring sequence changes. about ‘missing’ genetic variation where there or regions (for example, Estonia and Iceland) The notion of sequencing the whole is constraint. are also under way.

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 583 Viewpoint

A big piece of this puzzle is generating observations). While decreases in the A global view of human comprehensive genome sequence data in costs of sample preparation and data Sarah Tishkoff. The past 10 years saw these programmes and far beyond. For this processing are important, they represent an exponential increase in SNP array aim, large-​scale, affordable sequencing a small component of the total cost. and high-​coverage WGS data owing to is key. No problem, right? Is sequencing not Roughly 70% of the cost of sequencing innovations in genomic technologies. always getting cheaper? The problem is that a human genome is the sequencing It is now possible to generate WGS data this assumption is no longer true. We have reagent (flow cell) and the instrument. from tens of thousands of individuals got to where we are today because for a Appreciable cost decrease is made possible (for example, GenomeAsia 100K14 and long time, from 2008 to 2013, sequencing only by decreasing these marginal costs, NIH TOPMed15). An increase in medical costs dropped exponentially. However, in as was demonstrated in the period from biobanks with access to electronic health recent years, the sequencing cost curve has 2010 to 2014, when flow-cell​ densities records (for example, the UK Biobank16, flattened, as is apparent in publicly reported doubled and sequencing cost dropped the Million Veteran Project17 and BioBank cost estimates provided by the US National by an order of magnitude (US$100 per Japan18) is enabling the mapping of Human Genome Research Institute13. gigabase to US$10 per gigabase). hundreds of genetic associations with The cost per megabase of sequence data 2. Scale. One component of cost is the fixed complex traits and diseases, as well as has remained largely unchanged since cost borne by the sequencing centre or phenome-wide​ association studies19 to map around 2016, hovering around a list price the sequencing vendor. With high scale, pleiotropic associations of with of US$0.01 per megabase, which translates centres can become more efficient and genes. The genetic associations identified to a US$1,000 genome. Gone are the offset costs such as the costs of personnel, in these and other studies have been used days of our field touting the impressive equipment and facilities. Scale can also to calculate PRSs for predicting complex decrease of cost in comparison with Moore’s result in volume discounting of the phenotypes and risk of diseases. law, and this development is worrying. reagents, although this process is tightly Yet despite these advances, as of 2019, Some discounting does happen at controlled and approached cautiously nearly 80% of individuals in genome- ​ considerable volume, and whole genomes depending on overall market dynamics. wide association studies (GWAS) were of can be priced in the range of US$500 to 3. Competition. Innovation and scale can only European ancestries, ~10% were of East US$700. However, large projects (more than achieve so much. The cost of generating Asian ancestries, ~2% were of African 500,000 samples) sequenced at these prices the data (the cost per gigabase) dominates ancestries, ~1.5% were of Hispanic ancestries are few and far between, and are generally and thus must come down considerably. and less than 1% were of other ancestries20. dependent on pharmaceutical or biotech The current market requires alternative There is also a strong European bias in funding, which can bring with it restrictions options to drive this advance. Presently, genomic reference databases, such as on data sharing. It is my belief that a fivefold the market for short-read​ sequencing gnomAD and GTEx. These biases limit to sevenfold reduction in total costs is is lacking viable, proven competition our knowledge of genetic risk factors for needed to unlock more sequencing at the that would force flow-cell​ densities and disease in ethnically diverse populations population scale and, ultimately, for genome machine yield to be increased and put and could exacerbate health inequities20. sequencing to be more widely applied in pressures on volume discounting. While Furthermore, PRSs that were estimated the health-​care setting. At US$100 per options for long-read​ sequencing exist using European data do not accurately genome, the cost represents less than 1% of and play a role in particular applications, predict phenotypes and disease risk in the annual average health-care​ expenditure such as de novo sequencing and structural non-​European populations, performing per person in the United States, and a variant resolution, they are at present far worst in individuals with African ancestry21. genome sequence is a one-time​ investment from cost competitive and, therefore, do The lack of transportability of PRSs across that can be referenced again and again not apply pressure to bring down the cost ethnic groups is likely due to differences over the entire lifespan of a person. Getting of routine WGS. in patterns of linkage disequilibrium and that cost curve down will be important to haplotype structure (resulting in different inspire health-care​ systems to adopt genome We need innovation, great economies SNPs tagging causal variants), differences sequencing routinely. of scale and/or real competition to come to in allele frequencies, gene × gene effects and I see three main drivers that will get us to play in the marketplace. When it comes gene × environment effects. It is also possible US$100 per genome: innovation, scale and to sequencing technology, particularly at a that the genetic architecture of complex traits competition. large scale, we cannot be complacent and and diseases may differ across ethnic groups 1. Innovation. Generating sequence data work around the current barriers to realize owing to different demographic histories and requires multiple components, and there small gains and one-off​ wins. This might adaptation to diverse environments. are multiple areas ripe for innovation. involve specific types of investment beyond Although there have been initiatives Sample preparation can be improved just financial ones; adopting and vetting to increase inclusion of ethnically diverse through more efficient methods new technology requires time, creativity, populations in human genomics research that decrease the labour required, or commitment and patience. It is a challenge (for example, the NIH TOPMed15 and miniaturization can decrease the cost of for our community to take on now. In H3Africa consortia), Indigenous populations the reagents used in library preparation. 5 years’ time, I hope we can look back at remain under-represented.​ Great care must Developments to decrease data processing the era of the US$100 genome and progress be taken to ensure that genomic research costs are also ripe for innovation. towards real population-scale​ databases that of minority and Indigenous populations Recently, we showed that processing fuel discovery, enriching our knowledge is conducted in an ethical manner. This using optimized computing power of the human allelic series and, importantly, involves establishing partnerships with local lowered the time and cost of creating a the routine use of genomic data in the research scientists, being sensitive to local sequence file by ~50% (S.G., unpublished health-care​ setting. customs and cultural concerns, obtaining

584 | October 2020 | volume 21 www.nature.com/nrg Viewpoint

both community and individual consent, in northern Europeans and for decreased and organoids from diverse non-human​ and returning results to communities that height in southern Europeans). However, it primate species will also be informative for participated when possible. In addition, was recently shown that these results were comparative genomic studies to identify the there should be training and capacity influenced by population structure that evolution of human-specific​ traits such as building so that genomic research can be could not be easily corrected using standard brain development and cognition. However, conducted locally, where feasible. approaches, particularly for SNPs below iPS cell-​derived cells may not accurately A particular area of focus in the future genome-wide​ levels of significance23. When reflect the impact of mutations acting on should be developing tools and resources this analysis was repeated with variants developmental phenotypes, which will that make genomic data and analyses identified in a more homogenous set of require development of more efficient accessible in low- and middle-income​ individuals of European ancestry from the in vivo approaches in model organisms. countries. We have to ensure that all people UK Biobank, these signatures of polygenic Perhaps the biggest revolution in benefit from the genomics revolution adaptation were erased23. Thus, methods the study of recent human evolutionary and advances in precision medicine and for detecting polygenic adaptation that are history has been the development of gene editing. Thus, several of the biggest less biased by population structure and by methods that make it feasible to sequence challenges in the next decade will be (1) population ascertainment bias will need to and/or obtain targeted genotypes from to increase inclusion of ethnically diverse be developed in the future. These studies ancient DNA samples. The generation of populations in human genomics research; will also benefit from inclusion of more high-coverage​ reference genomes for archaic (2) the generation of more diverse reference ethnically diverse populations in GWAS and hominid species such as Neanderthals genomes using methods that generate identification of better tag SNPs as described and Denisovans, located in Eurasia, long sequencing reads, and haplotype earlier. A challenge of inclusion of minority has made it feasible to identify archaic phasing, to account for the large amount of populations in GWAS is that sample introgressed segments within the genomes structural variation that likely exists within sizes are often small relative to majority of non-Africans.​ Some of these regions and between populations; (3) the training populations. However, the high levels of have been shown to play a role in adaptive of a more diverse community of genomic genetic diversity and extremes of phenotypic traits such as adaptation to high altitude research scientists; and (4) the development diversity observed in some populations, and immune response26. Furthermore, of better methods for accurately predicting particularly those from Africa, make there has been an explosion of studies of phenotypes and genetic risk across ethnically them particularly informative for GWAS. ancient genetic variation in Europeans diverse populations and for distinguishing For example, a GWAS of skin pigmentation within the past 30,000 years that has gene × environment effects. in fewer than 1,600 Africans was informative demonstrated a much more complex model The inclusion of ethnically diverse for identifying novel genetic variants that of the peopling of Europe, and the recent populations, including Indigenous affect skin colour, including a previously evolution of adaptive traits, than previously populations, is also critical for uncharacterized gene, MFSD12 (ref.24). known from the archaeological record or reconstructing human evolutionary history Thus, genomic studies in the future must from studies of modern populations27. The and understanding the genetic basis of make inclusion of minority populations biggest challenge has been the inability adaptation to diverse environments and a priority. to get high-quality​ ancient DNA from diets. While there have been a number of A challenge in both GWAS and selection regions with a tropical climate, such as success stories for identifying genes of large scans has been the identification of causal Africa and Asia. While there has been effect that play a role in local adaptation genetic variants that directly have an impact success in analysing DNA samples as old (for example, lactose tolerance and sickle on variable traits. Most of these variants as 15,000 years in Africa, which has been cell disease (SCD) associated with malaria are in non-coding​ regions of the genome. informative for tracing recent migration resistance), identifying signatures of The development of high-throughput​ and admixture events28, the lack of a polygenic selection has been considerably approaches, such as massively parallel more ancient African reference genome more challenging22. Genomic signatures of luciferase expression assays to identify gene makes it very challenging to detect archaic polygenic adaptation are based on the ability regulatory regions and high-throughput​ , which currently relies on to detect subtle shifts in allele frequencies at CRISPR screens in vitro and in vivo to statistical modelling approaches. Thus, the hundreds or thousands of loci with minor identify functional variants influencing biggest challenge in the next 10 years will effect on the of a complex trait the trait of interest, will be useful25. There be the successful sequencing of ancient and to determine whether that shift is a is also a need to better understand cell DNA more than 20,000 years old from all result of demography or . type-specific​ variation and gene regulation regions of the world, so that we may have a A more daunting challenge arises from the at the single-cell​ level, including response to better understanding of the complex web of same issues of portability of PRSs described stimuli such as immune, pharmacological population histories from across the globe. earlier — variants associated with a and nutrient challenges, in ethnically complex trait may not tag well across ethnic diverse populations. However, these African genomics — the next frontier groups and/or the genetic architecture of approaches are still limited by the need Ambroise Wonkam. To fully meet the a trait may differ in different populations. to have informative cell lines. This can potential of global genetic medicine, Furthermore, it has recently been shown be particularly challenging to obtain for research into African genomic variation is a that uncorrected population stratification Indigenous populations living in remote scientific imperative, with equitable access can result in a false signal of polygenic regions. Improvements in the differentiation being a major challenge to be addressed. selection23. For example, several studies of induced pluripotent stem cells (iPS cells) Studying African genomic variation have identified signatures of polygenic into assorted cell types and into organoids represents the next frontier of genetic adaptation for height across European will be important for facilitating functional medicine for three major reasons: ancestry, populations (selection for increased height genomic studies. Establishment of iPS cells ecology and equity.

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 585 Viewpoint

On the basis of a ‘pan-genome’​ generated of all associations35. Moreover, whole-exome​ from 910 individuals of African descent, at sequencing of nearly 1,000 African study least 300 million DNA variants (10%) are participants of Xhosa ancestry with Greater availability of African not found in the current human reference schizophrenia found very rare damaging genomes will improve our genome29, and 2–19% of the genome of mutations in multiple genes36, a finding that ancestral Africans derives from poorly could be replicated in a Swedish cohort of understanding of genomic investigated archaic populations that 5,000 individuals. In comparison, results variation and complex diverged before the split of Neanderthals for the Xhosa cohort yielded larger effect trait associations in all 30 and modern humans . Neanderthal genome sizes, which shows that for the same number populations contributions make up ~2% of the of cases and controls, the greater genetic genome in present-day​ Europeans and are variation in African populations provides enriched for variations in genes involved in more power to detect genotype–phenotype Eurasians but are nearly non-existent​ in dermatological phenotypes, neuropsychiatric relationships. Therefore, millions of Africans, and there is evidence that novel disorders and immunological functions31. African genomes must be sequenced, with variants in hearing impairment-associated​ Once technical challenges in sequencing genotyping and analysis tools optimized for genes are more likely to be found in poor-​quality DNA have been overcome their interrogation. Africans than in populations of European and approaches to investigate the Greater availability of African genomes or Asian ancestries40. Higher fertility rate, genomic contribution of African archaic will improve our understanding of genomic consanguinity practices and regional populations have been refined, it is likely variation and complex trait associations genetic bottlenecks will improve novel gene that associations between variants in ancient in all populations but will also support discovery for monogenic diseases in Africa, African DNA and human traits or diseases research into common monogenic diseases. as well as disease–gene pair curation, and will be found, providing insights that can The discovery of a single African origin will address existing challenges surrounding benefit modern-day​ humans. of the SCD mutation, about 5,000–7,000 database biases and inference of variant As a consequence of the 300,000–500,000 years ago, not only suggested recent deleteriousness, which have led to the years of genomic history of modern humans migration and admixture events between misclassification of variants. in Africa, ancestral African populations are Africans and Mediterranean and/or Middle Differential population genomic the most genetically diverse in the world. Eastern populations but also enhanced variant frequencies are shaped by natural By contrast, there is an extreme genetic our understanding of genetic variation evolutionary selection as an adaptation bottleneck, resulting in much less variation, in general as well as its potential impact to environmental pressures. The African in all non-African​ populations who evolved on haemoglobinopathies37. For example, continent follows a North–South axis, from the thousands of humans who migrated variants in the HBB-like​ gene cluster linked which is associated with variable climates out of Africa approximately 70,000 years with high levels of fetal haemoglobin have and biodiversity, both motors of natural ago. Current PRSs, which aim to predict been associated with less severe SCD; because selection. This specific African ecology has the risk for an individual of a specific the level of fetal haemoglobin is under shaped genetic variation accordingly, which disease on the basis of the genetic variants genetic control, it is amenable to therapeutic can have a detrimental or positive impact on that individual harbours, exhibit a bias manipulation by gene editing38. Moreover, health. Obvious examples are variants that regarding usability and transferability across knowledge of an individual’s genetic variants cause SCD but confer resistance to malaria37, populations, as most PRSs do not account can have an impact on secondary prevention APOL1 variants that are protective against for multiple alleles that are either limited of and treatment strategies for SCD. For trypanosomes (the parasites that cause or of high frequency among Africans. example, variants in APOL1 and HMOX1 sleeping sickness)41 and variants of OSBPL10 A GWAS on the genetic susceptibility to and co-inheritance​ of α-thalassaemia​ are and RXRA that protect against dengue T2DM identified a previously unreported associated with kidney dysfunctions39; stroke fever42. Unfortunately, APOL1 variants African-​specific significant locus, while in SCD is associated with targeted genetic also increase susceptibility to chronic showing transferability of 32 established variants used in a Bayesian model; and kidney disease in populations of African T2DM loci32. In addition, nonsense overall SCD mortality has been associated ancestry39,41. A better understanding of the mutations found commonly among with circulating transcriptomic profiles. functional impact of genetic variants specific Africans in PCSK9, which are rare in It is estimated that 75% of the 305,800 to African populations, particularly those Europeans33, are associated with a 40% babies with SCD born each year are born in that have been selected under environmental reduction in plasma levels of low-density​ Africa; SCD in Africa will serve as a model pressure, and the way they interact with lipoprotein, supporting PCSK9 as a target for understanding the impact of genetic each other is needed and will have a positive for dyslipidaemia therapeutics. In the largest variation on common monogenic traits impact on genetic medicine practice. GWAS meta-analysis​ for 34 complex traits, and help to illustrate the multiple layers Moreover, immunogenetic studies among conducted in 14,345 Africans, several loci of genomic medicine implementation. Africans will further our understanding of had limited transferability among cohorts34, Exploring African genomic diversity natural selection and responses to emerging further illustrating that genomic variation will also increase discovery of novel infectious diseases, such as COVID-19. is highest among Africans compared with variants and genes for rare monogenic The scientific imperative of genomic other populations. As a consequence, linkage conditions. Indeed, allelic and locus research of African populations is expected disequilibrium is lower in Africans, which heterogeneity display important differences to enhance genetic medicine knowledge and improves fine mapping and identification in African individuals compared with other practice in Africa but will face the challenges of causative variants. Indeed, while only populations; for example, mutations in GJB2 of overburdened and under-​resourced public 2.4% of participants in large GWAS are account for nearly 50% of cases of congenital health-​care systems, and often absent ethical, African individuals, they account for 7% non-​syndromic hearing impairment among legal and social implication frameworks43,

586 | October 2020 | volume 21 www.nature.com/nrg Viewpoint

alleles that differ only slightly in their genetic of multiple genes52; and how GRN changes effects45; these genetic assumptions were lead to specific cellular responses53. Despite for a deeper understanding, quite contrary to what was then known44. many advances, the number of enhancers we need radically different Throughout the past century, this view regulating expression of a specific gene matured, as segregation analyses of human remains unknown. How many enhancers approaches to understand phenotypes taught us that — beyond the are cell type specific versus ubiquitous? complex trait biology effects of some major genes — most trait How many are constitutive rather than variation was polygenic, modulated by stage specific? And do they act additively family-​specific and random environmental or synergistically in gene expression? requiring international collaboration to be factors47. Today, we have empirical Additionally, which cognate transcription managed. Developing an African genomics evidence from GWAS, which use dense factors bind these enhancers, with what workforce will be necessary to meet the maps of genetic variants on hundreds of dynamics and how are they regulated54? major need for research across the lifespan thousands of individuals measured for These details of a gene’s ‘enhancer code’ are for cohorts of millions of individuals with many traits and diseases, that the genetic critical for assessing its relative effect on a complex or monogenic diseases. Such architecture of most multifactorial traits trait. Next, how does enhancer sequence endeavours can thrive on the foundation is from common sequence variants with variation affect a gene’s activity? Does of recently established initiatives such as small allelic differences at thousands of sites such variation affect transcription factor H3Africa. Indeed, equitable access for across the genome48. This replacement of a binding only or its interaction with the Africans is essential if African genomics is pan-Mendelian​ view with a pan-polygenic​ promoter? Is the enhancer variant’s effect to reach its full potential as the next frontier view of traits is one of the most important evident in all cellular states or only some? of global genetic medicine. contributions of genomics to genetics. Is variation in only one enhancer sufficient Unfortunately, this mapping success has to alter gene expression, or are multiple Decoding multifactorial phenotypes not clarified the number of genes involved, changes in multiple elements necessary? Aravinda Chakravarti. We live in a time the identity of those genes or how those Additional critical questions include of great technological progress in genomics genes specify the phenotype. Indeed, some which genes are involved in the core and computing. And we live in a time when have concluded that many of the mapped pathway underlying a trait, and how do we ‘genetics’ is a household word, with a public GWAS loci are unrelated to the core biology identify them49? Elegant work has shown increasingly adept at understanding its of each phenotype49. Thus, for a deeper how genes are regulated within integrated relevance to their own lives. Not surprisingly, understanding, we need radically different modular GRNs, whereby one gene’s the study of genetics is being reinvented, approaches to understand complex trait product is required in a subsequent step by rediscovered and reshaped, and we are biology in contrast to merely expanding another gene, with feedback interactions52. beginning to understand the science of GWAS in larger and larger samples. These GRNs comprise elements from the human heredity at a resolution that was Yet, the most significant biology to genome, and , with impossible before. emerge from GWAS is that most of the likely rate-limiting​ steps that require regulation. The most significant genetics puzzle trait-​causing variants fall outside coding As our work on Hirschsprung disease has today, in my view, is the dissection of ‘family sequences, in regulatory elements, most shown50,53, a GRN is composed of core genes, resemblance’ of complex phenotypes, both frequently enhancers50,51. This important is the logic diagram of regulation of a major for intellectual (raison d'être of genetics) and finding has uncovered four new genetic rate-limiting​ cellular step, is enriched in practical (disease diagnosis and therapy) puzzles. First, the non-coding​ regulatory coding and enhancer disease variants with reasons. We have long known that family machinery is vast; how much of this disease susceptibility scaling with increasing resemblance arises from shared alleles, regulation is compromised, and how does number of variants, and with disease declining as genetic relationship wanes, it affect phenotypes? Second, regulatory resulting from effects on its rate-limiting​ but the precise molecular components changes affect RNA expression at many genes gene product53. That is, the GRN integrates and composition of this resemblance are and protein expression at others; how does a the expression of multiple genes. Finally, we still poorly understood. At the turn of cell ‘read’ these numerous changes as specific need to understand how GRN changes alter the twentieth century, the components signals? Third, how is this coordinated cell properties and behaviour. I speculate were a matter of bitter and acrimonious expression response translated into cellular that rate-​limiting steps in GRNs are major debate44 between the ‘Mendelians’ and the responses affecting phenotypes? Fourth, regulators of broad cell properties, be they ‘Biometricians’, until the opposing views if specific environmental factors affect the differentiation, migration, proliferation or were reconciled by Ronald Fisher’s 1918 same phenotype, which components do apoptosis, the cellular integrator of GRN analysis45 that complex inheritance could they dysregulate? In my opinion, we need to variation. Thus, genetic variation across the be explained through segregation of many answer these questions for specific traits and genome affects enhancers dysregulating genes, each individually Mendelian. In 1920, diseases to truly understand their polygenic many genes, but only when they dysregulate its publication delayed by World War I, biology. Finally, these explanations must also GRNs through rate-limiting​ steps do they this notion was elegantly demonstrated by answer the question of why some traits are affect cell and tissue biology55. This offers the the experimental studies of Altenburg and decidedly Mendelian whereas others are not. promise of a mechanistic understanding of Muller using truncate wing, an “inconstant The questions of tomorrow will need human polygenic disease. and modifiable character”46 in Drosophila. to focus on four areas: the biology of The way forward for complex trait Fisher’s model assumed an infinite enhancers and the transcription factors biology, including disease, is to shift our number of genes additively contributing that bind them51; the effect of genetic approach from reverse to forward genetics, to a trait, with common genetic variation variation in enhancers50; gene regulatory using genome-wide​ approaches to cell at each component locus comprising two networks (GRNs) that regulate expression type-specific​ gene perturbation. I believe

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 587 Viewpoint

we can construct cell-type​ GRNs en masse, embryos will help, including CRISPR to some loci, mutation of a single transcription inclusive of their enhancers, transcription engineer genomes, optogenetics to perturb factor-​binding site in a single enhancer can factors and feedback or feedforward proteins, lattice light-sheet​ and selective have dramatic effects on gene expression interactions, to then assay functionally plane illumination microscopy to image and development. It is difficult to reconcile defined variation in phenotypes. But, processes in vivo, and low-input​ methods such cases with a shared condensate model, even this approach will be insufficient. to overcome issues with scarce material. as other proteins bound to the enhancers We need to test our success by solving Particularly exciting to me are recent and promoter should still phase separate. at least a few complex traits completely advances in single-cell​ genomics, which, By contrast, there are many examples and demonstrating their veracity using a although they are in their early days, will where mutation of a single transcription synthetic biology approach to recapitulate dramatically change the way we study factor-​binding site, or even an entire the phenotype in a model system; similarly embryogenesis. Many new insights have enhancer, has minimal impact on the to the field of chemistry, analysis has to be already emerged, including the discovery of expression of a gene. These observations followed by de novo synthesis. Our genomic unknown cell types and new developmental suggest that there may be different types of technologies are getting up to the task to trajectories for well-established​ cell types. loci, with requirements for different types enable this advance; as geneticists, are we? Even the concept of ‘cell identity’ has come of chromatin topologies and local nuclear into question. environments, which will be important to Enhancers and embryonic development Cell identities are largely driven by tease apart in the coming years. Eileen Furlong. The work of my group sits transcription factors, which act through The genetic dissection of model loci at the interface of genome regulation and cis-​regulatory elements called ‘enhancers.’ in the 1990s and the first decade of the animal development, and there have been One of the most exciting unsolved mysteries, twenty-​first century led to much of our many exciting advances in both during the in my opinion, is how enhancers relay understanding of how genes are regulated. past decade. Developmental biology studies information to their target genes. The The power of genomics in the past few fundamental processes such as tissue and textbook view of enhancers is of elements decades has captured regulatory information organ development and how complexity with exclusive function that regulate a for all genes genome-wide,​ providing more emerges through the combined action specific target gene through direct promoter unbiased views of regulatory signatures, of cell communication, movement and interactions, which occur sequentially if leading to new models of gene regulation. mechanical forces. After the discovery that multiple enhancers are involved. However, What is missing is empirical testing at a large differentiated cells could be reprogrammed emerging concepts in the past decade scale. A major challenge is to move to more to a naive embryonic stem cell-like​ state, question many of these ‘dogmas’. Some systematic in vivo functional dissection in the past decade has witnessed an explosion enhancers have dual functions, whereas organisms. CRISPR-based​ pooled screens in in vitro cellular reprogramming and others may even regulate two genes. have advanced the interrogation of genomic differentiation studies. Organoids are a Enhancer–promoter communication is regions in cell culture systems. However, very exciting extension of this. The extent now viewed in the light of spatial genome scaling functional assays in embryos remains to which these fairly simple systems can organization, including topologically a huge challenge. The task is enormous self-organize and generate complexity56 associating domains (TADs) and — even long-standing​ model organisms, is one of the unexpected surprises of the membraneless nuclear microcompartments such as Drosophila and mice, lack knockout past 5–10 years. The buzz around stem (that is, hubs or condensates)59. Being strains for all protein-coding​ genes, and the cells has also renewed interest in cellular present within the same TAD likely increases number of regulatory elements is at least plasticity in vivo and has uncovered an the frequency of enhancer–promoter an order of magnitude higher. There has unexpected degree of transdifferentiation interactions, but how a specific enhancer been little progress in developing scalable and dedifferentiation57. In the mouse heart, finds its correct promoter within a TAD, or methods to quantify the contribution of a for example, cardiomyocytes dedifferentiate when TADs are rearranged60,61, remains a transcription factor’s input to an enhancer’s and proliferate to regenerate heart tissue mystery. Hubs or condensates are dynamic activity, and gene expression, in embryos. when damaged within the first week microcompartments62 that contain high More systematic unbiased data will uncover after birth58. local concentrations of proteins, including more generalizable regulatory principles, Our understanding of the molecular transcription factors and the transcriptional increase our predictive abilities of gene changes that accompany differentiation machinery. One potential implication of regulation and developmental programmes, has hugely advanced owing to the jump condensates is that enhancers may not and enhance our understanding of the in scale, resolution and sensitivity of need to ‘directly’ touch a gene’s promoter impact of genetic variation. next-generation​ sequencing technologies to regulate transcription — rather, it may Perhaps the most promising and exciting over the past decade. This has led to a flood be sufficient to come in close proximity prospects in the coming years are to use of studies in embryonic stem cells, iPS cells within the same condensate. Presumably, single-cell​ genomics, imaging and the and embryos that revealed new concepts once proteins reach a critical concentration, integration of the two to dissect the amazing underlying genome regulation by measuring transcription will be initiated. While this transcript diversity, transcription factor model fits a lot of emerging data, there occupancy, chromatin accessibility and are still many open questions. What is the conformation, and chromatin, DNA and required distance between an enhancer and A major challenge is to move RNA modifications. The future challenge a promoter to trigger transcription? Does to more systematic in vivo will be to connect this information to the this distance differ for different enhancers63 physical characteristics of cells and how they depending on their transcription factor– functional dissection in form complex tissues. New technologies DNA affinities? Do different chromatin organisms that solve many challenges of working with environments64 influence the process? At

588 | October 2020 | volume 21 www.nature.com/nrg Viewpoint

complexity of embryonic development. cells for downstream processing. Methods that measure several features from the same Single-​cell genomics can reveal information are improving to measure genomic features cell; for example, mRNA and chromatin about developmental transitions in a way in situ70, as well as to computationally map accessibility75 or mRNA and lineage76. that was unfeasible before. When combined features to spatial contexts71,72. The stage is To build virtual maps, independent with temporal information, such data can set for the next phase of single-cell​ genomics, measurements from different cells can reconstruct developmental trajectories65,66 where spatial registration of multimodal be integrated with use of data integration and identify the regulatory regions and genome activity across molecular, cellular tools77, although it can be difficult to align transcription factors likely responsible for and tissue or ecosystem scales will enable cell states across modalities in particular each transition67. The scale and unbiased virtual reconstructions with extraordinary in developing systems. Therefore, the nature of the data, profiling tens to hundreds resolution and predictive capacity. These ultimate goal is to directly measure as many of thousands of cells, provides much virtual maps will rely on multi-omic​ features as possible (for example, RNA, richer information than anyone envisaged profiling of healthy and perturbed tissues lineage, chromatin, proteins and DNA just 5 years ago, bringing a new level of and organisms, which presents major methylation) in the same cell78, ideally with inference and causal modelling. The ability challenges and opportunities for innovation. spatial resolution. Furthermore, combining to measure single-cell​ parameters in situ Cell throughput remains a challenge, and genetic and pharmacological perturbation (called ‘spatial omics’) will be transformative it is unclear what role dissociation-based​ screens with single-cell​ multi-omic​ measures in the context of developing embryos to single-cell​ sequencing protocols will play will be informative to understand cell state reveal the functional impact of spatial in the future. These protocols are fairly landscapes and underlying regulatory gradients, inductive signals and cell–cell easy to implement, and laboratories around networks for each cell type. The CRISPR– interactions, and to move to digital 4D the world can execute projects with tens of Cas field continues to develop creative tools embryos. Combining these approaches thousands of cells analysed per experiment. for precise single-locus​ editing and other with genetic perturbations holds promise However, there are scenarios in which manipulations79, and incorporation of these to decode developmental programmes as measuring millions of cells per experiment toolkits with single-cell​ sequencing readouts they unfold. Will this bring us to a predictive would be desired, such as in perturbation will certainly bring new mechanistic insight. understanding of the regulatory networks screens. Combinatorial barcoding methods Life forms are inherently dynamic, driving embryonic development during push cell-​throughput boundaries73; however, and each cell has a story to tell. Static the next decade? ‘Simple’ model organisms it is unclear how to scale full transcriptome measurements do not provide sufficient are a fantastic test case to determine the sequencing economically to millions of insight into the mechanisms that give types and scale of data required and to cells using current sequencing technologies. rise to each cell state observed in a tissue. develop the computational framework ‘Compressed sensing’ modalities — Computational approaches to stitch to build predictive networks. The systematic whereby a limited, selected and/or random together independent measurements across functional dissection of gene regulation number of features are measured per cell, time can be used to reconstruct potential and true integration of single-cell​ genomics and high-​dimensional feature levels are histories; however, these are indirect with single-cell​ imaging will bring many recovered through inference or similarity to inferences. Long-​term live imaging in 2D exciting advances in our understanding a known reference — provide an interesting cultures using confocal microscopy and of the programmes driving embryonic possibility to increasing cell throughput74. in 3D tissues using light-sheet​ microscopy development in the coming years. Most single-​cell transcriptome protocols provides morphology, behaviour, location are currently limited to priming the and, in some cases, molecular information Spatial multi-omics​ in single cells polyadenylation track present on all cellular on the history of a cell. Indeed, such Barbara Treutlein. Incredibly, the first mRNAs; however, this approach leads to long-term imaging​ experiments revealed that single-cell​ transcriptome was sequenced just biased sampling of highly expressed mRNAs. cell fates or states can be predicted from cell over a decade ago68! Since this milestone, Clever innovations for random or targeted behaviour across many generations80. of millions of cells have RNA enrichment could be a way to build Cell tracking combined with end point been sequenced and analysed from diverse up composite representations of cell states. single-cell​ genomics experiments can help organisms, tissues and other cellular Image-​based in situ sequencing methods to understand how cell states came to be; biosystems, and these maps of cell states provide a means for increasing the number of however, these experiments lack molecular are revolutionizing the life sciences. The cells measured per experiment, as millions resolution of the intermediates. There are technologies and associated computational of cells can be imaged without a substantial strategies using CRISPR–Cas systems to methods have matured and been increase in financial cost, although imaging capture highly prevalent RNAs inside cells democratized to such an extent that nearly time is a limiting factor. There remains a lot at given times and insert these RNAs into all laboratories can apply the approach to of room for experimental and computational DNA for storage and subsequent readout81. their particular system or question. optimizations to measure the transcriptome, Together with live tracking and end-point​ Of course, the transcriptome is not random barcodes, DNA conformations and single-cell​ genomics, such methods could enough, and protocols have already protein abundances from the micrometre provide unprecedented insight into cell been developed to measure chromatin scale to the centimetre scale spatially, and it histories. accessibility, histone modifications, will be interesting to see how methods for My vision is that the emerging protein abundances, cell lineages and spatial registration advance over the next technologies described above can be other features linked to genome activity in 5 years. applied to human 2D cell culture and 3D single cells69. Currently, many studies use Currently, most high-throughput​ organoid biosystems to understand human dissociation-based​ single-cell​ genomics measurements are performed on cell development and disease mechanisms. methods, where the spatial context is suspensions or on intact tissues using one My team and others are working to build disrupted to facilitate the capture of single modality. That said, studies are emerging virtual human organs that are based on

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 589 Viewpoint

gene bodies are actively targeted, and most a potent tool for generating hypotheses. dynamic changes occur at distal regulatory However, for the most part, epigenomic the next generation of single- sites. Similar insights exist for many core analyses have expanded a highly valuable, cell genomics methods and histone modifications, and, in general, but still largely descriptive, understanding we have an improved appreciation of the of numerous epigenetic layers. So one may human organoid technologies epigenetic writers, readers and erasers ask, what is different now and why should will provide unprecedented involved. Over the past decade, we have seen we expect to answer these questions in the opportunities substantially integrated and multilayered coming years? epigenomic analyses that provide a fairly Technological innovation has always comprehensive picture of epigenomic played a key role in biology, and some high-throughput,​ multimodal single-cell​ landscapes, including their dynamics across broadly applicable, recent breakthroughs genomics data. Organoid counterparts development and disease. will enable us to drive progress in the provide opportunities to perturb the system Additional innovation is now needed coming years. These include the transfer of and understand lineage histories. Together, around data access and sharing. As noted, the bacterial innate immunity CRISPR–Cas the next generation of single-cell​ genomics there is certainly no shortage of data, but to system as a universal genome-targeting​ methods and human organoid technologies enable individual researchers to generate tool86 as well as for base editing, epigenome will provide unprecedented opportunities to and verify hypotheses quickly improved editing and various genome manipulations. develop new therapies for human disease. tools are required to access and browse Similarly, new fast-acting​ endogenous these data. Over the past decade, large protein degradation systems have been Unravelling the layers of the epigenome coordinated projects such as ENCODE, developed that further enhance our ability to Alexander Meissner. Around 1975, the the Roadmap Epigenomics Project and probe for precise function87. The past decade idea that 5-methylcytosine​ could provide Blueprint Epigenome have initiated such also saw major improvements in imaging a mechanism to control gene expression efforts, but it remains a reality that data are technologies as well as cell and molecular gained traction, despite little knowledge of not at everyone’s fingertips quite yet. biology, moving from the 2D space into the its genomic distribution or the associated Moreover, despite decades of steady 3D space with both organoid cell culture enzymes82. With similarly limited genomic and recently accelerated progress, many models88 and chromosome conformation information or knowledge of the players important questions remain regarding the capture approaches for exploring nuclear involved, the histone code hypothesis molecular coordination and developmental organization89. was put forward in 2000 to explain how functions of these epigenetic modifications. Another major shift included the multiple different covalent modifications For instance, cytosine methylation at gene reappreciation that membraneless of chromatin may be coordinated to direct bodies has been preserved for more than a organelles are a widespread mechanism specific regulatory functions83. Tremendous billion years of evolution and yet its precise of cellular organization90. In particular, progress has been made since, and the list function is still under investigation. How and there have been many advances in our of core epigenetic regulators that have been why did genomic methylation switch to a understanding of how condensates form discovered and characterized seems largely global mechanism in vertebrates compared and function, including for transcriptional complete84. with the selected methylation observed in regulation. Together with known properties DNA sequencing has continued to invertebrates? What is the precise function of modified histones on DNA and the dominate the past decade and contributed of this modification in each of its regulatory fact that many epigenetic regulators also to an exponential growth of genome-wide​ contexts, and how are its ubiquitously acting contain intrinsically disordered regions, it maps of all layers of regulation. In the enzymes recruited to specific sites in the is reasonable to assume that these physical early days, individual CpG sites could be genome? The latter is particularly timely properties will have a major impact on our measured by restriction enzymes, whereas given recent observations that enhancers, but understanding of chromatin. Importantly, now we have generated probably well over a also some repetitive elements, show ongoing changes in topology have been linked to trillion cytosine methylation measurements. recruitment of both de novo methylation disease91, and similar connections have been An equally astonishing number of and demethylation activity. Moreover, reported recently for condensates92. This genome-wide​ data sets have been collected extraembryonic tissues show redirected will likely be an exciting area to follow in the for transcriptomes, histone modifications, activity that shares notable similarities with coming years. transcription factor occupancy and the long observed altered DNA methylation Lastly, our research continues to be more DNA accessibility. Furthermore, the landscape found across most cancer types85. and more reliant on multidisciplinary skills, number of single-cell​ transcriptome and Lastly, it is abundantly clear that DNA with mathematics, physics, chemistry and epigenome data sets continues to grow methylation is essential for mammalian computer science playing an ever-more​ at an unprecedented pace. development; but despite us knowing this central role in biology, which will require On the basis of this overabundance of for nearly three decades, it is not clear how data across many normal and diseased and why developing knockout embryos die. cell states, for instance, we now clearly The specific developmental requirements are understand the non-random​ distribution of also largely true for many histone-modifying​ there have been many cytosine methylation across many different enzymes; however, it remains incompletely advances in our understanding organisms. These maps have helped to refine understood how exactly these modifications of how condensates form our understanding of its relationship to interact to support gene regulation. gene expression, including the realization A decade ago it seemed likely that and function, including for that only a few promoters are normally we would answer questions such as these transcriptional regulation controlled via this modification, whereas using newly gained sequencing power as

590 | October 2020 | volume 21 www.nature.com/nrg Viewpoint

some rethinking in training and institutional contained mostly ‘junk’, which sometimes in lncRNAs resembles that on a billboard organization to accomplish our goals. Going made RNA as transcriptional noise. (in which keywords and catchphrases are forward, we will need more functional The human genome is currently repeated) rather than a finely honed legal integration, which in part due to the estimated to encode nearly 60,000 lncRNAs, document (where every comma counts). aforementioned selected discoveries is ranging from several hundred to tens of Small units of RNA shapes are repeated now very tractable. In particular, more thousands of bases, that apparently do not within lncRNAs to build up the meaning refined perturbation of gene activity, which function by encoding proteins96. Studies in the lncRNA billboard, but these RNA for many chromatin regulators should be over the past decade discovered that many shapes can be rearranged in different orders separated into catalytic and regulatory lncRNAs act at the interface between or locations without affecting meaning. functions, together with readouts at multiple chromatin modification machinery and These insights have allowed scientists to levels of resolution will bring us closer to the the genome. Specific lncRNAs can act recognize lncRNA genes from different insights needed. We recently exemplified as guides, scaffolds or decoys to control species that perform the same function even this with a pipeline that explores epigenetic the recruitment of specific chromatin though the primary sequences bear little regulator mutant phenotypes at single-cell​ modification enzymes or transcription similarity100. Moreover, investigators were resolution93. From these studies, we may factors to DNA or their dismissal from able to strip down lncRNAs to their essential be able to understand how epigenetic DNA97. lncRNAs can activate as well as ‘words’, composed of these key repeating regulators interact with the environment silence genes, and these RNAs can target shapes and one-tenth​ the size of the original to influence or protect the organismal neighbouring genes as a function of lncRNA, which still functioned in vivo phenotype, connecting detailed molecular local chromosomal folding (in cis) or at a to control chromatin state over a whole genetics to classical theories of epigenetic distance throughout the genome (in trans). chromosome100,101. Finally, it is now possible phenomena. Detailed dissections of individual lncRNAs to successfully create synthetic lncRNAs. As we approach the 100-year anniversary have revealed that lncRNAs are composed By adding RNA shapes to carefully chosen of the detection of 5-methylcytosine in of modular RNA motifs that enable one RNA templates, investigators are starting to DNA94, it seems we can hope to declare at lncRNA to connect proteins that read, create designer lncRNAs that can regulate least for some layers of the epigenome that write or erase specific chromatin marks. chromatin in vivo100, suffice to partly rescue we fully understand the rules under which These findings have galvanized substantial the physiological lncRNA gene knockout102, they operate. This may enable the exploration excitement about lncRNAs; laboratories or target RNAs to specific cytotopic of more precise therapeutic interventions, for around the world are now investigating the locations within the cell103,104. instance by redirecting chromatin modifiers roles of lncRNAs in diverse systems, ranging The shift from reading to writing rather than blocking their universal catalytic from control of flowering time in plants to lncRNAs will challenge us on the technical activities, which are shared between normal mutations in human genetic disorders. front, leading to potential transformative and diseased states. Of course, looking back Nonetheless, the notable progress technologies. Current technologies for at predictions made just 10 years ago95, one to date can be viewed as anecdotal — massively parallel reporter gene assays are should expect many additional unforeseen each lncRNA is its own story. When a new built on short sequence inserts. A plan advances that are just as difficult to predict lncRNA sequence is recognized in a genome to build tens of thousands of synthetic now as they were back then. database or RNA profiling experiment, we lncRNAs will require accurate long DNA or are still in the dark about what may happen RNA synthesis. These designer sequences Long non-​coding RNAs: a time to build to the cell or organism (if anything) when will need to be placed into the appropriate Howard Chang. Long non-coding​ RNAs the lncRNA is removed. Indeed, efforts to locations in the genome and controlled to (lncRNAs) are the dominant transcriptional ‘read’ lncRNAs have been the dominant have proper developmental expression, output of many eukaryotic genomes. experimental strategy over the past two splicing pattern and RNA chemical Although studies over the past decade have decades. Systematic efforts in the ENCODE, modifications. Landmark studies using the revealed diverse mechanisms and disease FANTOM and emerging cell atlas consortia XIST lncRNA, which normally silences implications for many lncRNAs, the vast have mapped the transcriptional landscape, the second X chromosome in female cells, to majority of lncRNAs remain mysterious. transcript isoforms and, more recently, silence the ectopic chromosome 21 in Down The fundamental challenge is that we single-cell​ expression profiles of lncRNAs. syndrome cells highlight the biomedical lack the knowledge to systematically These powerful data are now combined with promise of such an approach105. transform lncRNA sequence into function. genome-scale​ CRISPR-based​ methods to As the field develops technologies for Progress in the next decade may come from inactivate tens of thousands of lncRNAs, one large-scale​ creation and testing of synthetic a paradigm shift from ‘reading’ to ‘writing’ at a time, to observe possible cell defects98,99. lncRNAs, we can rigorously test our lncRNAs. However, many challenges remain. Positive understanding of the information content Gene regulation was once thought to be hits require further exploratory studies to in the language of RNA sequences and the exclusive province of proteins. Intense define possible mechanisms of action, and shapes. The next decade promises to be efforts for disease diagnosis and treatment we lack a principled strategy to combine an exciting time for building non-coding​ focused almost entirely on protein-coding​ lncRNA knockouts to address genetic RNAs and to create entirely new tools to genes and their products, ignoring the vast redundancy and compensation. manipulate gene function for biology and majority of the genome. Even at the time A potentially fruitful and complementary medicine. of the completion of the Human Genome direction is the pivot from ‘reading’ to Project, only a handful of functional ‘writing’ long RNA scripts. On the basis of FAIR genomics to track tumorigenesis lncRNAs were known that silenced the the systematic dissection of RNA sequences Núria López-​Bigas. Cancer research is expression of neighbouring genes. Thus, and secondary structures in lncRNAs, one of the fields that has probably benefited it was widely believed that the genome we and others believe that the information the most from the technological and

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 591 Viewpoint

methodological advances of genomics. the path towards this final objective. As is In the span of less than two decades, the true of any Darwinian process, its two key field has witnessed an incredible boost features are variation and selection. Thanks we now have a much better in the generation of cancer genomic, to the past 15 years of cancer genomics, grasp of the origin of somatic epigenomic and transcriptomic data of we now have a much better grasp of the patients’ tumours, both in bulk and more origin of somatic genetic variation between genetic variation between cells recently at the single-cell​ level. My dream cells across different tissues. The study across different tissues as a cancer researcher is to have a full of the variability in the number, type and understanding of the path that cells follow genomic distribution of mutations across towards tumorigenesis. Which events in the tumours provides a window into the life Integrating genomics into medicine life of an individual, a tissue and a particular history of cells across the somatic tissues Eran Segal. The past 20 years in genomics cell lead to the malignant transformation of an individual108,109. In addition, recent have been extraordinary. We developed of some cells? Of course I do not expect studies sequencing the genome of healthy high-throughput​ sequencing and learned to have a deterministic answer, as this cells in different tissues110–112 have shown how to use it to efficiently sequence full is not a deterministic process. Instead that mutations accumulate in hundreds and genomes and measure gene expression and we should aim for a quantitative or thousands in our cells in normal conditions epigenetic marks at the genome-wide​ scale probabilistic understanding of the key over time. These studies have also detected and even at the single-cell​ level118. Using events that drive tumorigenesis. We have positive selection in some genes across these capabilities, we created unprecedented solid epidemiological evidence showing healthy tissues. Hence, positive selection catalogues of novel genomes, functional that smoking increases the probability of is a pervasive process that operates not DNA elements and non-coding​ RNAs lung cancer, exposure to the Sun raises only in tumorigenesis but also in healthy from all kingdoms of life119. But — perhaps the probability of developing melanoma tissues, where it is a hallmark of somatic with the exception of cancer120 and gene and some anticancer treatments increase development of skin, oesophagus, blood therapy for some monogenic diseases121 — the probability of secondary neoplasms. and other tissues. Take, for example, clonal genomics has yet to deliver on its promise But which specific mechanisms at the haematopoiesis: it results from a continuous to have an impact on our everyday life. molecular and cellular levels influence Darwinian evolutionary process in which For example, drugs and diagnostics are still these increases? over time (with age) some haematopoietic being developed in the traditional way, with One first clear goal of cancer genomics cells harbouring mutations in certain blood screening assays to find lead compounds is to catalogue all genes involved in development genes, such as DNMT3A for targets typically arising from animal tumorigenesis across different tissues. and TET2, outcompete other cells in the studies, without involving genomics in any Although this is a daunting task, it is actually compartment113,114. This process is part of the steps. Moreover, when the global feasible106. By analysing the mutational of normal haematopoietic development. COVID-19 pandemic hit, the genome patterns of genes across tumours, one can Problems arise only when this process gets of the spreading severe acute respiratory identify those with significant deviations out of control, leading to leukaemia in the syndrome coronavirus 2 (SARS-CoV-2)​ was from what is expected under neutrality, case of blood, or a malignant tumour in solid rapidly sequenced, but why some infected which indicates that these mutations provide tissues. Why is it only in rare cases that this individuals exhibit severe disease and others a selective advantage in tumorigenesis and ubiquitous interplay between variation and do not remains unknown. are thus driver mutations. We can imagine a selection becomes uncontrollable and results Indeed, our next challenge is to translate future in which through the systematic in full-​blown tumorigenesis? Which events, the incredible resources and technologies analysis of millions of sequenced tumour beside known tumorigenic mutations, drive developed in genomics into an improved genomes this catalogue or compendium this process? understanding of health and disease. This moves closer and closer to completion. If we have learnt something in recent improved understanding should transform For this to happen, not only do we need years, it is that virtually all tumours harbour the field of medicine to use genomics in genome sequencing to expand — this driver mutations115–117, implying that driver its transition to personalized medicine, process is already in motion in research, genomic events are necessary. However, they which promises individualized treatment clinical settings and the pharmaceutical are clearly not sufficient for tumorigenesis by targeting the right medication to the industry — but more importantly the to occur. So, what are these other triggers right person at the right time on the resulting data must be made FAIR (findable, of the tumorigenic process? What happens basis of that person’s unique profile. By accessible, interoperable and reusable)107. in the lung cells of a smoker or in the continuing to focus on more and more To this end, consortia and initiatives that haematopoietic cells of a patient treated with measurements and the creation of more promote, catalyse and facilitate the sharing chemotherapy that increases their chances to atlases and catalogues, we run the danger of of genomic data, such as the Beyond 1 become malignant? Epigenetic modifications drowning in ever-growing​ amounts of data Million Genomes consortium, the GA4GH and changes in selective constraints, such and correlative findings. Walking down this or the cBioPortal for Cancer Genomics, as evolutionary bottlenecks, for example, at path can lead to an endless endeavour, as are necessary. the time of chemotherapy, may be part of the bulk measurements can always be replaced Of note, cataloguing genes and mutations answer. with single-cell​ ones, or measures at higher involved in cancer development, albeit a For the near future, my dream is to see a temporal and spatial resolution, across more very important first step, is still far from the further increase in FAIR cancer genomics conditions and wider biological contexts. final goal of understanding how and under data to help us disentangle the step-by-​ ​step Instead, we should use genomics to which conditions they drive tumorigenesis. game of variation and selection in our tissues tackle big unanswered questions such Framing cancer development as a Darwinian that leads to tumorigenesis and likely other as what causes the variation that we see evolutionary process helps me to navigate ageing-related​ diseases. across people in phenotypes, disease

592 | October 2020 | volume 21 www.nature.com/nrg Viewpoint

susceptibility and drug responses? What is which of the associations are causal. One undergone in the past two decades, the time the relative contribution of genetic, way to address this is to wisely select the may be right to tackle them. Success can epigenetic, microbiome and environmental nature and type of the associations studied. transform genomics from being applied factors? How are their effects mediated, For example, in working with microbiome mostly in research settings to having it and what would be the effect of different data, we can move from analyses at the become an integral and inseparable part interventions? Ultimately, we should strive level of species composition to analyses of medicine. to use genomics to generate actionable and at the level of SNPs in bacterial genes. personalized insights that lead to better Such associations are more specific and CRISPR genome editing enters the clinic health. We are now at an inflexion point in more likely to be causal, as in the case of Jin-​Soo Kim. In the past several years, genomics that allows us for the first time to a SNP in the dadH bacterial gene, which genome editing has come of age128, in apply it to study human biology and realize correlated with metabolism of the primary particular because of the repurposing of these ambitious aims122. medication to treat Parkinson disease CRISPR systems. Genomic DNA can be At the cellular level, we can use iPS cells and the gut from patients125. modified in a targeted manner in vivo or from patients to derive cellular models of Another approach is to use longitudinal in vitro with high efficiency and precision, multiple diseases and prioritize treatments measurements and separation of time to potentially enabling therapeutic genome based on measuring both their cellular emulate target trials from observational editing for the treatment of both genetic and molecular response (for example, gene data126. For example, we can select distinct and non-​genetic diseases. All three types expression and ) to existing drugs subsets from the cohort that match on of programmable nucleases developed and drug combinations. We can even use several known risk factors (for example, for genome editing, namely zinc-finger​ massively parallel assays to separately measure age or body mass index) but differ on a nucleases, transcription activator-like​ the effect of each of tens of thousands of marker of interest (for example, expression effector nucleases and CRISPR nucleases, rationally designed mutations, including of a gene or presence of an epigenetic are now under clinical investigation. In the patient-​specific mutations, as we have done, mark), and compare future disease onset next several years, we will be able to learn for example, in testing the effect of all clini- or progression in these two populations. whether these genome-editing​ tools will be cally identified mutations in TP53 on cellular Similarly, retrospective analysis of baseline effective and safe enough to treat patients function123. Measuring the molecular effects multi-omic​ measurements from participants with an array of diseases, including HIV of directed mutations in genes encoding tran- in randomized clinical trials may identify infection, leukaemia, blood disorders and scription factors and signalling molecules markers that distinguish responders from hereditary blindness, heralding a new era and in other genes can reveal the underlying non-​responders and be used for patient in medicine. pathways and regulatory networks of the dis- stratification or for identifying additional If the history of the development of novel ease studied and identify putative therapeutic putative targets. drugs or treatments such as gene therapy targets. The application of such approaches Ultimately, biomarkers identified from and monoclonal antibodies is any guide, to fields that are still poorly understood, observational cohorts need to be tested the road to therapeutic genome editing is such as neurodegenerative diseases, can be in randomized clinical trials to establish likely to be bumpy but ultimately worth particularly impactful. causality and assess efficacy. In the case of travelling. Key questions related to medical But we can be much more ambitious microbial strains extracted from humans, applications of programmable nucleases and directly profile large cohorts of we may be able to skip animal testing concern their mode of delivery, specificity, human individuals using diverse ‘omics’ and go directly to human trials. In other on-​target activity and immunogenicity. assays. As molecular changes typically cases, such as when human genes are First, in vivo delivery (or direct delivery precede clinical disease manifestations, being manipulated, we will need to start into patients) of genes or mRNAs encoding longitudinal measurements coupled with with cell culture assays and animal testing programmable nucleases or preassembled clinical phenotyping have the potential of before performing clinical trials in humans. Cas9 ribonucleoproteins can be a challenge, identifying novel disease diagnostics and However, in all cases, tested omic targets given the large size of these nucleases. therapeutic targets. Indeed, biobanks that should have already shown associations in Ex vivo (or indirect) delivery is, in general, track large samples of hundreds of thousands human individuals, thus making them more more efficient than in vivo delivery but is of individuals have recently emerged and are likely to be relevant and succeed in trials, limited to cells from blood or bone marrow, proving highly informative124. However, at as is the case with drug targets for which which can be collected with ease, edited the molecular level their focus has thus far genetic evidence links them to the disease127. in vitro and transfused back into patients. been on genetics. Technological advances Beyond these scientific challenges, there Ongoing developments of nanoparticles and and cost reductions now allow us to obtain is the challenge of engaging the public viral vectors are expected to enhance and much deeper person-specific​ multi-omic​ and diverse ethnic and socio-economic​ expand in vivo genome editing in tissues or profiles that include transcriptome, groups to participate in such large-scale​ organs not readily accessible with current proteome, methylome, microbiome, multi-omic​ profiling endeavours even delivery systems, such as the brain. immune system and before we can present them with immediate Second, programmable nucleases, measurements. Having these data on benefits. We can start with incentives in the including CRISPR nucleases, can cause the same individual and at multiple time form of informational summary reports unwanted on-​target and off-target​ points can reveal which omic layer is more of the data measured and gradually move mutations, which may contribute to perturbed and informative for each disease towards carefully and responsibly conveyed oncogenesis. Several cell-based​ and cell-free​ and identify associations between molecular actionable insights as we learn more. methods have been developed to identify markers and disease. Overcoming the aforementioned genome-wide​ CRISPR off-target​ sites in The challenge in using such observational challenges is not an easy task, but with the an unbiased manner129–131. But it remains data from human cohorts is to identify breathtaking advances that genomics has a challenge to validate off-target​ activity

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 593 Viewpoint

at sites with low mutation frequencies template in human embryos136. Recurrent 5Department of Medicine, Faculty of Health Sciences, (less than 0.1%) in a population of cells, or non-recurrent​ de novo mutations are University of Cape Town, Cape Town, South Africa. 6 owing to the intrinsic error rates of responsible for the vast majority of genetic Institute of Infectious Diseases and Molecular Medicine, Faculty of Health Sciences, University current sequencing technologies. Even at diseases. Cell-free​ fetal DNA in the maternal of Cape Town, Cape Town, South Africa. on-​target sites, CRISPR–Cas9 can induce blood can be used to detect these de novo 7Center for Human Genetics and Genomics, New York unexpected outcomes such as large deletions mutations in fetuses, which are absent in University Grossman School of Medicine, New York, of chromosomal segments132. It will be the parents. Some de novo mutations are NY, USA. important to understand the mechanisms manifested even before birth, leading to 8European Molecular Biology Laboratory, Genome behind the unusual on-target​ activity and miscarriage, disability or early death after Biology Department, Heidelberg, Germany. to measure and reduce the frequencies of birth; it is often too late and inefficient 9Department of Biosystems Science and Engineering, such events. to attempt gene editing in newborns. These ETH Zurich, Basel, Switzerland. Last but not least, Cas9 and other mutations could be corrected in utero 10Department of Genome Regulation, Max Planck programmable nucleases can be using base editors or prime editors without Institute for Molecular Genetics, Berlin, Germany. immunogenic, potentially causing undesired inducing unwanted indels and without relying 11Department of Stem Cell and Regenerative Biology, innate and adaptive immune responses. on inefficient homologous recombination. Harvard University, Cambridge, MA, USA. 12 In this regard, it makes sense that initial Compared with germline editing or Institute of Chemistry and Biochemistry, Freie Universität Berlin, Berlin, Germany. clinical trials have focused on ex vivo preimplantation genetic diagnosis, in utero 13Center for Personal Dynamic Regulomes, Howard delivery of Cas9 ribonucleoproteins into editing, if proven safe and effective in the Hughes Medical Institute, Stanford University, T cells or in vivo gene editing in the eye, future, should be ethically more acceptable Stanford, CA, USA. an immunologically privileged organ. because it does not involve the creation or 14Institute for Research in Biomedicine (IRB Barcelona), Cas9 epitope engineering or novel Cas9 destruction of human embryos. The Barcelona Institute of Science and Technology, orthologues derived from non-pathogenic​ As promising and powerful as they are, Barcelona, Spain. bacteria may avoid some of the immune current versions of base editors and prime 15Research Program on Biomedical Informatics, responses, offering therapeutic modalities editors can be further optimized and Universitat Pompeu Fabra, Barcelona, Spain. for in vivo genome editing in tissues or improved. For instance, Cas9 evolved in 16Institucio ́Catalana de Recerca i Estudis Avancats, organs with little or no immune privilege. microorganisms as a nuclease rather than Barcelona, Spain. Base editing133,134 and prime editing135 a nickase. Current Cas9 nickases used for 17Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, are promising new approaches that may base editing (D10A SpCas9 variant) and Israel. overcome some of the limitations of prime editing (H840A variant) can be 18Center for Genome Engineering, Institute for Basic nuclease-​mediated genome editing. Base engineered to increase their activities and Science, Daejon, Republic of Korea. editors and prime editors are composed of specificities. In parallel, deaminase and ✉e-mail:​ [email protected]; stacey@broadinstitute. a Cas9 nickase, rather than the wild-type​ reverse transcriptase moieties in base editors org; [email protected]; ambroise. Cas9 nuclease, and a nucleobase deaminase and prime editors, respectively, can be [email protected]; aravinda.chakravarti@ and a reverse transcriptase, respectively. engineered or replaced with appropriate nyulangone.org; [email protected]; barbara.treutlein@ Because a nickase, unlike a nuclease, orthologues to increase the efficiency bsse.ethz.ch; [email protected]; howchang@ stanford.edu; [email protected]; eran. produces DNA single-strand​ breaks or and scope of genome editing. It has [email protected]; [email protected] nicks, but not double-strand​ breaks (DSBs), been shown that base editors can cause https://doi.org/10.1038/s41576-020-0272-6 base editors and prime editors are unlikely both guide RNA-dependent​ and guide Published online 24 August 2020 to induce large deletions at on-target​ sites RNA-independent​ DNA or RNA off-target​ and chromosomal rearrangements resulting mutations, raising concerns for their 1. Collins F. The director of the NIH lays out his vision of the future of medical science. Time https:// from non-​homologous end joining (NHEJ) applications in medicine. Prime editors may time.com/5709207/medical-​science-age-​of-discovery repair of concurrent on-target​ and off-target​ also cause unwanted on-target​ and off-target​ (2019). 2. The National Academies of Sciences, Engineering, and DSBs. Furthermore, when it comes to gene mutations, which must be carefully studied Medicine Organizing Committee for the International correction rather than gene disruption, these before moving on to therapeutic applications. Summit on Human Gene Editing. On human gene editing: international summit statement. The National new types of gene editors are much more Biomedical researchers are now equipped Academies of Sciences, Engineering, and Medicine efficient and ‘cleaner’ than DSB-producing​ with powerful tools for genome editing. https://www.nationalacademies.org/news/2015/12/ on-​human-gene-​editing-international-​summit- nucleases because they neither require I expect that these tools will be developed statement (2015). donor template DNA nor rely on further and applied more broadly in both 3. Centers for Disease Control and Prevention. COVID-19 in racial and ethnic minority groups. CDC https:// error-​prone NHEJ; in human cells, DSBs research and medicine in the coming years. www.cdc.gov/coronavirus/2019-ncov/need-​extra- precautions/racial-​ethnic-minorities.html (2020). are preferentially repaired by NHEJ, leading 1 ✉ 2 ✉ Amy L. McGuire , Stacey Gabriel , 4. Edwards, F., Lee, H. & Esposito, M. Risk of being killed to small insertions or deletions (indels), Sarah A. Tishkoff 3,4 ✉ , Ambroise Wonkam 5,6 ✉ , by police use of force in the United States by age, rather than by homologous recombination Aravinda Chakravarti 7 ✉ , Eileen E. M. Furlong 8 ✉ , race–ethnicity, and sex. Proc. Natl Acad. Sci. USA Barbara Treutlein 9 ✉ , Alexander Meissner 2,10,11,12 ✉ , 116, 16793–16798 (2019). involving donor DNA. 5. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on Howard Y. Chang 13 ✉ , Núria López-Bigas​ 14,15,16 ✉ , diversity. Nature 538, 161–164 (2016). Base editors and prime editors are also 17 ✉ 18 ✉ Eran Segal and Jin-​Soo Kim 6. Popejoy, A. B. et al. The clinical imperative for well suited for germline editing and in utero 1Center for Medical Ethics and Health Policy, Baylor inclusivity: race, ethnicity, and ancestry (REA) in genomics. Hum. Mutat. 39, 1713–1720 (2018). editing (that is, gene editing in the fetus), College of Medicine, Houston, TX, USA. 7. Artiga, S. & Orgera, K. Key facts on health and health which should be done with caution, in full 2Broad Institute of MIT and Harvard, Cambridge, care by race and ethnicity. Kaiser Family Foundation consideration of ethical, legal and societal MA, USA. https://www.kff.org/report-​section/key-​facts-on-​ health-and-​health-care-​by-race-​and-ethnicity-​ issues. In principle, CRISPR–Cas9 can 3Department of Genetics, University of Pennsylvania, coverage-access-​to-and-​use-of-​care/ (2019). be used for the correction of pathogenic Philadelphia, PA, USA. 8. Armstrong, K., Micco, E., Carney, A., Stopfer, J. & Putt, M. Racial differences in the use of BRCA1/2 4 mutations in human embryos; however, Department of Biology, University of Pennsylvania, testing among women with a family history of breast donor DNA is seldom used as a repair Philadelphia, PA, USA. or ovarian cancer. JAMA 293, 1729–1736 (2005).

594 | October 2020 | volume 21 www.nature.com/nrg Viewpoint

9. Bonham, V. L., Callier, S. L. & Royal, C. D. Will 36. Gulsuner, S. et al. Genetics of schizophrenia in 64. Narlikar, G. J. Phase-​separation in chromatin precision medicine move us beyond race? N. Engl. the South African Xhosa. Science 367, 569–573 organization. J. Biosci. 45, 5 (2020). J. Med. 374, 2003–2005 (2016). (2020). 65. Cao, J. et al. The single-​cell transcriptional landscape 10. Karczewski, K. J. et al. The mutational constraint 37. Shriner, D. & Rotimi, C. N. Whole-​genome-sequence-​ of mammalian organogenesis. Nature 566, 496–502 spectrum quantified from variation in 141,456 based haplotypes reveal single origin of the sickle (2019). humans. Nature 581, 434–443 (2020). allele during the Holocene wet phase. Am. J. Hum. 66. Farrell, J. A. et al. Single-​cell reconstruction 11. Khera, A. V. et al. Genome-​wide polygenic scores Genet. 102, 547–556 (2018). of developmental trajectories during zebrafish for common diseases identify individuals with risk 38. Wu, Y. et al. Highly efficient therapeutic gene editing embryogenesis. Science 360, eaar3131 (2018). equivalent to monogenic mutations. Nat. Genet. 50, of human haematopoietic stem cells. Nat. Med. 25, 67. Cusanovich, D. A. et al. The cis-​regulatory dynamics 1219–1224 (2018). 776–783 (2019). of embryonic development at single-​cell resolution. 12. The SIGMA Type 2 Diabetes Consortium. Sequence 39. Geard, A. et al. Clinical and genetic predictors of renal Nature 555, 538–542 (2018). variants in SLC16A11 are a common risk factor for dysfunctions in sickle cell anaemia in Cameroon. Br. J. 68. Tang, F. et al. mRNA-​Seq whole-​transcriptome type 2 diabetes in Mexico. Nature 506, 97–101 Haematol. 178, 629–639 (2017). analysis of a single cell. Nat. Methods 6, 377–382 (2014). 40. Lebeko, K. et al. Targeted genomic enrichment (2009). 13. Wetterstrand, K. A. DNA sequencing costs: data from and massively parallel sequencing identifies novel 69. Camp, J. G., Platt, R. & Treutlein, B. Mapping human the NHGRI genome sequencing program (GSP). nonsyndromic hearing impairment pathogenic cell phenotypes to genotypes with single-​cell genomics. National Human Genome Research Institute https:// variants in Cameroonian families. Clin. Genet. 90, Science 365, 1401–1405 (2019). www.genome.gov/sequencingcostsdata (2019). 288–290 (2016). 70. Lein, E., Borm, L. E. & Linnarsson, S. The promise of 14. Wall, J. D. et al. The GenomeAsia 100K Project 41. Genovese, G. et al. Association of trypanolytic ApoL1 spatial transcriptomics for neuroscience in the era enables genetic discoveries across Asia. Nature 576, variants with kidney disease in African Americans. of molecular cell typing. Science 358, 64–69 (2017). 106–111 (2019). Science 329, 841–845 (2010). 71. Satija, R., Farrell, J. A., Gennert, D., Schier, A. F. 15. Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-​ 42. Sierra, B. et al. OSBPL10, RXRA and lipid metabolism & Regev, A. Spatial reconstruction of single-​cell gene Omics for Precision Medicine (TOPMed) Consortium confer African-​ancestry protection against dengue expression data. Nat. Biotechnol. 33, 495–502 whole genome sequences improves imputation quality haemorrhagic fever in admixed Cubans. PLoS Pathog. (2015). and detection of rare variant associations in admixed 13, e1006220 (2017). 72. Achim, K. et al. High-​throughput spatial mapping African and Hispanic/Latino populations. PLoS Genet. 43. Wonkam, A. & de Vries, J. Returning incidental of single-​cell RNA-seq​ data to tissue of origin. 15, e1008500 (2019). findings in African genomics research. Nat. Genet. Nat. Biotechnol. 33, 503–509 (2015). 16. Bycroft, C. et al. The UK Biobank resource with 52, 17–20 (2020). 73. Cao, J. et al. Comprehensive single-​cell transcriptional deep phenotyping and genomic data. Nature 562, 44. Provine, W. B. The Origins of Theoretical Population profiling of a multicellular organism. Science 357, 203–209 (2018). Genetics (University of Chicago Press, 1971) 661–667 (2017). 17. Gaziano, J. M. et al. Million veteran program: a mega-​ 45. Fisher, R. A. The correlation between relatives on 74. Cleary, B., Cong, L., Cheung, A., Lander, E. S. biobank to study genetic influences on health and the supposition of Mendelian inheritance. Trans. R. & Regev, A. Efficient generation of transcriptomic disease. J. Clin. Epidemiol. 70, 214–223 (2016). Soc. Edinb. 52, 399–433 (1918). profiles by random composite measurements. Cell 18. Nagai, A. et al. Overview of the BioBank Japan 46. Altenburg, E. & Muller, H. J. The genetic basis of 171, 1424–1436 e1418 (2017). Project: study design and profile. J. Epidemiol. 27, truncate wing – an inconstant and modifiable 75. Cao, J. et al. Joint profiling of chromatin accessibility S2–S8 (2017). character in Drosophila. Genetics 5, 1–59 (1920). and gene expression in thousands of single cells. 19. Denny, J. C. et al. Systematic comparison of phenome-​ 47. Morton, N. E. Analysis of family resemblance. I. Science 361, 1380–1385 (2018). wide association study of electronic medical record Introduction. Am. J. Hum. Genet. 26, 318–330 76. Kester, L. & van Oudenaarden, A. Single-​cell data and genome-​wide association study data. (1974). transcriptomics meets lineage tracing. Cell Stem Cell Nat. Biotechnol. 31, 1102–1110 (2013). 48. Visscher, P. M. et al. 10 Years of GWAS discovery: 23, 166–179 (2018). 20. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The missing biology, function and translation. Am. J. Hum. Genet. 77. Stuart, T. & Satija, R. Integrative single-​cell analysis. diversity in human genetic studies. Cell 177, 1080 101, 5–22 (2017). Nat. Rev. Genet. 20, 257–272 (2019). (2019). 49. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded 78. Zhu, C., Preissl, S. & Ren, B. Single-​cell multimodal 21. Martin, A. R. et al. Clinical use of current polygenic view of complex traits: from polygenic to omnigenic. omics: the power of many. Nat. Methods 17, 11–14 risk scores may exacerbate health disparities. Cell 169, 1177–1186 (2017). (2020). Nat. Genet. 51, 584–591 (2019). 50. Emison, E. S. et al. A common, sex-​dependent 79. Anzalone, A. V., Koblan, L. W. & Liu, D. R. Genome 22. McQuillan, M. A., Zhang, C., Tishkoff, S. A. & Platt, A. mutation in a putative RET enhancer underlies editing with CRISPR-​Cas nucleases, base editors, The importance of including ethnically diverse Hirschsprung disease susceptibility. Nature 434, transposases and prime editors. Nat. Biotechnol. populations in studies of quantitative trait evolution. 857–863 (2005). (2020). Curr. Opin. Genet. Dev. 62, 30–35 (2020). 51. Maurano, M. T. et al. Systematic localization of 80. Loeffler, D. et al. Asymmetric lysosome inheritance 23. Sohail, M. et al. Polygenic adaptation on height is common disease-​associated variation in regulatory predicts activation of haematopoietic stem cells. overestimated due to uncorrected stratification in DNA. Science 337, 1190–1195 (2012). Nature 573, 426–429 (2019). genome-wide​ association studies. eLife 8, e39702 52. Davidson, E. Emerging properties of animal gene 81. Schmidt, F., Cherepkova, M. Y. & Platt, R. J. (2019). regulatory networks. Nature 468, 911–920 (2010). Transcriptional recording by CRISPR spacer acquisition 24. Crawford, N. G. et al. Loci associated with skin 53. Chatterjee, S. et al. Enhancer variants synergistically from RNA. Nature 562, 380–385 (2018). pigmentation identified in African populations. Science drive dysregulation of the RET gene regulatory 82. Holliday, R. & Pugh, J. E. DNA modification 358, eaan8433 (2017). network in Hirschsprung disease. Cell 167, 355–368 mechanisms and gene activity during development. 25. Gasperini, M., Tome, J. M. & Shendure, J. Towards a (2016). Science 187, 226–232 (1975). comprehensive catalogue of validated and target-​ 54. Segal, E., Raveh-​Sadka, T., Schroeder, M., Unnerstall, U. 83. Strahl, B. D. & Allis, C. D. The language of covalent linked human enhancers. Nat. Rev. Genet. 21, & Gaul, U. Predicting expression patterns from histone modifications. Nature 403, 41–45 (2000). 292–310 (2020). regulatory sequence in Drosophila segmentation. 84. Jambhekar, A., Dhall, A. & Shi, Y. Roles and regulation 26. Racimo, F., Sankararaman, S., Nielsen, R. & Nature 451, 535–540 (2008). of histone methylation in animal development. Huerta-Sánchez, E. Evidence for archaic adaptive 55. Chakravarti, A. & Turner, T. N. Revealing rate-​limiting Nat. Rev. Mol. Cell Biol. 20, 625–641 (2019). introgression in humans. Nat. Rev. Genet. 16, steps in complex disease biology: The crucial 85. Smith, Z. D. et al. Epigenetic restriction of 359–371 (2015). importance of studying rare, extreme-​phenotype extraembryonic lineages mirrors the somatic 27. Skoglund, P. & Mathieson, I. Ancient genomics families. Bioessays 38, 578–586 (2016). transition to cancer. Nature 549, 543–547 (2017). of modern humans: the first decade. Annu. Rev. 56. Lancaster, M. A. et al. Cerebral organoids model 86. Jinek, M. et al. A programmable dual-​RNA-guided Genomics Hum. Genet. 19, 381–404 (2018). human brain development and microcephaly. Nature DNA endonuclease in adaptive bacterial immunity. 28. Vicente, M. & Schlebusch, C. M. African population 501, 373–379 (2013). Science 337, 816–821 (2012). history: an ancient DNA perspective. Curr. Opin. 57. Rothman, J. & Jarriault, S. Developmental plasticity 87. Nabet, B. et al. The dTAG system for immediate and Genet. Dev. 62, 8–15 (2020). and cellular reprogramming in caenorhabditis elegans. target-​specific protein degradation. Nat. Chem. Biol. 29. Sherman, R. M. et al. Assembly of a pan-​genome from Genetics 213, 723–757 (2019). 14, 431–441 (2018). deep sequencing of 910 humans of African descent. 58. Porrello, E. R. et al. Transient regenerative potential of 88. Clevers, H. Modeling development and disease with Nat. Genet. 51, 30–35 (2019). the neonatal mouse heart. Science 331, 1078–1080 organoids. Cell 165, 1586–1597 (2016). 30. Durvasula, A. et al. Recovering signals of ghost archaic (2011). 89. Dekker, J., Marti-​Renom, M. A. & Mirny, L. A. introgression in African populations. Sci. Adv. 12, 59. Mir, M., Bickmore, W., Furlong, E. E. M. & Narlikar, G. Exploring the three-​dimensional organization of eaax5097 (2020). Chromatin topology, condensates and gene regulation: genomes: interpreting chromatin interaction data. 31. Skov, L. et al. The nature of Neanderthal introgression shifting paradigms or just a phase? Development 146, Nat. Rev. Genet. 14, 390–403 (2013). revealed by 27,566 Icelandic genomes. Nature 582, dev182766 (2019). 90. Banani, S. F., Lee, H. O., Hyman, A. A. & Rosen, M. K. 78–83 (2020). 60. Ghavi-Helm,​ Y. et al. Highly rearranged chromosomes Biomolecular condensates: organizers of cellular 32. Adeyemo, A. A. et al. ZRANB3 is an African-​specific reveal uncoupling between genome topology and gene biochemistry. Nat. Rev. Mol. Cell Biol. 18, 285–298 type 2 diabetes locus associated with beta-​cell mass expression. Nat. Genet. 51, 1272–1282 (2019). (2017). and insulin response. Nat. Commun. 10, 3195 (2019). 61. Despang, A. et al. Functional dissection of the 91. Lupianez, D. G. et al. Disruptions of topological 33. Cohen, J. et al. Low LDL cholesterol in individuals Sox9-Kcnj2 locus identifies nonessential and chromatin domains cause pathogenic rewiring of African descent resulting from frequent nonsense instructive roles of TAD architecture. Nat. Genet. 51, of gene-enhancer​ interactions. Cell 161, 1012–1025 mutations in PCSK9. Nat. Genet. 37, 161–165 (2005). 1263–1271 (2019). (2015). 34. Gurdasani, D. et al. Uganda genome resource enables 62. Hnisz, D., Shrinivas, K., Young, R. A., Chakraborty, A. K. 92. Basu, S. et al. Unblending of transcriptional insights into population history and genomic discovery & Sharp, P. A. A phase separation model for condensates in human repeat expansion disease. in Africa. Cell 179, 984–002.e36 (2019). transcriptional control. Cell 169, 13–23 (2017). Cell 181, 1062–1079 e1030 (2020). 35. Gurdasani, D. et al. Genomics of disease risk in 63. Shrinivas, K. et al. Enhancer features that drive 93. Grosswendt, S. et al. Epigenetic regulator function globally diverse populations. Nat. Rev. Genet. 20, formation of transcriptional condensates. Mol. Cell through mouse gastrulation. Nature 584, 102–108 520–535 (2019). 75, 549–561 e547 (2019). (2020).

NAture RevIeWS | GENETiCS volume 21 | October 2020 | 595 Viewpoint

94. Johnson, T. B. & Coghill, R. D. Researches on 114. Jaiswal, S. et al. Age related clonal hematopoiesis 136. Ma, H. et al. Correction of a pathogenic gene mutation pyrimidines. C111. The discovery of 5-methyl-​cytosine associated with adverse outcomes. N. Engl. J. Med. in human embryos. Nature 548, 413–419 (2017). in tuberculinic acid, the nucleic acid of the tubercle 371, 2488–2498 (2014). bacillus. J. Am. Chem. Soc. 47, 2838–2844,47 115. Sabarinathan, R. et al. The whole-​genome panorama Acknowledgements (1925). of cancer drivers. Preprint at bioRxiv https://doi.org/ A.C. acknowledges that the ideas in his contribution were 95. Heard, E. et al. Ten years of genetics and genomics: 10.1101/190330 (2017). developed through studies on Hirschsprung disease and what have we achieved and where are we heading? 116. Pich, O. et al. The mutational footprints of cancer thanks the many trainees who have contributed to this work Nat. Rev. Genet. 11, 723–733 (2010). therapies. Nat. Genet. 51, 1732–1740 (2019). over the past 5 years. A.L.M. acknowledges A. Gutierrez, 96. Quinn, J. J. & Chang, H. Y. Unique features of long 117. Campbell, P. J. et al. Pan-​cancer analysis of whole K. Kostick, G. Lazaro, M. Majumder, K. Munoz, S. Pereira, non-​coding RNA biogenesis and function. Nat. Rev. genomes. Nature 578, 82–93 (2020). H. Smith and P. Zuk for feedback. A.M. thanks D. Hnisz, Genet. 17, 47–62 (2016). 118. Rozenblatt-Rosen,​ O., Stubbington, M. J. T., Regev, A. Z. D. Smith, J. Charlton and H. Kretzmer for feedback and the 97. Kopp, F. & Mendell, J. T. Functional classification & Teichmann, S. A. The Human Cell Atlas: from vision Max Planck Society for funding. A.W. is supported by NIH and experimental dissection of long noncoding RNAs. to reality. Nature 550, 451–453 (2017). awards U54HG009790, U01HG009716, U01HG007459 and Cell 172, 393–407 (2018). 119. ENCODE Project Consortium. An integrated U24HL135600, and Wellcome Trust award H3A/18/001, 98. Liu, S. J. et al. CRISPRi-​based genome-​scale encyclopedia of DNA elements in the human genome. and states that the funders had no role in study design, and identification of functional long noncoding RNA loci Nature 489, 57–74 (2012). analysis, decision to publish or preparation of the manuscript. in human cells. Science 355, eaah7111 (2017). 120. Damodaran, S. et al. Cancer Driver Log (CanDL): B.T. acknowledges J. G. Camp for helpful discussions. E.E.M.F. 99. Rubin, A. J. et al. Coupled single-​cell CRISPR catalog of potentially actionable cancer mutations. is very grateful to A. Ephrussi, M. Mir, M. Perino, screening and epigenomic profiling reveals causal J. Mol. Diagn. 17, 554–559 (2015). Y. Kherdjemil, T. Pollex and S. Secchia for useful comments. gene regulatory networks. Cell 176, 361–376.e17 121. High, K. A. & Roncarolo, M. G. Gene therapy. N. Engl. E. E. M. F is supported by European Research Council (2019). J. Med. 381, 455–464 (2019). (Advanced Grant) agreement no. 787611 (DeCRyPT). E.S. is 100. Quinn, J. J. et al. Rapid evolutionary turnover 122. Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: supported by grants from the European Research Council and underlies conserved lncRNA-​genome interactions. challenges and promises of big data in healthcare. the Israel Science Foundation. H.Y.C. is supported by NIH Genes. Dev. 30, 191–207 (2016). Nat. Med. 26, 29–38 (2020). RM1-HG007735​ and R35-CA209919.​ H.Y.C. is an investiga- 101. Kirk, J. M. et al. Functional classification of long 123. Kotler, E. et al. A systematic p53 mutation library links tor of the Howard Hughes Medical Institute. J.-​S.K. is sup- non-coding RNAs by k-​mer content. Nat. Genet. differential functional impact to cancer mutation ported by the Institute for Basic Science (IBS-​R021-​D1). 50, 1474–1482 (2018). pattern and evolutionary conservation. Mol. Cell 71, N.L-​B. acknowledges funding from the European Research 102. Carter, A. C. et al. Spen links RNA-​mediated 873 (2018). Council (Consolidator Grant 682398), the Spanish Ministry endogenous retrovirus silencing and X chromosome 124. Swanson, J. M. The UK Biobank and selection bias. of Economy and Competitiveness (SAF2015-66084-​R, inactivation. eLife 9, e54508 (2020). Lancet 380, 110 (2012). European Regional Development Fund) and the Asociación 103. Lubelsky, Y. & Ulitsky, I. Sequences enriched in Alu 125. Maini Rekdal, V., Bess, E. N., Bisanz, J. E., Española Contra el Cáncer (GC16173697BIGA). S.A.T. is repeats drive nuclear localization of long RNAs in Turnbaugh, P. J. & Balskus, E. P. Discovery and funded by NIH grants R35 GM134957-01 and NIAMS human cells. Nature 555, 107–111 (2018). inhibition of an interspecies gut bacterial pathway for R01AR076241-01A1 and American Diabetes Association 104. Shukla, C. J. et al. High-​throughput identification levodopa metabolism. Science 364, eaau6323 (2019). Pathway to Stop Diabetes grant #1-19-​VSN-02. of RNA nuclear enrichment sequences. EMBO J. 126. Hernán, M. A. & Robins, J. M. Using big data to 37, e98452 (2018). emulate a target trial when a randomized trial is not Competing interests 105. Czerminski, J. T. & Lawrence, J. B. Silencing Trisomy available. Am. J. Epidemiol. 183, 758–764 (2016). H.Y.C. is a co-​founder of Accent Therapeutics and Boundless 21 with XIST in neural stem cells promotes neuronal 127. Nelson, M. R. et al. The support of human genetic Bio and an advisor of 10x Genomics, Arsenal Biosciences and differentiation. Dev. Cell 52, 294–308 e3 (2020). evidence for approved drug indications. Nat. Genet. Spring Discovery. J.-S.K.​ is a co-founder​ of and holds stock in 106. Martínez-​Jiménez, F. et al. A compendium of 47, 856–860 (2015). ToolGen Inc. A.C., A.L.M., A.M., A.W., B.T., E.E.M.F., E.S., mutational cancer driver genes. Nat. Rev. Cancer 128. Kim, J.-S. Genome editing comes of age. Nat. Protoc. N.L.-B.,​ S.G. and S.A.T. declare no competing interests. https://doi.org/10.1038/s41568-020-0290-x (2020). 11, 1573–1578 (2016). 107. Wilkinson, M. et al. The FAIR guiding principles 129. Kim, D. et al. Digenome-​seq: genome-wide​ profiling Publisher’s note for scientific data management and stewardship. of CRISPR-Cas9​ off-​target effects in human cells. Springer Nature remains neutral with regard to jurisdictional Sci. Data 3 , 160018 (2016). Nat. Methods 12, 237–243 (2015). claims in published maps and institutional affiliations. 108. Alexandrov, L. B. et al. The repertoire of mutational 130. Tsai, S. Q. et al. GUIDE-​seq enables genome-​wide signatures in human cancer. Nature 578, 94–101 profiling of off-​target cleavage by CRISPR-​Cas (2020). nucleases. Nat. Biotechnol. 33, 187–197 (2015). Related links 109. Gonzalez-Perez,​ A., Radhakrishnan, S. & 131. Wienert, B. et al. Unbiased detection of CRISPR Beyond 1 Million Genomes: https://b1mg-​project.eu/ Lopez-​Bigas, N. Local determinants of the mutational off-targets in vivo using DISCOVER-​Seq. Science 364, Blueprint epigenome: https://www.blueprint-​epigenome.eu/ landscape of the human genome. Cell 177, 101–114 286–289 (2019). cBioportal for Cancer Genomics: https://www.cbioportal. (2019). 132. Kosicki, M., Tomberg, K. & Bradley, A. et al. Repair of org/ 110. Martincorena, I. et al. High burden and pervasive double-strand​ breaks induced by CRISPR-​Cas9 leads enCoDe: https://www.encodeproject.org/ positive selection of somatic mutations in normal to large deletions and complex rearrangements. Global Alliance for Genomics and Health: https://www. human skin. Science 348, 880–886 (2015). Nat. Biotechnol. 36, 765–771 (2018). ga4gh.org/ 111. Martincorena, I. et al. Somatic mutant clones colonize 133. Komor, A. C. et al. Programmable editing of a target gnomAD: https://gnomad.broadinstitute.org/ the human esophagus with age. Science 362, base in genomic DNA without double-​stranded DNA Gtex: https://www.gtexportal.org/home/ 911–917 (2018). cleavage. Nature 533, 420–424 (2016). GwAS Catalog: https://www.ebi.ac.uk/gwas 112. Yokoyama, A. et al. Age-​related remodelling of 134. Nishida, K. et al. Targeted nucleotide editing using H3Africa: https://h3africa.org oesophageal epithelia by mutated cancer drivers. hybrid prokaryotic and vertebrate adaptive immune Roadmap epigenomics project: http://www. Nature 565, 312–317 (2019). systems. Science 353, aaf8729 (2016). roadmapepigenomics.org/ 113. Genovese, G. et al. Clonal hematopoiesis and 135. Anzalone, A. V. et al. Search-​and-replace genome blood-cancer risk inferred from blood DNA sequence. editing without double-​strand breaks or donor DNA. N. Engl. J. Med. 371, 2477–2487 (2014). Nature 576, 149–157 (2019). © Springer Nature Limited 2020

596 | October 2020 | volume 21 www.nature.com/nrg