1 Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: P

Total Page:16

File Type:pdf, Size:1020Kb

1 Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: P Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: Populations Coming into Focus In November 2012, some eleven years after the publication of the first draft sequence of a human genome, an article published in Nature reported a new ‘map’ of the human genome – created from not one, but 1,092 individuals. For many researchers, however, what was compelling was not the number of individuals sequenced, but rather the fourteen worldwide populations they represented. Comparisons that could be made within and among these populations represented new possibilities for the scientific study of human genetic variation. The paper – which has been cited over 400 times in the subsequent year – was the output of the first phase of the 1000 Genomes Project, one of several international research consortia launched with the intent of identifying and cataloguing such variation. With the project’s phase three data release, anticipated in early spring 2014, the sample size will rise to over 2500 individuals representing twenty-six populations. Each individual’s full sequence data is made publicly available online, and is also preserved through the establishment of immortal cell lines, from which DNA can be extracted and distributed. With these developments, population-based science has been made genomic, and scientific conceptions of human populations have begun to crystallize (see appendix). Such extensive biobanking and databasing of human populations is remarkable for a number of reasons, not least among them the socially charged terrain that such an enterprise inevitably must navigate. While the 1000 Genomes Project (1000G) has been relatively uncontroversial in its reception, predecessors such as the Human Genome Diversity Project (HGDP), first conceived in 1991, faced greater difficulty. Indeed, the latter is perhaps better known for the contentious dialogue it launched (e.g. Lock, 1994) than for its actual scientific accomplishments (e.g. Rosenberg et al., 2002). 1 Though critiques of the HGDP were broad in scope, many of these were united by a discomfort with the population as an object of scientific study: such methodologies threaten to reify, legitimate or otherwise confront human difference, giving rise to sharp tension with liberal values of universality and egalitarianism. These concerns included the worries that such research would fuel discriminatory ideologies, that vulnerable groups would be exploited, and that western practices of informed consent did not and could not accommodate the collectives that were to become the subjects of research. As the overarching nature of these concerns suggests, many commentators are leery of human population genomics as an enterprise – not simply with its proposed implementation in the HGDP (Reardon, 2011). Against this background, the lack of controversy surrounding 1000G demands some explanation. Particularly striking is the cessation of dialogue surrounding the population as an object of scientific study: whereas such concerns were paramount in the dialogue throughout the 90s and early 2000s surrounding the HGDP, by the time 1000G was underway population-level issues were not even mentioned in the relevant bioethical literature (Knoppers et al., 2012; Via et al., 2010). In this paper, I take a comparative approach between these two projects in order to make sense of their differing public receptions and the stabilization of the human population as an object of scientific study. I focus on these two projects in particular because of their visibility as public undertakings with international government sponsorship, driven by scientists at elite institutions. While they share the goal of identifying human genetic differences, the comparison between them is informative for understanding their divergent receptions, and indicates a shift in the ways that researchers hypostatize and engage with populations. 2 It should be noted that differing social context is doubtlessly a major contributor to the difference in the two projects’ public receptions. After all, the HGDP held the public’s attention most strongly in the 1990s, whereas the Human Genome Project’s draft sequence was published in 2001(Lander et al.). Genome science occupied a very different place in popular culture in 2008 when the 1000 Genomes Project was proposed. Additionally, the 1000 Genomes Project was preceded by the (similarly uncontroversial) HapMap Project, and indeed can be seen as an extension thereof, with their parallel goals of identifying genetic variants at population frequencies of >1% and >5%, respectively. Such considerations make it understandable that 1000G should enjoy a less critical reception than the HGDP. Nonetheless, the way that actors in the HGDP and 1000G Projects deploy scientific conceptions of human populations, as well as the way they interact with and/or treat humans in the groups that said scientific populations are intended to represent, differ meaningfully. While I do not take an evaluative approach in this paper – that is, I do not take a stance on the question of whether 1000G sufficiently addressed (or could sufficiently address at all) the social and ethical concerns that the HGDP drew attention to – it is my hope that this paper will provide a starting ground for such assessments. In what follows, I chronicle points of comparison between the HGDP and 1000G through analysis of scholarly publications and official project documentation. I begin by looking at the two projects holistically, noting differences in their methods and objectives (Section II). I then further examine one major shift in motivations for global genomic diversity studies, the shift from population genetics to medical genetics (Section III). In Section IV, I examine group consent and community engagement, and I conclude by discussing how all of these considerations influence the way scientific populations are constructed. 3 II. From Archiving Diversity to Cataloguing Variation At an abstract level, both the HGDP and 1000G were created for the same end: to characterize genetic differences among human individuals and groups. Luigi Luca Cavalli-Sforza, the population geneticist who first proposed (and became emblematic of) the HGDP, described the task of “understanding when and how patterns of diversity were formed” as the project’s “ultimate goal” (Cavalli-Sforza, 2005). Similar language appears in the 1000G’s documentation, which describes the project’s directive as “measur[ing] the extent of human genetic variation systematically” (“About the 1000 Genomes Project”). Substantively, understanding human genetic difference means providing answers to questions such as, which sites or regions in human genomes are polymorphic (i.e., have multiple variants that may differ from person-to-person)? What are the frequencies of such variants in different endogamous groups (i.e., populations) – are there, for example, any such variants that are ‘fixed’ (i.e., have reached 100% frequency) within some groups but remain polymorphic in others? When examining polymorphic sites that are near to each other on the same chromosome, which variants at those sites tend to appear together (i.e., what haplotypes are present) in different populations – and what are the frequencies of these groupings of variants? Answering such foundational questions, researchers argued, would create opportunities for answering applied questions in such fields as population genetics and medicine (see section II). This shared end – identifying and characterizing human genetic differences – notwithstanding, contrasting the ways that goal is motivated and achieved in the two projects demonstrates a shift in ways of apprehending said differences. Broadly, this shift can be described as a move towards making difference appear more benign. In this section, I describe two instances of this shift. The first concerns the substantive output of the projects, in which the formation of cell lines became less important than the creation 4 of internet databases (from ‘archive’ to ‘catalogue’). The second concerns the motivations for the projects, in which there was a move away from discourse that was overtly ‘otherizing’ (i.e., in the Foucauldian sense) towards a more universalist conception of human difference (from ‘diversity’ to ‘variation’). The Material and the Informatic As many scholars have noted (e.g. Thacker, 2005), genomes are ontologically precarious entities that can exist as collections of molecules on the one hand and as collections of characters (i.e. as sequence) on the other. While these ‘wet’ and ‘dry’ instantiations of genomes are taken to represent the same unified entity, they present different challenges and different opportunities for project management and analysis. Accordingly, both the HGDP and 1000G had to confront the question of how best to represent genomes. While both projects included the preservation of genomes in their ‘wet’ form, 1000G put much greater public emphasis on the creation of online databases. This shift can be seen first as a product of technological advances: the development of much more sophisticated sequencing technology by the early 2000s, together with the advances in the internet-based management of large genetic datasets, made ‘dry’ representations of genomes feasible. Nevertheless, the fact that cell lines remained the “basis of the HGDP” even in the mid-2000s suggests that enabling technologies are not the only forces at play (Cavalli-Sforza, 2005). Rather,
Recommended publications
  • Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P
    Published online 1 November 2009 Nucleic Acids Research, 2010, Vol. 38, Database issue D563–D569 doi:10.1093/nar/gkp871 Ensembl Genomes: Extending Ensembl across the taxonomic space P. J. Kersey*, D. Lawson, E. Birney, P. S. Derwent, M. Haimel, J. Herrero, S. Keenan, A. Kerhornou, G. Koscielny, A. Ka¨ ha¨ ri, R. J. Kinsella, E. Kulesha, U. Maheswari, K. Megy, M. Nuhn, G. Proctor, D. Staines, F. Valentin, A. J. Vilella and A. Yates EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK Received August 14, 2009; Revised September 28, 2009; Accepted September 29, 2009 ABSTRACT nucleotide archives; numerous other genomes exist in states of partial assembly and annotation; thousands of Ensembl Genomes (http://www.ensemblgenomes viral genomes sequences have also been generated. .org) is a new portal offering integrated access to Moreover, the increasing use of high-throughput genome-scale data from non-vertebrate species sequencing technologies is rapidly reducing the cost of of scientific interest, developed using the Ensembl genome sequencing, leading to an accelerating rate of genome annotation and visualisation platform. data production. This not only makes it likely that in Ensembl Genomes consists of five sub-portals (for the near future, the genomes of all species of scientific bacteria, protists, fungi, plants and invertebrate interest will be sequenced; but also the genomes of many metazoa) designed to complement the availability individuals, with the possibility of providing accurate and of vertebrate genomes in Ensembl. Many of the sophisticated annotation through the similarly low-cost databases supporting the portal have been built in application of functional assays.
    [Show full text]
  • Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W ­ 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S
    https://doi.org/10.1038/s41586-021-03855-y Accelerated Article Preview Rare variant contribution to human disease W in 281,104 UK Biobank exomes E VI Received: 3 November 2020 Quanli Wang, Ryan S. Dhindsa, Keren Carss, Andrew R. Harper, Abhishek N ag­­, I oa nn a Tachmazidou, Dimitrios Vitsios, Sri V. V. Deevi, Alex Mackay, EDaniel Muthas, Accepted: 28 July 2021 Michael Hühn, Sue Monkley, Henric O ls so n , S eb astian Wasilewski, Katherine R. Smith, Accelerated Article Preview Published Ruth March, Adam Platt, Carolina Haefliger & Slavé PetrovskiR online 10 August 2021 P Cite this article as: Wang, Q. et al. Rare variant This is a PDF fle of a peer-reviewed paper that has been accepted for publication. contribution to human disease in 281,104 UK Biobank exomes. Nature https:// Although unedited, the content has been subjectedE to preliminary formatting. Nature doi.org/10.1038/s41586-021-03855-y (2021). is providing this early version of the typeset paper as a service to our authors and Open access readers. The text and fgures will undergoL copyediting and a proof review before the paper is published in its fnal form. Please note that during the production process errors may be discovered which Ccould afect the content, and all legal disclaimers apply. TI R A D E T A R E L E C C A Nature | www.nature.com Article Rare variant contribution to human disease in 281,104 UK Biobank exomes W 1,19 1,19 2,19 2 2 https://doi.org/10.1038/s41586-021-03855-y Quanli Wang , Ryan S.
    [Show full text]
  • Mapping Our Genes—Genome Projects: How Big? How Fast?
    Mapping Our Genes—Genome Projects: How Big? How Fast? April 1988 NTIS order #PB88-212402 Recommended Citation: U.S. Congress, Office of Technology Assessment, Mapping Our Genes-The Genmne Projects.’ How Big, How Fast? OTA-BA-373 (Washington, DC: U.S. Government Printing Office, April 1988). Library of Congress Catalog Card Number 87-619898 For sale by the Superintendent of Documents U.S. Government Printing Office, Washington, DC 20402-9325 (order form can be found in the back of this report) Foreword For the past 2 years, scientific and technical journals in biology and medicine have extensively covered a debate about whether and how to determine the function and order of human genes on human chromosomes and when to determine the sequence of molecular building blocks that comprise DNA in those chromosomes. In 1987, these issues rose to become part of the public agenda. The debate involves science, technol- ogy, and politics. Congress is responsible for ‘(writing the rules” of what various Federal agencies do and for funding their work. This report surveys the points made so far in the debate, focusing on those that most directly influence the policy options facing the U.S. Congress, The House Committee on Energy and Commerce requested that OTA undertake the project. The House Committee on Science, Space, and Technology, the Senate Com- mittee on Labor and Human Resources, and the Senate Committee on Energy and Natu- ral Resources also asked OTA to address specific points of concern to them. Congres- sional interest focused on several issues: ● how to assess the rationales for conducting human genome projects, ● how to fund human genome projects (at what level and through which mech- anisms), ● how to coordinate the scientific and technical programs of the several Federal agencies and private interests already supporting various genome projects, and ● how to strike a balance regarding the impact of genome projects on international scientific cooperation and international economic competition in biotechnology.
    [Show full text]
  • (DDD) Project: What a Genomic Approach Can Achieve
    The Deciphering Development Disorders (DDD) project: What a genomic approach can achieve RCP ADVANCED MEDICINE, LONDON FEB 5TH 2018 HELEN FIRTH DM FRCP DCH, SANGER INSTITUTE 3,000,000,000 bases in each human genome Disease & developmental Health & development disorders Fascinating facts about your genome! –~20,000 protein-coding genes –~30% of genes have a known role in disease or developmental disorders –~10,000 protein altering variants –~100 protein truncating variants –~70 de novo mutations (~1-2 coding ie. In exons of genes) Rare Disease affects 1 in 17 people •Prior to DDD, diagnostic success in patients with rare paediatric disease was poor •Not possible to diagnose many patients with current methodology in routine use– maximum benefit in this group •DDD recruited patients with severe/extreme clinical features present from early childhood with high expectation of genetic basis •Recruitment was primarily of trios (ie The Doctor Sir Luke Fildes (1887) child and both parents) ~ 90% Making a genomic diagnosis of a rare disease improves care •Accurate diagnosis is the cornerstone of good medical practice – informing management, treatment, prognosis and prevention •Enables risk to other family members to be determined enabling predictive testing with potential for surveillance and therapy in some disorders February 28th 2018 •Reduces sense of isolation, enabling better access to support and information •Curtails the diagnostic odyssey •Not just a descriptive label; identifies the fundamental cause of disease A genomic diagnosis can be a gateway to better treatment •Not just a descriptive label; identifies the fundamental cause of disease •Biallelic mutations in the CFTR gene cause Cystic Fibrosis • CFTR protein is an epithelial ion channel regulating absorption/ secretion of salt and water in the lung, sweat glands, pancreas & GI tract.
    [Show full text]
  • Different Evolutionary Patterns of Snps Between Domains and Unassigned Regions in Human Protein‑Coding Sequences
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Springer - Publisher Connector Mol Genet Genomics (2016) 291:1127–1136 DOI 10.1007/s00438-016-1170-7 ORIGINAL ARTICLE Different evolutionary patterns of SNPs between domains and unassigned regions in human protein‑coding sequences Erli Pang1 · Xiaomei Wu2 · Kui Lin1 Received: 14 September 2015 / Accepted: 18 January 2016 / Published online: 30 January 2016 © The Author(s) 2016. This article is published with open access at Springerlink.com Abstract Protein evolution plays an important role in Furthermore, the selective strength on domains is signifi- the evolution of each genome. Because of their functional cantly greater than that on unassigned regions. In addition, nature, in general, most of their parts or sites are differently among all of the human protein sequences, there are 117 constrained selectively, particularly by purifying selection. PfamA domains in which no SNPs are found. Our results Most previous studies on protein evolution considered indi- highlight an important aspect of protein domains and may vidual proteins in their entirety or compared protein-coding contribute to our understanding of protein evolution. sequences with non-coding sequences. Less attention has been paid to the evolution of different parts within each pro- Keywords Human genome · Protein-coding sequence · tein of a given genome. To this end, based on PfamA anno- Protein domain · SNPs · Natural selection tation of all human proteins, each protein sequence can be split into two parts: domains or unassigned regions. Using this rationale, single nucleotide polymorphisms (SNPs) in Introduction protein-coding sequences from the 1000 Genomes Project were mapped according to two classifications: SNPs occur- Studying protein evolution is crucial for understanding ring within protein domains and those within unassigned the evolution of speciation and adaptation, senescence and regions.
    [Show full text]
  • Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I
    EMBL-European Bioinformatics Institute Annual Scientific Report 2013 On the cover Structure 3fof in the Protein Data Bank, determined by Laponogov, I. et al. (2009) Structural insight into the quinolone-DNA cleavage complex of type IIA topoisomerases. Nature Structural & Molecular Biology 16, 667-669. © 2014 European Molecular Biology Laboratory This publication was produced by the External Relations team at the European Bioinformatics Institute (EMBL-EBI) A digital version of the brochure can be found at www.ebi.ac.uk/about/brochures For more information about EMBL-EBI please contact: [email protected] Contents Introduction & overview 3 Services 8 Genes, genomes and variation 8 Molecular atlas 12 Proteins and protein families 14 Molecular and cellular structures 18 Chemical biology 20 Molecular systems 22 Cross-domain tools and resources 24 Research 26 Support 32 ELIXIR 36 Facts and figures 38 Funding & resource allocation 38 Growth of core resources 40 Collaborations 42 Our staff in 2013 44 Scientific advisory committees 46 Major database collaborations 50 Publications 52 Organisation of EMBL-EBI leadership 61 2013 EMBL-EBI Annual Scientific Report 1 Foreword Welcome to EMBL-EBI’s 2013 Annual Scientific Report. Here we look back on our major achievements during the year, reflecting on the delivery of our world-class services, research, training, industry collaboration and European coordination of life-science data. The past year has been one full of exciting changes, both scientifically and organisationally. We unveiled a new website that helps users explore our resources more seamlessly, saw the publication of ground-breaking work in data storage and synthetic biology, joined the global alliance for global health, built important new relationships with our partners in industry and celebrated the launch of ELIXIR.
    [Show full text]
  • NIH-GDS: Genomic Data Sharing
    NIH-GDS: Genomic Data Sharing National Institutes of Health Data type Explain whether the research being considered for funding involves human data, non- human data, or both. Information to be included in this section: • Type of data being collected: human, non-human, or both human & non-human. • Type of genomic data to be shared: sequence, transcriptomic, epigenomic, and/or gene expression. • Level of the genomic data to be shared: Individual-level, aggregate-level, or both. • Relevant associated data to be shared: phenotype or exposure. • Information needed to interpret the data: study protocols, survey tools, data collection instruments, data dictionary, software (including version), codebook, pipeline metadata, etc. This information should be provided with unrestricted access for all data levels. Data repository Identify the data repositories to which the data will be submitted, and for human data, whether the data will be available through unrestricted or controlled-access. For human genomic data, investigators are expected to register all studies in the database of Genotypes and Phenotypes (dbGaP) by the time data cleaning and quality control measures begin in addition to submitting the data to the relevant NIH-designated data repository (e.g., dbGaP, Gene Expression Omnibus (GEO), Sequence Read Archive (SRA), the Cancer Genomics Hub) after registration. Non-human data may be made available through any widely used data repository, whether NIH- funded or not, such as GEO, SRA, Trace Archive, Array Express, Mouse Genome Informatics, WormBase, the Zebrafish Model Organism Database, GenBank, European Nucleotide Archive, or DNA Data Bank of Japan. Data in unrestricted-access repositories (e.g., The 1000 Genomes Project) are publicly available to anyone.
    [Show full text]
  • Ancient DNA: a History of the Science Before Jurassic Park
    Contents lists available at ScienceDirect Studies in History and Philosophy of Biol & Biomed Sci journal homepage: www.elsevier.com/locate/shpsc Ancient DNA: a history of the science before Jurassic Park Elizabeth D. Jonesa,b,∗ a University College London, Department of Science and Technology Studies, Gower Street, London, WC1E 6BT, United Kingdom b University College London, Department of Genetics, Evolution and Environment, Gower Street, London, WC1E 6BT, United Kingdom 1. Introduction an array of actors from futurists and enthusiasts to scientists and the popular press contributed to ancient DNA's early history, and that what This history highlights the search for DNA from ancient and ex- we see as science often has its beginnings with ideas and individuals tinct organisms from the late 1970s to the mid 1980s, uncovering the that are outside the conventional confinements of the laboratory. origination and exploration of ideas that contributed to the con- Second, this article argues that from the beginning, particularly struction of this new line of research.1 Although this is the first preceding the release of Jurassic Park,thesearchforDNAfromfossils academic historical account of ancient DNA's disciplinary develop- was closely connected to the idea of bringing back extinct organisms. ment, there are other reviews and reports that outline its history.2 Ancient DNA elicited enthusiasm and speculation across different Most cite a paper published in Nature in 1984, where researchers audiences. Some imagined using DNA to study evolutionary history. reported the discovery of DNA froma140-year-oldextinctquagga,as Others speculated about its potential to resurrect long-lost species. the beginning of ancient DNA's history.
    [Show full text]
  • Commencement Program 1977 Whitworth University
    Whitworth Digital Commons Whitworth University Commencement Programs University Archives 1977 Commencement Program 1977 Whitworth University Follow this and additional works at: https://digitalcommons.whitworth.edu/commencement- programs Recommended Citation Whitworth University , "Commencement Program 1977" Whitworth University (1977). Commencement Programs. Paper 45. https://digitalcommons.whitworth.edu/commencement-programs/45 This Book is brought to you for free and open access by the University Archives at Whitworth University. It has been accepted for inclusion in Commencement Programs by an authorized administrator of Whitworth University. EIGHTY-SEVENTH SPRING COMMENCEMENT SUNDAY, MAY 15, 1977 Spokane, Washington "Friends, Iwill remember you, think of you, pray for you. And when another day is through I'll still be friends with you." THE PRELUDE THE CONFERRING OF THE GRADUATE DEGREES The Whitworth College Concert Band Edward B. Lindaman, L.H.D., Sc.D., President Richard V. Evans, DMA., Director Duncan S. Ferguson, Ph.D., Vice-President for Academic Affairs Jack W. Hatch, Vice-Chairman of the Board of Trustees Alvin B. Quail, Ed.D., Professor of Education, Director of Graduate THE PROCESSIONAL Studies March Processional Clare E. Grundman Ronald R. Short, Ph.D., Professor of Psychology, Director of M.A.A.B.S. The Whitworth College Concert Band Program Richard V. Evans, DMA., Director Glenn E. Fehler, M.Ed., Registrar THE INVOCATION A HYMN Ronald C. White, Ph.D., Chaplain Joyful, Joyful, We Adore Thee Joyful, joyful, we adore Thee, God of glory, Lord of love; Hearts unfold like flowers before Thee, Opening to the sun above. SCRIPTURE Melt the clouds of sin and sadness, Drive the dark of doubt away; Giver of immortal gladness, Fill us with the light of day.
    [Show full text]
  • The 1000 Genomes Project
    The 1000 Genomes Project: obtaining a deep catalogue of human genetic variation with new sequencing technology 2007First quarterfirstsecondfourththird20062005 quarter quarter quarter 2008 quarter Manolio, Brooks, Collins, J. Clin. Invest., May 2008 Chromosome 9p21: diabetes, coronary heart disease. Three genes, multiple SNPs 500,000 basepairs of Chr 9 (total length 109M bp) Zeggini et al, Science 2007; 316:1336-1341. After GWAS “hit”, what next? (remember, these are associations, not causes) One region (~Mb), multiple genes, or sometimes no genes (!), multiple SNPs to sort through Which is the right gene? What is the “causal” variant? The current SNP catalog is not complete – may not have the causal variant After a GWAS “hit”, what next? • One could get lucky (gene is a likely candidate based on previously known function*; a known associated SNP is a variant that prevents any gene function) • Gene expression correlates with believed function (e.g. tissue specific, disease specific) • Conservation of sequence between genomes of many mammals • Get a complete list of variants in the region, and one of them will be right. Need to sequence the associated region in many people. *CDKN: evidence for a role in islet cell growth. Also a tumor suppressor. Chromosome 9p21: diabetes, coronary heart disease. Three genes, multiple SNPs 500,000 basepairs of Chr 9 (total length 109M bp) Good bet on the gene, but what is the cause? 1000 Genomes Project: A resource for aiding human genetics studies • An essentially complete list of all variants in human
    [Show full text]
  • A Variant-Centric Perspective on Geographic Patterns of Human Allele Frequency Variation Arjun Biddanda, Daniel P Rice, John Novembre*
    TOOLS AND RESOURCES A variant-centric perspective on geographic patterns of human allele frequency variation Arjun Biddanda, Daniel P Rice, John Novembre* Department of Human Genetics, University of Chicago, Chicago, United States Abstract A key challenge in human genetics is to understand the geographic distribution of human genetic variation. Often genetic variation is described by showing relationships among populations or individuals, drawing inferences over many variants. Here, we introduce an alternative representation of genetic variation that reveals the relative abundance of different allele frequency patterns. This approach allows viewers to easily see several features of human genetic structure: (1) most variants are rare and geographically localized, (2) variants that are common in a single geographic region are more likely to be shared across the globe than to be private to that region, and (3) where two individuals differ, it is most often due to variants that are found globally, regardless of whether the individuals are from the same region or different regions. Our variant- centric visualization clarifies the geographic patterns of human variation and can help address misconceptions about genetic differentiation among populations. Introduction Understanding human genetic variation, including its origins and its consequences, is one of the long-standing challenges of human biology. A first step is to learn the fundamental aspects of how human genomes vary within and between populations. For example, how often do variants have an *For correspondence: allele at high frequency in one narrow region of the world that is absent everywhere else? For [email protected] answering many applied questions, we need to know how many variants show any particular geo- graphic pattern in their allele frequencies.
    [Show full text]
  • Industry Programme EMBL-EBI and Industry
    The European Bioinformatics Institute . Cambridge Industry Programme EMBL-EBI and Industry Our Industry Programme is unique. It is a forum for interaction and knowledge exchange for those working at the forefront of applied bioinformatics, in over 20 major companies with global R&D activities. The programme focuses on precompetitive collaboration, open-source software and informatics standards, which have become essential to improving efficiency and reducing costs for the world’s bioindustries. The European Bioinformatics Institute (EMBL-EBI) is a global leader in the storage, annotation, interrogation and dissemination of large datasets of relevance to the bioindustries. We help companies realise the potential of ‘big data’ by combining our unique expertise with their own R&D knowledge, significantly enhancing their ability to exploit high-dimensional data to create value for their business. We see data as a critical tool that can accelerate research and development. Our mission is to provide opportunities for scientists across sectors to make the best possible use of public and proprietary data. This can help companies reduce costs, enhance product selection and validation and streamline their decision-making processes. Companies with large R&D capacity must ensure high data quality and integrate licensed information with both public and proprietary data. At EMBL-EBI, we help companies build publicly available data into their local infrastructure so they can add proprietary and licensed information in a secure way. Going forward, we see our interactions with our industry partners growing stronger, as the quality of data continues to rise. Through our programme and efforts such as the Innovative Medicines Initiative and the Pistoia Alliance, we support pre- competitive research collaborations, promote the uptake and utility of open-source software, and steer the development of data standards.
    [Show full text]