1 Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: P

Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: Populations Coming into Focus In November 2012, some eleven years after the publication of the first draft sequence of a human genome, an article published in Nature reported a new ‘map’ of the human genome – created from not one, but 1,092 individuals. For many researchers, however, what was compelling was not the number of individuals sequenced, but rather the fourteen worldwide populations they represented. Comparisons that could be made within and among these populations represented new possibilities for the scientific study of human genetic variation. The paper – which has been cited over 400 times in the subsequent year – was the output of the first phase of the 1000 Genomes Project, one of several international research consortia launched with the intent of identifying and cataloguing such variation. With the project’s phase three data release, anticipated in early spring 2014, the sample size will rise to over 2500 individuals representing twenty-six populations. Each individual’s full sequence data is made publicly available online, and is also preserved through the establishment of immortal cell lines, from which DNA can be extracted and distributed. With these developments, population-based science has been made genomic, and scientific conceptions of human populations have begun to crystallize (see appendix). Such extensive biobanking and databasing of human populations is remarkable for a number of reasons, not least among them the socially charged terrain that such an enterprise inevitably must navigate. While the 1000 Genomes Project (1000G) has been relatively uncontroversial in its reception, predecessors such as the Human Genome Diversity Project (HGDP), first conceived in 1991, faced greater difficulty. Indeed, the latter is perhaps better known for the contentious dialogue it launched (e.g. Lock, 1994) than for its actual scientific accomplishments (e.g. Rosenberg et al., 2002). 1 Though critiques of the HGDP were broad in scope, many of these were united by a discomfort with the population as an object of scientific study: such methodologies threaten to reify, legitimate or otherwise confront human difference, giving rise to sharp tension with liberal values of universality and egalitarianism. These concerns included the worries that such research would fuel discriminatory ideologies, that vulnerable groups would be exploited, and that western practices of informed consent did not and could not accommodate the collectives that were to become the subjects of research. As the overarching nature of these concerns suggests, many commentators are leery of human population genomics as an enterprise – not simply with its proposed implementation in the HGDP (Reardon, 2011). Against this background, the lack of controversy surrounding 1000G demands some explanation. Particularly striking is the cessation of dialogue surrounding the population as an object of scientific study: whereas such concerns were paramount in the dialogue throughout the 90s and early 2000s surrounding the HGDP, by the time 1000G was underway population-level issues were not even mentioned in the relevant bioethical literature (Knoppers et al., 2012; Via et al., 2010). In this paper, I take a comparative approach between these two projects in order to make sense of their differing public receptions and the stabilization of the human population as an object of scientific study. I focus on these two projects in particular because of their visibility as public undertakings with international government sponsorship, driven by scientists at elite institutions. While they share the goal of identifying human genetic differences, the comparison between them is informative for understanding their divergent receptions, and indicates a shift in the ways that researchers hypostatize and engage with populations. 2 It should be noted that differing social context is doubtlessly a major contributor to the difference in the two projects’ public receptions. After all, the HGDP held the public’s attention most strongly in the 1990s, whereas the Human Genome Project’s draft sequence was published in 2001(Lander et al.). Genome science occupied a very different place in popular culture in 2008 when the 1000 Genomes Project was proposed. Additionally, the 1000 Genomes Project was preceded by the (similarly uncontroversial) HapMap Project, and indeed can be seen as an extension thereof, with their parallel goals of identifying genetic variants at population frequencies of >1% and >5%, respectively. Such considerations make it understandable that 1000G should enjoy a less critical reception than the HGDP. Nonetheless, the way that actors in the HGDP and 1000G Projects deploy scientific conceptions of human populations, as well as the way they interact with and/or treat humans in the groups that said scientific populations are intended to represent, differ meaningfully. While I do not take an evaluative approach in this paper – that is, I do not take a stance on the question of whether 1000G sufficiently addressed (or could sufficiently address at all) the social and ethical concerns that the HGDP drew attention to – it is my hope that this paper will provide a starting ground for such assessments. In what follows, I chronicle points of comparison between the HGDP and 1000G through analysis of scholarly publications and official project documentation. I begin by looking at the two projects holistically, noting differences in their methods and objectives (Section II). I then further examine one major shift in motivations for global genomic diversity studies, the shift from population genetics to medical genetics (Section III). In Section IV, I examine group consent and community engagement, and I conclude by discussing how all of these considerations influence the way scientific populations are constructed. 3 II. From Archiving Diversity to Cataloguing Variation At an abstract level, both the HGDP and 1000G were created for the same end: to characterize genetic differences among human individuals and groups. Luigi Luca Cavalli-Sforza, the population geneticist who first proposed (and became emblematic of) the HGDP, described the task of “understanding when and how patterns of diversity were formed” as the project’s “ultimate goal” (Cavalli-Sforza, 2005). Similar language appears in the 1000G’s documentation, which describes the project’s directive as “measur[ing] the extent of human genetic variation systematically” (“About the 1000 Genomes Project”). Substantively, understanding human genetic difference means providing answers to questions such as, which sites or regions in human genomes are polymorphic (i.e., have multiple variants that may differ from person-to-person)? What are the frequencies of such variants in different endogamous groups (i.e., populations) – are there, for example, any such variants that are ‘fixed’ (i.e., have reached 100% frequency) within some groups but remain polymorphic in others? When examining polymorphic sites that are near to each other on the same chromosome, which variants at those sites tend to appear together (i.e., what haplotypes are present) in different populations – and what are the frequencies of these groupings of variants? Answering such foundational questions, researchers argued, would create opportunities for answering applied questions in such fields as population genetics and medicine (see section II). This shared end – identifying and characterizing human genetic differences – notwithstanding, contrasting the ways that goal is motivated and achieved in the two projects demonstrates a shift in ways of apprehending said differences. Broadly, this shift can be described as a move towards making difference appear more benign. In this section, I describe two instances of this shift. The first concerns the substantive output of the projects, in which the formation of cell lines became less important than the creation 4 of internet databases (from ‘archive’ to ‘catalogue’). The second concerns the motivations for the projects, in which there was a move away from discourse that was overtly ‘otherizing’ (i.e., in the Foucauldian sense) towards a more universalist conception of human difference (from ‘diversity’ to ‘variation’). The Material and the Informatic As many scholars have noted (e.g. Thacker, 2005), genomes are ontologically precarious entities that can exist as collections of molecules on the one hand and as collections of characters (i.e. as sequence) on the other. While these ‘wet’ and ‘dry’ instantiations of genomes are taken to represent the same unified entity, they present different challenges and different opportunities for project management and analysis. Accordingly, both the HGDP and 1000G had to confront the question of how best to represent genomes. While both projects included the preservation of genomes in their ‘wet’ form, 1000G put much greater public emphasis on the creation of online databases. This shift can be seen first as a product of technological advances: the development of much more sophisticated sequencing technology by the early 2000s, together with the advances in the internet-based management of large genetic datasets, made ‘dry’ representations of genomes feasible. Nevertheless, the fact that cell lines remained the “basis of the HGDP” even in the mid-2000s suggests that enabling technologies are not the only forces at play (Cavalli-Sforza, 2005). Rather,

1 Constructing the Scientific Population in the Human Genome Diversity and 1000 Genome Projects Joseph Vitti I. Introduction: P

Ensembl Genomes: Extending Ensembl Across the Taxonomic Space P

Rare Variant Contribution to Human Disease in 281,104 UK Biobank Exomes W 1,19 1,19 2,19 2 2 Quanli Wang , Ryan S

Mapping Our Genes—Genome Projects: How Big? How Fast?

(DDD) Project: What a Genomic Approach Can Achieve

Different Evolutionary Patterns of Snps Between Domains and Unassigned Regions in Human Protein‑Coding Sequences

Annual Scientific Report 2013 on the Cover Structure 3Fof in the Protein Data Bank, Determined by Laponogov, I

NIH-GDS: Genomic Data Sharing

Ancient DNA: a History of the Science Before Jurassic Park

Commencement Program 1977 Whitworth University

The 1000 Genomes Project

A Variant-Centric Perspective on Geographic Patterns of Human Allele Frequency Variation Arjun Biddanda, Daniel P Rice, John Novembre*

Industry Programme EMBL-EBI and Industry