<<

Constructing the Scientific Population in the Diversity and 1000 Genome Projects Joseph Vitti

I. Introduction: Populations Coming into Focus

In November 2012, some eleven years after the publication of the first draft sequence of a , an article published in reported a new ‘map’ of the human genome – created from not one, but 1,092 individuals. For many researchers, however, what was compelling was not the number of individuals sequenced, but rather the fourteen worldwide populations they represented. Comparisons that could be made within and among these populations represented new possibilities for the scientific study of .

The paper – which has been cited over 400 times in the subsequent year – was the output of the first phase of the 1000 Project, one of several international research consortia launched with the intent of identifying and cataloguing such variation.

With the project’s phase three data release, anticipated in early spring 2014, the sample size will rise to over 2500 individuals representing twenty-six populations. Each individual’s full sequence data is made publicly available online, and is also preserved through the establishment of immortal lines, from which DNA can be extracted and distributed. With these developments, population-based science has been made genomic, and scientific conceptions of human populations have begun to crystallize (see appendix).

Such extensive biobanking and databasing of human populations is remarkable for a number of reasons, not least among them the socially charged terrain that such an enterprise inevitably must navigate. While the 1000 Genomes Project (1000G) has been relatively uncontroversial in its reception, predecessors such as the Human Genome

Diversity Project (HGDP), first conceived in 1991, faced greater difficulty. Indeed, the latter is perhaps better known for the contentious dialogue it launched (e.g. Lock, 1994) than for its actual scientific accomplishments (e.g. Rosenberg et al., 2002).

1 Though critiques of the HGDP were broad in scope, many of these were united by a discomfort with the population as an object of scientific study: such methodologies threaten to reify, legitimate or otherwise confront human difference, giving rise to sharp tension with liberal values of universality and egalitarianism. These concerns included the worries that such research would fuel discriminatory ideologies, that vulnerable groups would be exploited, and that western practices of informed consent did not and could not accommodate the collectives that were to become the subjects of research.

As the overarching nature of these concerns suggests, many commentators are leery of human population as an enterprise – not simply with its proposed implementation in the HGDP (Reardon, 2011). Against this background, the lack of controversy surrounding 1000G demands some explanation. Particularly striking is the cessation of dialogue surrounding the population as an object of scientific study: whereas such concerns were paramount in the dialogue throughout the 90s and early surrounding the HGDP, by the time 1000G was underway population-level issues were not even mentioned in the relevant bioethical literature (Knoppers et al., 2012; Via et al.,

2010). In this paper, I take a comparative approach between these two projects in order to make sense of their differing public receptions and the stabilization of the human population as an object of scientific study.

I focus on these two projects in particular because of their visibility as public undertakings with international government sponsorship, driven by scientists at elite institutions. While they share the goal of identifying human genetic differences, the comparison between them is informative for understanding their divergent receptions, and indicates a shift in the ways that researchers hypostatize and engage with populations.

2 It should be noted that differing social context is doubtlessly a major contributor to the difference in the two projects’ public receptions. After all, the HGDP held the public’s attention most strongly in the 1990s, whereas the ’s draft sequence was published in 2001(Lander et al.). Genome science occupied a very different place in popular culture in 2008 when the 1000 Genomes Project was proposed.

Additionally, the 1000 Genomes Project was preceded by the (similarly uncontroversial)

HapMap Project, and indeed can be seen as an extension thereof, with their parallel goals of identifying genetic variants at population frequencies of >1% and >5%, respectively.

Such considerations make it understandable that 1000G should enjoy a less critical reception than the HGDP.

Nonetheless, the way that actors in the HGDP and 1000G Projects deploy scientific conceptions of human populations, as well as the way they interact with and/or treat in the groups that said scientific populations are intended to represent, differ meaningfully. While I do not take an evaluative approach in this paper – that is, I do not take a stance on the question of whether 1000G sufficiently addressed (or could sufficiently address at all) the social and ethical concerns that the HGDP drew attention to – it is my hope that this paper will provide a starting ground for such assessments.

In what follows, I chronicle points of comparison between the HGDP and 1000G through analysis of scholarly publications and official project documentation. I begin by looking at the two projects holistically, noting differences in their methods and objectives

(Section II). I then further examine one major shift in motivations for global genomic diversity studies, the shift from population to medical genetics (Section III). In

Section IV, I examine group consent and community engagement, and I conclude by discussing how all of these considerations influence the way scientific populations are constructed.

3 II. From Archiving Diversity to Cataloguing Variation

At an abstract level, both the HGDP and 1000G were created for the same end: to characterize genetic differences among human individuals and groups. Luigi Luca

Cavalli-Sforza, the population geneticist who first proposed (and became emblematic of) the HGDP, described the task of “understanding when and how patterns of diversity were formed” as the project’s “ultimate goal” (Cavalli-Sforza, 2005). Similar language appears in the 1000G’s documentation, which describes the project’s directive as “measur[ing] the extent of human genetic variation systematically” (“About the 1000 Genomes

Project”).

Substantively, understanding human genetic difference means providing answers to questions such as, which sites or regions in human genomes are polymorphic (i.e., have multiple variants that may differ from person-to-person)? What are the frequencies of such variants in different endogamous groups (i.e., populations) – are there, for example, any such variants that are ‘fixed’ (i.e., have reached 100% frequency) within some groups but remain polymorphic in others? When examining polymorphic sites that are near to each other on the same chromosome, which variants at those sites tend to appear together (i.e., what are present) in different populations – and what are the frequencies of these groupings of variants? Answering such foundational questions, researchers argued, would create opportunities for answering applied questions in such fields as and (see section II).

This shared end – identifying and characterizing human genetic differences – notwithstanding, contrasting the ways that goal is motivated and achieved in the two projects demonstrates a shift in ways of apprehending said differences. Broadly, this shift can be described as a move towards making difference appear more benign. In this section, I describe two instances of this shift. The first concerns the substantive output of the projects, in which the formation of cell lines became less important than the creation

4 of internet databases (from ‘archive’ to ‘catalogue’). The second concerns the motivations for the projects, in which there was a move away from discourse that was overtly ‘otherizing’ (i.e., in the Foucauldian sense) towards a more universalist conception of human difference (from ‘diversity’ to ‘variation’).

The Material and the Informatic

As many scholars have noted (e.g. Thacker, 2005), genomes are ontologically precarious entities that can exist as collections of molecules on the one hand and as collections of characters (i.e. as sequence) on the other. While these ‘wet’ and ‘dry’ instantiations of genomes are taken to represent the same unified entity, they present different challenges and different opportunities for project management and analysis.

Accordingly, both the HGDP and 1000G had to confront the question of how best to represent genomes.

While both projects included the preservation of genomes in their ‘wet’ form,

1000G put much greater public emphasis on the creation of online databases. This shift can be seen first as a product of technological advances: the development of much more sophisticated sequencing technology by the early 2000s, together with the advances in the internet-based management of large genetic datasets, made ‘dry’ representations of genomes feasible. Nevertheless, the fact that cell lines remained the “basis of the HGDP” even in the mid-2000s suggests that enabling technologies are not the only forces at play

(Cavalli-Sforza, 2005). Rather, the shift of attention from wet to dry representations of genomes answers to social needs as well: although genomic data is considered highly sensitive, it is nonetheless data and so occupies a more sterile place in public imaginings than living tissue.

Genomes are preserved in ‘wet’ form through the establishment of immortalized cell lines. Certain cells extracted from biological tissue (e.g. lymphoblastoid cells - the precursors to white blood cells – derived from a blood sample) can be treated in such a

5 way that they will grow and divide indefinitely. For example, cells may be ‘transformed’ with Epstein-Barr virus, wherein the virus integrates its genetic material into the host cell’s genome and confers a cancer-like state of indefinite growth and division. Genomic material can also be amplified and collected through a process called polymerase chain reaction (PCR), but HGDP proponents argued that cell lines accrued fewer genetic errors

() and were more easily renewable – in addition to which, multiple population- based cell lines had already been initiated through other projects and were potentially available at the time of the HGDP’s conception (Bowcock et al., 1991).

The HGDP originally proposed the non-profit Coriell Institute for Medical

Research in Camden, New Jersey as a facility to maintain such cell lines (Cavalli-Sforza et al., 1991). Ultimately, the controversy in the US surrounding the HGDP would cause project leaders to look elsewhere, and the collection found a home at the Centre d’Étude du Polymorphisme Humain (CEPH, Human Study Center), a genome research center in Paris. The later 1000G cell line collection would end up being housed at Coriell.

As previously indicated, the HGDP’s focus on the establishment cell lines continued even as sequencing technology and internet databases came of age. Ostensibly, the HGDP saw its raison d’être as the creation of an archive of genomes, which would be preserved in their natural (wet) form, rather than the extraction and publication of sequence information from those genomes. Such labor would to be left to researchers, who could request up to one microgram per sample for the cost of shipping (Cavalli-

Sforza, 2005). This enduring emphasis on the preservation of actual biological tissue taken directly from human subjects may be one reason that the label ‘vampire project’ gained so much traction among critics of the HGDP (Rural Advancement Foundation

International, 1993). Although the broader grievance indicated by the term ‘vampire

6 project’ is the potential extraction of profit and other forms of capital from indigenous communities, the origin of such capital in human blood renders the project distributing on a visceral level for many.

By contrast, the 1000G project from its initiation described its raison d’être as the output of data. While the establishment of cell lines is nevertheless employed as a means to this end, project planners put considerably less public emphasis on this aspect than on the creation of sequence databases. The NIH’s initial announcement of the project, for example, immediately describes plans to make all project data “swiftly available to the worldwide scientific community through freely accessibly public databases;” the 2000- word announcement contains no mention of immortalized cell lines (“International

Consortium Announces the 1000 Genomes Project,” 2008).

The shifting emphasis from the molecular to the informatic reflects scientific as well as social needs. With respect to the former, advances in sequencing and computing technology make internet-based sequence information more easily accessible, as well as more relevant. At the same time, this shift also serves to make 1000G more palatable to its public audiences. Whereas the establishment of immortal cell lines from blood tissue rings of science fiction to many, the construction of sequence databases appears a much more sober endeavor. By placing greater emphasis on its databasing endeavors, 1000G was able to leave the construction of cell lines as a technical detail rather than a critical project goal, and more easily avert charges of vampirism. Human differences, it was decided, would be rendered as sterile data (a ‘catalog’) rather than living cells (an

‘archive’).

Scrapping Salvage Genetics

The idea of the HGDP as an archive becomes even more apt when one considers the early motivations for the project. The project proposal, published in the journal

Genomics, opens by describing a “vanishing opportunity to preserve the record of our

7 genetic heritage” (Cavalli-Sforza et al., 1991). Globalization was causing historically isolated groups to admix, meaning that their genetic information would lose much of its value from the viewpoint of population genetics (see Section III). Thus, HGDP planners presented themselves as acting at least partially from conservationist motives – what has been described as “salvage genetics” (El-Haj, 2012).

Ostensibly, such rhetoric functions to summon a sense of urgency, with the hope of cultivating favor (and funding) from audiences. The paper explains that the genetic diversity of living people contains “clues to the of our ,” but “the gate to preserve these clues is closing rapidly… it would be tragically ironic if, during the same decade that biological tools for understanding our species were created, major opportunities for applying them were squandered” (Cavalli-Sforza et al., 1991).

Unsurprisingly, this focus on the preservation of genetic material from isolated groups in the name of preserving “our common heritage” met with resistance (ibid.).

Construing such a rationale for the project had an otherizing effect, promoting a dichotomy (as well as a power imbalance) between conservationist-scientists on the one hand and subaltern communities on the other. Moreover, the invitation of representatives from non-profit cultural organizations to speak on population issues in early planning meetings, rather than indigenous people themselves, gave further grounds to the charge that the HGDP viewed populations as objects of scientific studies rather than respected partners in executing those studies (Reardon, 2001).

The HGDP abandoned the rhetoric of conservation by the mid-nineties. Such motivations had given rise to the perception that the HGDP’s explicit goal was to perform research on populations in danger of extinction, an idea that followed, according to project planners, from media misunderstandings of the project and of population admixture (“Diversity project,” 1996). Responding to the ongoing controversy, the US

8 National Science Foundation (NSF) and National Institutes of Health (NIH) commissioned an investigation into the HGDP, meant to evaluate a broad range of questions, including the scientific value, technical feasibility, and socioethical dimensions of the project. The US National Research Council’s report recommended collecting samples from large, rather than small or isolated populations, though the report did not specify whether this recommendation came on scientific grounds, ethical grounds, or both (National Research Council (US) Committee on Human Genome Diversity, 1997).

With ongoing discussions in the 90s problematizing salvage genetics, the 1000G did not invoke such considerations in explaining their motivations. Whereas the HGDP described an “intense scrutiny of human diversity”, the 1000G project recast human differences as variation, a word that connotes difference as something normal and shared rather than distinguishing (Cavalli-Sforza et al., 1991; Patterson, 2011). The move away from overtly otherizing rationales and the corresponding efforts to make human genetic difference more benign can be seen not only in such linguistic shifts, but also in the proposed applications of such studies – as discussed in the following section.

III. From Population Genetics to Medical Genetics

Why should one be interested in genetic variation to begin with? There is ample reason to be suspicious of such interest; establishing differences between groups can be a mechanism of establishing control over said groups – what Foucault has called biopolitics

(Foucault, 1997). Conversely, a shared sense of humanity is understood to be the foundation of liberal egalitarian value systems; the fact that 99% of human genomes are identical across individuals is widely touted as reminder of such values (Pollard, 2009).

With this in mind, the scientific study of genetic difference begins to appear perverse.

How did planners in the HGDP and 1000G projects justify basic research into human genetic difference?

9 As indicated in the previous section, part of the HGDP’s mandate was to preserve

(or archive) genetic diversity before globalization and population admixture rendered that diversity unintelligible. Despite the conservationist overtones of this logic, archiving genetic diversity was not, for the HGDP, an end-in-itself; rather it was a means to advance the field of population genetics (Cavalli-Sforza et al., 1991). Understanding genetic variation was seen as a key to understanding deeper questions about natural processes such as that are understood to shape humans, as all other species.

Although 1000G data is commonly used for similar studies, population genetics is not listed among the motivations for the project. Rather, the focus has shifted towards a far more fundamental (though no less challenging) endeavor: the association of genotypes with . Maps of genetic variation are meant to serve as a foundational resource for functional studies. Among these, medical studies – those that implicate genetic factors with clinical phenotypes, such as disease susceptibility or – readily assume a primary role in publicized documentation.

Though both projects were envisioned as resources for evolutionary as well as medical studies, the relative emphasis from the HGDP to the 1000G has shifted towards the latter. In the following section, I document this shift, unpacking claims about how studies of genetic variation can be applied in population and medical genetics. I suggest that this shift is partially responsible for broader public acceptance of 1000G, and that the disappearance of population genetics from the forefront can be seen as a response to criticisms of the HGDP.

Evolution as a Population-Level Process and the HGDP

Population genetics is a field that provides a mathematical framework for studying evolution (Hartl and Clark, 2007). Organisms change across generations – that is, evolve – because of the existence of heritable variation within endogamous groups –

10 that is, populations. Population geneticists use mathematical models to describe how such change occurs, and then compare the predictions of those models with empirical data in order to draw inferences about such historical processes as migration, admixture, and natural selection. When applied to humans, population genetics thus becomes a historical or anthropological endeavor.

In Cavalli-Sforza’s words, “population genetics has been going on since 1917.

The HGDP is an effort to make it more efficient, specific and rational” (“Diversity project,” 1996). Whereas human population geneticists had historically performed studies in a piecemeal fashion, collecting data as needed, HGDP planners envisioned a large- scale study of human populations that would furnish the data necessary for systematic investigation of evolutionary questions. The primary scientific publication put forth from the project, for example, performed a clustering analysis in order to identify similarity relationships among groups; the analysis identified (among other things) that higher-order populations tend to cluster along continental lines (Rosenberg et al., 2002).

Although the HGDP was primarily conceived as a resource for population genetics, planners described an “added benefit” that information generated from the project was “likely to prove useful in several areas of biomedical research” (Cavalli-

Sforza, 2005). Such a claim is surprising prima facie, given that no medical or phenotypic information would be collected from sample donors in order to preserve anonymity. The

NRC’s report on the project included the recognition that collecting such information (as well as genealogical information) would “greatly increase the biomedical utility” of the endeavor, but ultimately disfavored such an approach as it would “substantially increase the cost and time required to obtain samples, as well as the cost of data management and quality assurance,” with these latter concerns presumably gesturing to the logistical and

11 technical constraints imposed by questions of ethics and protection of confidentiality

(National Research Council (US) Committee on Human Genome Diversity, 1997).

How could information from the HGDP benefit medical studies? Most such claims in HGDP documentation make reference to those disease-related that were known at the time, such as CCR5, APOE and HLA, implicated in AIDS, Alzheimer’s disease, and the major histocompatibility complex (MHC), respectively. Studying population-specific frequencies of different variants at HLA, for example, could aid in transplant matching – “a benefit primarily to minority groups in developed nations”

(ibid). Such methods could also be used to estimate the incidence of diseases related to recessive variants (i.e., those genetic condition where an individual must carry two copies of the pathogenic variant in order to manifest the condition), although the relevance of such investigations to clinical care is not made explicit. The most straightforward medical application of the HGDP that was proposed was the provision of a “small but adequate and reliable control samples for association studies” (Cavalli-Sforza, 2005). Although the use of genetic diversity studies in identifying patterns of for association studies – which would later become the grounding motivation for 1000G – was indicated in governmental discussions (“Statement of Mary-Claire King of the

Human Genome Diversity Project to the National Academy of Sciences,” 1996), this idea was absent from more broadly visible project documentation.

Taken together, these suggestions of possible biomedical uses of HGDP samples instantiate what Michael Fortun has called the ‘promissory mode’ of genomics (Fortun,

2008). Although the project was designed as creating resources to advance the field of population genetics and enable the study of , planners speculated as to potential biomedical applications in order to bolster its appeal and relevance. By the mid-

2000s, however, the significance of genetic variation in association studies (and,

12 therefore, medical research) had gained broader recognition, giving 1000G planners substantially different grounds on which to advance their project.

Rare Variants and in 1000G

Unlike HGDP data, 1000G data is explicitly not to be used as controls for association studies: “no phenotypic information was collected with the samples, so we do not know what medical conditions the donors had” (Coriell Institue, n.d.). How, then, can genetic data without phenotypic information become medically relevant?

The 1000G pilot paper opens by describing the need for resources to assist in genotype- association studies (1000 Genomes Project Consortium et al., 2010).

Specifically, the genome-wide association study (GWAS), in which researchers seek genetic variants that are overrepresented in groups of people with a certain trait (e.g., a disease), has become a mainstay of medical research (Stranger et al., 2011). Querying the entire genome computationally across many people is much more efficient than historical methods of linkage mapping, which necessitate the analysis of genetic pedigrees across several generations. Whole-genome sequencing, however, remains too costly for most

GWAS – particularly because such studies necessitate samples from hundreds of individuals. Instead, researchers find an individual’s genotype at many markers that are scattered throughout the genome, which can be done using microarray technology for substantially cheaper.

With these developments in place, human genetic difference becomes clinically useful in two ways. First, by identifying how nearby genetic variants tend to be co- inherited in different populations (that is, identifying haplotypes – or, equivalently, looking for patterns of linkage disequilibrium or LD) researchers can develop methods that allow probabilistic inference of an individual’s genotype at a certain site, given their genotype at another site nearby - a process called imputation. Understanding population- specific patterns of LD allow researchers to impute genotypes for analysis, meaning that

13 more extensive analyses can be done with sparser data. Additionally, this process is particularly critical for researchers using different microarrays that look at different genetic sites; as 1000G samples group co-chair Aravinda Chakravarti explains,

“imputation is what allows many more people’s work to become comparable” (quoted in

Patterson, 2011).

The second way that systematic studies of genetic variation aid in functional and medical studies is by enabling the creation of better microarrays, which in turn can be used for studies such as GWAS described above. Materially, improving microarrays can involve the creation of probes to detect previously unknown low-frequency variants; the

1000G project’s specific goal is to detect most variants with frequencies as low as 1%

(“About the 1000 Genomes Project”). Improving microarrays can also involve the creation of population-specific arrays, as well as more versatile arrays that minimize ascertainment bias (Via et al., 2010). This bias, which results from the overrepresentation of certain groups in the studies from which arrays are designed, has been noted as an ongoing issue for the scientific community (Rosenberg et al., 2010).

Thus, part of the rationale of the 1000G project is to amend the “underrepresentation of genomes sequenced in nonwhite and non-Asian populations” in medical studies

(Patterson, 2011).

Against this background, population genetics has not disappeared; 1000G planners note that the project “will also improve our knowledge of genomic configurations that were shaped by evolutionary processes,” though this application is considered to be a secondary pursuit, in a reversal of those priorities listed by Cavalli-

Sforza in support of the HGDP several years prior (Via et al., 2010). Indeed, with projects like the International HapMap and 1000G, human population genomics has been gaining momentum, and studies reporting natural selection on a wide variety of traits

14 have become commonplace (Akey, 2009). These studies stand in stark opposition to the previous conjecture, made in defense of the HGDP, that differences between human groups are “perhaps entirely the result of climatic adaptation and random drift” (Cavalli-

Sforza, 2005). In addition to making data available as a resource for the population genetics (much as the HGDP had envisioned), the project consortium performs population genetic analyses for inclusion in project publications (1000 Genomes Project

Consortium et al., 2010; The 1000 Genomes Project Consortium, 2012).

Thus, from the HGDP to the 1000G, new applications for human genetic difference have become visible. Those applications further serve the social function of making human genetic difference relevant, and give population-based genetic studies a less contentious directive than studying human evolution or preserving genetic diversity.

Meanwhile, population genetic analyses have receded from the spotlight and ceased to be invoked as a rationale for studying difference, but have flourished in large part due to the data provided by projects like 1000G.

IV. From Group Consent to Community Engagement

Although the promise of medical application has upstaged population genetics, projects like 1000G remain decidedly population-based. One major question that was provoked by the HGDP proposal was whether such groups as human populations truly existed in any meaningful or biological sense (Lock, 1994). Project organizers took up this question in earnest, with Allan Wilson (second author on the 1991 paper that proposed the project) arguing for an alternative method of profiling diversity. Rather than fixing population identity and then finding individuals, Wilson proposed grid-sampling, in which individuals worldwide would be sampled at equal distances from each other, without reference to any ethnic identity. Wilson argued that such a method would allow researchers “‘to be explorers, finding out what is there, rather than presuming we know what a population is” (quoted in Reardon, 2001). Many argued that humans were not

15 distributed as separate evolutionary lineages (clades), but rather that there was fundamentally one human populations that could be described as separated along gradients (clines).

The group ultimately decided to implement a population-based approach, which was argued to “broaden the universe of testable hypotheses” (National Research Council

(US) Committee on Human Genome Diversity, 1997). Although critics warned of dangerous social ramifications of hypostatizing populations, Cavalli-Sforza maintained on the other hand that “ignoring the social realities of populations also seems dangerous” inasmuch as it would, for example, “generate a badly biased history of the whole world”

(Cavalli-Sforza, 2005). Such an approach was also easier from a logistical perspective, notwithstanding the project capacity that would then have to be allotted to ensure that the rights and interests of populations as well as individuals were protected.

For the HGDP, as well as for genome science and international governance more broadly, the social and ethical terrain surrounding the use of populations as objects of scientific study was terra incognita. Following the NRC report recommending that the project proceed in 1997, these issues came to the forefront of public discussion

(Knoppers, 2003), whereas by the time 1000G was gaining momentum they had largely receded (Via et al., 2010). In what follows, I consider the disappearance of the genetic population from widespread social concern, while analyzing the ways in which the

HGDP and 1000G came to engage with the populations at the centers of their projects.

The HGDP and the Call for Group Consent

The HGDP’s driving interest in evolution situated it in a precarious position with respect to the populations it sought to study. The demands of population genetics required that the populations identified be, in a sense, ‘pure’ – admixture between groups meant that historically isolated genomes would come together and mesh, and thus threatened to render genomic diversity unintelligible. In the original proposal for the project,

16 organizers described that “the populations that can tell us the most about our evolutionary past are those that have been isolated for some time, are likely to be linguistically and culturally distinct, and are often surrounded by geographic barriers” (Cavalli-Sforza et al., 1991). Such scientific motivations notwithstanding, an emphasis on isolated populations becomes immediately troubling when considered against the historical background of colonialism. Jenny Reardon articulates how such an approach is implicitly

Eurocentric as well as otherizing: the HGDP “imagines a ‘population’ corded off from modernity; what makes these ‘populations’ genetically interesting is precisely what defines them as not a part of modern Western social orders” (Reardon, 2001).

Project planners were conscientious of the socially asymmetric relationship they were engaging in – though the fact that discussions continued for some ten years before any scientific publication emerged reflects that the resulting issues proved more difficult to resolve than was initially believed. After discussing the need and motivations for a population-based study of human genetic diversity, the 1991 project proposal goes on to note:

“Among these very informative groups have been many peoples historically vulnerable to exploitation by outsiders. Hence, asking for samples alone, without consideration of a population’s needs for medical treatment and other benefits, will inevitably lead to the same sense of exploitation and abandonment experienced by survivors of Hiroshima and Nagasaki. It will be essential to integrate the study of peoples with response to their related needs.” (Cavalli- Sforza et al., 1991)

An indication of the awareness of the ‘needs’ of these groups furnished a starting point for discussions. However, as many others quickly pointed out, what was particularly troubling about this statement was the lack of awareness of other, arguably more important, dimensions of engaging populations. For example, the 1991 proposal lacked any acknowledgment of the collective autonomy of the communities it was seeking to study, and their subsequent right to determine, as a group, whether to participate – and to

17 participate not simply as research subjects but as collaborators (Greely, 2001). The

HGDP’s imagined points of discussion for the ethics of engaging populations – the question of benefit-sharing from any potential patents or other commercial ventures resulting from the project, for example (“Diversity project,” 1996) – already assumed that these populations would assume a subsidiary role. For many audiences, the inapt comparison of these populations to the victims of atomic bombings only served to demonstrate that planners had not given sufficient consideration to the social realities of the project they were proposing, and solidified a perception of the project’s mentality as otherizing.

By 1993, such attitudes had inspired protest from groups such as the Rural

Advancement Foundation International. These groups objected to organizers’ perceptions of who could speak on behalf of indigenous people: planners invited representatives of

Cultural Survival Enterprises, Inc. and the World Resources Institute, rather than representatives of any of the proposed populations themselves. It was at this point in project planning that ideas of group consent took hold, with John Moore, a social-cultural anthropologist from the University of Florida, cautioning against the “attack on the autonomy of the population” that failure to obtain some form of group consent would constitute (quoted in Reardon, 2001).

As the ensuing academic discussion would indicate (Knoppers, 2003), the actual implementation of group consent was far from straightforward. Critical questions – such as, how can representative voices for a collective be identified, and how can the group- level ramifications of participation in population-based research be predicted – proved unanswerable except on a case-by-case basis. Heterogeneous deployment of group consent meant that it could be described to the public in increasingly vague terms, with some commentators likening it to a ‘slow code’ – a medical euphemism for a half-hearted

18 attempt to resuscitate a patient when such an attempt is perceived to be futile (Juengst,

2003). In apparent recognition of the tension between Western legal conceptions of consent (which assumes the existence of a rational autonomous agent, rather than a collective) and the realities of social group identities, the term ‘community engagement’ gradually came to replace ‘group consent’ as the ethical standard for population-based research.

Ironically, the extended dialogue surrounding group consent and community engagement in the 90s had the ultimate effect of concealing the specifics of how HGDP researchers interacted with populations. By the time that new sample collections were launched in the early 2000s, the HGDP had for the most part receded from public attention – allowing some collections to proceed with “less ethical oversight than was proposed” (Macer, 2003). The 2005 paper on the project’s status included a discussion on ethical, legal and social issues; it mentioned confidentiality, anonymity, informed consent, subject awareness of intended data usage and the importance of “conforming with the legal needs of each country,” though it did not reference community engagement

(Cavalli-Sforza, 2005). In some ways, this is unsurprising, as the document cites the 1997

NRC report as its ethical guide; the report’s argument was that “such a survey [of genetic diversity], if performed to protect the rights of individual donors, does merit federal funding” (Schull et al., 1997; emphasis added).

Engaging and Labeling Populations in 1000G

The recession of group consent from public dialogue can be observed in the

1000G project as well. In the project’s 2010 ethics paper, the importance of protecting individual rights remains central whereas group-level concerns are not addressed – except inasmuch as it is acknowledged that the Genetic Information Nondiscrimination Act

(GINA) and other related protections are specific to the US and that project planners must consider the relevant local legal systems (Via et al., 2010). The implication of such an

19 omission is that such considerations have been addressed, although primary material describing the implementation of community engagement in the 1000G project is similarly sparse - albeit with exceptions. The Phase 1 paper, for example, describes that requests for samples are screened on the basis of intended research use, and that each community receives regular reports regarding how such use – though it should be noted that such protections do not apply to sequence data, which is open-access (The 1000

Genomes Project Consortium, 2012). In addition, the paper describes the provision of funds in support of educational and outreach activities related to biomedical research, although numbers are not given.

Although community engagement occurs behind closed doors, the 1000G project does seek to address some of the concerns of studying populations in the documentation of sample use. In particular, they emphasize the use of standardized population descriptors, recognizing that the way groups are named and identified in genetic studies has scientific, cultural, and social consequences (Coriell Institue, “Guidelines for

Referring to the Populations in Publications and Presentations”). They describe the importance of respect for local norms of the communities and an acknowledgment of cultural heterogeneity. Speaking of ethics, they note that “precision is part of the obligation of researchers to participants” (ibid). On the one hand, describing human groups in a way that is too specific runs the risk of singling out those communities and

“imply[ing] that those communities are somehow genetically unique, of special interest, or very different from their close neighbors.” On the other, overgeneralization of human groups:

“…could erroneously lead those who interpret data from studies that use the samples to equate ethnicity or ancestral geography with race (an imprecise and in large part socially constructed category, which has very different meanings in various parts of the world). This could reinforce social and historical stereotypes, and lead to group stigmatization and discrimination.” (ibid.)

20 For all of these reasons, the project enforces the standardization of population descriptors.

Those provided (Figure 1) typically invoke geographic or ethnic labels, and have the site of sampling specified. The documentation notes that, after introducing populations using the full descriptors, it then becomes acceptable for researchers to then use provided shorthand labels or abbreviations.

The consortium recognizes that while necessary, of course, rigorous naming practices are not sufficient to prevent the (mis)use of project data to draw problematic conclusions or fuel repugnant ideologies. This may be one reason why, among those uses of project data, the consortium opts not to be listed as an author on population comparison studies (“1000 Genomes Data and Sample Information”).

V. Constructing Scientific Populations

How has the constitution of the scientific population changed from the HGDP to

1000G? As Margaret Lock and others have noted, the population, “far from being a readily definable natural fact, is a contested and pliable concept, created to assist in the answer of specific questions and hypotheses” (1994). The types of questions being asked

– or, at least, publicly emphasized – have shifted, and so has the scientific population. For the HGDP, questions of evolution drove the construction of populations of

“anthropological interest – that is, those that were in place before the great disaporas started in the fifteen and sixteenth centuries, when navigation of the ocean became possible,” (Cavalli-Sforza, 2005). By contrast, this emphasis on isolated populations with fixed and recognizable identities has been explicitly abandoned by the 1000G, wherein the goal is “not to define populations in an anthropological sense, but to collect samples that addressed the project’s biomedical goals while recognizing the complexities of local populations and how they define themselves” (The 1000 Genomes Project Consortium,

2012).

21 The scientific population has gained recognition as a vulnerable entity, and for that reason the collectives asked to serve in this capacity no longer represent historically isolated groups to be archived, but are larger communities that are taken to be representative of globally shared patterns of variation. Ways of studying the scientific population have changed: owing to the existence of online sequence databases, researchers need not physically interact with members of a population or even tissue samples in order to perform research.

Most crucially, however, populations are believed to have taken on a form of autonomy as research participants, although the status of this identity is not well documented. If the comparison in the receptions of the HGDP and 1000G Projects is any indication, there has been a decrease in public demand for accountability in this endeavor. It will only become clear whether, and to what extent, populations are knowingly able to constitute themselves as scientific subjects when greater transparency in the implementation of ‘community engagement’ is achieved.

22 References

1000 Genomes Data and Sample Information. . Accessed 11/30/13. 1000 Genomes Project Consortium, Abecasis, G.R., Altshuler, D., Auton, A., Brooks, L.D., Durbin, R.M., Gibbs, R.A., Hurles, M.E., McVean, G.A., 2010. A map of human genome variation from population-scale sequencing. Nature 467, 1061– 1073. About the 1000 Genomes Project. . Accessed 11/30/13. Akey, J.M., 2009. Constructing genomic maps of positive selection in humans: where do we go from here? Genome Res. 19, 711–722. Bowcock, A.M., Kidd, J.R., Mountain, J.L., Hebert, J.M., Carotenuto, L., Kidd, K.K., Cavalli-Sforza, L.L., 1991. Drift, admixture, and selection in human evolution: a study with DNA polymorphisms. Proc. Natl. Acad. Sci. U. S. A. 88, 839–843. Cavalli-Sforza, L.L., 2005. The Human Genome Diversity Project: past, present and future. Nat. Rev. Genet. 6, 333–340. Cavalli-Sforza, L.L., Wilson, A.C., Cantor, C.R., Cook-Deegan, R.M., King, M.-C., 1991. Call for a worldwide survey of human genetic diversity: A vanishing opportunity for the Human Genome Project. Genomics 11, 490–491. Coriell Institue. Guidelines for Referring to the Populations in Publications and Presentations. . Accessed 12/1/13. Diversity project: Cavalii-Sforza answers his critics, 1996. . Nature 381, 14–14. El-Haj, N.A., 2012. The Genealogical Science: The Search for Jewish Origins and the Politics of Epistemology. University of Chicago Press. Fortun, M., 2008. Promising genomics: Iceland and deCODE Genetics in a world of speculation. University of Press, Berkeley. Foucault, M., 1997. The Birth of Biopolitics: Lectures at the College de France, 1978- 1979. Palgrave Macmillan, New York. Greely, H.T., 2001. Informed Consent and Other Ethical Issues in Human Population Genetics. Annu. Rev. Genet. 35, 785–800. Hartl, D.L., Clark, A.G., 2007. Principles of population genetics. Sinauer Associates, Sunderland, Mass. International Consortium Announces the 1000 Genomes Project, 2008. . Accessed 12/3/13. Juengst, E., 2003. Community Engagement in Genetic Research: The “Slow Code” of Research Ethics?, in: Populations and Genetics: Legal and Socio-Ethical Perspectives. Martinus Nijhoff Publishers. Knoppers, B.M., 2003. Populations and Genetics: Legal and Socio-Ethical Perspectives. Martinus Nijhoff Publishers. Knoppers, B.M., Zawati, M.H., Kirby, E.S., 2012. Sampling populations of humans across the world: ELSI issues. Annu. Rev. Genomics Hum. Genet. 13, 395–413. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C., Baldwin, J., Devon, K., Dewar, K., Doyle, M., FitzHugh, W., Funke, R., Gage, D., Harris, K., Heaford, A., Howland, J., Kann, L., Lehoczky, J., LeVine, R., McEwan, P., McKernan, K., Meldrim, J., Mesirov, J.P., Miranda, C., Morris, W., Naylor, J., Raymond, Christina, Rosetti, M., Santos, R., Sheridan, A., Sougnez, C., Stange-Thomann,

23 N., Stojanovic, N., Subramanian, A., Wyman, D., Rogers, J., Sulston, J., Ainscough, R., Beck, S., Bentley, D., Burton, J., Clee, C., Carter, N., Coulson, A., Deadman, R., Deloukas, P., Dunham, A., Dunham, I., Durbin, R., French, L., Grafham, D., Gregory, S., Hubbard, T., Humphray, S., Hunt, A., Jones, M., Lloyd, C., McMurray, A., Matthews, L., Mercer, S., Milne, S., Mullikin, J.C., Mungall, A., Plumb, R., Ross, M., Shownkeen, R., Sims, S., Waterston, R.H., Wilson, R.K., Hillier, L.W., McPherson, J.D., Marra, M.A., Mardis, E.R., Fulton, L.A., Chinwalla, A.T., Pepin, K.H., Gish, W.R., Chissoe, S.L., Wendl, M.C., Delehaunty, K.D., Miner, T.L., Delehaunty, A., Kramer, J.B., Cook, L.L., Fulton, R.S., Johnson, D.L., Minx, P.J., Clifton, S.W., Hawkins, T., Branscomb, E., Predki, P., Richardson, P., Wenning, S., Slezak, T., Doggett, N., Cheng, J.-F., Olsen, A., Lucas, S., Elkin, C., Uberbacher, E., Frazier, M., Gibbs, R.A., Muzny, D.M., Scherer, S.E., Bouck, J.B., Sodergren, E.J., Worley, K.C., Rives, C.M., Gorrell, J.H., Metzker, M.L., Naylor, S.L., Kucherlapati, R.S., Nelson, D.L., Weinstock, G.M., Sakaki, Y., Fujiyama, A., Hattori, M., Yada, T., Toyoda, A., Itoh, T., Kawagoe, C., Watanabe, H., Totoki, Y., Taylor, T., Weissenbach, J., Heilig, R., Saurin, W., Artiguenave, F., Brottier, P., Bruls, T., Pelletier, E., Robert, C., Wincker, P., Rosenthal, A., Platzer, M., Nyakatura, G., Taudien, S., Rump, A., Smith, D.R., Doucette-Stamm, L., Rubenfield, M., Weinstock, K., Lee, H.M., Dubois, J., Yang, H., Yu, J., Wang, J., Huang, G., Gu, J., Hood, L., Rowen, L., Madan, A., Qin, S., Davis, R.W., Federspiel, N.A., Abola, A.P., Proctor, M.J., Roe, B.A., Chen, F., Pan, H., Ramser, J., Lehrach, H., Reinhardt, R., McCombie, W.R., Bastide, M. de la, Dedhia, N., Blöcker, H., Hornischer, K., Nordsiek, G., Agarwala, R., Aravind, L., Bailey, J.A., Bateman, A., Batzoglou, S., Birney, E., Bork, P., Brown, D.G., Burge, C.B., Cerutti, L., Chen, H.-C., Church, D., Clamp, M., Copley, R.R., Doerks, T., Eddy, S.R., Eichler, E.E., Furey, T.S., Galagan, J., Gilbert, J.G.R., Harmon, C., Hayashizaki, Y., Haussler, D., Hermjakob, H., Hokamp, K., Jang, W., Johnson, L.S., Jones, T.A., Kasif, S., Kaspryzk, A., Kennedy, S., Kent, W.J., Kitts, P., Koonin, E.V., Korf, I., Kulp, D., Lancet, D., Lowe, T.M., McLysaght, A., Mikkelsen, T., Moran, J.V., Mulder, N., Pollara, V.J., Ponting, C.P., Schuler, G., Schultz, J., Slater, G., Smit, A.F.A., Stupka, E., Szustakowki, J., Thierry-Mieg, D., Thierry-Mieg, J., Wagner, L., Wallis, J., Wheeler, R., Williams, A., Wolf, Y.I., Wolfe, K.H., Yang, S.-P., Yeh, R.-F., Collins, F., Guyer, M.S., Peterson, J., Felsenfeld, A., Wetterstrand, K.A., Myers, R.M., Schmutz, J., Dickson, M., Grimwood, J., Cox, D.R., Olson, M.V., Kaul, R., Raymond, Christopher, Shimizu, N., Kawasaki, K., Minoshima, S., Evans, G.A., Athanasiou, M., Schultz, R., Patrinos, A., Morgan, M.J., 2001. Initial sequencing and analysis of the human genome. Nature 409, 860–921. Lock, M., 1994. Interrogating the Human Diversity Genome Project. Soc. Sci. Med. 39, 603–606. Macer, D., 2003. Ethical Considerations in the HapMap Project: An Insider’s Personal View. Eubios J. Asian Int. Bioeth. 13, 125–127. National Research Council (US) Committee on Human Genome Diversity, 1997. Evaluating Human Genetic Diversity, The National Academies Collection: Reports funded by National Institutes of Health. National Academies Press (US), Washington (DC). Patterson, K., 2011. 1000 GENOMES: A World of Variation. Circ. Res. 108, 534–536. Pollard, K.S., 2009. What Makes Us Human?: Scientific American [WWW Document]. URL http://www.scientificamerican.com/article.cfm?id=what-makes-us-human (accessed 12.9.13).

24 Reardon, J., 2001. The Human Genome Diversity Project A Case Study in Coproduction. Soc. Stud. Sci. 31, 357–388. Reardon, J., 2011. Human Population Genomics and the Dilemma of Difference, in: Reframing Rights: Bioconstitutionalism in the Genetic Age. Rosenberg, N.A., Huang, L., Jewett, E.M., Szpiech, Z.A., Jankovic, I., Boehnke, M., 2010. Genome-wide association studies in diverse populations. Nat. Rev. Genet. 11, 356–366. Rosenberg, N.A., Pritchard, J.K., Weber, J.L., Cann, H.M., Kidd, K.K., Zhivotovsky, L.A., Feldman, M.W., 2002. Genetic Structure of Human Populations. Science 298, 2381–2385. Rural Advancement Foundation International, 1993. Patents, indigenous peoples, and human genetic diversity. . Accessed 12/3/13. Schull, W.J., Cavalli-Sforza, L.L., Bodmer, W., Dausset, J., 1997. Support for genetic diversity project. Nature 390, 221–221. Statement of Mary-Claire King of the Human Genome Diversity Project to the National Academy of Sciences, 1996. . Accessed 12/10/13. Stranger, B.E., Stahl, E.A., Raj, T., 2011. Progress and Promise of Genome-Wide Association Studies for Human Complex Trait. Genetics 187, 367–383. Thacker, E., 2005. The global genome: biotechnology, politics, and culture. MIT Press, Cambridge, Mass. The 1000 Genomes Project Consortium, 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65. Via, M., Gignoux, C., Burchard, E.G., 2010. The 1000 Genomes Project: new opportunities for research and social challenges. Genome Med. 2, 3.

25 Appendix

Figure 1. File describing population codes used in the 100 Genomes Project. Accessed from ftp.1000genomes.ebi.ac.uk/vol1/ftp/README.populations.

Figure 2.

26 Figure listing populations sampled in the Human Genome Diversity Project. (Cavalli- Sforza, 2005)

27