Human Genomics of Human Immunodeficiency Virus Type 1 and Hepatitis C Virus

Genetica van infecties door het humaan immunodeficiëntievirus en het hepatitis C-virus

(met een samenvatting in het Nederlands)

Proefschrift

ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van de rector magnificus, prof. dr. G.J. van der Zwaan, ingevolge het besluit van het college voor promoties in het openbaar te verdedigen op 19 januari 2015 des middags te 4.15 uur

door

Jacques Casimir Fellay

geboren op 12 april 1974 te Martigny, Zwitserland

Promotoren: prof. dr. P.I.W. de Bakker prof. dr. F. Miedema

Contents

Chapter 1 Introduction 5

Chapter 2 A whole-genome association study of major determinants for 15 host control of HIV-1

Chapter 3 Common genetic variation and the control of HIV-1 in humans 21

Chapter 4 Common human genetic variants and HIV-1 susceptibility: a 35 genome-wide survey in a homogeneous African population

Chapter 5 A genome-wide association study of resistance to HIV 43 in highly exposed uninfected individuals with hemophilia A

Chapter 6 Association study of common genetic variants and HIV-1 53 acquisition in 6,300 infected cases and 7,200 controls

Chapter 7 Host genetic determinants of T cell responses to the MRKAd5 63 HIV-1 gag/pol/nef vaccine in the Step trial

Chapter 8 A genome-to-genome analysis of associations between human 71 genetic variation, HIV-1 sequence diversity, and viral control

Chapter 9 Genetic variation in IL28B predicts hepatitis C treatment- 89 induced viral clearance

Chapter 10 ITPA gene variants protect against anemia in patients treated 93 for chronic hepatitis C

Chapter 11 A perspective on the future of host genetics in chronic viral 99 diseases

Summary 111

Samenvatting 113

Acknowledgements 115

Curriculum Vitae 117

Publication List 119

Chapter 1

Introduction

Introduction

Chronic viral infections: the global burden of HIV and HCV diseases Chronic or persistent viral infections occur when primary infections are not cleared by the host immune response. In humans, chronic viral infections underlie a wide variety of medically important diseases that either follow directly from primary infection or may require several years to develop. Human Immunodeficiency Virus (HIV) and Hepatitis C Virus (HCV), which were only identified in the last decades of the twentieth century, are two of the most important viral causes of human disease. HIV is a major global public health issue, having claimed close to 40 million lives over the past 30 years. The WHO estimates that here are approximately 35 million people living with HIV, and that 1.5 million people died last year from HIV- related causes [1]. A central feature of HIV is its ability to target the immune system. As the virus destroys and impairs the function of immune cells, primarily of CD4+ T lymphocytes, infected individuals gradually become immunodeficient. The most advanced stage of HIV infection is Acquired Immunodeficiency Syndrome (AIDS), which is characterized by an increased susceptibility to opportunistic infections and to certain types of cancer. Since its discovery in 1989, HCV has been recognized as a major cause of liver disease worldwide. The virus can cause both acute and chronic hepatitis infection, ranging in severity from a mild illness lasting a few weeks to a serious, lifelong illness. About 150 million people globally have chronic hepatitis C infection, and a significant number of those who are chronically infected will develop liver cirrhosis or liver cancer. A total of 350,000 to 500,000 people die each year from hepatitis C-related liver diseases [2].

Human genetic variation Human genetic variation partly explains why we look different, why we may have diverse sets of talents, but also why we are more or less susceptible to most diseases. Genetic variation notably plays a key role in determining the individual response to infectious agents. Indeed, even if susceptibility or resistance to a microbial challenge is the final result of dynamic interactions between host, pathogen and environmental influences, human genetic polymorphisms have been shown to have an important impact on the outcome of various infections [3,4]. During the past decade, spectacular progress in technology and has profoundly changed our understanding of human genetic diversity, and the way human genetic research can be performed. Building on the success of the Human Genome Project, in which the sequence of our genome was first decoded [5,6], the international research community has continued working together to

catalogue the genomic variation present at the population level through the HapMap project [7-9] and the 1000 Genomes Project [10], among others. The human genome is made of more than 3 billion base pairs of deoxyribonucleic acid (DNA). The vast majority of them do not differ between any two individuals; still every human being carries about 4 million genetic variants, i.e. genomic sites that differ from the reference human genome sequence. The vast majority of these variants are single nucleotide changes, which are referred to as single nucleotide polymorphisms, or SNPs, when they are common enough in a population; that is, when they have a minor allele frequency of at least 1%. To date, more than 40 million single nucleotide variants have been described and validated in humans [11]. In addition, the genome contains a variety of structural changes: from short insertions or deletions (indels) to larger copy number variants (CNVs), variable number tandem repeat (VNTR), block substitutions, inversions, translocations, chromosomal rearrangements or even aneuploidy (an abnormal number of chromosomes). All classes of genetic variants may have an impact on human phenotypes, including health-related outcomes.

Genome-wide association studies The newly acquired knowledge about human diversity allowed the development of genome-wide association studies (GWAS) [12]. In GWAS, researchers search for statistical associations between common genetic variants present in the genome and any phenotype of interest, e.g. physical trait, disease susceptibility or drug response, hoping to identify changes that can help personalize patient care and better understand underlying pathophysiological mechanisms. To test for genome-wide associations, it is necessary to genotype large numbers of SNPs in a cost-effective manner. High-density oligonucleotide SNP arrays are used, in which hundreds of thousands of probes are distributed on a small chip, allowing for all selected SNPs to be interrogated simultaneously. A key step in the elaboration of efficient GWAS arrays was the definition of patterns of linkage disequilibrium (LD) between SNPs. LD describes the tendency of neighbouring variants to be inherited together as part of genomic blocks called haplotypes. This allows SNPs to be identified that “tag” haplotype blocks and indirectly represent multiple other SNPs (tag SNPs) [Figure 1]. The SNPs genotyped on a GWAS chip are selected to represent haplotypes across the entire genome, covering common variants for all genes and non-genic regions, and contain information about almost all common variants present in human populations. The genome-wide genotyping chips used in GWAS contain hundreds of thousands – now up to 5 millions – tag SNPs. It is this ability of tag SNPs to indirectly represent non-genotyped SNPs that makes the technology truly genome-wide and particularly attractive for non a priori, discovery genetic studies.

Figure 1: Identification of tag SNPs for association studies. (a) SNPs are identified in DNA samples from multiple individuals. (b) Adjacent SNPs that are inherited together are compiled into haplotypes. (c) Tag SNPs within haplotypes are identified that uniquely identify those haplotypes. By genotyping the three tag SNPs shown in this figure, researchers can identify which of the four haplotypes shown here are present in each individual. Figure is from the HapMap Project: http://hapmap.ncbi.nlm.nih.gov/whatishapmap.html

Over the past 8 years, the GWAS approach has been successful in identifying a large number of genetic loci associated with hundreds of human traits and diseases. Nevertheless, two limitations of the approach make it clear that more in-depth exploration of human genetic diversity, notably using sequencing technology, will be needed to fully describe the impact of genetic variation on phenotypes: [A] because it relies on tagging SNPs, the GWAS study design is not well suited to identifying the actual genetically causal variants; and [B] in most cases, even when applied to studies involving hundreds of thousands of participants, the approach failed to explain the majority of the presumed genetic component of any given trait [13].

Host genetics of HIV disease Differences between individuals exposed to HIV have been observed since the early days of the current pandemic. Susceptibility to HIV infection and natural history of the disease are highly variable, owing to the complex interplay between the virus, its human host, and the environment. Before genome-wide approaches became widely available, research in HIV host genetics had already unraveled a series of human gene variants that modulate the response to retroviral exposure [14,15], thereby partially explaining why some individuals remain uninfected even

when repeatedly exposed to HIV, or why a subset of infected patients are able to maintain a normal level of immunity after years of infection. Most of these discoveries resulted from candidate gene studies, in which allelic variants were analyzed in genes that were known or suspected to play a role in HIV pathogenesis and immune response. As a consequence, the genetic markers relevant to HIV disease that were identified are related to genes that can broadly be classified into one of these 2 categories: (1) host genes that are implicated in HIV-1 life cycle, from entry into the target cells to the different intracellular steps that are required for viral replication and propagation; (2) immune-related genes, coding for canonical innate and adaptive immune response factors, as well as for proteins involved in immune-regulation and in specific antiretroviral defense mechanisms. As discussed above, the development of genome-wide approaches represented a paradigm shift for complex trait genetics, including host genetic research. Global host influences on pathogen susceptibility could be assessed in single experiments, and individual contributions of numerous genetic variants could be ranked and compared to get a comprehensive view of the impact of human genomic variation. GWAS were thus designed to search for human variants associated with HIV control (see chapters 2-4), acquisition risk (see chapters 5- 6), and response to experimental vaccines (see chapter 7) [Figure 2].

Genetics of vaccine trials

Genetics of viral control

Genetics of resistance

Exposure(

Infec&on( Figure 2: HIV-related phenotypes for host genetic studies. Genome-wide association studies in HIV host genetics have used phenotypes reflecting resistance or susceptibility to infection, spontaneous control of viremia and disease progression in infected cohorts, and response to candidate vaccines.

Host genetics of HCV disease Both natural history of HCV disease and response to anti-HCV treatment are highly variable between patients [Figure 3]. Upon acute infection with HCV, a minority of individuals is able to mount an effective immune response that will lead to spontaneous viral clearance in less than 6 months. On the other hand, approximately 70% of HCV-infected persons will progress to chronic infection, and are at risk for the development of extrahepatic manifestations, cirrhosis, and hepatocellular carcinoma. Even in chronically infected individuals, the progression of liver disease caused by chronic HCV infection is extremely variable. Some will never develop clinical disease, while others will die from liver-related complications within a few years if left untreated. Risk factors that have been consistently associated with faster progression of liver disease include older age at infection, male gender, alcohol intake and co-infection with HIV or hepatitis B virus [16]. However, none of these factors are sufficient to replace a careful assessment of liver disease by either live biopsy or noninvasive measurement of liver disease.

Figure 3: Inter-individual variability in HCV outcomes. Schematic representation of the observed variability in natural history of HCV infection and in anti-HCV treatment response

Until very recently, the gold standard treatment for chronic HCV infection consisted in the combination of pegylated interferon and ribavirin (peg-IFN/RBV), prescribed for 12 to 18 months, and with a global success rate – defined as sustained virological response, i.e. absence of HCV RNA in plasma samples collected 6 months after treatment interruption – of less than 50%. Both viral and host factors have long been known to strongly influence the response to peg- IFN/RBV. Until recently, the viral genotype was the strongest predictor of treatment success. Permanent viral eradication could be achieved in up to 80%

of individuals infected with ‘favorable’ HCV genotypes (2 or 3), but only in approximately 40% of those infected with ‘unfavorable’ HCV genotypes (1 or 4). In addition, response rates were higher in females, in those with lower HCV RNA levels, and in Asians and Caucasians, compared with individuals of recent African ancestry [17]. The contribution of human genetic variation to treatment response, although suspected, could not be convincingly demonstrated before our discovery of the strong associations with IL28B and ITPA polymorphisms (see chapters 9-10).

Overview of the thesis This thesis presents a series of GWAS that aimed at identifying human genetic determinants of HIV and HCV infections. Chapters 2 to 4 describe three GWAS of HIV control in infected individuals. The very first GWAS performed for an infectious disease, presented in Chapter 2, marked a conceptual transition: whereas all previous genetic work involved testing candidate genes that had been implicated on the basis of functional work on immunology, the GWAS approach permitted a truly agnostic approach to gene discovery. We here confirmed the primary role of HLA class I in the control of HIV viremia, in populations of both European and African ancestry. The strongest determinant of viral load was HLA-B*57:01 in Europeans, and the closely related HLA-B*57:03 in African-Americans. In addition, we confirmed that heterozygosity for a 32 base pair deletion in CCR5 (CCR5Δ32) associate with better viral control and slower disease progression. In Chapters 5 and 6, two GWAS of HIV susceptibility / acquisition are presented, in which HIV infected individuals were compared to highly exposed, yet uninfected populations, or to large numbers of presumably healthy individuals form the general population. These studies did not identify any human genetic variant associated with HIV susceptibility, beyond the known protective effect of CCR5Δ32 homozygosity. Still, theses negative results led to the important conclusion that common genetic polymorphisms with moderate to high effect size do not appreciably impact HIV acquisition. The only GWAS that has been performed to date on human response to an experimental HIV vaccine is presented in Chapter 7. We searched for genetic determinants of the inter-individual variability observed in cellular immune responses to a T cell vaccine (the theMRKAd5 HIV-1 gag/pol/nef vaccine, used in the Step trial). We showed that the same HLA alleles are implicated in vaccine induced cellular immunity and in natural immune control, which is of relevance for vaccine design. Chapter 8 describes an innovative “genome-to-genome” approach that we developed to better understand the impact of human genetic variation on chronic viral infections. Instead of focusing on clinically defined outcomes (resistance or hyper-susceptibility to infection) or on pathogen related laboratory results (such as CD4+ T cell counts and viral load set point), as was done in previous studies,

we used variation in the viral genome as an intermediate phenotype. This method identified the areas of the human genome that put pressure on HIV, and the regions of the virus that serve to escape human control. Two GWAS performed in patients treated for chronic HCV infection are presented in Chapters 9 and 10. We showed that interferon-lambda (encoded by IL28B) is a key determinant of response to treatment with peg-interferon alpha and ribavirin, and that inosine triphosphatase deficiency is a major protective mechanism against drug-induced anemia. Finally, Chapter 11 consists in a perspective on the future of host genetics in chronic viral diseases.

References 1. http://www.who.int/mediacentre/factsheets/fs360/en/ 2. http://www.who.int/mediacentre/factsheets/fs164/en/ 3. Hill AVS, Kaslow RA, McNicholl J (2008) Genetic susceptibility to infectious diseases: New York : Oxford University Press. 4. Quintana-Murci L, Alcais A, Abel L, Casanova JL (2007) Immunology in natura: clinical, epidemiological and evolutionary genetics of infectious diseases. Nat Immunol 8: 1165-1171. 5. Lander ES, Linton LM, Birren B, al. E (2001) Initial sequencing and analysis of the human genome. Nature 409: 860-921. 6. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, et al. (2001) The sequence of the human genome. Science 291: 1304-1351. 7. The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437: 1299-1320. 8. The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851-861. 9. The International HapMap 3 Consortium (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467: 52-58. 10. The 1000 Genomes Project Consortium (2010) A map of human genome variation from population-scale sequencing. Nature 467: 1061-1073. 11. http://www.ncbi.nlm.nih.gov/projects/SNP/snp_summary.cgi 12. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, et al. (2008) Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat Rev Genet 9: 356-369. 13. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747-753. 14. O'Brien SJ, Nelson GW (2004) Human genes that limit AIDS. Nat Genet 36: 565-574. 15. Lama J, Planelles V (2007) Host factors influencing susceptibility to HIV infection and AIDS progression. Retrovirology 4: 52.

16. Thomas DL, Astemborski J, Rai RM, Anania FA, Schaeffer M, et al. (2000) The natural history of hepatitis C virus infection: host, viral, and environmental factors. JAMA 284: 450-456. 17. Kau A, Vermehren J, Sarrazin C (2008) Treatment predictors of a sustained virologic response in hepatitis B and C. J Hepatol 49: 634-651.

Chapter 2

A whole-genome association study of major determinants for host control of HIV-1

Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, Weale M, Zhang K, Gumbs C, Castagna A, Cossarizza A, Cozzi-Lepri A, De Luca A, Easterbrook P, Francioli P, Mallal S, Martinez-Picado J, Miro JM, Obel N, Smith JP, Wyniger J, Descombes P, Antonarakis SE, Letvin NL, McMichael AJ, Haynes BF, Telenti A, Goldstein DB. A whole-genome association study of major determinants for host control of HIV-1. Science. 2007 Aug 17;317(5840):944-7

REPORTS

27. We thank Environment Australian Capital Territory and This work was funded by grants from the Royal Society Supporting Online Material the director of the Australian National Botanic Gardens University Research Fellowship scheme (A.F.R. and R.M.K.), www.sciencemag.org/cgi/content/full/317/5840/941/DC1 for permission to work at Campbell Park and the Botanic the Australian Research Council (N.E.L., A.C., L.B.A., and Materials and Methods Gardens, respectively; A. Guy for help with egg assays; R.M.K.), and The Leverhulme Trust (N.E.L. and R.M.K.). References J. Gardner, G. Maurer, M. Double, H. Osmond, and Ethics approval was given by Australian National D. Green for help with field work; and E. Cunningham, University Animal Experimentation Ethics Committee 4 June 2007; accepted 6 July 2007 R. Heinsohn, and J. Wright for their valuable comments. under protocol F.BTZ.61.03. 10.1126/science.1146037

phism database number rs2395029, P=9.36 × A Whole-Genome Association 10−12). A single copy of the controlling allele was found to result in a reduction in VL of >1 log (Fig. 1); at P=9.36 × 10−12, this genome-wide Study of Major Determinants association is significant. The HCP5 gene is located 100 kb centromer- for Host Control of HIV-1 ic from HLA-B on chromosome 6 (Fig. 2), and the associated variant is known to be in high 1 2 1 3 4 Jacques Fellay, Kevin V. Shianna, * Dongliang Ge, * Sara Colombo, * Bruno Ledergerber, * linkage disequilibrium (LD) with the HLA allele 1 3 1 5 6 Mike Weale, * Kunlin Zhang, Curtis Gumbs, Antonella Castagna, Andrea Cossarizza, B*5701 (10)(r2 = 1 in our data set). This allele 7 8 9 10 Alessandro Cozzi-Lepri, Andrea De Luca, Philippa Easterbrook, Patrick Francioli, itself has the strongest-described protective 11 12 13 14 2 Simon Mallal, Javier Martinez-Picado, José M. Miro, Niels Obel, Jason P. Smith, impact on HIV-1 disease progression (11)and 3 15 16 17 Josiane Wyniger, Patrick Descombes, Stylianos E. Antonarakis, Norman L. Letvin, has been associated with low VL (12). 18 19 3 1 Andrew J. McMichael, Barton F. Haynes, Amalio Telenti, † David B. Goldstein †‡ Given the strong functional data supporting a role for HLA-B*5701 in restricting HIV-1, our Understanding why some people establish and maintain effective control of HIV-1 and others do first hypothesis is that the association observed not is a priority in the effort to develop new treatments for HIV/AIDS. Using a whole-genome association strategy, we identified polymorphisms that explain nearly 15% of the variation among 1Center for Population Genomics and Pharmacogenetics, individuals in viral load during the asymptomatic set-point period of infection. One of these is Duke Institute for Genome Sciences and Policy, Duke 2 found within an endogenous retroviral element and is associated with major histocompatibility University, Durham, NC 27710, USA. Duke Institute for – Genome Sciences and Policy, Duke University, Durham, NC allele human leukocyte antigen (HLA) B*5701, whereas a second is located near the HLA-C gene. 27710, USA. 3Institute of Microbiology, University Hospital An additional analysis of the time to HIV disease progression implicated two genes, one of which Center; and University of , 1011 Lausanne, encodes an RNA polymerase I subunit. These findings emphasize the importance of studying . 4Division of Infectious Diseases, University 5 human genetic variation as a guide to combating infectious agents. Hospital, 8091 Zürich, Switzerland. Clinic of Infectious Diseases, Istituto di Ricovero e Cura a Carattere Scientifico, San Raffaele Hospital, 20127 Milan, Italy. 6Department umans show remarkable variation in of quality-control steps resulted in the elimina- of Biomedical Sciences, Section of General Pathology, University of Modena and Reggio Emilia, School of vulnerability to infection by HIV-1 and tion of 20,251 polymorphisms (7). We applied 7 especially in the clinical outcome after methods to identify deletions and targeted copy- Medicine, 41100 Modena, Italy. Department of Primary H Care and Population Sciences, Royal Free and University infection. One striking and largely unexplained number variations and to assess whether they College Medical School, University College London, London difference is the level of circulating virus in the influenced the phenotype (7). Our core association NW3 2PF, UK. 8Institute of Clinical Infectious Diseases, Catholic University of the Sacred Heart, 00168 Rome, Italy. plasma during the nonsymptomatic phase pre- analyses focused on single-marker genotype-trend 9 ceding the progression to AIDS. This is known as tests of the quality control–passed SNPs, using Academic Department of HIV and Genitourinary Medicine, Kings College London, at Guy’s, King’s, and St. Thomas’ the viral set point and can vary among individuals linear regression (7). To control for the possibility Hospitals, London SE5 9RJ, UK. 10Service of Infectious byasmuchas4to5logs(1–6). We aimed to of spurious associations resulting from population Diseases, Department of Medicine and Service of Hospital identify human genetic differences that influence stratification, we used a modified EIGENSTRAT Preventive Medicine, University Hospital Center, 1011 11 this variation. method (7, 9). We assessed significance with a Lausanne, Switzerland. Centre for Clinical Immunology −8 and Biomedical Statistics, Royal Perth Hospital; and To define a homogeneous phenotype for ge- Bonferroni correction (P cutoff=9.3×10 ). Murdoch University, Perth, WA 6000, Australia. 12irsiCaixa netic analyses, a consortium of nine cohorts was Analyses incorporating human leukocyte antigen Foundation and Hospital Germans Trias i Pujol, 08916 formed [termed Euro-CHAVI (Center for HIV/ (HLA) typing were carried out on a subgroup of Badalona, Spain; and Institució Catalana de Recerca i 13 AIDS Vaccine Immunology) (7)], and a total of 187 patients with available four-digit HLA class I Estudis Avançats, Barcelona, Spain. Hospital Clinic– Institut d’Investigacions Biomèdiques August Pi i Sunyer, 30,000 patients were screened to identify those allelic determination. University of Barcelona, 08036 Barcelona, Spain. 14De- most appropriate for analysis. All longitudinal These analyses identified two independently partment of Infectious Diseases, Copenhagen University viral-load (VL) data were assessed through a acting groups of polymorphisms, associated with Hospital, Rigshospitalet, 2100 Copenhagen, Denmark. 15 computerized algorithm to eliminate VL not re- HLA loci B and C, that are estimated to explain Genomics Platform, National Centre of Competence in Research “Frontiers in Genetics,” University of Geneva, flecting the steady state and were individually 9.6 and 6.5% of the total variation in HIV-1 set 1211 Geneva, Switzerland. 16Department of Genetic inspected by an experienced infectious-disease point, respectively, and can thus be considered as Medicine and Development, University of Geneva Medical clinician (Fellay) to exclude suspicious VL data major genetic determinants of viral set point. A School, 1211 Geneva, Switzerland. 17Division of Viral and patients that do not show a clear set point, third set located >1 Mb away in the major his- Pathogenesis, Beth Israel Deaconess Medical Center, leaving 486 patients with a consistent and accu- tocompatibility complex upstream of a gene that Harvard Medical School, Boston, MA 02215, USA. 18Medical Research Council Human Immunology Unit, rately measured phenotype (7). For patients with encodes an RNA polymerase I subunit explains Weatherall Institute of Molecular Medicine, John Radcliffe at least four CD4 cell-count results, we defined a 5.8% of the total variation in disease progression. Hospital, Oxford OX3 9DS, UK. 19Duke Human Vaccine progression phenotype as the time to treatment Together, the three polymorphisms explain Institute, Duke University, Durham, NC 27710, USA. initiation or to the predicted or observed drop of 14.1% of the variation in HIV-1 set point. *These authors contributed equally to this work. the CD4 cell count below 350 (7, 8). One polymorphism located in the HLA †To whom correspondence should be addressed. E-mail: [email protected] (A.T.); [email protected] All samples were genotyped with the use of complex P5 (HCP5) gene explains 9.6% of the (D.B.G.) Illumina’s HumanHap550 BeadChip with 555,352 total variation in set point, despite a minor-allele ‡On behalf of the Center for HIV/AIDS Vaccine Immunology single-nucleotide polymorphisms (SNPs). A series frequency of 0.05 (Single Nucleotide Polymor- (CHAVI) and the Euro-CHAVI consortia.

944 17 AUGUST 2007 VOL 317 SCIENCE www.sciencemag.org REPORTS here is due to the effect of HLA-B*5701, re- region of the HLA-C gene,35kbawayfrom HLA-C expression SNP shows association with flected in its tagging a SNP within HCP5 (10). transcription initiation (Fig. 2) and 156 kb certain alleles or group of alleles (Tables 1 and 2). We emphasize, however, that genetics allows telomeric of the HCP5 gene. This SNP ex- In each case, however, although the HLA-C no resolution of whether this effect is exclu- plains 6.5% of the variation in set point (Fig. 1) expression variant can explain the effect of these sively due to B*5701 or if HCP5 variation also and shows a genome-wide significant asso- alleles on the HIV-1 set point, the reverse is not contributes to the control. In fact, as a human ciation (P=3.77 × 10−9). Despite minor LD true. When a linear regression model includes endogenous retroviral element (HERV) with between the HCP5 and HLA-C SNPs (r2 = known HLA alleles, the addition of rs9264942 sequence homology to retroviral pol genes (13) 0.05, D′ = 0.84), nested regression models results in a significant increase in the ex- and confirmed expression in lymphocytes (14), clearly demonstrate an independent effect of plained variation. On the other hand, none of HCP5 is itself a good candidate to interact with each of these variants. In a model including the HLA alleles considered, with the exception HIV-1, possibly through an antisense mecha- the HCP5 variant, the addition of the HLA-C of HCP5/B*5701, adds significantly to a mod- nism (14). Moreover, HCP5 is predicted to encode variant results in a highly significant increase el that already incorporates the HLA-C variant two proteins, and the associated polymorphism in explanatory power, as does the addition of (Tables 1 and 2). results in an amino acid substitution in one of the HCP5 variant to a model already includ- No other single marker reached genome these proteins. ing the HLA-C variant [supporting online ma- significance, and none of the identified copy- A model in which HCP5 and HLA-B*5701 terial (SOM) text]. number variations (7) showed any association have a combined haplotypic effect on the HIV-1 This SNP also associates strongly with dif- with the HIV-1 set point. An analysis com- set point is consistent with the observation that ferences in HLA-C expression levels, both in paring the observed set of P values to that suppression of viremia can be maintained in the Sanger Institute Genevar expression data- expected under the null hypotheses shows B*5701 patients with undetectable VL, even after base (18) (table S1) and in a replication group no overall inflation of P values (indicating lit- HIV-1 undergoes mutations that allow escape of 48 healthy volunteers established for this tle contribution from population stratification) from cytotoxic T lymphocyte (CTL)–mediated study (SOM text). The protective allele leads but does show an excess of low P values, be- restriction (15). However, this observation has to a lower VL and is associated with higher ex- ginning with the 355th most associated SNP also been explained by a decrease in viral fitness pression of the HLA-C gene. This strong and (fig. S1). This indicates that additional real associated with the escape variants (16). In independent association with HLA-C expres- effects are likely to be present among the addition, B*5701 patients present less frequently sion levels suggests that genetic control of most associated polymorphisms in this study with symptoms during acute HIV-1 infection expression levels of a classical HLA gene in- (complete list available in table S2). Potential- (12), suggesting control before the time of a fluences viral control. Other HLA-C 5′ variants ly interesting candidates with a lesser as- maximal CTL response (17). also associate with HLA-C expression but do sociation with the set point (listed in tables The second most significant polymorphism not contribute independently to viral control S3 to S5) were chosen on the basis of their we identified, rs9264942, is located in the 5′ (SOM text). ranking in the study or their link with HIV-1 Althoughthesedatamakeastrongcase biology. for a causal role for HLA-C expression levels, We next identified polymorphisms that as- A extensive LD throughout the MHC region makes sociate with progression rather than VL: The 5 it necessary to directly test for alternative causal strongest association included a set of seven variants. Specifically, we used nested regres- polymorphisms located in and near the ring 4 sion models to assess whether the observed finger protein 39 (RNF39) and zinc ribbon cp/ml) association could be determined by described domain–containing 1 (ZNRD1) genes, respec- 10 functional HLA class I alleles. In fact, the tively (rs9261174, rs3869068, rs2074480, 3

2 Table 1. The impact of HLA-C 5′ expression polymorphism rs9264942 on the set point is independent of its association with HLA alleles and groups of alleles previously implicated in HIV-1

HIV-1 VL (log 1 control. The addition of rs9264942 to the linear regression model significantly improves the fit for TT, N=439 GT, N=45 GG, N=2 all HLA alleles or groups of alleles that have been suspected to have an influence on HIV disease. N.A., not applicable. B 5 LD between Models with Addition of Addition of rs9264942 and HLA alleles rs9264942 to models HLA alleles to a Allele HLA alleles (r2) (P value) with HLA alleles model with rs9264942

cp/ml) 4 (P value) (P value) 10 –5 3 HLA-B*27 0.07 0.19 1.3 × 10 0.91 HLA-B*5701 0.05 2.6 × 10–5 8.5 × 10–5 4.1 × 10–4 –6 2 HLA-B*35Px 0.04 0.18 8.1 × 10 0.73 HLA-B*08 0.09 0.042 3.6 × 10–5 0.41 –6 HIV-1 VL (log HLA-B*51 <0.01 0.44 5.3 × 10 0.36 1 All 5 above N.A. N.A. 3.1 × 10–4 4.4 × 10–4 TT, N=166 TC, N=238 CC, N=82 B22 serogroup 0.01 0.28 8.1 × 10–6 0.59 Fig. 1. HIV-1 VL at the set point is highly cor- B7 supertype 0.17 0.007 1.7 × 10–4 0.35 related with (A)theHCP5 rs2395029 geno- Bw4 serotype 0.23 0.010 1.7 × 10–4 0.65 type, where T is the major allele and G is the Bw6 serotype 0.16 0.033 6.3 × 10–5 0.76 ′ minor allele, and with (B)theHLA-C 5 region HLA-A*23 <0.01 0.41 6.9 × 10–5 0.38 rs9264942 genotype, where T is the major al- –5 A2 supertype <0.01 0.72 6.4 × 10 0.53 lele and C is the minor allele. Mean and SEM – HLA-Cw*4 0.10 0.041 1.3 × 10 4 0.56 (error bars) are represented for the respective –4 genotypes. HLA-Cw*7 0.25 0.065 1.0 × 10 0.83

www.sciencemag.org SCIENCE VOL 317 17 AUGUST 2007 945 REPORTS

rs7758512, rs9261129, rs2301753, and S1). Two of them (rs3869068 and rs9261174) duction that was attributable to post-entry cel- rs2074479). This group of polymorphisms ex- are located in a putative regulatory 5′ re- lular factors (21). plains 5.8% of the variation in progression, gion, 25 and 32 kb away from the gene, re- The second gene, RNF39, is poorly charac- with a relative hazard of 0.64 (fig. S2), and spectively. Because ZNRD1 encodes an RNA terized but cannot be ruled out as a candidate approaches genome-wide significance (P= polymerase I subunit, if this gene is respon- because two of the associated polymorphisms 3.89 × 10−7). It also associates with VL at sible for the restriction of HIV-1, the mechanism are located in its coding region and result in the set point (P=7.11 × 10−3). These var- could involve interference with the processing amino acid changes (rs2301753 and rs2074479). iants are >1 Mb telomeric from the previous of HIV-1 transcripts by the HIV-1 regulatory No other genome-wide significant association candidates (fig. S3), and their effect on both protein Rev. Rev is known to be localized in was observed in the analysis of the progression progression and set point is independent of the nucleolus (where RNA polymerase I tran- phenotype (SOM text). HCP5-andHLA-C–related polymorphisms scribes ribosomal RNA), and the blockade of We established an independent replica- and HLA alleles or groups of alleles previous- RNA polymerase I has been shown to influ- tion cohort of 140 Caucasian patients, drawn ly implicated in HIV-1 control (SOM text and ence the distribution of REV, causing a shift from the same participating cohorts. For this table S6). from the nucleolus to the cytoplasm (19, 20). Ef- follow-up study, we relaxed the interval from Using the Genevar database and our group ficiency in provirus transcription is highly a documented negative test to a positive test of 48 healthy volunteers, we observed that variable among individuals; in one study, dif- (for infection) from 2 years to 4 years to iden- ZNRD1 expression is significantly associated ferences in transcription efficiency accounted tify additional qualifying study participants. with the identified SNPs (SOM text and table for 64 to 83% of the total variance in virus pro- We genotyped representative polymorphisms

Fig. 2. Partial map of the HLA class I region (chromosome 6 p21.3). The P significant association with HIV-1 VL at the set point are marked in red. values [–log(P)] of all genotyped SNPs annotated with the gene structure The graph was drawn with WGAViewer software (www.genome.duke.edu/ are indicated. The two independent SNPs that show genome-wide centers/pg2/index_html/downloads/AnnotationSoftware).

946 17 AUGUST 2007 VOL 317 SCIENCE www.sciencemag.org REPORTS for the associations reported above (HCP5, may often include gene variants with major 5. G. W. Nelson, S. J. O'Brien, J. Acquired Immune Defic. rs2395029; HLA-C, rs9264942; and ZNRD1, effects. This suggests a degree of urgency in Syndr. 42, 347 (2006). 6. G. Bleiber et al., J. Virol. 79, 12674 (2005). rs9261174). Each association was replicated carrying out similar studies for other infectious 7. Materials and methods are available as supporting with effects all in the same direction: HCP5, diseases. material on Science Online. − − P=1.4 × 10 2; HLA-C, P=2.8 × 10 3;and Our results suggest two possible directions 8. D. C. Douek, L. J. Picker, R. A. Koup, Annu. Rev. Immunol. ZNRD1, P=4.8 × 10−2. for therapeutic intervention. First, if HCP5 and 21, 265 (2003). 9. A. L. Price et al., Nat. Genet. 38, 904 (2006). We have securely identified at least two ZNRD1 contribute to the control associated with 10. P. I. de Bakker et al., Nat. Genet. 38,1166 mechanisms not previously known to restrict HLA-B*5701, they could lead to therapeutic ap- (2006). HIV-1: HLA-C, which has been suspected but plications. On the other hand, the implication of 11. S. A. Migueles et al., Proc. Natl. Acad. Sci. U.S.A. 97, never confirmed to contribute to HIV-1 control, HLA-C in HIV-1 control could present im- 2709 (2000). and an RNA polymerase subunit that substan- portant opportunities, given that the HIV-1 ac- 12. M. Altfeld et al., AIDS 17, 2581 (2003). 13. J. K. Kulski, R. L. Dawkins, Immunogenetics 49, 404 tially changes the time course of HIV progres- cessory protein Nef selectively down-regulates (1999). sion (fig. S1). We also suggest the possibility the expression of HLA-A and -B but not that of 14. C. Vernet et al., Immunogenetics 38, 47 (1993). that a HERV-derived gene may contribute to HLA-C on the surface of infected cells (22). 15. J. R. Bailey et al., J. Exp. Med. 203, 1357 (2006). the viral control attributed to the HLA-B*5701 Originally, this strategy was considered ad- 16. J. Martinez-Picado et al., J. Virol. 80, 3617 (2006). 17. A. J. McMichael, S. L. Rowland-Jones, Nature 410, 980 allele. Our findings confirm and emphasize the vantageous for the virus because HLA-A and (2001). central role of the MHC region in HIV-1 -B present foreign (notably viral) epitopes to 18. B. E. Stranger et al., PLoS Genet. 1, e78 (2005). restriction, estimate its contribution against all CD8 T cells, resulting in cell destruction, where- 19. A. Michienzi et al., Proc. Natl. Acad. Sci. U.S.A. 97, 8955 genome influences, and open up new per- as HLA-C binds self-peptides and interacts with (2000). 20. M. Dundr et al., J. Cell Sci. 108, 2811 (1995). spectives in the understanding of its mode of natural killer (NK) cells to avoid NK attack. 21. A. Ciuffi et al., J. Virol. 78, 10747 (2004). action: It is necessary to expand HLA analysis However, HLA-C also has the ability to present 22. G. B. Cohen et al., Immunity 10, 661 (1999). to include high-density genotyping. It is also viral peptides to cytotoxic CD8+ T cells and 23. P. J. Goulder et al., AIDS 11, 1884 (1997). noteworthy that this genome-wide study of consequently to restrict HIV-1 (23, 24). Our ob- 24. S. Adnan et al., Blood 108, 3414 (2006). – 25. We thank R. Sahli (University of Lausanne, Switzerland). host determinants has three clear discoveries, servations suggest that HLA-C mediated re- Funding was provided by the NIH-funded Center implying that determinants of host response striction may be an important element of viral for HIV/AIDS Vaccine Immunology. J.F. is supported by control in specific genetic backgrounds, and the Swiss Foundation for Grants in Biology and Medicine, that the apparent immunity of HLA-C to Nef and A.T. is supported by Infectigen and the Swiss down-regulation could present an opportunity National Science Foundation. Table 2. In contrast to Table 1, only HLA-B*5701 for vaccine strategies targeting HLA-C–mediated has an independent impact after taking into responses. Supporting Online Material account the effect of rs9264942. The indepen- www.sciencemag.org/cgi/content/full/1143767/DC1 dence of HLA-C is also clearly seen in the mean Materials and Methods SOM Text values of the HIV-1 set point for each rs9264942 References and Notes Figs. S1 to S3 genotype: The minor allele C is associated with a 1. A. Telenti, D. B. Goldstein, Nat. Rev. Microbiol. 4, 865 Tables S1 to S6 decrease in VL, independent of all considered (2006). References alleles and groups of alleles. Numbers refer to a 2. S. J. O'Brien, G. W. Nelson, Nat. Genet. 36, 565 subgroup of 187 patients with available four-digit (2004). 13 April 2007; accepted 2 July 2007 3. A. Telenti, G. Bleiber, Future Virol. 1, 55 (2006). Published online 19 July 2007; HLA class I allelic results. 4. M. Carrington, S. J. O'Brien, Annu. Rev. Med. 54, 535 10.1126/science.1143767 (2003). Include this information when citing this paper. rs9264942 Number Mean SD genotype of patients All patients TT 67 4.37 0.85 Spatial Regulation of an E3 TC 87 3.84 1.19 CC 33 3.24 1.28 Patients without HLA-B*27 Ubiquitin Ligase Directs Selective TT 66 4.39 0.83 TC 78 3.86 1.21 Synapse Elimination

CC 25 3.13 1.23 1 1,2 1 1,2 Patients without HLA-B*5701 Mei Ding, Dan Chao, George Wang, Kang Shen * TT 67 4.37 0.85 Stereotyped synaptic connectivity can arise both by precise recognition between appropriate TC 76 3.97 1.08 partners during synaptogenesis and by selective synapse elimination. The molecular mechanisms CC 27 3.43 1.25 that underlie selective synapse removal are largely unknown. We found that stereotyped Patients without HLA-B*35Px developmental elimination of synapses in the Caenorhabditis elegans hermaphrodite-specific TT 56 4.32 0.88 motor neuron (HSNL) was mediated by an E3 ubiquitin ligase, a Skp1–cullin–F-box (SCF) complex TC 84 3.86 1.17 composed of SKR-1 and the F-box protein SEL-10. SYG-1, a synaptic adhesion molecule, bound to CC 32 3.23 1.30 SKR-1 and inhibited assembly of the SCF complex, thereby protecting nearby synapses. Thus, Patients without HLA-B*08 subcellular regulation of ubiquitin-mediated protein degradation contributes to precise synaptic TT 50 4.26 0.87 connectivity through selective synapse elimination. TC 81 3.83 1.22 CC 33 3.24 1.28 ynapse elimination is a developmental hall- (2, 3). However, little is known about the mo- Patients without HLA-B*51 mark of neural circuit refinement (1). In lecular machinery that eliminates presynaptic TT 56 4.38 0.85 the vertebrate neuromuscular junction, the specializations, or the mechanisms that selec- TC 73 3.77 1.17 S neurotransmitter acetylcholine (ACh) drives elim- tively eliminate certain synapses while sustain- CC 28 3.21 1.38 ination of postsynaptic ACh receptor clusters ing others.

www.sciencemag.org SCIENCE VOL 317 17 AUGUST 2007 947

Chapter 3

Common genetic variation and the control of HIV-1 in humans

Fellay J*, Ge D*, Shianna KV, Colombo S, Ledergerber B, Cirulli ET, Urban TJ, Zhang K, Gumbs CE, Smith JP, Castagna A, Cozzi-Lepri A, De Luca A, Easterbrook P, Günthard HF, Mallal S, Mussini C, Dalmau J, Martinez-Picado J, Miro JM, Obel N, Wolinsky SM, Martinson JJ, Detels R, Margolick JB, Jacobson LP, Descombes P, Antonarakis SE, Beckmann JS, O'Brien SJ, Letvin NL, McMichael AJ, Haynes BF, Carrington M, Feng S, Telenti A, Goldstein DB; NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI). Common genetic variation and the control of HIV-1 in humans. PLoS Genet. 2009 Dec;5(12):e1000791

Common Genetic Variation and the Control of HIV-1 in Humans

Jacques Fellay1., Dongliang Ge1., Kevin V. Shianna1,2, Sara Colombo3, Bruno Ledergerber4, Elizabeth T. Cirulli1, Thomas J. Urban1, Kunlin Zhang3,5, Curtis E. Gumbs1, Jason P. Smith2, Antonella Castagna6, Alessandro Cozzi-Lepri7, Andrea De Luca8, Philippa Easterbrook9, Huldrych F. Gu¨ nthard4, Simon Mallal10, Cristina Mussini11, Judith Dalmau12, Javier Martinez-Picado12,13, Jose´ M. Miro14, Niels Obel15, Steven M. Wolinsky16, Jeremy J. Martinson17, Roger Detels18, Joseph B. Margolick19, Lisa P. Jacobson20, Patrick Descombes21, Stylianos E. Antonarakis22, Jacques S. Beckmann23, Stephen J. O’Brien24, Norman L. Letvin25, Andrew J. McMichael26, Barton F. Haynes27, Mary Carrington28,29, Sheng Feng1, Amalio Telenti3*, David B. Goldstein1*, NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI) 1 Center for Human Genome Variation, Duke Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America, 2 Genomic Analysis Facility, Duke Institute for Genome Sciences and Policy, Duke University, Durham, North Carolina, United States of America, 3 Institute of Microbiology, University of Lausanne, Lausanne, Switzerland, 4 Division of Infectious Diseases and Hospital Epidemiology, University Hospital, University of Zu¨rich, Zu¨rich, Switzerland, 5 Behavioral Genetics Center, Institute of Psychology, Chinese Academy of Sciences, Beijing, China, 6 Clinic of Infectious Diseases, Vita-Salute San Raffaele University and Diagnostica and Ricerca San Raffaele, Milan, Italy, 7 Research Department of Infection and Population Health, University College London, London, United Kingdom, 8 Institute of Clinical Infectious Diseases, Catholic University of the Sacred Heart, Rome, Italy, 9 Academic Department of HIV/GUM, Kings College London at Guy’s, King’s, and St. Thomas’ Hospitals, Weston Education Centre, London, United Kingdom, 10 Centre for Clinical Immunology and Biomedical Statistics, Institute for Immunology and Infectious Diseases, Royal Perth Hospital and Murdoch University, Perth, Australia, 11 Infectious Diseases Clinics, Azienda Ospedaliero-Universitaria, Modena, Italy, 12 IrsiCaixa Foundation and Hospital Germans Trias i Pujol, Badalona, Spain, 13 Institucio´ Catalana de Recerca i Estudis Avanc¸ats (ICREA), Barcelona, Spain, 14 Hospital Clinic – IDIBAPS, University of Barcelona, Barcelona, Spain, 15 Department of Infectious Diseases, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark, 16 Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America, 17 Infectious Diseases and Microbiology, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America, 18 Department of Medicine, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, California, United States of America, 19 Department of Molecular Microbiology and Immunology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America, 20 Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, United States of America, 21 Genomics Platform, National Centre of Competence in Research ‘‘Frontiers in Genetics,’’ University of Geneva, Geneva, Switzerland, 22 Department of Genetic Medicine and Development, University of Geneva Medical School, Geneva, Switzerland, 23 Department of Medical Genetics, University of Lausanne, and Service of Medical Genetics, Centre Hospitalier Universitaire Vaudois, Lausanne, Switzerland, 24 Laboratory of Genomic Diversity, National Cancer Institute, Frederick, Maryland, United States of America, 25 Division of Viral Pathogenesis, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, United States of America, 26 MRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, United Kingdom, 27 Duke Human Vaccine Institute, Duke University, Durham, North Carolina, United States of America, 28 Cancer and Inflammation Program, Laboratory of Experimental Immunology, SAIC-Frederick, Inc., National Cancer Institute at Frederick, Frederick, Maryland, United States of America, 29 Ragon Institute of Massachusetts General Hospital, MIT and Harvard, Boston, Massachusetts, United States of America

Abstract To extend the understanding of host genetic determinants of HIV-1 control, we performed a genome-wide association study in a cohort of 2,554 infected Caucasian subjects. The study was powered to detect common genetic variants explaining down to 1.3% of the variability in viral load at set point. We provide overwhelming confirmation of three associations previously reported in a genome-wide study and show further independent effects of both common and rare variants in the Major Histocompatibility Complex region (MHC). We also examined the polymorphisms reported in previous candidate gene studies and fail to support a role for any variant outside of the MHC or the chemokine receptor cluster on chromosome 3. In addition, we evaluated functional variants, copy-number polymorphisms, epistatic interactions, and biological pathways. This study thus represents a comprehensive assessment of common human genetic variation in HIV-1 control in Caucasians.

Citation: Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, et al. (2009) Common Genetic Variation and the Control of HIV-1 in Humans. PLoS Genet 5(12): e1000791. doi:10.1371/journal.pgen.1000791 Editor: Mark I. McCarthy, University of Oxford, United Kingdom Received June 15, 2009; Accepted November 25, 2009; Published December 24, 2009 This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. Funding: Funding was provided by the NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI) grant AI067854. This project has also been funded in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E, and by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. AT and SEA are supported by the Swiss National Science Foundation and by the Infectigen Foundation. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] (DBG); [email protected] (AT) . These authors contributed equally to this work.

PLoS Genetics | www.plosgenetics.org 1 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

Author Summary sample size and the power of genome-wide data, which allows precise correction for population stratification, to evaluate most The ability to spontaneously control HIV-1 upon infection candidate gene discoveries. is highly variable between individuals. To evaluate the Genome-wide analysis also makes it possible to assess genetic contribution of variation in human genes to differences in interactions, copy number polymorphisms, enrichment of gene plasma viral load and in disease progression rates, we sets and of functional variants: these analyses were pursued in the performed a genome-wide association study in .2,500 present work. Collectively, our study circumscribes the impact of HIV–infected individuals. This study achieved two goals: it common variation in the control of HIV-1 in an adult and completed the analysis of common variation influencing predominantly male Caucasian population. viral control, and it re-assessed the majority of previously reported genetic associations. We show that genetic variants located near the HLA-B and HLA-C genes are the Results strongest determinants of viral control, and that other Subjects independent associations exist in the same region of chromosome 6, the Major Histocompatibility Complex, A total of 2554 HIV-1 infected individuals of self-reported known to contain a large number of genes involved in Caucasian ancestry were genotyped on Illumina whole-genome immune defense. We could not replicate most of the chips (Table S1). Participants were recruited between 1984 and previously published associations with HIV candidate 2007 in one of the 9 cohorts forming the Euro-CHAVI genes in this large, well-characterized cohort. Overall, Consortium (N = 1397, including 486 subjects that were included common human genetic variation, together with demo- in our previous study [3], 75.7% male, median age: 33 years) or in graphic variables, explains up to 22% of the variability in the MACS cohort (N = 1157, 100% male, median age: 33 years). viral load in the Caucasian population. A subset of 1113 patients had a proven date of seroconversion (‘‘seroconverters’’) and the remainder had a confirmed stable viremia profile but no known date of infection (‘‘seroprevalent’’ Introduction patients). Several quality control steps (Text S1) resulted in the exclusion of 115 individuals with insufficient genotype call rates, of The clinical outcome of HIV-1 infection is highly variable and 17 individuals that were found to be genetically related with determined by complex interactions between virus, host and another study subject, and of 5 individuals with a gender environment. Several human genetic factors have been reported to discrepancy between phenotype and genotype data. To control modulate HIV-1 disease [1,2], but current knowledge only for population stratification, we performed a principal component explains a small fraction of the observed variability in the course analysis of the whole-genome genotyping data (Eigenstrat [12], of infection. Our first genome-wide association study (GWAS) of Text S1), which identified 12 significant axes after exclusion of 55 human genetic variants that associate with HIV-1 control outlier subjects. The most significant principal component is a analyzed 486 individuals of European ancestry and identified north-to-south European axis that has already been described [13] two genome-wide significant determinants of viremia at set point, (Figure S1). A total of 2362 individuals were included in the set and one determinant of disease progression [3]. The 3 single point association analyses (including 486 subjects studied before nucleotide polymorphisms (SNPs) collectively explained 14% of [3]), and 1071 seroconverters were eligible for the analysis of the variation in viral load at set point and 10% of the variation in disease progression (including 337 subjects studied before [3]). A disease progression. All 3 were located in the Major Histocom- subset of 1204 subjects had complete 4-digit HLA Class I results patibility Complex (MHC) region on chromosome 6, confirming, and could be included in models assessing the respective influences in a genome-wide context, the essential role played by the MHC of SNPs and HLA alleles on HIV-1 control. region in HIV-1 control [4,5,6]. Since then, the findings have been replicated by several independent groups that used either targeted Common variants and variation in viral load at set point genotyping [7,8,9] or whole genome approaches [10,11]. All QC-passed SNPs (Text S1, Table S11) were tested for The power of our initial study was limited to the detection of association with HIV-1 viremia at set point in separate linear common variants (with minor allele frequency of 5% or more) regression models that included gender, age, and the 12 significant which explain a sizable fraction of the phenotypic variability: it PC axes as covariates. The global distribution of resulting p-values had an 80% power of identifying SNPs that explain at least 5% of was very close to the null expectation (l = 1.006, Figure S2) the variability. Here we have increased the sample size to 2554 indicating that stratification was adequately controlled [14]. Male which allows us to provide a much more thorough investigation of gender and older age both associated with higher set point the role of common variation in the control of HIV-1: considering (p = 1.9E-21 and p = 2.6E-05, respectively) and explained 4% of the final number of participants included in the analysis, the study the inter-individual variability. The population sub-structure, was powered to detect the effects of common variants down to reflected in the 12 significant PC axes, explained an additional 1.3% of explained variability. 3.4%. One surprising feature of the first genome-wide investigation The 2 SNPs previously reported as genome-wide significant [3] was that it failed to identify any of the many non-MHC candidate were confirmed to be the strongest determinants of variation in gene variants that have been reported to associate with HIV-1 HIV-1 viral load (Figure 1 and Figure 2): rs2395029, an HCP5 disease outcomes over the past 15 years. Candidate gene studies T.G variant, which is known to be an almost perfect proxy for claimed associations for variants in genes selected for their known HLA-B*5701 in Caucasians [15,16] (p = 4.5E–35), and rs9264942, or suspected role in HIV-1 pathogenesis and in immune response aT.C SNP located in the 59 region of HLA-C, 35 kb away from [1,2] [compiled in http://www.hiv-pharmacogenomics.org]. Col- transcription initiation (p = 5.9E–32). Of note, the association lectively, this body of work has implicated various components of signals were also convincingly genome-wide significant in an innate, adaptive, and intrinsic immunity in HIV-1 control, cellular analysis restricted to the subjects that were not included in our co-factors important in viral life cycle, and quite unexpected initial study (Table 1). Each SNP explained 5 to 6% of the candidates such as the vitamin D receptor. We here use our large variability in set point viremia, as determined by the increase in

PLoS Genetics | www.plosgenetics.org 2 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

Figure 1. Significant hits in the MHC region. Representation of a 3 Mb stretch in the MHC region, encompassing the HLA Class I gene loci and the genome-wide significant SNPs identified in the study (red dots represents SNPs with p-value,5E–08). Results are shown for set point (upper plot) and for progression (lower plot). The figure was created with WGAViewer [40]. doi:10.1371/journal.pgen.1000791.g001 the r2 value of the respective linear regression models: the effect SNPs represent independent effects: they together explain 9% of size estimates are smaller in the expanded data set than in the the variability in HIV-1 set point. previously reported study. This is due, at least in part, to the inclusion of seroprevalent subjects with less stringent phenotype Further independent associations in the MHC region definition. Interestingly, we observed a break in the strong linkage We next addressed the question of whether there are any disequilibrium (LD) between HCP5 and HLA-B in 9/1204 subjects additional, independent and significant SNP effects in the MHC with HLA Class I results (0.7%, r2 = 0.93): the set point values region. In addition to the top 2 associated variants, 86 other SNPs were lower for the 4 patients that had B*5701 without the met the criteria of genome-wide significance (p,5E–08), all rs2395029 minor allele than for the 5 patients with a G at located in the MHC (Figure 1, Table S9). rs2395029 but without B*5701 (median [IQR] log10 HIV-1 cp/ We included a total of 331 MHC SNPs with p,1E–04 in a ml = 3.14 [2.77–3.68] vs. 4.22 [4.14–4.28] respectively (Table conservative forward selection algorithm within a linear regression S2); p = 0.05, Kruskal-Wallis rank test). Consequently, the model. SNPs were tested and selected into the model one at a association with set point was stronger for B*5701 (p = 3.8E–19) time, independently of the top 2 associated variants and of the than it was for rs2395029 (p = 1.7E–17) in the same subset of other covariates (gender, age and Eigenstrat axes). In order to patients. control for multiple testing, a permutation procedure was A major challenge in assessing association evidence in the MHC performed to assess the empirical significance cut-off value. Four is the long range pattern of LD, complicating the definitive additional SNPs were found to significantly associate with set point identification of causal variants and making it necessary to in models including the previously associated variants (Table S3): consider evidence for association in the region as a whole. For [1] rs259919, located in an intron of the uncharacterized C6orf12 this reason, we evaluated the evidence for independent effects of gene, 3.5 kb away from the ZNRD1 gene in the 59 region; [2] the two reported associations. Since the HCP5 and the HLA-C rs9468692, located in the 39 region of the TRIM10 gene, in high variants are in partial LD (r2 = 0.06, D9 = 0.86), the combined LD (r2 = 0.87) with a non-synonymous coding SNP in the first strength of their associations with set point is less than the sum of exon of TRIM10 - rs12212092: 279A.G; H65R, which is the signals measured separately. It is also interesting to note that in predicted to be a high-risk change by FastSNP [17] (non- 95% of cases the rs2395029 minor allele, tagging B*5701, is found conservative amino acid change; possibly splicing regulation); [3] in combination with the controlling C allele at HLA-C rs9264942. rs9266409, located in the 39 region of HLA-B, 12 kb away; [4] This means that the protective effect in this haplotype is a rs8192591, a non-synonymous coding SNP located in the 9th exon combination of the effect of both alleles and that analyses that are of the NOTCH4 gene: 1739A.G; S534G (conservative amino acid not adjusted for the HLA-C variant overestimate the B*5701- change; possibly influencing splicing). We emphasize that these related effect. Nonetheless, nested regression models clearly analyses demonstrate that there are further independent effects in demonstrated that each of the variants is independently genome- the MHC region, but they do not prove that the specific SNPs wide significant (p = 1.8E–23 for rs2395029; p = 2.4E–20 for implicated are responsible for those effects. Some or all of these rs9264942). We therefore can conclude unequivocally that the two SNPs are most likely markers for one or more variants that have

PLoS Genetics | www.plosgenetics.org 3 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

SNPs. Several HLA-C alleles associated significantly with viral control before but not after adjustment for the top SNPs (Table 2, Table S6). This is partly because all HLA-C alleles are in LD with the HLA-C -35 rs9264942. They can in fact be perfectly divided into 2 mutually exclusive groups on the basis of their LD with the rs9264942 C or T allele. The C-related alleles, as a group, strongly associated with lower setpoint (p = 2.8E–14) but failed to entirely recapitulate the rs9264942 association signal (p = 8.4E–16 in this group). Homozygosity for the HLA Class I loci also showed a weak independent association (p = 0.03 after adjustment for the SNPs). Altogether, a model including 6 SNPs, 4 alleles and homozygosity status shows that MHC variation explains 12% of the set point variability in this cohort.

HIV-1 disease progression We defined HIV-1 disease progression as the drop of CD4 T cell count to below 350 cells/ul or the initiation of potent antiretroviral treatment (cART) following a CD4 T cell count ,500 cells/ul. A total of 1071 individuals with known date of seroconversion and at least two CD4 T cell determinations in absence of cART were included in a survival analysis. Of those, 765 (71.4%) progressed during follow-up: 612 (80%) because of a CD4 T cell value ,350/ul and 153 (20%) because they started treatment with ,500 CD4 T cells per ul. The top associated variants were HCP5/B*5701 rs2395029 (p = 1.2E–11) and HLA-C rs9264942 (p = 6.4E–12) (Figure 1 and Figure 3, Table S10). If viral load at set point is added to the models, the association signals are much weaker (p = 0.001 for rs2395029 and p = 0.02 for rs9264942), demonstrating that the HCP5/B*5701 and HLA-C effects on disease progression are mainly driven by their impact on early viral control. Another set of variants reached genome-wide significance: rs9261174, Figure 2. Correlation between HIV-1 set point and the genotypes of the top associated SNPs. HIV-1 viremia at set point rs3869068, rs2074480, rs7758512, rs9261129, rs2301753 and strongly associates with rs2395029 (upper panel) and rs9264942 (lower rs2074479 (p = 1.8E–08), in high-LD and located around the panel) genotypes. The rs2395029 minor allele G has a frequency of 4.8% ZNDR1 and RNF39 genes, close to the HLA-A locus in the MHC and each copy of this allele associates with a 0.7 log lower set point. region. This association is largely independent of viremia The rs9264942 minor allele C has a frequency of 41.2%, and each copy (p = 4.7E–05 in a model including set point as covariate), of this allele associates with a 0.3 log lower set point. Mean and suggesting that a different mechanism of action is here modulating Standard Deviation (error bars) are represented for the respective genotypes. HIV disease progression. The causal variant(s) responsible for this doi:10.1371/journal.pgen.1000791.g002 association remains largely undetermined, and it seems possible that the association depends on the contribution of multiple causal sites. We do note, however, that no single HLA Class I allele can not been genotyped and which provide aspects of viral control account for the association. Specifically, the recent claim that it is independent of the two previously reported associations. due entirely to the A10 serogroup of HLA-A alleles [7] is not As one possible contribution to these associations we assessed supported by the LD data. When this claim is evaluated by the HLA Class I alleles in the subset of 1204 subjects who had full including the A10 alleles in a regression model and testing the MHC typing results. The 4-digit alleles were tested separately and significance of the increased variation explained by the ZNRD1 in models including the identified MHC SNPs (Table 2, Table S4). SNPs, we observe an independent additional effect of the SNPs 15 alleles were found to associate with set point, but only 4 of them (p = 0.03). Indeed, the identified SNPs still associate with (A*3201, B*1302, B*2705 and B*3502) had an independent effect progression in individuals without HLA-A10 (Figure S3). Con- that was still detectable in models including the top associated versely, HLA-A10 alleles do not significantly associate with

Table 1. P-values and population effect sizes (or explained fraction of the inter-individual variability) for the strongest determinants of HIV-1 viremia at set point.

P-value (initial, Effect Size P-value (replication, Effect Size P-value (all, SNP MAF Gene N = 486) (initial) N = 1896) (replication) N = 2362) Effect Size (all)

rs2395029 4.8% HCP5/B*5701 9.4E–12 9.6% 4.2E–23 4.9% 4.5E–35 5.8% rs9264942 41.2% HLA-C 3.8E–09 6.5% 1.6E–23 4.8% 5.9E–32 5.3%

Results are shown for the subjects that were included in our initial study [3] (initial), for the subset of subjects that are new to the present study (replication), and for the global study population (all). MAF: Minor allele frequency. doi:10.1371/journal.pgen.1000791.t001

PLoS Genetics | www.plosgenetics.org 4 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

Table 2. Associations between 4-digit HLA Class I alleles and HIV-1 set point in a subset of 1,204 subjects with full results.

Model with HLA allele, rs2395029, Model with HLA allele Model with HLA allele, Model with HLA allele, rs9264942, rs259919, rs9468692, HLA Allele only rs2395029 rs2395029, rs9264942 rs9266409, rs8192591 Effect

A*2402 3.4E–02 6.9E–02 1.1E–01 9.9E–01 A*2501 2.8E–02 2.7E–02 1.2E–01 3.2E–01 A*3201 5.0E–03 1.0E–03 6.0E–03 2.9E–02 protective B*0702 7.0E–03 1.5E–02 4.3E–01 3.8E–01 B*0801 7.0E–03 3.3E–02 5.8E–01 2.4E–01 B*1302 2.0E–03 2.1E–04 1.2E–02 1.6E–02 protective B*2705 5.2E–05 4.9E–06 2.0E–03 3.0E–03 protective B*3502 1.9E–05 3.9E–05 6.2E–04 2.0E–03 deleterious B*5601 4.2E–02 7.8E–02 1.9E–01 1.2E–01 C*0202 1.0E–03 2.5E–04 6.0E–02 5.0E–02 C*0401 5.0E–03 3.0E–02 9.3E–01 8.5E–01 C*0602 1.2E–11 4.9E–02 7.1E–01 7.8E–01 C*0701 5.4E–04 3.0E–03 1.8E–01 7.8E–01 C*0702 3.4E–02 6.3E–02 9.9E–01 9.5E–01 C*0802 1.2E–02 2.0E–03 1.6E–01 2.5E–01

P-values are shown for all Class I alleles that have a nominally significant association with HIV-1 viral load at set point (with the exception of B*5701, discussed in the text). All linear regression models include gender, age and the 12 Eigenstrat axes as covariates. Most of the association signals disappear once the top associated SNPs are added. However, A*3201, B*1302, B*2705 and B*3502 still have an independent effect. See Table S4 for a complete list of all HLA Class I allele results. In addition, Table S5 lists all pairs of HLA-B and HLA-C alleles that are in LD (with an r2.0.1) and therefore can represent the same association signal (as for example in the case of HLA-C*0602, which is often on the same haplotype as HLA-B*5701). doi:10.1371/journal.pgen.1000791.t002 progression in models that include the ZNRD1 SNPs. The fact that during the set point period. One important question to address A10 alleles (notably A*2501 and A*2601) are in LD with the therefore is whether the genetic determinants identified influence ZNRD1 SNPs (r2 = 0.46) is not sufficient evidence to assign viral control throughout the full phenotypic range, from those with responsibility for the association, although A10 may contribute to low to high viral loads, and from slow to fast progression times. To the association signal. address this, we looked at allele frequencies in individuals that maintained variable degrees of viral control: we found a consistent Genetic variants in HIV candidate genes enrichment of the protective alleles in categories of subjects with We tested a total of 34 SNPs in 21 genes, representing 27 previously good viral control or slow disease progression rate in comparison reported associations with HIV-1 control (individual SNPs or to subjects with poor control or rapid progression (Figure 4). haplotypes) (Table 3) [1,2,18]. Nine SNPs were directly genotyped and 12 had a good proxy (r2.0.8) on all the chips that we used. Two Additional analyses were present on the Human1M chips only and 9 were not represented: The large genotypic and phenotypic data set generated in this these 11 SNPs were genotyped by TaqMan assays. The CCL3L1 copy study is a resource that allows a more in-depth exploration of the number polymorphism was also assessed and the absence of any role of human genetic variation: we ran additional analyses to test association with HIV-1 control has been recently reported [19]. whether there is any evidence from the existing data, which mainly The CCR5-D32 variant, a 32 bp deletion in the main HIV-1 co- represents common variants, for other determinants of viral control. receptor that protects against HIV-1 acquisition when present in We first performed a genome-wide screen (Text S1) for SNPs homozygous form [20,21,22], strongly associated with both set point that modify the effect of rs2395029 (HCP5/B*5701), rs9264942 (p = 1.7E–10) and progression (p = 3.5E–07). CCR5-D32 explained (HLA-C) rs9261174 (ZNRD1/RNF39) and CCR5-D32: no interac- 1.7% of the variability in viral control. Only two other variants tion was large enough to reach genome-wide significance. To showed significant associations: The CCR5 promoter variant P1 [23], evaluate common structural variation, we tested 285 SNPs that found on a haplotype known to increase CCR5 expression [24], were identified as tags for copy number polymorphisms (CNP) in associated with higher set point and faster progression, whereas the HapMap CEU samples [27]. No significant association was CCR2-64Ivariant,aValinetoIsoleucinechangeintheHIV-1minor observed when theses CNP-tagging SNPs were considered as a set. receptor CCR2 [25,26] associated with better viral control (Table 3). We also used a gene set enrichment analysis (GSEA) [28,29] to ask They together explained 1% of the variability in viral control. Partial whether groups of genes or pathways were enriched in SNPs with association results for the two CCR5 variants in a subset of our study low association p-values: 5 gene sets were significant, some of them population were reported elsewhere [19]. with suggestive evidence of involvement in HIV-1 pathogenesis (Table S7). Finally, we developed a permutation procedure to test Effect of identified genetic determinants throughout the whether genetic variants with known functional role were more full phenotype range likely to associate with differences in HIV-1 control than non- By design, this study focused on the control of HIV-1 as a functional SNPs: we observed a significant effect that was however quantitative trait, with a particular focus on the amount of virus limited to the MHC region (p = 0.001) (Figure S4).

PLoS Genetics | www.plosgenetics.org 5 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

Figure 3. Kaplan-Meier survival estimates for the top associated variants. Results are shown for the 3 most associated SNPs indentified in the genome-wide progression scan and for CCR5-D32. The survival curves show, for each genotype, the proportion of the individuals that do not reach a progression outcome over the first 10 years after seroconversion. doi:10.1371/journal.pgen.1000791.g003

Discussion MHC. The intricacy of the LD pattern in the region makes it difficult to find causal variants with certainty. Nevertheless, using both SNP The results presented here reaffirm the central role of the MHC and HLA Class I data, we identify additional variants that associate in HIV-1 control by first confirming with certainty the independently with viral set point and together explain at least 3.5% independence of two association signals in the genomic region of the variability, on top of the 9% explained by the first 2 SNPs. We that encompasses HLA-B and HLA-C. These common variants do not know at this stage whether any of the 4 SNPs identified in our show the strongest association with both viral set point and permutation analysis have a direct functional role, though a non- progression. We initially speculated that the HCP5 gene itself, synonymous coding change in a TRIM gene represents an attractive which contains the top-associated SNP rs2395029, contributes to candidate [34]. Several of the HLA alleles that independently HIV-1 control [3], but recent epidemiologic and functional reports associate with control have good functional support for their [30,31,32] suggest that the gene and the variant itself have no such involvement in HIV-1 pathogenesis: B*2705 presents epitopes that effect. In addition, we here present data from a small number of lead to efficient viral restriction [35], while B*3502 is a member of the subjects with a recombination event between HCP5 and HLA-B B35Px group [36] that, according to recent data obtained on B*3503, indicating that HLA-B*5701 is the main contributor to the has a preferential binding to the inhibitory myelomonocytic MHC association signal: the HCP5 variant is therefore likely to be only a class I receptor ILT4 on myeloid dendritic cells, which results in marker for the effect of HLA-B*5701 and possibly of other dysfunctional antigen-presenting properties of these cells (XG Yu, protective variants present on the same haplotype. The HLA-C personal communication). Functional studies of the gene variants variant rs9264942, which is in partial LD with all HLA-C alleles identified in the MHC and deeper understanding of the structure of (Table S6), associates with mRNA and protein expression levels of the associations between SNPs and surrounding HLA alleles [16] are HLA-C [3,33] (R. Thomas, M.C., personal communication): it is warranted. thus likely that this SNP is in fact a marker of the effect of HLA-C We show data that question most previously reported expression on HIV-1 control: more work is needed to understand associations in HIV-1 candidate genes (Table 3 and [19]). The the precise immunological and biological function of HLA-C in chemokine/chemokine receptor locus on chromosome 3 is the the context of HIV-1 infection. only non-MHC region with convincing association evidence for an Beyond the top 2 associated variants, we demonstrate that there impact of genetic variation on HIV-1 phenotypes. Homozygosity are other, independent, genetic contributors to HIV-1 control in the for CCR5D32 is known to confer almost complete protection

PLoS Genetics | www.plosgenetics.org 6 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

Table 3. Variants in HIV-1 candidate genes, previously reported to associate with viral control or disease progression.

p p group SNP gene variant genotyping proxy r2 model setpoint N progression N effect

Chemokine rs333 CCR5 delta 32 TaqMan no dominant 1.7E–10 2333 3.5E–07 1054 protective receptors rs1799988 CCR5 P1 - 627T.C TaqMan no recessive 7.5E–05 1791 3.8E–04 1012 deleterious rs1799864 CCR2 V64I TaqMan no dominant 8.1E–03 2317 1.5E–01 1056 protective rs3732378 CX3CR1 T280M present on all chips - recessive 1.5E–01 2362 6.0E–01 1071 Chemokines rs1719134 CCL3/MIP1a intron 459C.T proxy on all chips rs1634508 1 dominant 6.0E–01 2362 8.2E–01 1071 rs2107538 CCL5/RANTES promoter -403G.A proxy on all chips rs2291299 1 haplotypes 2.9E–01 2362 2.3E–01 1069 R1-R5: additive rs2280788 promoter -28C.G proxy on all chips rs4251739 1 rs2280789 In1.1 T.C proxy on all chips rs2306630 1 rs1801157 SDF-1/CXCL12 SDF-1 39A proxy on all chips rs10900029 1 recessive 8.6E–01 2362 9.2E–01 1071 Cytokines rs2243250 IL-4 promoter -589C.T proxy on all chips rs2243290 1 additive 3.4E–01 2362 6.3E–01 1065 rs1800872 IL-10 promoter -592C.A proxy on all chips rs3024490 1 dominant 6.5E–02 2362 1.4E–01 1065 rs1799946 DEFB1 promoter -52G.A proxy on all chips rs2741127 0.9 recessive 1.9E–01 1830 2.5E–01 931 Intracellular life rs8177826 PPIA promoter 1604C.G TaqMan no dominant 6.0E–01 1808 6.3E–01 1065 cycle rs6850 promoter 1650A.G TaqMan no dominant 7.7E–01 1753 1.4E–01 1065 rs2292179 TSG101 promoter -183T.C proxy on all chips rs3781640 1 haplotypes 7.9E–01 1792 1.9E–01 929 rs1395319 intron 181A.C TaqMan no Intrisic immunity rs8177832 APOBEC3G NS coding H186R present on all chips - additive 3.2E–01 2362 5.2E–01 1071 rs3740996 TRIM5a NS coding H43Y present on all chips - recessive 7.3E–01 2362 9.3E–01 1071 rs10838525 NS coding R136Q present on all chips - additive 9.0E–01 2362 8.3E–01 1071 Innate immunity rs2287886 DC-SIGN/CD209 promoter -139T.C present on all chips - additive 9.2E–01 2362 1.2E–01 1065 rs5030737 MBL2 NS coding R52C 1M chip + TaqMan no recessive 4.5E–01 1735 2.9E–01 503 rs1800450 NS coding G54D present on all chips - recessive 4.3E–01 2361 9.0E–01 1064 rs1800451 NS coding G57E 1M chip + TaqMan no recessive 5.5E–01 1728 8.5E–01 501 rs352139 TLR9 intron 1174G.A proxy on all chips rs352163 0.9 additive 6.5E–01 2362 7.9E–01 1065 rs352140 syn coding P545P proxy on all chips 0.9 rs3764880 TLR8 NS coding V1M present on all chips - additive 5.0E–01 2361 9.3E–01 1064 Others rs601338 FUT2 W154stop proxy on all chips rs504963 0.8 dominant 1.1E–01 2362 7.0E–02 1065 rs1801274 FCGR2A NS coding H131R present on all chips - recessive 7.9E–01 2360 2.0E–01 1065 rs1544410 VDR intron 8 variant present on all chips - recessive 5.1E–01 2360 6.0E–01 1064

See http://www.hiv-pharmacogenomics.org/pdf/ref_tbl_nat_history/The_complete_reference_table_for_HIV_natural_history_modifiers.pdf for references. Dominant or additive genetic models were used in the analyses for individual SNPs on the basis of their described effect and/or their minor allele frequencies. The P1 variant in the CCR5 promoter region is defined by the SNP rs1799988 (627 C.T). Haplotypes R1 to R5 in the CCL5 (RANTES) gene were defined using 2 promoter variants (2403C.G, defining haplotype R1, and 228C.G, defining haplotype R5), and 1 intronic variant (375T.C, or In1.1, present in haplotypes R3, R4 and R5). The haplotype R4 is defined by a 2222T.C SNP that is monomorphic in Caucasians and was therefore absent in our study population. A combined variable was then defined and tested in additive models: 0 = putatively deleterious haplotypes (presence of an R3 haplotype in the absence of R1 and R5), 1 = neutral haplotypes (all other) and 2 = haplotypes putatively protective (presence of an R1 or R5 haplotype in the absence of R3). For the TSG101 gene, 2 SNPs defined haplotype B (2183T/181C), haplotype C (2183C/ 181C) and haplotype A (2183T/181A). Again, a combined variable was defined and tested in additive models: 0 = haplotypes putatively deleterious (AC or CC), 1 = neutral haplotypes (AA or BC) and 2 = haplotypes putatively protective (AB or BB). Only variants from the chromosome 3 CCR5-CCR2 genomic region showed nominally significant association with the HIV-1-related outcomes under study. SNP: single nucleotide polymorphism. proxy: high-LD SNP (r2.0.8) that can be used as a tag for the original variant. r2: r-squared. p: p-value. doi:10.1371/journal.pgen.1000791.t003 against infection [20,21,22] and we here show that heterozygosity HIV-1 control. In the absence of further significant association at for the 32 bp deletion is the strongest protective factor for VL the individual SNP level, we sought to comprehensively assess the control and progression outside of MHC. While it is possible that impact of common variation by using the genotyping results in our analyses missed some real candidate gene associations, our several additional ways: these analyses provide no evidence for failure to replicate most previous reports confirms the critical strong interactions between the MHC or CCR5 polymorphisms importance of adopting stringent standards for significance level and any other common variant, or for CNP-related effects. We and stratification control [14,37,38]. also used a more global approach that shows enrichment for Our study was powered to detect a single marker association associated variants in some interesting gene sets, but was not that explains just above 1% of the inter-individual variability in designed to identify novel genetic determinants. Limitations to

PLoS Genetics | www.plosgenetics.org 7 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

Figure 4. Allelic distribution of the significant variants in subsets of the study population. The bar graphs show the allelic distribution of the 4 variants that have a genome-wide significant association with HIV-1 set point and/or disease progression in subsets of the study population. Groups were defined according to HIV-1 set point (left-hand side graphs) and to progression time (right-hand side graphs). doi:10.1371/journal.pgen.1000791.g004

PLoS Genetics | www.plosgenetics.org 8 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control these additional analyses include [1] the deliberately limited scope only if the last CD4 T cell count before cART start was ,500/ul. of our interaction screen, in which only pairs of SNPs including This later criterion was made necessary by the different rationales one that is significant have been tested (Text S1): we did not run behind treatment initiation between patients and over time: a large any exhaustive SNP by SNP or haplotype analyses; [2] the use of part of patients starting cART with CD4 T cell counts approaching SNP tags for copy number assessment, which limits the analysis to the 350/ul threshold did so because their CD4 slope actually showed previously described deletions and duplications that are in LD with a significant decrease, whereas most of patients who started treatment common SNPs. with normal or subnormal CD4 T cell counts had stable CD4 T cell All variants securely identified in this study together explain profiles. Since progression has been represented in a number of 13% of the observed variability in HIV-1 viremia in a population different ways in genetic studies, we used the known genetic of mostly male Caucasian adults. The addition of gender, age and determinants to confirm that our measure is the most accurate, by residual population structure to the genetic model pushes this comparing it to survival analyses that used cART start either as a figure up to 22%. These fractions compare favorably to what is censoring event (i.e. all patients starting cART are considered non- known for other complex traits, where dozens of SNPs often progressors) or as a progression event (i.e. all patients starting cART explain only a few percent of the variance. Comparable studies are are considered progressors) (Table S8). certainly needed in additional populations, notably in other ethnic groups, in women and in children to fully assess the impact of Genotyping common human genetic variation in HIV-1 control. All participants were genotyped using Illumina BeadChips: We Many factors certainly contribute to the large unexplained used HumanHap550 Beadchips for 1633 samples and Human1M portion of the inter-individual variability (viral genetics/fitness, Beadchips for 921 samples. Most common SNPs found in the environment, stochastic biological variation, noise in phenotype HapMap CEU population are readily covered by both chips; determination), but it is also expected that much more is however, it is not clear yet whether common variants that have not attributable to human genetic variation. The data presented here been genotyped in the HapMap project will be measured equally suggest that common polymorphisms are unlikely to add much, well. To increase the coverage of the MHC region in the samples unless more complex gene by gene and gene-environment genotyped with the 550K chip, we designed a customized MHC- interactions play a major role in the genetic architecture of chip that contained an additional 8000 SNPs, largely overlapping HIV-1 control. After an era of candidate genes studies and a first with the variants that are present on the Human1M chip. We wave of large-scale genomics projects that could only interrogate carried out a series of data cleaning and quality control common genetic variation, resequencing strategies to identify rare procedures: SNPs were filtered based on missingness (drop if call causal variants, as well as integration of multiple genome-level rate ,99%), MAF (drop if ,0.006) and Hardy-Weinberg data (genomic DNA, epigenetic marks, transcriptome, siRNA Equilibrium deviation. Participants were filtered based on call screens) will prove essential to better appreciate the global rate, gender check (heterozygosity testing), cryptic relatedness and contribution of the human genome to HIV-1 control. population structure (see EIGENSTRAT below). The genetic variants located in HIV-1 candidate genes that were not Methods represented directly or indirectly on the genome-wide chips were Ethics statement independently genotyped with TaqMan assays. All participating centers provided local institutional review board approval for genetic analysis, and each participant provided Gender check informed consent for genetic testing. The quality control of the genotyping data included a check on the gender specification obtained from the phenotype database, Patients/cohorts using the observed genotypes of SNPs on chromosome X and Y. Patients have been included from two sources (Text S1): [1] Subjects that were identified as ‘‘male’’ in the phenotype file but . Euro-CHAVI, a Consortium of 8 European and 1 Australian had a significant amount of heterozygous X genotypes ( = 1%), Cohorts/Studies; and [2] MACS, the Multicenter AIDS Cohort as well as subjects that was identified as ‘‘female’’ in the phenotype Study that enrolled homosexual and bisexual men in 4 US cities. file but had a high frequency of homozygous X genotypes In general, patients had viral load (VL) and CD4 count monitoring (. = 80%) and Y genotype readings were excluded. at least 6 monthly. Eligible patients had 3 or more stable plasma HIV RNA results in the absence of antiretroviral treatment, and Control for population stratification met one of the following criteria: a valid seroconversion date To control for the possibility of spurious associations resulting estimation proven by documents or biological markers; or, for from population stratification we used a modified EIGENSTRAT seroprevalent patients, VL data over a period of at least 3 years, approach. This method derives the principal components of the diverging by no more than 0.5 log. Only individuals with known correlations among gene variants and corrects for those correla- date of seroconversion and at least 2 CD4 T cell determinations in tions in the association tests. In principle therefore the principal absence of potent antiretroviral treatment (cART) were included components in the analyses should reflect population ancestry. in the progression analysis. Having noticed however that some of the leading axes depend on other sources of correlation, such as sets of variants near one Determination of phenotypes another that show extended association (LD), we inspected the HIV-1 set point was defined as the average of the remaining VL SNP loadings and followed a series of pruning procedures to results after careful assessment of each individual data and ensure that EIGENSTRAT axes reflected only effects that applied elimination of VL outliers: see Text S1 for criteria used to identify equally across the whole genome (Text S1). and exclude outlier VL data. Of note, our phenotype definition leads to the exclusion rapid progressors that never reach a stable VL Genome-wide association analysis plateau. The disease progression phenotype was defined as [1] the The core association analyses on HIV-1 setpoint focused on drop of CD4 T cells below 350/ul, or [2] the initiation of cART, but single-marker genotype-trend tests of the quality control–passed

PLoS Genetics | www.plosgenetics.org 9 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

SNPs using linear regression and including age, gender, and the splicing site genetic variants, with the global distribution of p- significant EIGENSTRAT axes values as covariates. Associations values of all genotyped SNPs (with 10,000 permutations). We also with progression were tested using a Cox proportional hazards ran the same analysis for the MHC region only (318 functional model. We assessed significance with a Bonferroni correction variants) and for the rest of the genome (12217 functional taking into account 1 million tests (p-value cutoff = 5E–08). variants).

Search for independent associations in the MHC Supporting Information To search for additional SNPs effects in the MHC region we used Figure S1 Distribution of individual eigen values along the first a forward selection algorithm to investigate all MHC SNPs with axis identified by principal component analysis of the genotyping p,1E–04 (N = 331). The top 2 associated variants and the standard data (Eigenstrat). Participants are grouped by country of covariates were fixed in a linear regression model, while new SNPs recruitment. The most important contributor to population were added and selected into the model one at a time. In the stratification in our Caucasian population is a North-to-South forward selection process, a newly selected SNP is expected to European ancestry axis. Participants from the USA and Australia explain a certain proportion of the set point variation independently are most similar to northern Europeans. of all variables already in the model. To control for multiple testing, Found at: doi:10.1371/journal.pgen.1000791.s001 (0.05 MB a permutation procedure was used to assess the empirical DOC) significance cut-off value. In the permutation, the LD patterns among SNPs were retained while the associations between SNPs Figure S2 QQ-plot of observed versus expected -log(p-values) and set point were permuted. 1000 permutation runs were for all SNPs in the set point association analysis. The plot shows no performed and the 5th percentile of the empirical distributions of deviation from the expected line, except for the top 3000 SNPs the partial R-squares and the p-values of the 1st,2nd,3rd, etc. (0.3%). selected SNPs were recorded and compared to the observed sample Found at: doi:10.1371/journal.pgen.1000791.s002 (0.03 MB statistics. The permutation p-value was defined as the probability DOC) that the statistics observed in the permutation was more extreme Figure S3 Associations between the SNPs identified in the than the statistics observed in the real sample. The forward selection ZNRD1/RNF39 region and disease progression in individuals with algorithm is expected to be conservative. To verify this, a large-scale and without HLA-A alleles belonging to the serogroup A10. The statistical simulation was performed, considering different effect rs9261174 minor allele C shows the strongest association with sizes, various LD patterns among SNPs and different sample sizes progression in individuals that also have an HLA-A10 (green), but (Text S1). The simulation results confirmed that the forward it also associates when HLA-A10 is absent (red). The LD (r2) selection algorithm with permutation procedure is conservative. between the ZNRD1 SNP rs9261174 and HLA-A10 as a group is 0.46. It is 0.17 for HLA-A*2501, 0.25 for A*2601, 0.00 for A*3402 Interaction analysis and 0.02 for A*6601. We performed a two-way interaction test between each QC- Found at: doi:10.1371/journal.pgen.1000791.s003 (0.06 MB passed SNP and the top associated variants. For each genome- DOC) wide significant polymorphism, the screening procedure involved Figure S4 P-value distribution of functional variants. P-value the calculation of the p-value associated with each of the ,1 distributions of 12,535 functional genetic variants (red, A) in million multivariate linear models incorporating the 2 polymor- comparison with 12,535 randomly selected intergenic variants phisms. The p-value associated with the interaction term for this (red, B), both in the background of p-value distributions generated model was retained and then the top p-values for all ,1 million from 10,000 permutations (blue, A and B). The upper figures show tests considered. This approach followed recommendations that the distributions of ranked -log10(P) for each of the observed and the search space for interactions can be appropriately reduced and permutated p-value series. The lower figures show the distribu- therefore the power can be improved by focusing the search on tions of the sum -log10(P) generated from the permutated series pairs of polymorphisms including at least one that is known to be (blue), their 95% cutoff of empirical probability (cyan), and the significant [39]. relative position and probability of the observed sum -log10(P) (red) in these distributions. The sum -log10(P) from the observed Gene set enrichment analysis (GSEA) functional genetic variants (red, A) is higher than what would be The SNPs that were represented in all genotyping chips were expected if there were no enrichment of low p-values in this series, mapped to their closest gene (if ,500 kb away) and used in the and this phenomenon is very unlikely to be due to chance GSEA analysis if minor allele frequency was .0.05, Hardy- (p = 0.004), while the data from randomly selected 12,535 Weinberg Equilibrium test p-value was .0.001, and at least 90% intergenic variants shows no difference (p = 0.76). The randomly individuals were successfully genotyped. Among the 639 canonical selected 12,535 intergenic variant series in (B) can be approxi- pathway gene sets that are represented in the Molecular mately viewed as one of the permutated data series in (A) except Signatures Database (MsigDB, http://www.broad.mit.edu/gsea/ that they are restricted to be annotated as intergenic, and is msigdb/collections.jsp), we tested the 298 gene sets that had at presented here for better illustration of the results. Functional least 20 and at most 200 genes represented in the study (Text S1). genetic variants are those annotated as falling into one or more of these categories: stop-gained, stop-lost, fame-shift coding, non- Functional variants synonymous coding, and essential splicing site genetic variants. A permutation procedure was developed to test whether genetic Ensembl database version 50_36i was used for these annotations. variants with known functional role were more likely to have a Found at: doi:10.1371/journal.pgen.1000791.s004 (0.13 MB lower p-values distribution than supposedly neutral variants (Text DOC) S1). We compared the overall p-value distribution of 12,535 Table S1 Participants included in the study. putatively functional polymorphisms, including stop-gained, stop- Found at: doi:10.1371/journal.pgen.1000791.s005 (0.04 MB lost, frame-shift coding, non-synonymous coding, and essential DOC)

PLoS Genetics | www.plosgenetics.org 10 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

Table S2 VL setpoint values for groups of individuals with or Table S11 Number of SNPs discarded during quality control without a recombination event between HCP5 and HLA-B. procedures. Found at: doi:10.1371/journal.pgen.1000791.s006 (0.03 MB Found at: doi:10.1371/journal.pgen.1000791.s015 (0.03 MB DOC) DOC) Table S3 Results of the stepwise forward selection of MHC Text S1 Supplementary text. SNPs. Found at: doi:10.1371/journal.pgen.1000791.s016 (0.09 MB Found at: doi:10.1371/journal.pgen.1000791.s007 (0.04 MB DOC) DOC) Table S4 Associations between 4-digit HLA Class I alleles and Acknowledgments HIV-1 set point in the subset of 1204 subjects with complete SNP We thank Steven A. McCarroll for giving us the tag-SNPs for CNPs. We and HLA typing results. gratefully acknowledge all patients, clinicians, and researchers from the Found at: doi:10.1371/journal.pgen.1000791.s008 (0.10 MB Euro-CHAVI Consortium and the Multicenter AIDS Cohort Study. A DOC) complete list of collaborators and funding agencies is available in Text S1. Table S5 Pairs of HLA-B and HLA-C alleles that are in linkage The Euro-CHAVI Consortium is coordinated by A. Telenti (University of Lausanne, Switzerland), with S. Colombo (University of Lausanne, disequilibrium (with an r2 of at least 0.1) in the subset of 1204 Switzerland) and B. Ledergerber (University of Zurich, Switzerland). individuals with complete HLA typing results. Participating Cohorts/Studies (Principal Investigators) are: Swiss HIV Found at: doi:10.1371/journal.pgen.1000791.s009 (0.07 MB Cohort Study, Switzerland (P Francioli); Icona Foundation Study, Rome, DOC) Italy (A. De Luca); IrsiCaixa Foundation, Barcelona, Spain (J. Martinez- Picado, with the help of J. Dalmau); San Raffaele del Monte Tabor Table S6 Associations between HLA-C alleles and HIV-1 set point in Foundation, Milan, Italy (A. Castagna); Danish Cohort, Denmark (N. the subset of 1204 subjects with complete SNP and HLA typing results. Obel); Royal Perth Hospital, Perth, Australia (S. Mallal); Guy’s, King’s, Found at: doi:10.1371/journal.pgen.1000791.s010 (0.06 MB and St. Thomas’ Hospitals, United Kingdom (P. Easterbrook); Modena DOC) Cohort, Modena, Italy (A. Cossarizza); Hospital Clinic-IDIBAPS cohort, Table S7 Results of the GSEA analysis. Barcelona, Spain (J. M. Gatell). The Multicenter AIDS Cohort Study has centers (Principal Investigators) Found at: doi:10.1371/journal.pgen.1000791.s011 (0.04 MB at The Johns Hopkins Bloomberg School of Public Health (Joseph B. DOC) Margolick, Lisa P. Jacobson), Howard Brown Health Center, Feinberg Table S8 Comparison of the strength of the association results School of Medicine, Northwestern University, and Cook County Bureau of for polymorphisms with clear association with HIV outcomes Health Services (John P. Phair, Steven M. Wolinsky), University of using different definitions of progression in survival analyses. California, Los Angeles (Roger Detels), and University of Pittsburgh (Charles R. Rinaldo). Found at: doi:10.1371/journal.pgen.1000791.s012 (0.03 MB DOC) Author Contributions Table S9 Top 500 SNPs in the setpoint analysis. Conceived and designed the experiments: JF JSB NLL AJM BFH MC AT Found at: doi:10.1371/journal.pgen.1000791.s013 (0.74 MB DBG. Performed the experiments: JF KVS TJU CEG JPS PD. Analyzed DOC) the data: JF DG BL ETC KZ SF. Contributed reagents/materials/analysis Table S10 Top 500 SNPs in the progression analysis. tools: KVS SC BL AC ACL ADL PE HFG SM CM JD JMP JMM NO Found at: doi:10.1371/journal.pgen.1000791.s014 (0.72 MB SMW JJM RD JBM LPJ SEA SJO BFH MC AT. Wrote the paper: JF AT DOC) DBG.

References 1. Telenti A, Bleiber G (2006) Host genetics of HIV-1 susceptibility. Future 11. Dalmasso C, Carpentier W, Meyer L, Rouzioux C, Goujard C, et al. (2008) Virology 1: 55–70. Distinct genetic loci control plasma HIV-RNA and cellular HIV-DNA levels in 2. O’Brien SJ, Nelson GW (2004) Human genes that limit AIDS. Nat Genet 36: HIV-1 infection: the ANRS Genome Wide Association 01 study. PLoS ONE 3: 565–574. e3907. doi:10.1371/journal.pone.0003907. 3. Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, et al. (2007) A whole- 12. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, et al. (2006) genome association study of major determinants for host control of HIV-1. Principal components analysis corrects for stratification in genome-wide Science 317: 944–947. association studies. Nat Genet 38: 904–909. 4. Martin MP, Carrington M (2005) Immunogenetics of viral infections. Curr Opin 13. Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, et al. (2008) Genes Immunol 17: 510–516. mirror geography within Europe. Nature. 5. Carrington M, O’Brien SJ (2003) The influence of HLA genotype on AIDS. 14. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, et al. (2008) Annu Rev Med 54: 535–551. Genome-wide association studies for complex traits: consensus, uncertainty and 6. Detels R, Liu Z, Hennessey K, Kan J, Visscher BR, et al. (1994) Resistance to challenges. Nat Rev Genet 9: 356–369. HIV-1 infection. Multicenter AIDS Cohort Study. J Acquir Immune Defic 15. Colombo S, Rauch A, Rotger M, Fellay J, Martinez R, et al. (2008) The HCP5 Syndr 7: 1263–1269. single-nucleotide polymorphism: a simple screening tool for prediction of 7. Catano G, Kulkarni H, He W, Marconi VC, Agan BK, et al. (2008) HIV-1 hypersensitivity reaction to abacavir. J Infect Dis 198: 864–867. disease-influencing effects associated with ZNRD1, HCP5 and HLA-C alleles 16. de Bakker PI, McVean G, Sabeti PC, Miretti MM, Green T, et al. (2006) A are attributable mainly to either HLA-A10 or HLA-B*57 alleles. PLoS ONE 3: high-resolution HLA and SNP haplotype map for disease association studies in e3636. doi:10.1371/journal.pone.0003636. the extended human MHC. Nat Genet 38: 1166–1172. 8. van Manen D, Kootstra NA, Boeser-Nunnink B, Handulle MA, van’t Wout AB, 17. Yuan H-Y, Chiou J-J, Tseng W-H, Liu C-H, Liu C-K, et al. (2006) FASTSNP: et al. (2009) Association of HLA-C and HCP5 gene regions with the clinical an always up-to-date and extendable service for SNP function analysis and course of HIV-1 infection. AIDS 23: 19–28. prioritization. Nucl Acids Res 34: W635–641. 9. Trachtenberg E, Bhattacharya T, Ladner M, Phair J, Erlich H, et al. (2009) The 18. Telenti A, Goldstein DB (2006) Genomics meets HIV-1. Nat Rev Microbiol 4: HLA-B/-C haplotype block contains major determinants for host control of 865–873. HIV. Genes Immun;Aug 20 [Epub ahead of print]. 19. Urban TJ, Weintrob AC, Fellay J, Colombo S, Shianna KV, et al. (2009) 10. Limou S, Le Clerc S, Coulonges C, Carpentier W, Dina C, et al. (2009) CCL3L1 and HIV/AIDS susceptibility. Nat Med 15: 1110–1112. Genomewide Association Study of an AIDS-Nonprogression Cohort Empha- 20. Samson M, Libert F, Doranz BJ, Rucker J, Liesnard C, et al. (1996) Resistance sizes the Role Played by HLA Genes (ANRS Genomewide Association Study to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 02). J Infect Dis 199: 419–426. chemokine receptor gene. Nature 382: 722–725.

PLoS Genetics | www.plosgenetics.org 11 December 2009 | Volume 5 | Issue 12 | e1000791 Human Genetic Variation and HIV Control

21. Dean M, Carrington M, Winkler C, Huttley GA, Smith MW, et al. (1996) 31. Shrestha S, Aissani B, Song W, Wilson CM, Kaslow RA, et al. (2009) Host Genetic restriction of HIV-1 infection and progression to AIDS by a deletion genetics and HIV-1 viral load set-point in African-Americans. AIDS 23: allele of the CKR5 structural gene. Hemophilia Growth and Development 673–677. Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, 32. Yoon W, Ma B, Fellay J, WH, SX, et al. (2009) A polymorphism in the HCP5 San Francisco City Cohort, ALIVE Study. Science 273: 1856–1862. gene associated with HLA-B*5701 does not restrict HIV-1 in vitro. AIDS In 22. Liu R, Paxton WA, Choe S, Ceradini D, Martin SR, et al. (1996) Homozygous press. defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed 33. Stranger BE, Forrest MS, Clark AG, Minichiello MJ, Deutsch S, et al. (2005) individuals to HIV-1 infection. Cell 86: 367–377. Genome-wide associations of gene expression variation in humans. PLoS Genet 23. Martin MP, Dean M, Smith MW, Winkler C, Gerrard B, et al. (1998) Genetic 1: e78. doi:10.1371/journal.pgen.0010078. acceleration of AIDS progression by a promoter variant of CCR5. Science 282: 34. Towers G (2007) The control of viral infection by tripartite motif proteins and 1907–1911. cyclophilin A. Retrovirology 4: 40. 24. Salkowitz JR, Bruse SE, Meyerson H, Valdez H, Mosier DE, et al. (2003) CCR5 35. Schneidewind A, Brockman MA, Yang R, Adam RI, Li B, et al. (2007) Escape promoter polymorphism determines macrophage CCR5 density and magnitude from the dominant HLA-B27-restricted cytotoxic T-lymphocyte response in Gag of HIV-1 propagation in vitro. Clin Immunol 108: 234–240. is associated with a dramatic reduction in human immunodeficiency virus type 1 25. Smith MW, Dean M, Carrington M, Winkler C, Huttley GA, et al. (1997) replication. J Virol 81: 12382–12393. Contrasting genetic influence of CCR2 and CCR5 variants on HIV-1 infection 36. Gao X, Nelson GW, Karacki P, Martin MP, Phair J, et al. (2001) Effect of a and disease progression. Hemophilia Growth and Development Study (HGDS), single amino acid change in MHC class I molecules on the rate of progression to Multicenter AIDS Cohort Study (MACS), Multicenter Hemophilia Cohort Study (MHCS), San Francisco City Cohort (SFCC), ALIVE Study. Science 277: AIDS. N Engl J Med 344: 1668–1675. 959–965. 37. Little J, Higgins JPT, Ioannidis JPA, Moher D, Gagnon F, et al. (2009) 26. Rizzardi GP, Morawetz RA, Vicenzi E, Ghezzi S, Poli G, et al. (1998) CCR2 STrengthening the REporting of Genetic Association Studies (STREGA); An polymorphism and HIV disease. Swiss HIV Cohort. Nat Med 4: 252–253. Extension of the STROBE Statement. PLoS Med 6: e22. doi:10.1371/ 27. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, et al. (2008) journal.pmed.1000022. Integrated detection and population-genetic analysis of SNPs and copy number 38. Attia J, Ioannidis JP, Thakkinstian A, McEvoy M, Scott RJ, et al. (2009) How to variation. Nat Genet 40: 1166–1174. use an article about genetic association: B: Are the results of the study valid? 28. Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005) JAMA 301: 191–197. Gene set enrichment analysis: A knowledge-based approach for interpreting 39. Brem RB, Storey JD, Whittle J, Kruglyak L (2005) Genetic interactions between genome-wide expression profiles. Proc Natl Acad Sci U S A 102: 15545–15550. polymorphisms that affect gene expression in yeast. Nature 436: 701–703. 29. Wang K, Li M, Bucan M (2007) Pathway-Based Approaches for Analysis of 40. Ge D, Zhang K, Need AC, Martin O, Fellay J, et al. (2008) WGAViewer: Genomewide Association Studies. Am J Hum Genet 81. software for genomic annotation of whole genome association studies. Genome 30. Han Y, Lai J, Barditch-Crovo P, Gallant JE, Williams TM, et al. (2008) The role Res 18: 640–643. of protective HCP5 and HLA-C associated polymorphisms in the control of HIV-1 replication in a subset of elite suppressors. AIDS 22: 541–544.

PLoS Genetics | www.plosgenetics.org 12 December 2009 | Volume 5 | Issue 12 | e1000791

Chapter 4

Common human genetic variants and HIV-1 susceptibility: a genome-wide survey in a homogeneous African population

Petrovski S*, Fellay J*, Shianna KV, Carpenetti N, Kumwenda J, Kamanga G, Kamwendo DD, Letvin NL, McMichael AJ, Haynes BF, Cohen MS, Goldstein DB; Center for HIV/AIDS Vaccine Immunology. Common human genetic variants and HIV-1 susceptibility: a genome-wide survey in a homogeneous African population. AIDS. 2011 Feb 20;25(4):513-8

CONCISE COMMUNICATION

Common human genetic variants and HIV-1 susceptibility: a genome-wide survey in a homogeneous African population

M M Slave´ Petrovskia,b, , Jacques Fellaya, , Kevin V. Shiannaa, Nicole Carpenettic, Johnstone Kumwendac, Gift Kamangad, Deborah D. Kamwendod, Norman L. Letvine, Andrew J. McMichaelf, Barton F. Haynesg, Myron S. Cohenh, David B. Goldsteina, on behalf of the Center for HIV/AIDS Vaccine Immunology

Objective: To date, CCR5 variants remain the only human genetic factors to be confirmed to impact HIV-1 acquisition. However, protective CCR5 variants are largely absent in African populations, in which sporadic resistance to HIV-1 infection is still unexplained. We investigated whether common genetic variants associate with HIV-1 susceptibility in Africans. Methods: We performed a genome-wide association study (GWAS) in a population of 1532 individuals from Malawi, a country with high prevalence of HIV-1 infection. Using single-nucleotide polymorphisms (SNPs) present on the genome-wide chip, we also investigated previously reported associations with HIV-1 susceptibility or acqui- sition. Recruitment was coordinated by the Center for HIV/AIDS Vaccine Immunology at two sexually transmitted infection clinics. HIV status was determined by HIV rapid tests and nucleic acid testing. Results: After quality control, the population consisted of 848 high-risk seronegative and 531 HIV-1 seropositive individuals. Logistic regression testing in an additive genetic model was performed for SNPs that passed quality control. No single SNP yielded a significant P value after correction for multiple testing. The study was sufficiently powered to detect markers with genotype relative risk 2.0 or more and minor allele frequencies 12% or more. Conclusion: This is the first GWAS of host determinants of HIV-1 susceptibility, performed in an African population. The absence of any significant association can have many possible explanations: rarer genetic variants or common variants with weaker effect could be responsible for the resistance phenotype; alternatively, aCenter for Human Genome Variation, School of Medicine, Duke University, Durham, North Carolina, USA, bDepartment of Medicine, University of Melbourne, Melbourne, Victoria, Australia, cCollege of Medicine, University of Malawi, Blantyre, dUniversity of North Carolina Project, Lilongwe, Malawi, eDivision of Viral Pathogenesis, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, USA, fMRC Human Immunology Unit, Weatherall Institute of Molecular Medicine, John Radcliffe Hospital, Oxford, UK, gDuke Human Vaccine Institute, Duke University, Durham, and hDepartment of Epidemiology, University of North Carolina, Chapel Hill, North Carolina, USA. Correspondence to Slave´ Petrovski, Centre for Human Genome Variation, School of Medicine, Duke University, Durham, NC 27708, USA. Tel: +1 919 684 0896; fax: +1 919 668 6787; e-mail: [email protected] S.P. and J.F. contributed equally to the writing of this article. Received: 24 September 2010; revised: 24 November 2010; accepted: 2 December 2010.

DOI:10.1097/QAD.0b013e328343817b

ISSN 0269-9370 Q 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins 513 514 AIDS 2011, Vol 25 No 4

resistance to HIV-1 infection might be due to nongenetic parameters or to complex interactions between genes, immunity and environment. ß 2011 Wolters Kluwer Health | Lippincott Williams & Wilkins

AIDS 2011, 25:513–518

Keywords: acquisition, Africa, genome-wide association study, HIV-1, resistance

Introduction infection (STI) clinics in Blantyre and Lilongwe, Malawi. These clinics are integrated in the Center for HIV/AIDS Throughout the history of the AIDS epidemic, subsets of Vaccine Immunology (CHAVI) Clinical Core. The individuals have appeared to resist HIV-1 infection prevalence of HIV-1 infection in Malawi is one of the despite multiple exposures to the virus [1,2]. However, highest in the world, with an estimated 12% of adults almost 30 years after the first description of AIDS, variants infected [12]. An even higher prevalence (around 30%) in the CCR5 gene remain the only human genetic was observed among the patients screened for this study. variants that have been proven to significantly impact We therefore assume that sexually active individuals HIV-1 acquisition [3,4]. When present in homozygous or recruited at these sites were likely to have been exposed to combined heterozygous form, the D32 deletion and the the HIV virus. We did not, however, collect information much rarer m303T>A point mutation confer complete about individual exposure level, sexual orientation, or resistance to infection by viruses that use CCR5 as co- intravenous drug use. receptor. Nevertheless, these mutations only explain a fraction of the apparently HIV-1-exposed, yet uninfected This study was approved by all local and by the sponsoring cases. Importantly, they are only found in individuals with institution’s ethics committees. All participants consented a northern European or central Asian heritage and thus to a blood sample collection and genetic testing. HIV are not responsible for resistance observed in African status was determined by HIV rapid tests and nucleic acid populations [5]. The identification of additional human testing (NAT): a positive HIV-1 diagnosis required a genetic factors influencing HIV-1 susceptibility would confirmed positive rapid test and a negative HIV-1 shed new light on transmission mechanisms and diagnosis was based on two negative rapid tests followed pathogenesis, and potentially suggest novel preventive by a negative NAT, or discordant results from rapid tests or therapeutic approaches. with a negative NAT.

Genome-wide association studies (GWASs) are a widely Genome-wise association study genotyping and accepted approach for the investigation of common quality control genetic variation in the human genome [6]. Not relying DNA samples were genotyped using either the Illumina on candidate gene selection, these hypothesis-free studies Human 1M or 1M-Duo DNA Analysis BeadChips have the potential to implicate new genomic regions and (Illumina Inc., San Diego, California, USA). Genotype pathways affecting complex human traits and diseases. clustering was performed using the Infinium BeadStudio Several recent GWASs have provided a detailed descrip- program. Samples that obtained a very low intensity or tion of how common variation influences control of call rate (<99%) were excluded. Further quality control HIV-1 in infected individuals from European and was performed using PLINK [13], by checking the African–America ancestry [7–10]. To date, however, genetic sex and removing the sex misclassified individuals. there have been no reported GWASs of HIV-1 resistance/ Then, cryptic relatedness was assessed using pair-wise susceptibility. A major reason for this has been the identity-by-descent (IBD). All pairs of DNA samples difficulty in recruiting enough well characterized, highly showing 0.125 or more (estimated proportion of alleles exposed, yet seronegative individuals for an adequately IBD) were individually inspected, and one sample in each powered study [11]. We here describe the first GWAS of pair was excluded from further analyses. host determinants of HIV-1 susceptibility, performed in a homogeneous African population. To account for the possibility of spurious associations resulting from residual population stratification, we used a modified EIGENSTRAT method to correct for popu- Methods lation ancestry within the remaining case and control data [14]. Study population To identify common gene variants influencing HIV-1 Genome-wide association analysis acquisition in the highly affected sub-Saharan African We searched for an association between HIV infection region, we performed a GWAS in a population of over status and each of the single-marker genotypes by logistic 1500 individuals recruited from two sexually transmitted regression in an additive genetic model using PLINK, GWAS for HIV-1 protection in Africans Petrovski et al. 515 correcting for age, sex and the significant principal (Supplementary Table S1, http://links.lww.com/QAD/ component analysis axes identified with EIGENSTRAT. A110). Bonferroni correction was applied to correct for multiple testing; however, we first used a linkage disequilibrium The Q–Q plot of the GWAS P value distribution shows pruning procedure to remove entirely dependent that the distributions of the observed and expected markers, defined as r2 ¼ 1, and then used the Bonferroni P values are very similar, with a lambda value of 0.9982 adjustment based on this reduced set of SNPs. This that suggests no inflation of association signals after allowed for improved control of multiple marker testing. correction for population stratification.

Power calculations for association analysis were performed As an additional subset analysis, based on previously using the genetic power calculator (GPC) (available at published reports of variants associated with HIV-1 http://pngu.mgh.harvard.edu/purcell/gpc/) [15]. susceptibility, we checked the P values across 22 candidate genes of 36 previously reported candidate SNPs or their Previous studies have reported variants that might closest proxies within 100 kb. For each gene, we report the influence HIV-1 susceptibility in other populations. candidate SNPs when possible, and otherwise the best We investigated whether these previously reported available proxy.Wealso report the lowest P value identified associations could be replicated in a genome-wide within each candidate gene, first uncorrected and then context within the Malawi study by looking at the corrected for the number of SNPs analyzed in that gene SNP variants that have previously been published with a (Table 1 [3,4,17–46]). However, failure to find a significant P less than 0.05. If the originally reported SNP was not association within a candidate gene in which the originally genotyped, we report the best available proxy SNP based reported SNP was not examined does not translate to a on the HapMap YRI data, also reporting the r2 value. failure to replicate the original association. Wewere able to Moreover, we report the SNP with the lowest P value for directly test 17 of the 36 SNPs previously reported. For each of the previously reported candidate genes. those SNPs not present on the chips, none have good proxies r2 more than 0.8. Of these 17 SNPs, only rs1946518-IL18, originally reported to increase suscepti- bility to HIV-1infection in a pediatric Brazilian population (P ¼ 0.02) [17], was significant in our Malawi study at the Results P less than 0.05 level in which the C allele is significantly more represented in the HIV-positive Malawi group A total of 1532 Chichewa-speaking individuals recruited than in the high-risk seronegative group (67 vs. 62%, from Malawi STI clinics between December 2006 and P ¼ 0.004). The meta P value remains nonsignificant at the August 2008 were genotyped. Of these, n ¼ 922 (60.2%) genome-wide level, P ¼ 0.001, Stouffer’s z-trend. were HIV-negative cases and n ¼ 610 (39.8%) were HIV- positive controls. Of the 22 candidate genes tested, 13 genes were found to have at least one SNP significant at the P less than 0.05 level, DNA samples from 86 individuals (6%) did not pass initial with the most strongly associated gene, CXCL12 [18], quality control filtering. An additional 19 individuals represented byeight SNPs below the P less than 0.001 level. were removed due to sex misclassification and 37 were After correcting for the number of SNPs per gene, only removed due to cryptic relatedness. In addition, to assess rs2437935-CXCL12 remained significant with a P value of population stratification, PCA was performed on a subset 0.01 (corrected for 179 SNPs). However, that P value of 191 212 SNPs not in Linkage Disequilibrium. In the increased to 0.085 when correcting for all 1208 SNPs first iteration, 11 outliers were identified and excluded. tested across the 22 candidate genes. Following the above quality control steps, the population adopted in the association testing consisted of 848 HIV- negative cases, of which 52% were women and 531 HIV- positive controls, of which 62% were women. Age Discussion distribution significantly differed between the HIV-1 seropositive and seronegative samples [median 29 (range Here we performed a GWAS of determinants of resistance 18–62) vs. 29 (range 20–66), P ¼ 0.002]. to HIV-1 infection by testing for associations at over 800 000 SNPs. We failed to detect significant signals for Logistic regression testing in an additive genetic model differences in HIV-1 susceptibility in this study of samples was performed for the 844 489 single markers that passed collected from Malawi STI clinics. Our lowest P value was 6 quality control. No single SNP yielded a P value below 3.97 10 , which is substantially higher than the Pcutoff 8 8 the Pcutoff ¼ 6.03 10 (Supplementary Fig. S1, Man- estimated to be 6.03 10 on this dataset. hattan Plot, http://links.lww.com/QAD/A111). An annotated list of all markers obtaining a P value less than This is, to our knowledge, the first report of a genome- 2 104 was generated using WGAviewer software [16] wide search for genetic variants associated with differences 516 AIDS 2011, Vol 25 No 4 . [31] . [42] . [33] et al et al et al value for the . [29]; . [35]; . [32] P [18]; Modi . [29] . [24] . [44] et al . [26] et al . [28]; Pastinen [3]; Samson . [45] [17] [37] et al . [34] . [23]; Milanese . [40] . [25] et al . [37] . [37] . [19]; Fernandez et al . [43]; Naicker et al. . [21] et al et al [39] . [30] [4]; Huang . [20]; Rits et al et al et al et al et al . [36] . [38] . [41]; Boniotto . [22]; Vallinoto . [46] . [27]; Liu et al et al. et al. et al et al et al. et al et al et al et al et al et al. et al et al et al. et al et al et al et al An Gonzalez wide Genome- correction Reference value identified per candidate gene; column P Set- wide correction value for SNPs linked to the candidate gene; column 7, P value as identified in the YRI population for the candidate 2 r wide Gene- value for the lowest correction P 3 1 1 1 Dean 3 0.33 1 1 Modi 9 0.27 1 1 An 8 0.8 1 1 He 4 1 1 1 Valcke 6 0.9 1 1 Colobran 3 0.3 1 1 Gonzalez 13 1 1 1 Modi in gene Number of SNPs tested P value identified within the candidate gene; column 9, set-wide corrected in gene 0.004 17 0.07 1 1 Segat 0.19 0.03 51 10.03 1 188 1 1 Braida 1 1 Tchilian 0.2 0.07 19 1 1 1 Martin 0.11 0.03 0.26 0.15 Lowest P P SNP 0.004 0.2 0.007 18 0.13 1 1 Shin 0.34 0.34 12 1 1 1 Ball 0.210.56 0.007 0.03 33 32 0.23 0.96 1 1 1 1 Oyugi Faure 0.45 0.001 297 0.3 1 1 Garred 1 0.1 0.68 0.04 10 0.4 1 1 McDermott 0.54 0.0008 175 0.14 0.96 1 Modi 0.87 0.004 105 0.42 1 1 Fellay J replication value for the lowest P in YRI) 2 r rs10900029 (0.22) 0.8 7.07E-05 179 0.01 0.08 0.6 Petersen Closest proxy ( rs1719126 (0.43) 0.41 0.1 value for the SNPs identified in column 2, or column 4, if a proxy SNP; column 6, the lowest rs10838525 rs10769175 (0.19) 0.67 0.005 23 0.12 1 1 Javanbakht rs1801157 Previously associated SNPs represented by an LD proxy rs1719134 P rs1946518 rs1800872 rs1800896 rs17848424 rs28919570 rs3732378 rs3732379 rs5030737 rs1800450 rs1800451 rs3740996 rs2814778 rs2107538 rs2280789 rs1024610 rs1024611 rs1045642 value identified per candidate gene and corrected for all candidate genes investigated; column 10, genome-wide corrected P IL18 IL10 IRF1 CCR5 CD4 CX3CR1 11, studies reporting corresponding candidate SNPs and genes. DEFB1 MBL2 PTPRC TRIM5 Column 1, candidate gene; column 2,candidate the SNPs identical originally candidate single-nucleotide identified polymorphisms and (SNPs) available originally via identified a and proxy available found after on passing the QC 1M-Duo on the chip; 1M-Duo column chip; 4, column the 3, proxy the SNP identical and the corresponding DARC CXCL12 CCL11 CD209 CCL7 lowest the number of SNPs tested in each candidate gene; column 8, gene-wide corrected SNP identified in column 3; column 5, the lowest Table 1. Candidate study of previously reported variants associated with HIV-1 susceptibility. Gene Previously associated SNPs PPIA CCL5 CCL2 APOBEC3G ABCB1 CCL4 CCL3 GWAS for HIV-1 protection in Africans Petrovski et al. 517 in susceptibility to HIV-1 infection. We studied a J.F., K.V.S., N.L.L., A.J.M., B.F.H., M.SC. and D.B.G. homogenous sub-Saharan African population, comparing contributed to the design of the study. S.P. and J.F. genotypes of HIV-infected and noninfected individuals analyzed the data. S.P. and J.F. wrote the paper and all that attended the same STI clinics in Malawi. Due to the coauthors reviewed the manuscript. N.C., J.K., G.K., high prevalence of HIV-1 in this region, it is believed that D.D.K. and members of the CHAVI team designed, HIV-negative individuals attending these STI clinics are in established and maintained the study cohort and provided a high-risk category and are likely to have been exposed to the samples. All authors contributed to interpreting the the virus. However, clinical data on exposure details were data, revising the manuscript, and reading and approving not collected, and as such, we had no information on the final version. the number and type of sexual contacts, the number of partners, sexual orientation, co-infections or discordance We thank all the individuals who agreed to participate in in long-term relationship with a known HIV-1-infected the study and the staff at the two STI clinics (Blantyre and partner. Lilongwe) in Malawi that recruited the participants. Funding was provided by the NIAID Center for HIV-1/ Failing to detect a GWAS signal in this study can have AIDS Vaccine Immunology grant AI067854. S.P.acknow- many possible explanations, including the hypothesis that ledges a Sir Keith Murdoch Fellowship from the American resistance or reduced susceptibility to HIV-1 infection Australian Association. might be due to complex interactions between innate and acquired immunity, modulated by epistasis and environ- ment. Nongenetic factors might include mode of transmission, concurrent STIs, viral load of infected References partner and multiple viral strain exposures. However, it is still possible that resistance or reduced susceptibility to 1. Detels R, Liu Z, Hennessey K, Kan J, Visscher BR, Taylor JM, HIV-1 infection is due to common human genetic variants et al. Resistance to HIV-1 infection. Multicenter AIDS Cohort not identified in the present study, either because they are Study. J Acquir Immune Defic Syndr 1994; 7:1263–1269. 2. Lederman MM, Alter G, Daskalakis DC, Rodriguez B, Sieg SF, not represented, directly or indirectly, on the genome- Hardy G, et al. Determinants of protection among HIV-exposed wide genotyping chip that we used (the Illumina 1M-Duo seronegative persons: an overview. J Infect Dis 2010; 202 chip is the best currently available platform for investigat- (S3):S333–S338. 3. Dean M, Carrington M, Winkler C, Huttley GA, Smith MW, ing African population [47], but it still has suboptimal Allikmets R, et al. Genetic restriction of HIV-1 infection and coverage), or because they have relatively weak effects, progression to AIDS by a deletion allele of the CKR5 structural undetected with our sample size. Even when there is gene. Hemophilia Growth and Development Study, Multicen- ter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, incomplete exposure in the high-risk seronegative group, San Francisco City Cohort, ALIVE Study. Science 1996; 273: there is still an expectation of allele frequency imbalance, as 1856–1862. 4. Samson M, Libert F, Doranz BJ, Rucker J, Liesnard C, Farber CM, the HIV-positive individuals are infectable, and the power et al. Resistance to HIV-1 infection in Caucasian individuals depends on the precise degree of exposure. Under the bearing mutant alleles of the CCR-5 chemokine receptor gene. assumption that all seronegative individuals recruited at the Nature 1996; 382:722–725. 5. Fowke KR, Nagelkerke NJ, Kimani J, Simonsen JN, Anzala AO, Malawi STI clinics have some protection, given MAFs of Bwayo JJ, et al. Resistance to HIV-1 infection among persis- 5 and 20%, under an additive disease model with a type I tently seronegative prostitutes in Nairobi, Kenya. Lancet 1996; error rate of 6.03 108, our Malawi study provides at 348:1347–1351. 6. McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J, least 80% power to detect an association for HIV-1reduced Ioannidis JP, Hirschhorn JN. Genome-wide association studies susceptibility, with genotype relative risks (GRR) of 2.65 for complex traits: consensus, uncertainty and challenges. Nat or more and more than 1.7, respectively. Moreover, the Rev Genet 2008; 9:356–369. 7. Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, Weale power to detect markers with GRRs 2.0 or more was 13, M, et al. A whole-genome association study of major deter- 68 and 99% for markers with 5, 10 and 20% MAFs, minants for host control of HIV-1. Science 2007; 317:944– respectively. Considering a larger population and thus 947. 8. Limou S, Le Clerc S, Coulonges C, Carpentier W, Dina C, increasing power might detect a signal from African Delaneau O, et al. Genomewide association study of an populations. An alternative hypothesis to common causa- AIDS-nonprogression cohort emphasizes the role played by tion would be that rare variants are causal, and therefore, HLA genes (ANRS Genomewide Association Study 02). J Infect Dis 2009; 199:419–426. resequencing efforts on highly exposed, HIV-1 uninfected 9. Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, Cirulli individuals might return informative data on the genetic ET, et al. Common genetic variation and the control of HIV-1 in humans. PLoS Genet 2009; 5:e1000791. determinants of HIV-1 resistance. 10. Pelak K, Goldstein DB, Walley NM, Fellay J, Ge D, Shianna KV, et al. Host determinants of HIV-1 control in African Americans. J Infect Dis 2010; 201:1141–1149. 11. Horton RE, McClaren PJ, Fowke K, Kimani J, Ball TB. Cohorts for the study of HIV-1-exposed but uninfected individuals: Acknowledgements benefits and limitations. J Infect Dis 2010; 202:S377–S381. 12. UNAIDS. 2008 report on the global AIDS epidemic. UNAIDS World Health Organization (WHO); 2008. http://www.unaids. The funding was provided by the NIAID Center for org/en/KnowledgeCentre/HIVData/GlobalReport/2008/2008_ HIV-1/AIDS Vaccine Immunology grant AI067854. Global_report.asp. [Accessed 16 April 201] 518 AIDS 2011, Vol 25 No 4

13. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, 30. He W, Neil S, Kulkarni H, Wright E, Agan BK, Marconi VC, et al. Bender D, et al. PLINK: a toolset for whole-genome association Duffy antigen receptor for chemokines mediates trans-infec- and population-based linkage analysis. Am J Hum Genet 2007; tion of HIV-1 from red blood cells to target cells and affects 81:557–575. HIV-AIDS susceptibility. Cell Host Microbe 2008; 4:52–62. 14. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, 31. Huang Y, Paxton WA, Wolinsky SM, Neumann AU, Zhang L, He Reich D. Principal components analysis corrects for stratifica- T, et al. The role of a mutant CCR5 allele in HIV-1 transmission tion in genome-wide association studies. Nat Genet 2006; and disease progression. Nat Med 1996; 2:1240–1243. 38:904–909. 32. Javanbakht H, An P, Gold B, Petersen DC, O’Huigin C, Nelson 15. Purcell S, Cherny SS, Sham PC. Genetic power calculator: GW, et al. Effects of human TRIM5alpha polymorphisms on design of linkage and association genetic mapping studies of antiretroviral function and susceptibility to human immuno- complex traits. Bioinformatics 2003; 19:149–150. deficiency virus infection. Virology 2006; 354:15–27. 16. Ge D, Zhang D, Need AC, Martin O, Fellay J, Urban TJ, et al. 33. Liu H, Hwangbo Y, Holte S, Lee J, Wang C, Kaupp N, et al. WGAViewer: software for genomic annotation of whole gen- Analysis of genetic polymorphisms in CCR5, CCR2, stromal ome association studies. Genome Res 2008; 18:640–643. cell-derived factor-1, RANTES, and dendritic cell-specific in- 17. Segat L, Bevilacqua D, Boniotto M, Arraes LC, de Souza PR, de tercellular adhesion molecule-3-grabbing nonintegrin in ser- Lima Filho JL, Crovella S. IL-18 gene promoter polymorphism is onegative individuals repeatedly exposed to HIV-1. J Infect Dis involved in HIV-1 infection in a Brazilian pediatric population. 2004; 190:1055–1058. Immunogenetics 2006; 58:471–473. 34. Martin MP, Lederman MM, Hutcheson HB, Goedert JJ, Nelson 18. Petersen DC, Glashoff RH, Shrestha S, Bergeron J, Laten A, Gold GW, van Kooyk Y, et al. Association of DC-SIGN promoter B, et al. Risk for HIV-1 infection associated with a common polymorphism with increased risk for parenteral, but not CXCL12 (SDF1) polymorphism and CXCR4 variation in an mucosal, acquisition of human immunodeficiency virus type African population. J Acquir Immune Defic Syndr 2005; 1 infection. J Virol 2004; 78:14053–14056. 40:521–526. 35. McDermott DH, Beecroft MJ, Kleeberger CA, Al-Sharif FM, 19. An P, Nelson GW, Wang L, Donfield S, Goedert JJ, Phair J, et al. Ollier WE, Zimmerman PA, et al. Chemokine RANTES promo- Modulating influence on HIV/AIDS by interacting RANTES ter polymorphism affects risk of both HIV infection and disease gene variants. Proc Natl Acad Sci U S A 2002; 99:10002– progression in the Multicenter AIDS Cohort Study. AIDS 2000; 10007. 14:2671–2678. 20. An P, Wang LH, Hutcheson-Dilks H, Nelson G, Donfield S, 36. Milanese M, Segat L, Pontillo A, Arraes LC, de Lima Filho JL, Goedert JJ, et al. Regulatory polymorphisms in the cyclophilin A Crovella S. DEFB1 gene polymorphisms and increased risk of gene, PPIA, accelerate progression to AIDS. PLoS Pathog 2007; HIV-1 infection in Brazilian children. AIDS 2006; 20:1673– 3:e88. 1675. 21. Ball TB, Ji H, Kimani J, McLaren P, Marlin C, Hill AV, Plummer 37. Modi WS, Goedert JJ, Strathdee S, Buchbinder S, Detels R, FA. Polymorphisms in IRF-1 associated with resistance to HIV- Donfield S, et al. MCP-1-MCP-3-Eotaxin gene cluster influences 1 infection in highly exposed uninfected Kenyan sex workers. HIV-1 transmission. AIDS 2003; 17:2357–2365. AIDS 2007; 21:1091–1101. 38. Modi WS, Scott K, Goedert JJ, Vlahov D, Buchbinder S, Detels R, 22. Boniotto M, Braida L, Pirulli D, Arraes L, Amoroso A, Crovella S. et al. Haplotype analysis of the SDF-1 (CXCL12) gene in a MBL2 polymorphisms are involved in HIV-1 infection in longitudinal HIV-1/AIDScohort study. Genes Immun 2005; Brazilian perinatally infected children. AIDS 2003; 17:779– 6:691–698. 780. 39. Naicker DD, Werner L, Kormuth E, Passmore JA, Mlisana K, 23. Braida L, Boniotto M, Pontillo A, Tovo PA, Amoroso A, Crovella Karim SA, Ndung’u T. Interleukin-10 promoter polymorphisms S. A single-nucleotide polymorphism in the human beta-de- influence HIV-1 susceptibility and primary HIV-1 pathogen- fensin 1 gene is associated with HIV-1 infection in Italian esis. J Infect Dis 2009; 200:448–452. children. AIDS 2004; 18:1598–1600. 40. Oyugi JO, Vouriot FC, Alimonti J, Wayne S, Luo M, Land AM, 24. Colobran R, Adreani P, Ashhab Y, Llano A, Este JA, Dominguez et al. A common CD4 gene variant is associated with an O, et al. Multiple products derived from two CCL4 loci: high increased risk of HIV-1 infection in Kenyan female commercial incidence of a new polymorphism in HIVR patients. J Immunol sex workers. J Infect Dis 2009; 199:1327–1334. 2005; 174:5655–5664. 41. Pastinen T, Liitsola K, Niini P, Salminen M, Syva¨nen AC. 25. Faure S, Meyer L, Costagliola D, Vaneensberghe C, Genin E, Contribution of the CCR5 and MBL genes to susceptibility to Autran B, et al. Rapid progression to AIDS in HIVR individuals HIV type 1 infection in the Finnish population. AIDS Res Hum with a structural variant of the chemokine receptor CX3CR1. Retroviruses 1998; 14:695–698. Science 2000; 287:2274–2277. 42. Rits MA, van Dort KA, Kootstra NA. Polymorphisms in the 26. Fellay J, Marzolini C, Meaden ER, Back DJ, Buclin T, Chave JP, regulatory region of the Cyclophilin A gene influence the et al. Response to antiretroviral treatment in HIV-1-infected susceptibility for HIV-1 infection. PLoS One 2008; 3:e3975. individuals with allelic variants of the multidrug resistance 43. Shin HD, Winkler C, Stephens JC, Bream J, Young H, Goedert JJ, transporter 1: a pharmacogenetics study. Lancet 2002; 359:30– et al. Genetic restriction of HIV-1 pathogenesis to AIDS by 36. promoter alleles of IL10. Proc Natl Acad Sci U S A 2000; 27. Fernandez RM, Borrego S, Marcos I, Rubio A, Lissen E, Antinolo 97:14467–144721. G. Fluorescence resonance energy transfer analysis of the 44. Tchilian EZ, Wallace DL, Dawes R, Imami N, Burton C, Gotch F, RANTES polymorphisms -403G –> A and -28G –> C: evaluation Beverley PC. A point mutation in CD45 may be associated with of both variants as susceptibility factors to HIV type 1 infection an increased risk of HIV-1 infection. AIDS 2001; 15:1892–1894. in the Spanish population. AIDS Res Hum Retroviruses 2003; 45. Valcke HS, Bernard NF, Bruneau J, Alary M, Tsoukas CM, Roger 19:349–352. M. APOBEC3G genetic variants and their association with risk 28. Garred P, Richter C, Andersen AB, Madsen HO, Mtoni I, of HIV infection in highly exposed Caucasians. AIDS 2006; Svejgaard A, Shao J. Mannan-binding lectin in the sub-Saharan 20:1984–1986. HIV and tuberculosis epidemics. Scand J Immunol 1997; 46. Vallinoto AC, Menezes-Costa MR, Alves AE, Machado LF, de 46:204–208. Azevedo VN, Souza LL, et al. Mannose-binding lectin gene 29. Gonzalez E, Dhanda R, Bamshad M, Mummidi S, Geevarghese polymorphism and its impact on human immunodeficiency R, Catano G, et al. Global survey of genetic variation in CCR5, virus 1 infection. Mol Immunol 2006; 43:1358–1362. RANTES, and MIP-1alpha: impact on the epidemiology of the 47. Bhangale TR, Rieder MJ, Nickerson DA. Estimating coverage HIV-1 pandemic. Proc Natl Acad Sci U S A 2001; 98:5199– and power for genetic association studies using near-complete 5204. variation data. Nat Genet 2008; 40:841–843.

Chapter 5

A genome-wide association study of resistance to HIV infection in highly exposed uninfected individuals with hemophilia A

Lane J*, McLaren PJ*, Dorrell L, Shianna KV, Stemke A, Pelak K, Moore S, Oldenburg J, Alvarez-Roman MT, Angelillo-Scherrer A, Boehlen F, Bolton-Maggs PH, Brand B, Brown D, Chiang E, Cid-Haro AR, Clotet B, Collins P, Colombo S, Dalmau J, Fogarty P, Giangrande P, Gringeri A, Iyer R, Katsarou O, Kempton C, Kuriakose P, Lin J, Makris M, Manco-Johnson M, Tsakiris DA, Martinez-Picado J, Mauser-Bunschoten E, Neff A, Oka S, Oyesiku L, Parra R, Peter-Salonen K, Powell J, Recht M, Shapiro A, Stine K, Talks K, Telenti A, Wilde J, Yee TT, Wolinsky SM, Martinson J, Hussain SK, Bream JH, Jacobson LP, Carrington M, Goedert JJ, Haynes BF, McMichael AJ, Goldstein DB, Fellay J; NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI). A genome-wide association study of resistance to HIV infection in highly exposed uninfected individuals with hemophilia A. Hum Mol Genet. 2013 May 1;22(9):1903-10

Human Molecular Genetics, 2013, Vol. 22, No. 9 1903–1910 doi:10.1093/hmg/ddt033 Advance Access published on January 30, 2013 A genome-wide association study of resistance to HIV infection in highly exposed uninfected individuals with hemophilia A

Je´roˆ me Lane1,{, Paul J. McLaren1,2,3,{, Lucy Dorrell4, Kevin V. Shianna5, Amanda Stemke6, Kimberly Pelak5, Stephen Moore4, Johannes Oldenburg7, Maria Teresa Alvarez-Roman8, Anne Angelillo-Scherrer9, Francoise Boehlen10, Paula H.B. Bolton-Maggs11, Brigit Brand12, Deborah Brown13, Elaine Chiang14, Ana Rosa Cid-Haro15, Bonaventura Clotet16, Peter Collins17, Sara Colombo2, Judith Dalmau16, Patrick Fogarty18, Paul Giangrande19, Alessandro Gringeri20, Rathi Iyer21, Olga Katsarou22, Christine Kempton23, Philip Kuriakose24, Judith Lin25, Mike Makris26, Marilyn Manco-Johnson27, Dimitrios A. Tsakiris28, Javier Martinez-Picado16, Evelien Mauser- Downloaded from Bunschoten29, Anne Neff30, Shinichi Oka31, Lara Oyesiku19, Rafael Parra32, Kristiina Peter- Salonen33, Jerry Powell34, Michael Recht35, Amy Shapiro36, Kimo Stine37, Katherine Talks38, Amalio Telenti2, Jonathan Wilde39, Thynn Thynn Yee40, Steven M. Wolinsky41, Jeremy 42 43 44 45 46,47

Martinson , Shehnaz K. Hussain , Jay H. Bream , Lisa P. Jacobson , Mary Carrington , http://hmg.oxfordjournals.org/ James J. Goedert48, Barton F. Haynes6, Andrew J. McMichael4, David B. Goldstein5 and Jacques Fellay1,2,5,∗ for the NIAID Center for HIV/AIDS Vaccine Immunology (CHAVI)

1School of Life Sciences, Ecole Polytechnique Fe´de´rale de Lausanne, Lausanne, Switzerland 2Institute of Microbiology, University Hospital and University of Lausanne, Lausanne, Switzerland 3Program in Medical and Population Genetics, The Broad Institute of MIT and Harvard, Cambridge, MA, USA 4Weatherall Institute of Molecular Medicine, University of Oxford, Oxford, UK 5Center for Human Genome Variation, Duke University School of by guest on July 29, 2013 Medicine, Durham, NC, USA 6Duke Human Vaccine Institute, Duke University, Durham, NC, USA 7Institute of Experimental Haematology and Transfusion Medicine, University Clinic Bonn, Bonn, Germany 8Unidad de Coagulopatias Congenitas, La Paz University Hospital, Madrid, Spain 9Service and Central Laboratory of Hematology, Lausanne University Hospital, Lausanne, Switzerland 10Division of Angiology and Haemostasis, Department of Specialties of Medicine, University Hospitals, Geneva, Switzerland 11Manchester Haemophilia Comprehensive Care Centre, Manchester Royal Infirmary, University of Manchester, Manchester, UK 12Division of Hematology, University Hospital, Zurich, Switzerland 13Gulf States Hemophilia and Thrombophilia Center, Houston, TX, USA 14University of Pennsylvania, Penn Comprehensive Hemophilia and Thrombosis Program, Philadelphia, PA, USA 15Unidad de Coagulopatias Congenitas, La Fe University Hospital, Valencia, Spain 16AIDS Research Institute IrsiCaixa, Institut d’Investigacio´ en Cie`ncies de la Salut Germans Trias i Pujol, Universitat Auto`noma de Barcelona, Badalona, Spain 17Arthur Bloom Haemophilia Centre, School of Medicine, Cardiff University, Wales, UK 18University of California San Francisco, San Francisco, CA, USA 19Oxford Haemophilia and Thrombosis Centre, Oxford University Hospitals NHS Trust, Oxford, UK 20Universita` degli Studi di Milano and Fondazione IRCCS Ca’ Granda Ospedale Policlinico di Milano, Milan, Italy 21UMHC Clinic for Bleeding and Clotting Disorders, University of Mississippi Medical Center, Jackson, MS, USA 22Blood Centre and National Reference Centre for Congenital Bleeding Disorders, Laiko General Hospital, Athens, Greece 23Adult Hemophilia Program, Emory University, Atlanta, GA, USA 24Henry Ford Health System, Detroit, MI, USA 25Boston Hemophilia Center, Brigham and Women’s Hospital, Boston, MA, USA

∗To whom correspondence should be addressed at: EPFL-SV-GHI, Station 19, 1015 Lausanne, Switzerland. Tel: +41 216931849; Fax: +41 216937220; Email: jacques.fellay@epfl.ch †These authors contributed equally to this work.

# The Author 2013. Published by Oxford University Press. All rights reserved. For Permissions, please email: [email protected] 1904 Human Molecular Genetics, 2013, Vol. 22, No. 9

26Department of Cardiovascular Science, University of Sheffield, Sheffield, UK 27Department of Pediatrics, University of Colorado and Children’s Hospital Colorado, Aurora, USA 28Hemophilia Comprehensive Care Center, University Hospital, Basel, Switzerland 29University Medical Center Utrecht, Utrecht, Netherlands 30Hemostasis–Thrombosis Clinic, Vanderbilt University, Nashville, TN, USA 31National Center for Global Health and Medicine, Tokyo, Japan 32Hemophilia Unit, Hospital Vall d’Hebron, Barcelona, Spain 33Hematology, Inselspital, University Hospital Bern, Bern, Switzerland 34UC Davis Hemophilia Treatment Center, Sacramento, CA, USA 35The Hemophilia Center, Oregon Health & Science University, Portland, OR, USA 36Indiana Hemophilia & Thrombosis Center, Indianapolis, IN, USA 37Hemophilia Treatment Center of Arkansas, Arkansas Children’s Hospital, Little Rock, AR, USA 38Newcastle upon Tyne Hospitals NHS Foundation Trust, Newcastle upon Tyne, UK 39New Queen Elizabeth Hospital, Edgbaston, Birmingham, UK 40Katharine Dormandy Haemophilia Centre & Thrombosis Unit, Royal Free Hospital, London, UK 41Division of Infectious Diseases, Northwestern University Feinberg School of Medicine, Chicago, IL, USA 42Department of Pathology, School of Medicine, University of Pittsburgh Medical Center, Pittsburgh, PA, USA 43Department of Epidemiology, Fielding School of Public Health, University of California Los Angeles, Los Angeles, CA, USA 44Department of Molecular Microbiology and Immunology and 45Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA 46Cancer and Inflammation Program, Laboratory of 47 Experimental Immunology, SAIC Frederick, Inc., Frederick National Lab, Frederick, MD, USA Ragon Institute of Downloaded from MGH, MIT and Harvard, Charlestown, MA, USA 48Division of Cancer Epidemiology and Genetics, NCI, Bethesda, MD, USA

Received November 22, 2012; Revised January 18, 2013; Accepted January 28, 2013 http://hmg.oxfordjournals.org/

Human genetic variation contributes to differences in susceptibility to HIV-1 infection. To search for novel host resistance factors, we performed a genome-wide association study (GWAS) in hemophilia patients highly exposed to potentially contaminated factor VIII infusions. Individuals with hemophilia A and a docu- mented history of factor VIII infusions before the introduction of viral inactivation procedures (1979–1984) were recruited from 36 hemophilia treatment centers (HTCs), and their genome-wide genetic variants were compared with those from matched HIV-infected individuals. Homozygous carriers of known CCR5 resist- ance mutations were excluded. Single nucleotide polymorphisms (SNPs) and inferred copy number variants by guest on July 29, 2013 (CNVs) were tested using logistic regression. In addition, we performed a pathway enrichment analysis, a heritability analysis, and a search for epistatic interactions with CCR5 D32 heterozygosity. A total of 560 HIV-uninfected cases were recruited: 36 (6.4%) were homozygous for CCR5 D32 or m303. After quality control and SNP imputation, we tested 1 081 435 SNPs and 3686 CNVs for association with HIV-1 serostatus in 431 cases and 765 HIV-infected controls. No SNP or CNV reached genome-wide signifi- cance. The additional analyses did not reveal any strong genetic effect. Highly exposed, yet uninfected hemophiliacs form an ideal study group to investigate host resistance fac- tors. Using a genome-wide approach, we did not detect any significant associations between SNPs and HIV-1 susceptibility, indicating that common genetic variants of major effect are unlikely to explain the observed resistance phenotype in this population.

INTRODUCTION the circulation. Replacement of FVIII is necessary to prevent morbidity and mortality associated with uncontrolled bleed- A single exposure to a transfusion with HIV-contaminated ing. The use of donor FVIII concentrates derived from blood or clotting factor concentrates carries a risk of infection pooled plasma from up to 100 000 donors was the mainstay of 90%, far greater than that of any other risk exposure (1). of hemophilia treatment until recombinant factor products This is clearly illustrated by the disastrous consequences of were introduced in the 1990s (2,3). Prior to 1984, factor con- widespread use of contaminated plasma-derived clotting centrates were not subjected to viral inactivation processes and factor concentrates for the treatment of hemophilia in the as a result, a large proportion of patients with hemophilia A early days of the current AIDS pandemic. became infected with HIV-1 (4–8). The risk of infection Hemophilia A is an inherited X-linked bleeding disorder was correlated with the severity of the disease: individuals affecting 1 in 5000–1 in 10 000 males. It is caused by muta- with severe hemophilia experienced more bleeding episodes tions in the factor 8 gene (F8) on the X chromosome, requiring treatment with FVIII concentrates or were treated leading to reduced levels of factor VIII (FVIII) activity in prophylactically two or three times per week, and the quantity Human Molecular Genetics, 2013, Vol. 22, No. 9 1905 of concentrates correlated directly with the probability of ac- 9.4% of individuals of northern European ancestry, 2.6% of quiring HIV-1 infection (9). However, infection was not uni- central Europeans and 0.9% of southern Europeans were versal, even among patients with severe hemophilia. found to be homozygous, versus 2, 0.5 and 0.1% in the re- Individuals, who were likely exposed, yet remain HIV unin- spective general populations (13,21). Because the HIV EU fected are here referred to as exposed uninfected (EU). phenotype of the 36 subjects carrying CCR5 homozygous It is already known that human genetic variation contributes mutations was already explained genetically, they were not to differences in susceptibility to HIV-1 infection. Homozy- included in the GWAS. The frequency of D32 heterozygosity gosity for a 32 bp deletion in the gene coding for the HIV-1 was not increased in the EU cohort (n ¼ 92/560, 16.4%) in co-receptor CCR5 results in the absence of CCR5 expression comparison to control populations, implying the absence of at the cell surface, offering protection against R5 strains of additional CCR5 variants that could form protective com- HIV-1, the usual infecting virus (10–12). The CCR5 D32/ pound heterozygotes with the D32 deletion. D32 homozygous genotype is found in 1% of healthy indi- A total of 521 of the initial sample of 560 EU cases and 823 viduals of European descent, but is rare in non-European HIV positive controls were genotyped: 16 samples did not populations (13). A rarer mutation in CCR5, m303, also abro- pass initial quality control filtering. An additional 43 indivi- gates expression and confers resistance in homozygotes or duals were removed due to cryptic relatedness: this high compound heterozygotes with CCR5 D32 (14). No other number of related individuals in our study population is unsur- host genetic polymorphism has been consistently shown to prising considering the familial clustering of hemophilia. protect against HIV-1 acquisition (15). Of note, several Finally, we excluded 90 population outliers that were identified studies in Europe and North America have shown that the fre- through principal component analysis of the genotyping data. Downloaded from quency of CCR5 D32/D32 is significantly higher in The final study population consisted in 430 EU hemophilia HIV-uninfected hemophiliacs than in the general population cases and 765 HIV positive controls. After all quality control (up to 15% compared with ≤1%), with the highest frequencies steps, 890 599 single-nucleotide polymorphisms (SNPs) were in those with severe hemophilia (16–20). In contrast, CCR5 used for imputation based on the HapMap 3 CEU reference

D32/D32 is rare or absent in hemophiliacs who acquired set, resulting in a total of 1 081 435 SNPs used for association http://hmg.oxfordjournals.org/ HIV infection (20). testing. The coordinates of nine significant principal component We here present a genome-wide association study (GWAS) axes were included as covariates in all regression models. that aims at discovering additional genetic polymorphisms Using PennCNV, we identified 2543 deletions and 1143 dupli- associated with reduced susceptibility to HIV-1. A clearer cations in 3375 variable genomic regions: the number of copy understanding of host genetic resistance against HIV-1 infec- number variants (CNVs) was consistent with data from the tion is of enormous importance to identifying novel prophylac- 1000 Genomes Project (22). tic drug targets as well as correlates of protection for rational HIV-1 vaccine design. Power calculation by guest on July 29, 2013 RESULTS With our final numbers of cases and controls, the study had 80% power to detect associated variants with a genotype Study participants and genotypes relative risk (GRR) of two or more. Table 2 shows the GRR A total of 483 individuals with hemophilia A were included in required for a polymorphism to be significantly associated the CHAVI014 protocol from 36 hemophilia treatment centers with resistance against HIV infection at the genome-wide (HTCs) in nine countries (Table 1). Samples from an addition- level (Pthreshold ¼ 5E208), under different genetic models al group of 77 highly exposed hemophiliacs were obtained and with various minor allele frequencies (MAFs). from the Multicenter Hemophilia Cohort Study (MHCS) (6,19). All study participants received potentially contami- nated FVIII concentrates between 1979 and 1984 and had at Genome-wide association analyses least one documented negative HIV-1 screening test at a later date. Most patients had severe hemophilia and were posi- After imputation, we tested all SNPs for association with HIV tive for chronic or resolved hepatitis C virus (HCV) infection, resistance under additive, dominant and recessive models a marker of blood-borne pathogen exposure (Table 1). The using logistic regression. No SNP reached genome-wide control population comprised of 823 HIV positive, ethnically significance (Fig. 1A) under any of the genetic models. The matched individuals from the Multicenter AIDS Cohort distribution of observed P-values was very similar to the Study (MACS). null expectation, as shown in Figure 1B: a l value of 1.01 To reduce heterogeneity, we excluded the only three female indicated that the association statistics were not confounded subjects who were recruited in CHAVI014. Targeted genotyp- by population stratification. As a comparison, the CCR5 D32 ing of the CCR5 protective variants identified 35 hemophilia variant strongly associated with HIV resistance under a reces- cases as homozygous for the D32 deletion and one case as sive genetic model (P ¼ 9.4E215) in the initial study popula- homozygous for the m303 variant. There were no D32/m303 tion consisting of 560 cases and 823 controls. compound heterozygotes. We observed a consistent enrich- Since the only known genetic variant associated with HIV ment for CCR5 D32 homozygosity across study sites and resistance (CCR5 D32) also impacts HIV disease progression countries, closely reflecting the known north–south decreas- (10), we sought to increase power for detecting genetic ing cline of D32 allele frequency in European populations: effects by meta-analyzing the current association results with 1906 Human Molecular Genetics, 2013, Vol. 22, No. 9

Table 1. Characteristics of EU hemophilia cases Table 2. Minimal GRR required for 80% power for variant detection in the genome-wide association analyses Male gender (n, %) 557 (99.5) Caucasian ethnicity (n, %) 516 (92.1) Genetic model Birth year (median, IQR) 1967 (1954–1976) MAF Additive Dominant Recessive Country (n,%) USA 203 (36.3) 5% 2.9 3 36.5 UK 110 (19.6) 20% 2 2.4 5.3 Switzerland 61 (10.9) Italy 60 (10.7) Power was calculated using the present sample size and a genome-wide Netherlands 40 (7.1) significance threshold of P , 5E208. Spain 36 (6.4) Greece 20 (3.6) Germany 19 (3.4) Japan 11 (2) inclusion of non-exposed individuals (misclassification bias) Severity of hemophilia (n,%)a Severe (,1% normal FVIII activity) 406 (84.1) would decrease study power, because they would most Moderate (1% ,normal FVIII activity ,5%) 77 (15.9) likely be susceptible. Therefore, we applied strict selection cri- Number of FVIII infusions 1979–1984 (median, IQR)a 51 (7–296) teria to ensure that our EU subjects had a very high likelihood Hepatitis C status (n,%)a of effective exposure to HIV-1. All included cases had a docu- Never infected 29 (6) mented history of treatment with FVIII concentrates with a

Spontaneous clearance 70 (14.5) Downloaded from Chronically infected 262 (54.2) high likelihood of HIV-1 contamination. Due to the severity Successfully treated 122 (25.3) of hemophilia, they received a relatively high number of Protective CCR5 genotype: homozygosity D32 or m303 36 (6.4) FVIII infusions (median 51), each derived from pooled plasma from thousands of donors. Most subjects were infected aInformation was only available for the 483 CHAVI014 participants. by HCV, confirming actual exposure to blood-borne patho-

gens. The majority of patients treated in the same HTCs http://hmg.oxfordjournals.org/ were infected before 1984 (4–8). Finally, the observed enrich- those from a GWAS on HIV control (23). We observed no ment of CCR5 D32 homozygosity in cases (6.4% versus ≤1% genome-wide significant associations after meta-analysis in European populations) is a clear indicator of effective of results under additive models or when combining additive HIV-1 exposure. results from the HIV control GWAS with recessive model The choice of an adequate control population represented an results from the present study (i.e. imitating the known essential step in the study design. We chose to compare EU effects of CCR5 D32). cases with HIV-1-infected patients, to make certain that all We further addressed whether combining SNP effects or controls were in fact susceptible. Alternative options would testing for epistatic interactions between genome-wide SNPs have been to select either unexposed hemophilia subjects or by guest on July 29, 2013 and CCR5 D32 heterozygosity could improve sensitivity for population-level samples as controls. We considered, however, detecting genetic influences on HIV resistance. To achieve that there was no advantage in matching cases and controls this, we performed pathway enrichment, a heritability analysis for a monogenic disease that is genetically unrelated to HIV-1 assessing additive genetic effects and a genome-wide inter- susceptibility, and that using either alternative group could action analysis with CCR5 D32. In all cases, no significant evi- reduce power because potentially resistant subjects would not dence for enrichment or interaction was observed further, be excluded. An additional concern was that hemophilia cases suggesting a lack of strong genetic effects of common variants were selected on the basis of their resistance to intravenously on HIV resistance. administered blood products, whereas controls would largely consist of individuals infected through mucosal exposure. This is unlikely to lead to substantial bias, as mechanisms involved DISCUSSION in susceptibility or resistance in the intravenous compartment Individuals with hemophilia, who were exposed to potentially should also impact the likelihood of HIV-1 acquisition after contaminated blood products, yet were not infected by HIV-1 mucosal exposure, as observed for CCR5-associated resistance. in the early years of the pandemic, form an ideal study group Our genome-wide analysis did not reveal any statistically to investigate host resistance factors. Our study represents an significant association between SNPs or CNVs and resistance unprecedented effort to identify and prospectively recruit against HIV-1 infection. Furthermore, we did not find any evi- such individuals. Through an international collaboration in- dence for genetic effects in a pathway enrichment analysis, a volving 36 HTCs in nine countries and three continents, we heritability analysis, and a search for epistatic interactions obtained informed consent, clinical information and genetic with CCR5 D32 heterozygosity. Our results strongly suggest material from 483 subjects. Those were combined with an that common genetic variants with a major effect do not additional set of EU individuals from the MHCS, resulting play a major role in determining susceptibility to HIV-1 in in a total number of 560 cases, which were compared with a our study population. During the past two decades, several higher number of ethnically matched controls at more than cohorts of EU individuals, identified by different modes of one million polymorphic sites across the genome. HIV-1 exposure, have been studied for genetic factors that The possibility of identifying associated variants depends might account for their apparent resistance to infection (24), on an actual enrichment of resistance alleles—or depletion but only CCR5 variation has been consistently shown to of susceptibility alleles—in the case population. The incorrect confer any degree of protection. Additional gene variants Human Molecular Genetics, 2013, Vol. 22, No. 9 1907

Figure 1. Logistic regression results. (A) Manhattan plot of association results from the additive model shows no significant association signal throughout the genome (the dotted red line indicates the significance threshold of 5E208). (B) QQ plot demonstrates that the observed distribution of P-values corresponds to the expected distribution under the null hypothesis for all models (additive in black, dominant in blue, recessive in green), indicating that potential confounders are well controlled. were reported to protect against acquisition or to increase sus- hemophilia A to study the genetic factors that may influence

ceptibility to infection, but they are, at best, supported by weak susceptibility and resistance to HIV-1 infection. Potentially Downloaded from statistical evidence, and none has been convincingly replicated eligible patients were identified from a search of hemophilia (15). The negative result of our GWAS—one of the largest registries, surveillance programme databases or other HTC genetic studies of HIV-1 acquisition to date, performed in the databases. Individuals 18 years of age or older with moderate most exposed population, in accordance with the latest standard or severe hemophilia A (,5 IU/dl or ,1 IU/dl normal FVIII of genomic research—confirms the absence of common protect- activity, respectively) were eligible if they had documented ive variants of large effect in individuals of Western European treatment with a plasma-derived FVIII concentrate between http://hmg.oxfordjournals.org/ ancestry, beyond CCR5: it also means that similarly sized 1 January 1979 and 31 December 1984 and documented HIV- GWASs are unlikely to reveal any genetic association in negative test. The number of treatment episodes during the lower exposure cohorts (sexual transmission, intravenous drug high-risk exposure period was recorded. In a separate recruit- use, mother-to-child transmission). ment effort, retrospective samples from a well-characterized The exclusive focus on whites is obviously an important cohort of highly exposed, yet uninfected hemophiliacs were limitation of our study. A small number of Japanese indivi- obtained through collaboration with J.J.G. and the MHCS duals were recruited in the CHAVI014 protocol (n ¼ 11), (6,19). but could not be included in the association analysis due to HIV-infected individuals from the MACS that had been concerns about stratification and to a lack of ethnically genotyped in the context of a previous GWAS (23)wereused by guest on July 29, 2013 matched controls. To date, the only GWAS of HIV-1 acquisi- as controls: none of them were hemophiliacs, and the predomin- tion performed in non-whites did not identify any association ant mode of HIV acquisition was homosexual contact. Briefly, in a population from Malawi (25). Additional, large-scale the MACS was established in 1983 and includes 6972 adult studies of individuals from various ethnic backgrounds are a homosexual and bisexual men from four metropolitan areas, Bal- clear priority in the field. timore, Chicago, Los Angeles and Pittsburgh, recruited during Our GWAS was designed to identify genetic variants with three recruitment periods, 1984–1985, 1987–1991 and 2001– relatively high MAFs (.5%) and moderate to high GRR 2003. (≥2). Consequently, two categories of genetic variants could Local institutional review boards at each participating center still be involved in the HIV-1 resistance phenotype observed approved the study, and all participants provided informed in epidemiological studies: common variants with weak consent for genetic testing. effects and rare variants (26,27). Several efforts are underway to explore these non-exclusive possibilities. The HIV host gen- etics community is notably joining forces to run a large Genotypes meta-analysis of existing genome-wide studies (28). Also, Genomic DNA was extracted from 10 ml of whole blood. thanks to affordable sequencing technology, it is now feasible CCR5 D32 and m303 genotypes were assessed by in-house to perform exhaustive searches for rare causal variants taqman assays: individuals with known CCR5 protective gen- throughout the exome or the genome of EU individuals. The otypes (homozygosity for any of the variant or compound het- combination of these approaches in various populations will erozygosity) were excluded from the subsequent GWAS. finally delineate the impact of human genetic diversity on Genome-wide genotyping was done on Illumina Human 1M HIV-1 susceptibility. or 1Mduo SNP chip, containing 1 072 820 and 1 271 154 SNPs, respectively. MATERIALS AND METHODS We carried out a series of SNP and sample quality control procedures: polymorphisms were filtered based on missing- Study participants ness (dropped if called in ,99% of participants), MAF The CHAVI014 study was set up to obtain peripheral blood (dropped if the value was ,1%) and severe deviation from specimens from HIV-1 exposed, yet uninfected subjects with Hardy–Weinberg equilibrium (dropped if P , 5E208). 1908 Human Molecular Genetics, 2013, Vol. 22, No. 9

Samples were filtered based on genotyping quality (dropped if recessive model results from the recent study (i.e. directly call rate ,95%) and a gender check (heterozygosity testing). mimicking the observed CCR5 D32 effect), we used Fisher’s SNPs were then used to identify cryptic relatedness between method for combining P-values. study participants: we estimated the sharing of genetic infor- mation by estimating identity by descent (IBD), and excluded one randomly selected individual in each pair of DNA samples Assessment of enrichment of SNPs association signal in showing .12.5% of estimated IBD, corresponding to first- biological pathways degree cousins. To control for the possibility of spurious asso- We used MAGENTA (34) to search for abundance of SNPs ciations resulting from population stratification, we used a association signal across pathways using the default para- modified Eigenstrat method, which derives the principal com- meters to define gene boundaries (mapping to hg18) and to ponents of the correlations among gene variants (29): popula- correct for the confounding effects of gene size and linkage tion outliers were discarded, and the coordinates of the disequilibrium between SNPs. Approximately 10 000 gene significant principal component axes were included in the as- sets defined by publicly available resources were used. sociation tests to correct for residual stratification. We further added custom gene sets relevant to HIV biology To increase the coverage of genomic variation, we imputed defined by: whole genome siRNA knockdown screens (35), the genotyping data using MACH software with HapMap 3 human–HIV protein interactions (36) and a curated list of 2 CEU as a reference set (30); SNPs with r ,0.3 and/or an interferon-stimulated genes (37). We used the false discovery MAF ,1% were removed. rate P-value correction within MAGENTA to assess CNVs were derived from non-imputed SNPs using significance. Downloaded from PennCNV software (31), separately for the 1M and the 1Mduo chips. To avoid spurious CNV calls, deletions or dupli- cations overlapping centromeric, telomeric and immunoglobu- Heritability analysis lin regions were discarded. Finally, only CNVs present in at To investigate a role for measurable, additive genetic contribu- least two samples were considered for association analysis. tions to the HIV resistance phenotype, we used GCTA (38). http://hmg.oxfordjournals.org/ We performed strict sample and SNP quality control as Association analyses described in (39) and estimated the total genetic variance explained by genome-wide SNPs and the narrow-sense herit- Logistic regression was used to assess the differences in geno- ability assuming a trait prevalence of 1%. Genetic variance type frequencies of SNPs and CNVs between EU individuals was estimated using the underlying liability scale with the and HIV-infected controls under additive, recessive and dom- narrow-sense heritability calculated as the proportion of total inant genetic models, and for interaction analysis between phenotypic variance that is due to additive genetic effects. genome-wide SNPs and CCR5 D32 heterozygosity. To control for population structure, the coordinates of five signifi- by guest on July 29, 2013 cant Eigenstrat axes were included as covariates in all models. ACKNOWLEDGEMENTS We used the CaTS Power Calculator for Genome-Wide Asso- ciation Studies software (http://www.sph.umich.edu/csg/abeca We would like to thank all study participants and health care sis/CaTS/) for power calculations, PLINK (http://pngu.mgh.ha workers at the contributing hemophilia treatment centers, as rvard.edu/~purcell/plink/) for logistic regression analyses and well as Jo Roberts in Oxford for administration and coordin- WGAViewer (http://compute1.lsrc.duke.edu/softwares/ ation of the CHAVI014 study. The Multicenter AIDS Cohort WGAViewer/) for evaluation and annotation of the association Study (MACS, http://www.statepi.jhsph.edu/macs/macs.html) statistics. Bonferroni’s correction was applied for multiple has centers (Principal Investigators) at The Johns Hopkins testing. Bloomberg School of Public Health (Joseph B. Margolick, Lisa P. Jacobson), Howard Brown Health Center, Feinberg School of Medicine, Northwestern University and Cook Meta-analysis of association results from an HIV control County Bureau of Health Services (John P. Phair, Steven GWAS M. Wolinsky), University of California, Los Angeles (Roger The CCR5 D32 deletion provides proof of concept that genetic Detels) and University of Pittsburgh (Charles R. Rinaldo). variants can impact both HIV acquisition and disease progres- sion. Thus, we sought to improve power for variant detection Conflict of Interest statement. None declared. by combining association results from the HIV resistance ana- lysis with those from a GWAS on HIV control. The HIV FUNDING control dataset includes 815 members of the Swiss HIV Cohort Study typed on the Illumina HumanHap 550 BeadChip This work was supported by the National Institute of Allergy and imputed using the HapMap 3 European sample as refer- and Infectious Diseases Center for HIV/AIDS Vaccine Im- ence as previously described (23,32). We meta-analyzed the munology (CHAVI) (AI067854). This project has been results from additive genetic models in both studies by com- funded in part with federal funds from the Frederick National bining z-scores that incorporated effect direction in both Laboratory for Cancer Research (HHSN261200800001E). The studies (assuming variants that decrease HIV susceptibility content of this publication does not necessarily reflect the also decrease set point viral load) (33). For combining additive views or policies of the Department of Health and Human Ser- association results from the GWAS on HIV control with vices, nor does mention of trade names, commercial products Human Molecular Genetics, 2013, Vol. 22, No. 9 1909

or organizations imply endorsement by the US Government. 16. Wilkinson, D.A., Operskalski, E.A., Busch, M.P., Mosley, J.W. and Koup, This research was supported in part by the Intramural Re- R.A. (1998) A 32-bp deletion within the CCR5 locus protects against transmission of parenterally acquired human immunodeficiency virus but search Program of the National Institutes of Health, Frederick does not affect progression to AIDS-defining illness. J. Infect. Dis., 178, National Lab, Center for Cancer Research. The Multicenter 1163–1166. AIDS Cohort Study is funded by the National Institute of 17. Kupfer, B., Kaiser, R., Brackmann, H.H., Effenberger, W., Rockstroh, Allergy and Infectious Diseases, with additional supplemental J.K., Matz, B. and Schneweis, K.E. (1999) Protection against parenteral funding from the National Cancer Institute (UO1-AI-35042, HIV-1 infection by homozygous deletion in the C-C chemokine receptor 5 gene. AIDS, 13, 1025–1028. UL1-RR025005, UO1-AI-35043, UO1-AI-35039, UO1-AI- 18. Barretina, J., Blanco, J., Gutierrez, A., Puig, L., Altisent, C., Espanol, T., 35040, UO1-AI-35041). Caragol, I., Clotet, B. and Este, J.A. (2000) Evaluation of the putative role of C-C chemokines as protective factors of HIV-1 infection in seronegative hemophiliacs exposed to contaminated hemoderivatives. Transfusion, 40, 461–467. 19. Salkowitz, J.R., Purvis, S.F., Meyerson, H., Zimmerman, P., O’Brien, REFERENCES T.R., Aledort, L., Eyster, M.E., Hilgartner, M., Kessler, C., Konkle, B.A. 1. Baggaley, R., Boily, M., White, R. and Alary, M. (2006) Risk of HIV-1 et al. (2001) Characterization of high-risk HIV-1 seronegative transmission for parenteral exposure and blood transfusion: a systematic hemophiliacs. Clin. Immunol., 98, 200–211. review and meta-analysis. AIDS, 20, 805–812. 20. Zhang, M., Goedert, J.J. and O’Brien, T.R. (2003) High frequency of 2. Fricke, W.A. and Lamb, M.A. (1993) Viral safety of clotting factor CCR5-delta32 homozygosity in HCV-infected, HIV-1-uninfected concentrates. Semin. Thromb. Hemost., 19, 54–61. hemophiliacs results from resistance to HIV-1. Gastroenterology, 124, 3. Morgenthaler, J.J. (2001) Securing viral safety for plasma derivatives. 867–868. Transfus. Med. Rev., 15, 224–233. 21. Novembre, J., Galvani, A.P. and Slatkin, M. (2005) The geographic Downloaded from 4. AIDS-Hemophilia French Study Group (1985) Immunologic and virologic spread of the CCR5 D32 HIV-resistance allele. PLoS Biol., 3, e339. status of multitransfused patients: role of type and origin of blood 22. Mills, R.E., Walter, K., Stewart, C., Handsaker, R.E., Chen, K., Alkan, C., products. Blood, 66, 896–901. Abyzov, A., Yoon, S.C., Ye, K., Cheetham, R.K. et al. (2011) Mapping 5. Ragni, M., Winkelstein, A., Kingsley, L., Spero, J. and Lewis, J. (1987) copy number variation by population-scale genome sequencing. Nature, 1986 update of HIV seroprevalence, seroconversion, AIDS incidence, and 470, 59–65. 23. Fellay, J., Ge, D., Shianna, K.V., Colombo, S., Ledergerber, B., Cirulli, immunologic correlates of HIV infection in patients with hemophilia A http://hmg.oxfordjournals.org/ and B. Blood, 70, 786–790. E.T., Urban, T.J., Zhang, K., Gumbs, C.E., Smith, J.P. et al. (2009) 6. Kroner, B., Rosenberg, P., Aledort, L., Alvord, W. and Goedert, J. Common genetic variation and the control of HIV-1 in humans. PLoS (1994) HIV-1 infection incidence among persons with hemophilia in Genet., 5, e1000791. the United States and western Europe, 1978–1990. Multicenter 24. Lederman, M.M., Alter, G., Daskalakis, D.C., Rodriguez, B., Sieg, S.F., Hemophilia Cohort Study. J. Acquir. Immune Defic. Syndr. (1999), 7, Hardy, G., Cho, M., Anthony, D., Harding, C., Weinberg, A. et al. (2010) 279–286. Determinants of protection among HIV-exposed seronegative persons: an 7. Darby, S.C., Kan, S.W., Spooner, R.J., Giangrande, P.L., Lee, C.A., overview. J. Infect. Dis., 202(Suppl 3), S333–S338. Makris, M., Sabin, C.A., Watson, H.G., Wilde, J.T. and Winter, M. (2004) 25. Petrovski, S., Fellay, J., Shianna, K.V., Carpenetti, N., Kumwenda, J., The impact of HIV on mortality rates in the complete UK haemophilia Kamanga, G., Kamwendo, D.D., Letvin, N.L., McMichael, A.J., Haynes, population. AIDS, 18, 525–533. B.F. et al. (2011) Common human genetic variants and HIV-1

8. Dietrich, S.L., Mosley, J.W., Lusher, J.M., Hilgartner, M.W., Operskalski, susceptibility: a genome-wide survey in a homogeneous African by guest on July 29, 2013 E.A., Habel, L., Aledort, L.M., Gjerset, G.F., Koerper, M.A., Lewis, B.H. population. AIDS, 25, 513–518. et al. (1990) Transmission of human immunodeficiency virus type 1 by 26. Manolio, T.A., Collins, F.S., Cox, N.J., Goldstein, D.B., Hindorff, L.A., dry-heated clotting factor concentrates. Transfusion Safety Study Group. Hunter, D.J., McCarthy, M.I., Ramos, E.M., Cardon, L.R., Chakravarti, A. Vox Sang., 59, 129–135. et al. (2009) Finding the missing heritability of complex diseases. Nature, 9. Gjerset, G.F., Clements, M.J., Counts, R.B., Halvorsen, A.S. and 461, 747–753. Thompson, A.R. (1991) Treatment type and amount influenced human 27. Fellay, J., Shianna, K.V., Telenti, A. and Goldstein, D.B. (2010) immunodeficiency virus seroprevalence of patients with congenital Host genetics and HIV-1: the final phase? PLoS Pathogens, 6, bleeding disorders. Blood, 78, 1623–1627. e1001033. 10. Dean, M., Carrington, M., Winkler, C., Huttley, G.A., Smith, M.W., 28. McLaren, P.J., Zagury, J.-F. and Fellay, J. and The International HIV Allikmets, R., Goedert, J.J., Buchbinder, S.P., Vittinghoff, E., Gomperts, Acquisition Consortium (2012), Abstract 295, CROI 2012, Seattle. E. et al. (1996) Genetic restriction of HIV-1 infection and progression to 29. Price, A.L., Patterson, N.J., Plenge, R.M., Weinblatt, M.E., Shadick, N.A. AIDS by a deletion allele of the CKR5 structural gene. Hemophilia and Reich, D. (2006) Principal components analysis corrects for Growth and Development Study, Multicenter AIDS Cohort Study, stratification in genome-wide association studies. Nat. Genet., 38, Multicenter Hemophilia Cohort Study, San Francisco City Cohort, ALIVE 904–909. Study. Science, 273, 1856–1862. 30. Li, Y., Willer, C.J., Ding, J., Scheet, P. and Abecasis, G.R. (2010) MaCH: 11. Liu, R., Paxton, W.A., Choe, S., Ceradini, D., Martin, S.R., Horuk, R., using sequence and genotype data to estimate haplotypes and unobserved MacDonald, M.E., Stuhlmann, H., Koup, R.A. and Landau, N.R. (1996) genotypes. Genet. Epidemiol., 34, 816–834. Homozygous defect in HIV-1 coreceptor accounts for resistance of some 31. Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J., Grant, S.F., multiply-exposed individuals to HIV-1 infection. Cell, 86, 367–377. Hakonarson, H. and Bucan, M. (2007) PennCNV: an integrated hidden 12. Samson, M., Libert, F., Doranz, B.J., Rucker, J., Liesnard, C., Farber, Markov model designed for high-resolution copy number variation C.M., Saragosti, S., Lapoumeroulie, C., Cognaux, J., Forceille, C. et al. detection in whole-genome SNP genotyping data. Genome Res., 17, (1996) Resistance to HIV-1 infection in Caucasian individuals bearing 1665–1674. mutant alleles of the CCR-5 chemokine receptor gene. Nature, 382, 32. The International HIV Controllers Study (2010) The major genetic 722–725. determinants of HIV-1 control affect HLA class I peptide presentation. 13. Martinson, J.J., Chapman, N.H., Rees, D.C., Liu, Y.-T. and Clegg, J.B. Science, 330, 1551–1557. (1997) Global distribution of the CCR5 gene 32-basepair deletion. Nat. 33. de Bakker, P.I.W., Ferreira, M.A.R., Jia, X., Neale, B.M., Raychaudhuri, Genet., 16, 100–103. S. and Voight, B.F. (2008) Practical aspects of imputation-driven 14. Quillent, C., Oberlin, E., Braun, J., Rousset, D., Gonzalez-Canali, G., meta-analysis of genome-wide association studies. Hum. Mol. Genet., 17, Metais, P., Montagnier, L., Virelizier, J.-L., Arenzana-Seisdedos, F. and R122–R128. Beretta, A. (1998) HIV-1-resistance phenotype conferred by combination 34. Segre`, A.V., Groop, L., Mootha, V.K., Daly, M.J., Altshuler, D., of two separate inherited mutations of CCR5 gene. Lancet, 351, 14–18. Consortium, D. and investigators, M. (2010) Common inherited variation 15. Telenti, A. and McLaren, P. (2010) Genomic approaches to the study of in mitochondrial genes is not enriched for associations with Type 2 HIV-1 acquisition. J. Infect. Dis., 202(Suppl 3), S382–S386. diabetes or related glycemic traits. PLoS Genet., 6, e1001058. 1910 Human Molecular Genetics, 2013, Vol. 22, No. 9

35. Bushman, F.D., Malani, N., Fernandes, J., D’Orso, I., Cagney, G., 37. Schoggins, J.W., Wilson, S.J., Panis, M., Murphy, M.Y., Jones, C.T., Diamond, T.L., Zhou, H., Hazuda, D.J., Espeseth, A.S., Konig, R. et al. Bieniasz, P. and Rice, C.M. (2011) A diverse range of gene products are (2009) Host cell factors in HIV replication: meta-analysis of genome-wide effectors of the type I interferon antiviral response. Nature, 472, 481–485. studies. PLoS Pathogens, 5, e1000437. 38. Yang, J., Lee, S.H., Goddard, M.E. and Visscher, P.M. (2011) GCTA: a tool 36. Jager, S., Cimermancic, P., Gulbahce, N., Johnson, J.R., McGovern, K.E., for genome-wide complex trait analysis. Am. J. Hum. Genet., 88, 76–82. Clarke, S.C., Shales, M., Mercenne, G., Pache, L., Li, K. et al. (2012) 39. Lee, S.H., Wray, N.R., Goddard, M.E. and Visscher, P.M. (2011) Global landscape of HIV–human protein complexes. Nature, 481, Estimating missing heritability for disease from genome-wide association 365–370. studies. Am. J. Hum. Genet., 88, 294–305. Downloaded from http://hmg.oxfordjournals.org/ by guest on July 29, 2013

Chapter 6

Association study of common genetic variants and HIV-1 acquisition in 6,300 infected cases and 7,200 controls

McLaren PJ*, Coulonges C*, Ripke S, van den Berg L, Buchbinder S, Carrington M, Cossarizza A, Dalmau J, Deeks SG, Delaneau O, De Luca A, Goedert JJ, Haas D, Herbeck JT, Kathiresan S, Kirk GD, Lambotte O, Luo M, Mallal S, van Manen D, Martinez-Picado J, Meyer L, Miro JM, Mullins JI, Obel N, O'Brien SJ, Pereyra F, Plummer FA, Poli G, Qi Y, Rucart P, Sandhu MS, Shea PR, Schuitemaker H, Theodorou I, Vannberg F, Veldink J, Walker BD, Weintrob A, Winkler CA, Wolinsky S, Telenti A, Goldstein DB, de Bakker PI, Zagury JF, Fellay J. Association study of common genetic variants and HIV-1 acquisition in 6,300 infected cases and 7,200 controls. PLoS Pathog. 2013;9(7):e1003515

Association Study of Common Genetic Variants and HIV- 1 Acquisition in 6,300 Infected Cases and 7,200 Controls

Paul J. McLaren1,2,3.,Ce´dric Coulonges4,5., Stephan Ripke3,6, Leonard van den Berg7, Susan Buchbinder8, Mary Carrington9,10, Andrea Cossarizza11,JudithDalmau12,StevenG.Deeks13, Olivier Delaneau14, Andrea De Luca15,16, James J. Goedert17,DavidHaas18,JoshuaT.Herbeck19, Sekar Kathiresan3,20, Gregory D. Kirk21, Olivier Lambotte22,23,24,MaLuo25,26, Simon Mallal27,Danie¨lle van Manen28¤, Javier Martinez-Picado12,29, Laurence Meyer5,30,Jose´ M. Miro31, James I. Mullins19,NielsObel32, Stephen J. O’Brien33, Florencia Pereyra10,34, Francis A. Plummer25,26,GuidoPoli35,YingQi9, Pierre Rucart4,5,ManjS.Sandhu36,37, Patrick R. Shea38, Hanneke Schuitemaker28¤, Ioannis Theodorou5,39, Fredrik Vannberg40, Jan Veldink7, Bruce D. Walker10,41,AmyWeintrob42,CherylA.Winkler43,StevenWolinsky44,AmalioTelenti2, David B. Goldstein38,PaulI.W.deBakker3,45,46,47, Jean-Franc¸ois Zagury4,5, Jacques Fellay1,2* 1 School of Life Sciences, E´cole Polytechnique Fe´de´rale de Lausanne, Lausanne, Switzerland, 2 Institute of Microbiology, University Hospital Center and University of Lausanne, Lausanne, Switzerland, 3 Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 4 Laboratoire Ge´nomique, Bioinformatique, et Applications, EA4627, Chaire de Bioinformatique, Conservatoire National des Arts et Me´tiers, Paris, France, 5 ANRS Genomic Group (French Agency for Research on AIDS and Hepatitis), Paris, France, 6 Center for Human Genetic Research, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America, 7 Department of Neurology, Rudolf Magnus Institute of Neuroscience, University Medical Center Utrecht, Utrecht, The Netherlands, 8 Bridge HIV, San Francisco Department of Public Health, San Francisco, California, United States of America, 9 Cancer and Inflammation Program, Laboratory of Experimental Immunology, SAIC Frederick, Inc., Frederick National Laboratory for Cancer Research, Frederick, Maryland, United States of America, 10 Ragon Institute of MGH, MIT and Harvard, Boston, Massachusetts, United States of America, 11 Department of Surgery, Medicine, Dentistry and Morphological Sciences University of Modena and Reggio Emilia School of Medicine, Modena, Italy, 12 AIDS Research Institute IrsiCaixa, Institut d’Investigacio´ en Cie`ncies de la Salut Germans Trias i Pujol, Universitat Auto`noma de Barcelona, Badalona, Spain, 13 Department of Medicine, University of California, San Francisco, California, United States of America, 14 Department of Statistics, University of Oxford, Oxford, United Kingdom, 15 University Division of Infectious Diseases, Siena University Hospital, Siena, Italy, 16 Institute of Clinical infectious Diseases, Universita` Cattolica del Sacro Cuore, Roma, Italy, 17 Infections and Immunoepidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, Maryland, United States of America, 18 Vanderbilt University School of Medicine, Nashville, Tennessee, United States of America, 19 Department of Microbiology, University of Washington, Seattle, Washington, United States of America, 20 Cardiovascular Research Center and Center for Human Genetic Research, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America, 21 Department of Epidemiology, Johns Hopkins University, Baltimore, Maryland, United States of America, 22 INSERM U1012, Biceˆtre, France, 23 University Paris-Sud, Biceˆtre, France, 24 AP-HP, Department of Internal Medicine and Infectious Diseases, Biceˆtre Hospital, Biceˆtre, France, 25 Department of Medical Microbiology, University of Manitoba, Winnipeg, Manitoba, Canada, 26 National Microbiology Laboratory, Winnipeg, Manitoba, Canada, 27 Institute for Immunology & Infectious Diseases, Murdoch University and Pathwest, Perth, Australia, 28 Department of Experimental Immunology, Sanquin Research, Landsteiner Laboratory, and Center for Infectious Diseases and Immunity Amsterdam (CINIMA) at the Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands, 29 Institucio´ Catalana de Recerca i Estudis Avanc¸ats (ICREA), Barcelona, Spain, 30 Inserm, CESP U1018, University Paris-Sud, UMRS 1018, Faculte´ de Me´decine Paris-Sud; AP-HP, Hopital Biceˆtre, Epidemiology and Public Health Service, Le Kremlin Biceˆtre, France, 31 Infectious Diseases Service. Hospital Clinic – IDIBAPS, University of Barcelona, Barcelona, Spain, 32 Department of Infectious Diseases, The National University Hospital, Rigshospitalet, Copenhagen, Denmark, 33 Theodosius Dobzhansky Center for Genome Bioinformatics, St. Petersburg State University, St. Petersburg, Russia, 34 Division of Infectious Disease, Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America, 35 Division of Immunology, Transplantation and Infectious Diseases, Vita-Salute San Raffaele University, School of Medicine & San Raffaele Scientific Institute, Milan, Italy, 36 Genetic Epidemiology Group, Wellcome Trust Sanger Institute, Hinxton, United Kingdom, 37 Non-Communicable Disease Research Group, Department of Public Health and Primary Care, University of Cambridge, Cambridge, United Kingdom, 38 Center for Human Genome Variation, Duke University School of Medicine, Durham, North Carolina, United States of America, 39 INSERM UMRS 945, Paris, France, 40 School of Biology, Georgia Institute of Technology, Atlanta, Georgia, United States of America, 41 Howard Hughes Medical Institute, Chevy Chase, Maryland, United States of America, 42 Infectious Disease Clinical Research Program, Uniformed Services University of the Health Sciences, Bethesda, Maryland, United States of America, 43 Basic Research Laboratory, Molecular Genetic Epidemiology Section, Center for Cancer Research, NCI, SAIC-Frederick, Inc., Frederick National Laboratory, Frederick, Maryland, United States of America, 44 Division of Infectious Diseases, The Feinberg School of Medicine, Northwestern University, Chicago, Illinois, United States of America, 45 Division of Genetics Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts, United States of America, 46 Department of Medical Genetics, University Medical Center Utrecht, Utrecht, The Netherlands, 47 Department of Epidemiology, University Medical Center Utrecht, Utrecht, The Netherlands

Abstract Multiple genome-wide association studies (GWAS) have been performed in HIV-1 infected individuals, identifying common genetic influences on viral control and disease course. Similarly, common genetic correlates of acquisition of HIV-1 after exposure have been interrogated using GWAS, although in generally small samples. Under the auspices of the International Collaboration for the Genomics of HIV, we have combined the genome-wide single nucleotide polymorphism (SNP) data collected by 25 cohorts, studies, or institutions on HIV-1 infected individuals and compared them to carefully matched population-level data sets (a list of all collaborators appears in Note S1 in Text S1). After imputation using the 1,000 Genomes Project reference panel, we tested approximately 8 million common DNA variants (SNPs and indels) for association with HIV-1 acquisition in 6,334 infected patients and 7,247 population samples of European ancestry. Initial association testing identified the SNP rs4418214, the C allele of which is known to tag the HLA-B*57:01 and B*27:05 alleles, as genome-wide significant (p = 3.6610211). However, restricting analysis to individuals with a known date of seroconversion suggested that this association was due to the frailty bias in studies of lethal diseases. Further analyses including testing recessive genetic models, testing for bulk effects of non-genome-wide significant variants, stratifying by sexual or parenteral transmission risk and testing previously reported associations showed no evidence for genetic influence on HIV-1 acquisition (with the exception of CCR5D32 homozygosity). Thus, these data suggest that genetic influences on HIV acquisition are either rare or have smaller effects than can be detected by this sample size.

PLOS Pathogens | www.plospathogens.org 1 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals

Citation: McLaren PJ, Coulonges C, Ripke S, van den Berg L, Buchbinder S, et al. (2013) Association Study of Common Genetic Variants and HIV-1 Acquisition in 6,300 Infected Cases and 7,200 Controls. PLoS Pathog 9(7): e1003515. doi:10.1371/journal.ppat.1003515 Editor: Franc¸ois Balloux, University College London, United Kingdom Received March 8, 2013; Accepted June 7, 2013; Published July 25, 2013 This is an open-access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication. Funding: Initial funding for the International Collaboration for the Genomics of HIV was provided by the NIH Office of AIDS Research. This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under contract HHSN26120080001E. The MIGen study was funded by the US NIH National Heart, Lung, and Blood Institute’s STAMPEED genomics research program through grant R01 HL087676. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government. This research was supported (in part) by the Intramural Research Program of the NIH, National Cancer Institute, Center for Cancer Research. A full statement of funding for the contributing cohorts is listed in the supplementary materials. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: [email protected] ¤ Current address: Crucell Holland BV, Leiden, The Netherlands. . These authors contributed equally to this work.

Introduction the Myocardial Infarction Genetics (MIGen) Consortium (genotyped using the Affymetrix 6.0 platform) [18]. Sample- Variation in infection susceptibility and severity is a hallmark of level quality control and case-control matching (Figure S1 in infectious disease biology. This natural variation can be attributed Text S1) resulted in six non-overlapping data sets including to a variety of host, pathogen and environmental factors, including 6,334 HIV-1 infected cases and 7,247 controls (Table S1 in Text host genetics. Several genome-wide association studies (GWAS) of S1). After imputation, each variant was individually tested for HIV-1 outcomes have been performed primarily to assess the association with HIV-1 status by logistic regression including impact of human genetic variation on plasma viral load and/or PCs to correct for residual population structure, under additive disease progression [1,2,3,4,5,6,7,8,9,10,11]. These studies have and recessive genetic models. Association results were then confirmed the key role of major histocompatibility complex combined across data sets. (MHC) polymorphisms in HIV-1 control, with a minor impact Restricting to variants observed in all six data sets with .1% of variants in the CCR5 gene region. frequency and a minimum imputation quality of 0.8 in at least 2 A smaller number of GWAS have also investigated host genetic groups, approximately 86106 common variants (SNPs and indels) influences on HIV-1 acquisition using samples of individuals with were tested. The overall distribution of p-values was highly known or presumed exposure to an HIV-1 infected source consistent with the null hypothesis (l1000 = 1.01) suggesting that [12,13,14,15,16]. With the exception of CCR5D32 homozygosity the matching strategy was successful in minimizing inflation (known to explain a proportion of HIV-1 resistance in Europeans (Figure 1a). We observed 11 SNPs with combined evidence for [17]), no reproducible associations with increased or reduced association passing the genome-wide significance threshold HIV-1 acquisition have been observed. Additionally, several (p,561028, Figure 1b) under an additive genetic model. All variants reported to influence HIV-1 acquisition by candidate genome-wide significant SNPs were located in the MHC region, gene studies have either failed to be replicated or lacked sufficient centered on the class I HLA genes HLA-B/HLA-C (Figure 2a and investigation as to be considered confirmed. Table S2 in Text S1). The top SNP, rs4418214 (p = 3.6610211, We here describe a large study of human genetic determinants odds ratio (OR) for the C allele = 1.52) has previously been of HIV-1 acquisition, performed under the auspices of the associated with control of HIV-1 viral load [8], with the C allele International Collaboration for the Genomics of HIV, a collab- tagging the classical HLA-B alleles 57:01 and 27:05, both known to orative research effort bringing together the HIV-1 host genetics associate with lower viral load and longer survival after infection. community. By collecting for the first time all available genome- Analysis assuming a recessive genetic model did not identify any wide single nucleotide polymorphism (SNP) data on HIV-1 genome-wide significant associations (data not shown). infected individuals and comparing them with population-level control data sets we sought to uncover common genetic markers Exploration of top associations that influence HIV-1 acquisition. Since variation in the HLA region is well known to impact rate of HIV-1 disease progression and not acquisition, we sought to Results better understand the observed associations at this locus. Due to their shorter survival time, patients with rapid disease progression Association testing and meta-analysis are underrepresented in seroprevalent cohorts, while individuals Genome-wide genotype data were collected from 25 cohort with prolonged disease-free survival times are more likely to be studies and clinical centers (listed at the end of the paper and in included, leading to an enrichment of factors that protect against Note S1 in Text S1). We obtained a data set of 11,860 HIV-1 disease progression in such populations. Additionally, some of the infected individuals genotyped at multiple centers using several cohorts accessed for this analysis specifically recruited long-term platforms (Table S1 in Text S1). The present analysis focused on non-progressors (LTNPs, Groups 2, 3 and 4). Inspection of the the subset of these individuals that are of European ancestry as effect estimates at the top SNP (rs4418214) per data set showed assessed by principal components (PCs) analysis (see methods). that the majority of the association signal was driven by groups For two of the genotyping centers, matched HIV-1 uninfected specifically enriched for LTNPs (Figure 2b) suggesting a possible controls were available. For the remaining samples, large frailty bias in the overall results. population-level control data sets were accessed from the To assess the potential contribution of frailty bias, we ran Illumina Genotype Control Database (www.illumina.com) and association testing as previously but restricting the case popula-

PLOS Pathogens | www.plospathogens.org 2 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals

Author Summary Polygenic analysis Previous studies in large cohorts have shown that multiple Comparing the frequency differences between common genetic variants with small effect sizes that contribute to complex DNA variants in disease-affected cases and in unaffected traits, but fall below the genome-wide significance threshold, can controls has been successful in uncovering the genetic be detected by examining the consistency of their combined effects component of multiple diseases. This approach is most effective when large samples of cases and controls are across studies [19]. We sought to test for evidence of such available. Here we combine information from multiple polygenic inheritance in our study population. To do this (and to studies of HIV infected patients, including more than 6,300 avoid overfitting), we split our sample into a discovery set (Groups HIV+ individuals, with data from 7,200 general population 1,2,4,5 and 6) and a test set (Group 3) and performed genome- samples of European ancestry to test nearly 8 million wide association testing and meta-analysis on the discovery set. common DNA variants for an impact on HIV acquisition. Based on these results, we generated sets of high-quality SNPs With this large sample we did not observe any single (minor allele frequency .0.1, imputation accuracy .0.9) in common genetic variant that significantly associated with relative linkage equilibrium (r2,0.1, informed by p-value in the HIV acquisition. We further tested 22 variants previously discovery set, see methods) falling below various p-value thresholds identified by smaller studies as influencing HIV acquisition. (PT). Scores were then generated for all individuals in Group 3 by With the exception of a deletion polymorphism in the summing the weighted genotype dosage (using the log odds ratio CCR5 gene (CCR5D32) we found no convincing evidence to from the discovery set as weights) of all SNPs below a given PT. support these previous associations. Taken together these Phenotype was then regressed on this score using logistic data suggest that genetic influences on HIV acquisition are regression including covariates. We assessed both the significance either rare or have smaller effects than can be detected by of the score and the phenotypic variance explained (using this sample size. Nagelkerke’s pseudo-R2 [20]). We did not observe a significant association between the calculated score and phenotype in the discovery set at any PT (Figure 3). This further suggests that effects tion to 2,173 individuals with a known date of seroconversion of common variants on HIV-1 acquisition detectable by this study that were not enrolled in LTNP cohorts. Association testing in design are negligible. this sample showed no variants passing the genome-wide significance threshold. Additionally, rs4418214 dramatically Analysis by transmission risk dropped in strength of association to p = 0.02, with all other Since different modes of HIV-1 transmission may be influenced previously genome-wide significant SNPs suffering a similar loss by different host factors, we further investigated if genetic variants in association strength (Figure 2c and Table S2 in Text S1). In may contribute to enhanced HIV-1 acquisition within transmis- order to address whether this loss of association signal could be sion risk sub-groups. We stratified the study population by due to the reduced size of the case population rather than frailty reported risk groups that were either primarily sexual (homosexual bias, we performed a sensitivity analysis where we tested for and heterosexual, n = 3,311) or parenteral (injection drug use and association at rs4418214 restricting the HIV+ cases to 2,173 transfusion, n = 1,046). Association results in these sub-groups individuals randomly selected from the full case sample. We were consistent with those observed in the full set with no genome- repeated this procedure 1,000 times and compared the p-value wide significant signals detected (data not shown). from the random case selection to that obtained when restricting to seroconverters. Of these 1,000 tests, only one resulted in a loss Association testing of variants previously reported to of association signal that was similar to what was observed when influence HIV-1 acquisition restricting to seroconverters (Figure S2 in Text S1). This suggests With the exception of CCR5D32 (addressed in the next section), that the signal observed in the full acquisition analysis is most many variants reported to influence HIV-1 acquisition have likely due to frailty bias. remained unconfirmed. We sought to assess the evidence for

Figure 1. Association results for approximately 8 million common DNA variants tested for an impact on HIV-1 acquisition. A) Quantile-quantile plot of association results after meta-analysis across the six groups. For each variant tested, the observed 2log10 p-value is plotted against the null expectation (dashed line). P-values lower than 561028 are truncated for visual effect. B) Manhattan plot of association results where each variant is plotted by genomic position (x-axis) and 2log10 p-value (y-axis). Only variants in the MHC region on chromosome 6 have p-values below genome-wide significance (p,561028 dashed line, large diamonds). doi:10.1371/journal.ppat.1003515.g001

PLOS Pathogens | www.plospathogens.org 3 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals

Figure 2. Common DNA variants within the MHC region that are associated with HIV-1 acquisition comparing 6,334 HIV-1 infected patients to 7,247 population controls are driven by HIV-1 controllers and not maintained when restricting to patients with known dates of seroconversion. A) Regional association plot of the locus containing genome-wide significant SNPs after meta-analysis. The signal of association is centered on the HLA-B/HLA-C genes. The association result for the top SNP, rs4418214, is indicated by the purple diamond, with dark blue indicating SNPs in high LD (r2.0.8), light blue indicating moderate LD (r2 between 0.2 and 0.8) and grey indicating low or no LD (r2,0.2) with rs4418214. The dashed line indicates genome-wide significance (p,561028). The location of classical class I and class II HLA genes (green arrows) is given as reference. B) Forest plot of effect estimates for the C allele at rs4418214 with 95% confidence intervals per group (box and whiskers) and after meta-analysis (diamond). The majority of the association signal is contributed by Groups 3 and 4, which are enriched for HIV-1 controllers. C) Regional association plot of the same variants as in A) but restricting analysis to include only individuals with a known date of seroconversion to limit frailty bias. doi:10.1371/journal.ppat.1003515.g002 association of 22 variants previously reported to influence HIV-1 by commercial arrays (and is not included in the 1,000 Genomes acquisition in this sample. All 22 of these variants could be Project reference panel), genotypes of the deletion were available measured in this sample either through direct genotyping or for a majority of the HIV-1 infected individuals (n = 4,854). As imputation. Of these, only one variant (rs1800872) showed expected, we observed very few D32/D32 homozygous individuals nominal significance (p,0.05, Table 1) although it did not survive in this sample (n = 4) and a large deviation from Hardy-Weinberg correction for the number of variants tested (p.2.561023). Thus, equilibrium (Table S3 in Text S1). none of the previously reported associations can be considered To assess the association strength of this variant, we used a confirmed in this large sample. subset of our sample with available CCR5D32 genotypes to build a reference panel, which was then used for imputation of CCR5D32 Power for variant detection in both cases and controls (see methods). Overall the imputation Parameters required for determining power for variant detec- accuracy was acceptable (average information score = 0.82) and tion, specifically the trait prevalence and the level of enrichment of we observed good correspondence between typed and imputed enhanced HIV-1 acquisition, are difficult to estimate given this dosage (Figure S3 in Text S1). Using a recessive genetic model, we study design. Thus, we sought to determine the extent to which we observed a genome-wide significant association between CCR5D32 could detect known genetic influences on HIV-1 acquisition in this homozygosity and HIV-1 acquisition (p = 561029, OR = 0.2). No sample by assessing the depletion of CCR5D32 homozygosity in impact on HIV-1 acquisition was observed under any other the HIV-1 infected sample. Although this variant is not captured genetic model.

PLOS Pathogens | www.plospathogens.org 4 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals

alleles have been associated with reduced HIV-1 transmission in heterosexual couples [21], likely due to the effects of HLA-B on HIV-1 viral load, which decreases infectiousness. To further explore the possibility that HLA-B alleles are also associated with HIV-1 acquisition, we ran an analysis restricting the case population to the 2,173 individuals with a known date of seroconversion, assuming that cohorts of patients recruited soon after HIV-1 acquisition are less likely to suffer from frailty bias. This analysis resulted in an almost complete loss of signal at rs4418214 that is unlikely to be due to the reduction in size of the case population. Thus, the most parsimonious explanation for the association result in the HLA class I region is that it reflects an enrichment of alleles that protect against disease progression (hence survival) rather than increasing acquisition. Under ideal circumstances, this sample size provides approxi- mately 80% power to detect a common variant (MAF = 0.1) with genotypic relative risk of 1.3 at genome-wide significance. However, we recognize that the present study design allows for a proportion of the sample to be misclassified (i.e. individuals at average or low susceptibility to HIV-1 infection included as cases) Figure 3. Analysis of bulk SNP effects shows no evidence for which can reduce power [22]. Nevertheless, even under assump- enrichment of association signal across data sets. LD pruned SNP tions including a large proportion of controls in the case group, sets falling below various p-value thresholds (grey shades, x-axis) were selected based on association results calculated in five of six groups this sample size is suitable to discover large effect variants (discovery set). Per individual scores were calculated in a non- (GRR.3, Figure S4 in Text S1). This is further evidenced by overlapping test set (Group 3) by summing the beta-weighted dosage our ability to detect the known effect of CCR5D32 homozygosity of all SNPs in that set. Model p-value (listed above bars) and variance on HIV-1 acquisition in this sample, even given imperfect 2 explained (using Nagelkerke’s pseudo R , y-axis) were calculated by imputation. regressing phenotype on per individual score using logistic regression. doi:10.1371/journal.ppat.1003515.g003 Additionally, the lack of enrichment of the control population for individuals with proven or suspected resistance against HIV-1 infection may also influence power [23]. However, in line with our To address whether the association signal at CCR5D32 was results, GWAS looking at HIV-1 acquisition in mother-to-child subject to the same frailty bias as the MHC SNPs, we next tested transmission pairs [12], discordant couples [13], areas of for association between CCR5D32 and HIV acquisition restricting heightened prevalence [14] and in hemophiliacs exposed to only to the 2,173 HIV+ individuals with known dates of potentially contaminated blood products [16] (although much seroconversion. Using these individuals, CCR5D32 remains smaller than the present study) have been similarly unable to strongly associated (p = 161026 for the recessive model), suggest- discover novel associations. ing that the observed association statistic in the full set is not simply This large study population is useful for attempting to replicate due to frailty bias. This demonstrates that, despite an inability to previous associations, particularly with genetic variants thought to precisely estimate power, other variants of similar or somewhat reduce HIV-1 acquisition, as they would be depleted in infected weaker effect could also have been detected in this sample. individuals. None of the 22 previously reported variants tested in this sample were associated with HIV-1 acquisition after Discussion correcting for multiple tests. This lack of replication is consistent with other, smaller GWAS of this phenotype [14]. These data By assembling a large collaborative network of cohorts and suggest that many or all of these variants do not appreciably institutions involved in HIV-1 host genetic studies we sought to impact HIV-1 acquisition. Thus, evidence is mounting that test for common genetic polymorphisms that influence HIV-1 common polymorphisms affecting acquisition are either very acquisition. Through this network, we were able to combine difficult to detect (perhaps due to weak effects) or absent, with the genome-wide SNP data on over 6,300 HIV-1 infected patients of exception of CCR5D32 homozygosity. European ancestry. In order to maximize power, we further The early observation that CCR5D32 influences both acquisi- accessed large population-level genotype data sets to use as tion (when homozygous) and disease progression (when heterozy- controls. Where necessary, case/control samples were iteratively gous) suggested shared biology between these phenotypes. matched to limit inflation in the test statistic due to platform or However, this proved not to be a generalizable observation since cohort effects. Genome-wide imputation using the 1,000 Genomes variation at other loci, such as HLA class I and KIR, associate Project CEU sample as a reference panel resulted in a set of with disease progression but are not generally believed to approximately 86106 high-quality variants that were tested for modulate acquisition. Mechanisms mediating acquisition i.e. association with HIV-1 acquisition. We observed 11 variants that permissiveness to HIV upon parenteral or mucosal exposure, passed the genome-wide significance threshold, all located in the likely involve cellular targets and innate immune factors that play MHC region. Imputation and association testing of the CCR5D32 none or a limited role in disease progression. On the other hand, polymorphism demonstrated that this sample size and study design mediators of host tolerance (as defined by [24]) and of acquired are appropriate to detect strong associations that impact HIV-1 immunity are only expected to exert their effects after infection has acquisition. been established. The fact that the top association in the full analysis (rs4418214) Although this study focuses on the host genetics of HIV-1 is a tag SNP for HLA-B*57:01 and 27:05 highlights the frailty bias acquisition, it is possible that the extensive variation in HIV-1 inherent to studies of diseases with high mortality rates. HLA-B genotype also plays a role in determining susceptibility. This

PLOS Pathogens | www.plospathogens.org 5 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals

Table 1. Results for 22 SNPs previously reported to affect HIV-1 acquisition sorted by reported effect and genomic location.

Frequency Frequency Reported effect on SNP CHR BP (hg19) A1 A2 HIV+ HIV2 OR SE P Gene acquisition Reference

rs1800872 1 206946407 T G 0.245 0.232 1.08 0.030 0.01 IL10 Increased [34] rs3732378 3 39307162 A G 0.163 0.164 0.97 0.035 0.35 CX3CR1 Increased [35] rs3732379 3 39307256 T C 0.279 0.282 0.98 0.028 0.46 CX3CR1 Increased [35] rs6850 7 44836314 G A 0.123 0.133 0.94 0.039 0.09 PPIA Increased [36] rs754618 10 44886206 T C 0.311 0.304 1.01 0.028 0.73 CXCL12 Increased [37] rs1946518 11 112035458 G T 0.590 0.592 0.98 0.026 0.49 IL18 Increased [38] rs2280789 17 34207003 G A 0.136 0.134 1.04 0.038 0.30 CCL5 Increased [39] rs2280788 17 34207405 C G 0.022 0.023 0.90 0.088 0.25 CCL5 Increased [39] rs2107538 17 34207780 T C 0.183 0.180 1.02 0.034 0.49 CCL5 Increased [39] rs2549782 5 96231000 T G 0.477 0.477 1.00 0.026 0.94 ERAP2 Decreased [40] rs2070729 5 131819921 A C 0.428 0.426 1.02 0.026 0.50 IRF1 Decreased [41] rs2070721 5 131825842 G T 0.427 0.426 1.02 0.026 0.50 IRF1 Decreased [41] rs6996198 8 65463442 T C 0.159 0.167 0.97 0.035 0.46 CYP7B1 Decreased [42] rs1552896 9 14841387 G C 0.227 0.227 1.01 0.032 0.77 FREM1 Decreased [15] rs1801157 10 44868257 T C 0.200 0.209 0.97 0.032 0.36 CXCL12 Decreased [37] rs10838525 11 5701001 T C 0.357 0.355 1.00 0.027 0.95 TRIM5 Decreased [43] rs3740996 11 5701281 A G 0.113 0.117 0.93 0.040 0.05 TRIM5 Decreased [44] rs1024611 17 32579788 G A 0.267 0.277 0.95 0.029 0.08 CCL2 Decreased [45] rs1024610 17 32580231 T A 0.200 0.205 0.97 0.032 0.31 CCL2 Decreased [46] rs2857657 17 32583132 G C 0.196 0.200 0.97 0.032 0.32 CCL2 Decreased [46] rs4795895 17 32611446 A G 0.193 0.196 0.97 0.032 0.40 CCL11 Decreased [46] rs1719134 17 34416946 A G 0.240 0.231 1.05 0.031 0.13 CCL3 Decreased [45]

Reported effects correspond to the A1 allele. Frequency and odds ratio (OR) are calculated for the A1 allele with an OR.1 indicating a higher frequency of A1 in the HIV-1 infected sample. doi:10.1371/journal.ppat.1003515.t001 notion is supported by the observation that amino acid changes, Sample collection, genotyping and quality control generally in response to host HLA pressure, result in decreased The International Collaboration for the Genomics of HIV was viral fitness (reviewed in [25]). However, defining viral variants established as a platform to combine all available genome-wide that limit or enhance infection would require large-scale epidemi- SNP data sets obtained on HIV-1 infected individuals worldwide. ological investigations in HIV-1 endemic areas. Patient material was collected at multiple clinical centers across Despite the large sample size and comprehensive genotype North America, Europe, Australia and Africa (a list of contributing information obtained through imputation, this study is still limited cohort studies and centers is given at the end of the paper). to analysis of common variation with detectable effects present in Genotypes for uninfected control individuals were obtained European samples. Thus, we cannot rule out whether multiple directly from three of the participating centers (GRIV, ACS, common variants of small effect, population-specific variants or CHAVI) and from the Illumina genotype control database (www. rare variants exist that influence HIV-1 acquisition. Of particular illumina.com/icontroldb) and the Myocardial Infarction Genetics note, in light of the well-known effect of CCR5D32 on HIV Consortium (MIGen) (NIH NCBI dbGaP Study Accession: acquisition, is the inability to comprehensively test structural phs000294.v1.p1) [18]. Each data set was subject to quality variation using array-based genotyping platforms. Although SNPs control procedures performed prior to centralization of all data for contained on commercial arrays have been shown to tag a large the combined analysis. However, to ensure consistency, all data proportion of common structural variants [26] it is still possible were subject to further quality control once submitted. Per data that unobserved or poorly tagged structural variants contribute to set, samples with high missingness (,95% of sites successfully HIV acquisition. Detection of these types of effects will require genotyped) and high heterozygosity (inbreeding coefficient .0.1) large-scale sequencing efforts, preferably in samples with known were removed. Ancestry was determined using EIGENSTRAT to levels of exposure to HIV-1. project sample data onto the HapMap III reference panel. For the present analysis, only individuals clustering with the CEU/TSI Materials and Methods subset were retained. To remove samples genotyped by multiple centers (and those with high relatedness) we performed identity- Ethics statement by-state analysis taking the intersection of SNPs across all Ethical approval for this study was obtained from institutional genotyping platforms, using PLINK version 1.07 [27]. In the case review boards at each of the Cohorts, Studies and Centers listed at of duplicates, the sample contributing the larger number of the end of the manuscript. All subjects provided written informed genotyped SNPs was retained. We further filtered out individuals consent. with relatedness higher than 0.125, adopting a strategy to

PLOS Pathogens | www.plospathogens.org 6 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals maximize sample retention. After sample removal, SNPs with high regression, calculation of variance explained and results visualiza- missingness (.2%), low minor allele frequency (,1%) or that were tion was performed using R version 2.12 (www.r-project.org) and out of Hardy-Weinberg equilibrium (p,161026) were removed. the Design package [32].

Case/control matching Testing previous associations To limit bias introduced due to the majority of the control A list of SNPs previously reported to associate with HIV-1 samples being genotyped separately from cases we used a 2-stage acquisition was taken from Petrovski et al [14] and updated to case/control matching strategy. In the first round, cases and include recently reported associations. All SNPs were either controls were matched by platform and geographic origin. This directly genotyped or imputed, and tested in the same logistic resulted in four clusters; The Netherlands (Illumina, Group 1), regression/meta-analysis framework as all other variants. France (Illumina, Group 2), North America and non-Dutch/non- French European (Illumina, Groups 3 and 4), North America and Imputation and association testing of CCR5D32 non-Dutch/non-French European (Affymetrix, Groups 5 and 6). CCR5D32 genotypes were obtained by individual cohorts using To test the success of this method at controlling inflation, we ran either Sequenom genotyping, PCR or direct sequencing as association testing on all genotyped SNPs including the top PCs as described in the original publications. Since genotype of this covariates per cluster and assessed lambda (Figure S1 in Text S1). deletion was not available in the control populations we used a For samples ascertained from France and The Netherlands, this subset of the HIV+ sample with both genome-wide genotypes and was sufficient to control inflation in the test statistic (l,1, Figures CCR5D32 types as a reference panel for imputation. For this, we S1a–d in Text S1). For the remaining two clusters, we plotted each used the subset typed on the Illumina 1M platform (n = 1,100) to sample based on their coordinates across the top two PCs and split maximize SNP coverage. Additionally, we included 383 non- each cluster into two sub-clusters based on these coordinates. Sub- overlapping individuals with known CCR5D32 genotype from a clusters then underwent either 1:3 or 1:1 case/control matching recent GWAS in hemophiliacs [16]. Phasing of the reference panel using Euclidean distance across the top 10 PCs (with the top PC and imputation was performed using ShaPEIT [31] and given twice the weight of the others). Samples were removed if no IMPUTE2 [29,30]. We imputed CCR5D32 genotype in both suitable match could be identified. This strategy proved sufficient cases and controls using a leave-one-out strategy such that, if an to control inflation in these remaining clusters (Figures S1e–l in individual was present in both the reference and test sample, their Text S1). genotype information was removed from the reference panel and imputation was carried out using the remaining samples as Imputation and association testing reference. Association was tested under a recessive model and After sample matching and per group quality control, unob- assuming an additive or heterozygous advantage model. served SNP genotypes were imputed using the 1,000 Genomes Project Phase I release integrated SNPs and indels (March 2012). Estimating power for variant detection Two teams from this Collaboration performed the analysis Power for variant detection was estimated over a wide range of independently using different tools. The first team used BEAGLE possible proportions of controls being misclassified as cases (Figure [28], the second team used the pipeline IMPUTE2, SNPTEST S4 in Text S1). Calculations were made under an additive genetic and META [29,30] with a pre phasing step using ShaPEIT [31]. model assuming a risk variant of 10% frequency for a study of Per group, phenotype was regressed on genotype dosage including 6,300 cases and 7,200 controls at genome-wide significance population covariates calculated by EIGENSTRAT to control for (p,561028). Calculations were performed using PAWE-3D residual structure under both additive and recessive genetic [22,33]. models. Association results were then combined using inverse- variance weighted meta-analysis including a covariate to correct Cohorts, studies, and centers participating in the for group-specific effects. The results obtained by each team were International Collaboration for the Genomics of HIV compared for cross-validation and found to be highly consistent (Figure S5 in Text S1). SNPs were considered associated if the 1. The AIDS clinical Trial Group (ACTG) in the USA combined p-value was below the accepted level of genome-wide 2. The AIDS Linked to the IntraVenous Experience (ALIVE) significance (p,561028). Cohort in Baltimore, USA 3. The Amsterdam Cohort Studies on HIV infection and AIDS Polygenic analysis (ACS) in the Netherlands We performed analysis to test for evidence of polygenic effects 4. The ANRS CO18 in France using five of the six groups as a discovery set and Group 3 (the 5. The ANRS PRIMO Cohort in France largest single group) as the test set. To build a SNP set we first filtered out all SNPs with low minor allele frequency (MAF,0.1) 6. The Center for HIV/AIDS Vaccine Immunology (CHAVI) in and low imputation quality (R2,0.9) and removed the MHC the USA region. We then performed LD pruning informed by the p-value 7. The Danish HIV Cohort Study in Denmark calculated in the discovery set such that the SNP with the lowest p- 8. The Genetic and Immunological Studies of European and 2 value was selected and all other SNPs in LD (r .0.1) were African HIV-1+ Long Term Non-Progressors (GISHEAL) removed. The SNP with the lowest remaining p-value was then Study, in France and Italy selected and again all other SNPs in LD were removed. This 9. The GRIV Cohort in France procedure was repeated until no remaining SNPs fell below the 10. The Hemophilia Growth and Development Study (HGDS) selected PT. In the test set, per individual scores were generated by summing the dosage of all SNPs in a set weighted by the effect size in the USA (beta) calculated in the discovery set. We then regressed phenotype 11. The Hospital Clinic-IDIBAPS Acute/Recent HIV-1 Infec- on this score using logistic regression including top PCs. SNP set tion cohort in Barcelona, Spain pruning was performed using PLINK version 1.07 [27], logistic 12. The Icona Foundation Study in Italy

PLOS Pathogens | www.plospathogens.org 7 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals

13. The International HIV Controllers Study in Boston, USA Supporting Information 14. The IrsiCaixa Foundation Acute/Recent HIV-1 Infection Text S1 Includes Note S1: the cohorts and individuals cohort in Barcelona, Spain contributing to the International Consortium for the Genomics 15. The Modena Cohort in Modena, Italy of HIV, Tables S1, S2, S3, Figures S1, S2, S3, S4, S5 and 16. The Multicenter AIDS Cohort Study (MACS), in Balti- supplementary references. more, Chicago, Pittsburgh and Los Angeles, USA (DOC) 17. The Multicenter Hemophilia Cohort Studies (MHCS) 18. The NCI Laboratory of Genomic Diversity in Frederick, Acknowledgments USA We would like to thank Stuart Z Shapiro (Program Officer, Division of 19. The Pumwani Sex Workers Cohort in Nairobi, Kenya, and AIDS, National Institute of Allergy and Infectious Diseases) and Stacy Winnipeg, Canada Carrington-Lawrence (Chair of Etiology and Pathogenesis, NIH Office of 20. The San Francisco City Clinic Cohort (SFCCC) in San AIDS Research) for their continued support. Francisco, USA 21. The Sanger RCC Study in Oxford, UK, and in Uganda Author Contributions 22. The Swiss HIV Cohort Study (SHCS), in Switzerland Conceived and designed the experiments: PJM MC AT DBG PIWdB JFZ 23. The US military HIV Natural History Study (NHS) JF. Performed the experiments: PJM CC SR. Analyzed the data: PJM CC SR OD PR. Contributed reagents/materials/analysis tools: LvdB SB MC 24. The Wellcome Trust Case Control Consortium (WTCCC3) AC JD SGD ADL JJG DH JTH SK GDK OL ML SM DvM JMP LM study of the genetics of host control of HIV-1 infection in JMM JIM NO SJO FP FAP GP YQ MSS PRS HS IT FV JV BDW AW the Gambia CAW SW AT DBG JFZ JF. Wrote the paper: PJM CC JFZ JF. 25. The West Australian HIV cohort Study

References 1. Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, et al. (2007) A whole- 16. Lane J, McLaren PJ, Dorrell L, Shianna KV, Stemke A, et al. (2013) A genome- genome association study of major determinants for host control of HIV-1. wide association study of resistance to HIV infection in highly exposed Science 317: 944–947. uninfected individuals with hemophilia A. Human molecular genetics 22: 2. Dalmasso C, Carpentier W, Meyer L, Rouzioux C, Goujard C, et al. (2008) 19039–1910. Distinct genetic loci control plasma HIV-RNA and cellular HIV-DNA levels in 17. Dean M, Carrington M, Winkler C, Huttley GA, Smith MW, et al. (1996) HIV-1 infection: the ANRS Genome Wide Association 01 study. PLoS One 3: Genetic restriction of HIV-1 infection and progression to AIDS by a deletion e3907. allele of the CKR5 structural gene. Hemophilia Growth and Development 3. Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, et al. (2009) Common Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, genetic variation and the control of HIV-1 in humans. PLoS Genet 5: e1000791. San Francisco City Cohort, ALIVE Study. Science 273: 1856–1862. 4. Le Clerc S, Limou S, Coulonges C, Carpentier W, Dina C, et al. (2009) 18. Kathiresan S (2008) A PCSK9 missense variant associated with a reduced risk of Genomewide association study of a rapid progression cohort identifies new early-onset myocardial infarction. The New England journal of medicine 358: susceptibility alleles for AIDS (ANRS Genomewide Association Study 03). 2299–2300. J Infect Dis 200: 1194–1201. 19. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, et al. (2009) 5. Limou S, Le Clerc S, Coulonges C, Carpentier W, Dina C, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar Genomewide association study of an AIDS-nonprogression cohort emphasizes disorder. Nature 460: 748–752. the role played by HLA genes (ANRS Genomewide Association Study 02). 20. Nagelkerke NJD (1991) A note on a general definition of the coefficient of J Infect Dis 199: 419–426. determination. Biometrika 78: 691–692. 6. Herbeck JT, Gottlieb GS, Winkler CA, Nelson GW, An P, et al. (2010) 21. Welzel TM, Gao X, Pfeiffer RM, Martin MP, O’Brien SJ, et al. (2007) HLA-B Multistage genomewide association study identifies a locus at 1q41 associated Bw4 alleles and HIV-1 transmission in heterosexual couples. AIDS 21: 225– with rate of HIV-1 disease progression to clinical AIDS. J Infect Dis 201: 618– 229. 626. 22. Gordon D, Finch SJ, Nothnagel M, Ott J (2002) Power and sample size 7. Pelak K, Goldstein DB, Walley NM, Fellay J, Ge D, et al. (2010) Host calculations for case-control genetic association tests when errors are present: determinants of HIV-1 control in African Americans. The Journal of infectious application to single nucleotide polymorphisms. Human heredity 54: 22–33. diseases 201: 1141–1149. 23. Telenti A, McLaren P (2010) Genomic approaches to the study of HIV-1 8. Pereyra F, Jia X, McLaren PJ, Telenti A, de Bakker PI, et al. (2010) The major acquisition. The Journal of infectious diseases 202 Suppl 3: S382–386. genetic determinants of HIV-1 control affect HLA class I peptide presentation. 24. Medzhitov R, Schneider DS, Soares MP (2012) Disease tolerance as a defense Science 330: 1551–1557. strategy. Science 335: 936–941. 9. Le Clerc S, Coulonges C, Delaneau O, Van Manen D, Herbeck JT, et al. (2011) 25. Virgin HW, Walker BD (2010) Immunology and the elusive AIDS vaccine. Screening low-frequency SNPS from genome-wide association study reveals a Nature 464: 224–231. new risk allele for progression to AIDS. Journal of acquired immune deficiency 26. Craddock N, Hurles ME, Cardin N, Pearson RD, Plagnol V, et al. (2010) syndromes 56: 279–284. Genome-wide association study of CNVs in 16,000 cases of eight common 10. van Manen D, Delaneau O, Kootstra NA, Boeser-Nunnink BD, Limou S, et al. diseases and 3,000 shared controls. Nature 464: 713–720. (2011) Genome-wide association scan in HIV-1-infected individuals identifying 27. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, et al. (2007) variants influencing disease course. PLoS One 6: e22208. PLINK: a tool set for whole-genome association and population-based linkage 11. McLaren PJ, Ripke S, Pelak K, Weintrob AC, Patsopoulos NA, et al. (2012) analyses. Am J Hum Genet 81: 559–575. Fine-mapping classical HLA variation associated with durable host control of 28. Browning BL, Browning SR (2009) A unified approach to genotype imputation HIV-1 infection in African Americans. Human molecular genetics 21: 4334– and haplotype-phase inference for large data sets of trios and unrelated 4347. individuals. Am J Hum Genet 84: 210–223. 12. Joubert BR, Lange EM, Franceschini N, Mwapasa V, North KE, et al. (2010) A 29. Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint whole genome association study of mother-to-child transmission of HIV in method for genome-wide association studies by imputation of genotypes. Nature Malawi. Genome medicine 2: 17. genetics 39: 906–913. 13. Lingappa JR, Petrovski S, Kahle E, Fellay J, Shianna K, et al. (2011) 30. Howie B, Fuchsberger C, Stephens M, Marchini J, Abecasis GR (2012) Fast and Genomewide association study for determinants of HIV-1 acquisition and viral accurate genotype imputation in genome-wide association studies through pre- set point in HIV-1 serodiscordant couples with quantified virus exposure. PLoS phasing. Nature genetics 44: 955–959. One 6: e28632. 31. Delaneau O, Marchini J, Zagury JF (2012) A linear complexity phasing method 14. Petrovski S, Fellay J, Shianna KV, Carpenetti N, Kumwenda J, et al. (2011) for thousands of genomes. Nature methods 9: 179–181. Common human genetic variants and HIV-1 susceptibility: a genome-wide 32. Harrell Jr FE (2009) Design. R package version 2.3-0. http://cran.r-project.org/ survey in a homogeneous African population. AIDS 25: 513–518. src/contrib/Archive/Design/ 15. Luo M, Sainsbury J, Tuff J, Lacap PA, Yuan XY, et al. (2012) A genetic 33. Gordon D, Haynes C, Blumenfeld J, Finch SJ (2005) PAWE-3D: visualizing polymorphism of FREM1 is associated with resistance against HIV infection in power for association with error in case-control genetic studies of complex traits. the Pumwani Sex Worker Cohort. Journal of virology 86: 11899–11905. Bioinformatics 21: 3935–3937.

PLOS Pathogens | www.plospathogens.org 8 July 2013 | Volume 9 | Issue 7 | e1003515 GWAS of HIV-1 Acquisition in 13,500 Indivuals

34. Shin HD, Winkler C, Stephens JC, Bream J, Young H, et al. (2000) Genetic selection and is associated with natural resistance to HIV-1 infection. Human restriction of HIV-1 pathogenesis to AIDS by promoter alleles of IL10. Proc molecular genetics 19: 4705–4714. Natl Acad Sci U S A 97: 14467–14472. 41. Ball TB, Ji H, Kimani J, McLaren P, Marlin C, et al. (2007) Polymorphisms in 35. Faure S, Meyer L, Costagliola D, Vaneensberghe C, Genin E, et al. (2000) IRF-1 associated with resistance to HIV-1 infection in highly exposed uninfected Rapid progression to AIDS in HIV+ individuals with a structural variant of the Kenyan sex workers. AIDS 21: 1091–1101. chemokine receptor CX3CR1. Science 287: 2274–2277. 42. Limou S, Delaneau O, van Manen D, An P, Sezgin E, et al. (2012) Multicohort 36. An P, Wang LH, Hutcheson-Dilks H, Nelson G, Donfield S, et al. (2007) genomewide association study reveals a new signal of protection against HIV-1 Regulatory polymorphisms in the cyclophilin A gene, PPIA, accelerate acquisition. The Journal of infectious diseases 205: 1155–1162. progression to AIDS. PLoS Pathog 3: e88. 43. Javanbakht H, An P, Gold B, Petersen DC, O’Huigin C, et al. (2006) Effects of 37. Modi WS, Scott K, Goedert JJ, Vlahov D, Buchbinder S, et al. (2005) Haplotype human TRIM5alpha polymorphisms on antiretroviral function and susceptibil- analysis of the SDF-1 (CXCL12) gene in a longitudinal HIV-1/AIDS cohort ity to human immunodeficiency virus infection. Virology 354: 15–27. study. Genes and immunity 6: 691–698. 44. Sawyer SL, Wu LI, Akey JM, Emerman M, Malik HS (2006) High-frequency 38. Segat L, Bevilacqua D, Boniotto M, Arraes LC, de Souza PR, et al. (2006) IL-18 persistence of an impaired allele of the retroviral defense gene TRIM5alpha in gene promoter polymorphism is involved in HIV-1 infection in a Brazilian humans. Current biology 16: 95–100. 45. Gonzalez E, Rovin BH, Sen L, Cooke G, Dhanda R, et al. (2002) HIV-1 pediatric population. Immunogenetics 58: 471–473. infection and AIDS dementia are influenced by a mutant MCP-1 allele linked to 39. An P, Nelson GW, Wang L, Donfield S, Goedert JJ, et al. (2002) Modulating increased monocyte infiltration of tissues and MCP-1 levels. Proc Natl Acad influence on HIV/AIDS by interacting RANTES gene variants. Proc Natl Acad Sci U S A 99: 13795–13800. Sci U S A 99: 10002–10007. 46. Modi WS, Goedert JJ, Strathdee S, Buchbinder S, Detels R, et al. (2003) MCP- 40. Cagliani R, Riva S, Biasin M, Fumagalli M, Pozzoli U, et al. (2010) Genetic 1-MCP-3-Eotaxin gene cluster influences HIV-1 transmission. AIDS 17: 2357– diversity at endoplasmic reticulum aminopeptidases is maintained by balancing 2365.

PLOS Pathogens | www.plospathogens.org 9 July 2013 | Volume 9 | Issue 7 | e1003515

Chapter 7

Host genetic determinants of T cell responses to the MRKAd5 HIV-1 gag/pol/nef vaccine in the Step trial

Fellay J, Frahm N, Shianna KV, Cirulli ET, Casimiro DR, Robertson MN, Haynes BF, Geraghty DE, McElrath MJ, Goldstein DB; National Institute of Allergy and Infectious Diseases Center for HIV/AIDS Vaccine Immunology; NIAID HIV Vaccine Trials Network. Host genetic determinants of T cell responses to the MRKAd5 HIV-1 gag/pol/nef vaccine in the Step trial. J Infect Dis. 2011 Mar 15;203(6):773-9

MAJOR ARTICLE

Host Genetic Determinants of T Cell Responses to the MRKAd5 HIV-1 gag/pol/nef Vaccine in the Step Trial

Jacques Fellay,1 Nicole Frahm,3 Kevin V. Shianna,1 Elizabeth T. Cirulli,1 Danilo R. Casimiro,4 Michael N. Robertson,4 Barton F. Haynes,2 Daniel E. Geraghty,3 M. Juliana McElrath,3 and David B. Goldstein,1 for the National Institute of Allergy and Infectious Diseases (NIAID) Center for HIV/AIDS Vaccine Immunology (CHAVI) and the NIAID HIV Vaccine Trials Network (HVTN) 1Center for Human Genome Variation, Duke University School of Medicine and 2Human Vaccine Institute, Duke University, Durham, North Carolina; 3Vaccine and Infectious Disease institute and Clinical Research Division, Fred Hutchinson Cancer Research Center, Seattle, Washington; and 4Merck Research Laboratories, West Point, Pennsylvania

Understanding how human genetic variation impacts individual response to immunogens is fundamental for rational vaccine development. To explore host mechanisms involved in cellular immune responses to the MRKAd5 human immunodeficiency virus type 1 (HIV-1) gag/pol/nef vaccine tested in the Step trial, we performed a genome-wide association study of determinants of HIV-specific T cell responses, measured by interferon c enzyme-linked immunospot assays. No human genetic variant reached genome-wide significance, but polymorphisms located in the major histocompatibility complex (MHC) region showed the strongest association with response to the HIV-1 Gag protein: HLA-B alleles known to be associated with differences in HIV-1 control were responsible for these associations. The implication of the same HLA alleles in vaccine- induced cellular immunity and in natural immune control is of relevance for vaccine design. Furthermore, our results demonstrate the importance of considering the host immunogenetic background in the analysis of immune responses to T cell vaccines.

The Step trial was a phase 2b proof-of-concept study It was a multicenter, double-blind, randomized, placebo- conducted by the HIV Vaccine Trials Network (HVTN) controlled study performed in adults at high risk of and Merck that was designed to assess the safety and HIV-1 infection [1]. Vaccinations in the trial were efficacy of a 3-dose regimen of the Merck adenovirus terminated following a planned interim analysis in serotype 5 (MRKAd5) human immunodeficiency September 2007 because of a lack of efficacy of the virus type 1 (HIV-1) gag/pol/nef vaccine, a replication- tested vaccine. Vaccination was not associated with a defective adenovirus serotype 5 (Ad5) vaccine express- lower viral load in participants who became infected, ing Gag, Pol, and Nef proteins from an HIV-1 clade B and the rate of HIV-1 acquisition was actually higher strain, which aimed to elicit cellular immune responses. among vaccine recipients than among placebo recipi- ents, particularly in the subgroup of uncircumcised male participants with high pre-existing Ad5 neutralizing

Received 19 July 2010; accepted 1 November 2010. antibody titers [1, 2]. Understanding this disappointing Potential conflicts of interest: D.R.C. and M.N.R. are employees and and paradoxical outcome is a high priority for the field. shareholders of Merck. M.J.M. has served as an investigator on Merck-funded research. All other authors report no potential conflicts. Recombinant Ad5 vaccine candidates had previ- Reprints or correspondence: Jacques Fellay, MD, PhD, Center for Human ously been shown to elicit strong cellular immune re- Genome Variation, Duke University School of Medicine, Durham, NC 27708 sponses against HIV-1 epitopes [3, 4]. Accordingly, the ([email protected]). The Journal of Infectious Diseases 2011;203:773–779 MRKAd5 HIV-1 gag/pol/nef vaccine was highly immu- Published by Oxford University Press on behalf of the Infectious Diseases Society of nogenic for inducing HIV-specific T cell responses: at America 2011. least three-quarters of vaccine recipients generated a re- 1537-6613/2011/2036-0001$15.00 DOI: 10.1093/infdis/jiq125 sponse to 1 or more of the HIV-1 proteins included in the vaccine [2]. Still, those responses were quantitatively highly performed at Merck, whereas positivity was assessed using variable: higher response rates were observed in participants a bootstrapping method specifically developed for in-house with low pre-existing Ad5 neutralizing antibody titers [2], but ELISPOT data for the samples tested at HVTN [6, 7]. We also a significant fraction of variability in response intensity remains compared the HIV-1-specific IFN-c ELISPOT results to unexplained and could be at least partially attributable to human CD41 and CD81 T cell responses measured on a subset of genetic variation. samples by IFN-c and interleukin 2 (IL-2) intracellular cytokine Thanks to recent advances in genomics, technology, and staining assays [2]. bioinformatics, it is now possible to comprehensively assess associations between common human genetic variants and Genotyping biological phenotypes [5]. Here we combine whole-genome DNA samples were genotyped using the Human1M-Duo In- genotyping results and HLA class I typing data to explore the finium HD BeadChip (Illumina), which features .1million host mechanisms associated with variability in T cell responses single-nucleotide polymorphisms (SNPs) and an additional to the HIV-1 proteins present in the MRKAd5 HIV-1 gag/pol/nef 52,167 markers designed to specifically target copy number vaccine tested in the Step trial. variant regions. We also inferred larger copy number variants from the genotyping data using PennCNV software (2009Aug27version) [8]. A series of data cleaning and quality METHODS control procedures were performed: SNPs were filtered on the basis of missingness (dropped if called in ,99% of participants), Participants minor allele frequency (dropped if the value was ,.005), and Study participant enrollment inclusion and exclusion criteria for deviation from Hardy-Weinberg equilibrium. Study partic- the Step study have been described elsewhere [1]. Each partici- ipants were filtered on the basis of genotyping quality, a sex pant provided informed consent for genetic testing. All male check (heterozygosity testing), and cryptic relatedness (the Step trial participants who received at least 2 doses of the sharing of genetic information was first predicted by estimating MRKAd5 HIV-1 gag/pol/nef vaccine and were HIV-1 seroneg- identity by descent [IBD], and then 1 sample in each pair of ative at week 8 (ie, 4 weeks after the second vaccination) were DNA samples showing .12.5% of IBD was excluded). High- eligible for our study. This investigation was initiated after resolution HLA class I typing (4 digits; HLA-A, HLA-B, and the interim analysis when all but 1 of the case individuals HLA-C) was obtained using sequence-based methods. identified were male [1]. Most immunogenicity and covariate analyses subsequently performed were confined to male partic- Association Analysis ipants, and accordingly this study was restricted to male vaccine SNPs, copy number variants, and HLA class I alleles were tested recipients. for association with IFN-c ELISPOT responses to Gag, Pol, and Nef in regression models that also included age, prevaccination Immunological Assays Ad5 serology titers, and the laboratory in which the ELISPOT Validated interferon c (IFN-c) enzyme-linked immunospot assay was performed (HVTN vs Merck) as covariates. In ad- (ELISPOT) assays were performed on previously cryopreserved dition, to correct for population stratification, we applied peripheral blood mononuclear cells (PBMCs) that were ob- a modified Eigenstrat method [9], which is a principal com- tained at week 8; the Merck laboratory (hereafter Merck) tested ponent analysis of the genotyping data, to define ethnic groups one-quarter of the samples, which were randomly selected and and correct for residual population ancestry within each group. stratified by treatment assignment and study site. The HVTN The analyses were performed separately in 3 ethnic groups by laboratory at Fred Hutchinson Cancer Research Center labora- means of Plink software (version 1.06) [10]. Combined tory (Seattle, WA; hereafter HVTN) tested the remaining sam- P values for all 3 populations were obtained using the Stouffer ples. Cells were stimulated ex vivo with pools of peptides that weighted Z method [11]. We used linear regression to test for were 15 amino acids in length and overlapping in sequence by 11 association between genetic variants and natural log-transformed amino acids. The peptide sequences matched the HIV-1 proteins quantitative ELISPOT responses, and we used logistic encoded by the vaccine, and a total of 4 nonoverlapping pools of regression for qualitative assessment of the ELISPOT responses peptides were tested: 1 Gag pool, 2 Pol pools, and 1 Nef pool. If (case-control comparisons between individuals with and any of the protein-level responses was positive, then the overall individuals without detectable responses to the HIV-1 peptide response was considered positive. Responses were reported as pools). Significance was assessed with a straight Bonferroni the number of spot-forming cells (SFCs) per 106 PBMCs. correction (threshold for genome-wide significance, A sample was considered positive if the background-adjusted P 5 5 3 1028). Power calculations were performed using mean of the experimental wells was >4 times the mean of the GWASpower/QT software (available at http://humangenome. negative controls and >55 SFCs per 106 PBMCs for the assays duke.edu/software). RESULTS assays gated on CD41 and CD81 T cell populations, available for 150 samples [2], were compared to Gag-specific responses Participants and Samples measured by IFN-c ELISPOT assays (Figure 1). We observed A total of 831 participants were eligible for the study. They all a strong correlation between ELISPOT results and CD81 Tcell were male vaccine recipients and HIV-1 negative at week 8. responses (r2 5 .4; P , .001), but no correlation with CD41 However, 36 of them became infected later in the study. Par- Tcellresponses(r2 5 .05; P 5 .1). ticipants were enrolled in 9 countries in the Americas and Australia, and self-reported ethnicities were very diverse, with Population Structure a majority of the individuals being white and multiracial in- Altogether, 768 participants had complete genotype and phe- dividuals (Table 1). All DNA samples were tested on notype results (Table 1). To avoid spurious associations due to Human1M-Duo chips: 22 samples were excluded because of population structure, we ran separate analyses in 3 ethnic groups insufficient genotyping quality; 3 individuals were excluded corresponding to the major clusters identified in the principal because of cryptic relatedness (1 in each of 2 pairs of 100% component analysis of the genotyping data. After removal of identical genotypes and 1 in a pair of 50% identical genotypes); individuals who could not be classified into a homogeneous and 2 participants were excluded because of a sex discrepancy group, a total of 318 white, 87 black, and 115 Mestizo partic- between genotype and the phenotypic database (female partic- ipants were available for regression analyses (Table 2). The study ipants misclassified as male). had 80% power to detect genetic variants explaining at least 6.6% of the variability in quantitative HIV-1-specific responses Immunological Assays in the combined population. Of the 831 samples, 792 had sufficient cryopreserved PBMCs to be tested in IFN-c ELISPOT assays: 30.6% were analyzed by Association Analysis Merck (N 5 242) and 69.4% by HVTN (N 5 550). IFN-c The genome-wide analyses did not reveal any significant asso- ELISPOT responses were detected in 675 (85.2%) of 792 vaccine ciation between IFN-c ELISPOT responses to Gag, Pol, or Nef recipients. The percentages of participants with positive re- and SNPs or copy number variants after correction for multiple sponses were 75.6% for Gag, 69.3% for Nef, and 63.6% and testing. However, in the analysis of association with quantitative 55.3% for the 2 Pol peptide pools. Gag-specific expression of responses to Gag, the SNPs with the lowest P values clustered IFN-c and/or IL-2 measured by intracellular cytokine staining in the MHC region, around the HLA-B and HLA-C genes

Table 1. Characteristics of Study Participants

Eligible Participants in association Characteristic participants analyses Male sex, no. (%) 831 (100) 768 (100) Vaccine recipient, no. (%) 831 (100) 768 (100) Age in years, mean (SD) 30.4 (7.7) 30.4 (7.7) Circumcision, no. (%)a 458 (55.8)) 429 (55.9) HIV infection during trial, no. (%) 36 (4.3) 34 (4.4) Ethnicity, no. (%) White 410 (49.3) 375 (48.8) Black 84 (10.1) 76 (9.9) Hispanic American 72 (8.7) 69 (9.0) Asian 13 (1.6) 13 (1.7) Native American 8 (1.0) 7 (.9) Polynesian 1 (.1) 1 (.1) Multiracial 242 (29.1) 226 (29.4) Country, no. (%) United States 484 (58.2) 449 (58.5) Peru 217 (26.1) 202 (26.3) Brazil 50 (6.0) 47 (6.1) Canada 27 (3.3) 24 (3.1) Haiti 18 (2.2) 15 (2.0) Dominican Republic 13 (1.6) 11 (1.4) Australia 10 (1.2) 9 (1.2) Puerto Rico 10 (1.2) 10 (1.3) Jamaica 2 (.2) 1 (.1)

NOTE. a Information was missing for 10 participants. Figure 1. Comparison between Gag-specific enzyme-linked immunospot (ELISPOT) and intracellular cytokine staining (ICS) results: interferon c (IFN-c) ELISPOT and CD81 T cell IFN-c and interleukin 2 (IL-2) ICS results strongly correlate (top), whereas there is no correlation between IFN-c ELISPOT and CD41 T cell IFN-c and IL-2 ICS results (bottom).

(Figure 2); no such clustering was observed for Pol or Nef as- the variability in Gag responses. No significant association was sociations. Therefore, we tested all 4-digit HLA class I alleles found in the smaller groups of black and Mestizo individuals, present in the study population for association with the same although a trend toward higher Gag responses was observed for phenotype. In white participants with positive Gag-specific re- HLA-B*5703 (P 5 .1). We did not observe any significant as- sponses (N 5 265), 3 HLA-B alleles were associated with higher sociation between HLA-C alleles and quantitative responses to responses: HLA-B*2705 (N 5 27; P 5 4.6 3 1025), B*5101 Gag. We then tested whether the identified HLA-B alleles were (N 5 29; P 5 .02), and B*5701 (N 5 17; P 5 .05); whereas 2 responsible for the association signals detected in the MHC re- alleles were associated with lower responses: HLA-B*0801 gion in the genome-wide scan, using nested linear regression. (N 5 60; P 5 2.7 3 1024) and B*4501 (N 5 5; P 5 .01) When the SNPs that showed the strongest associations in the (Figure 3). Together, these HLA-B alleles accounted for 13.6% of genome-wide association study (GWAS) were added to

Table 2. Participants Included in Linear Regression Analyses, by Ethnic Group

Ethnic Total no. of No. (%) of participants with No. (%) of participants with group participants positive responses to any HIV-1peptide pool positive Gag-specific responses White 318 283 (89.0) 265 (83.3) Black 87 72 (87.8) 59 (73.0) Mestizoa 115 101 (82.8) 84 (67.8)

NOTE. a Mestizo (‘‘mixed race’’) is used by Peruvians to describe their ethnic group. Figure 2. Genomic overview of the major histocompatibility complex (MHC) region that includes the most associated variants. Indicated are the P values for association with Gag-specific responses of all genotyped single-nucleotide polymorphisms in the region and the structures of the surrounding genes. A P value of ,5 3 1028 was required to declare significance at the genome-wide level. HLA-B, major histocompatibility complex, class I, B; HLA- C, major histocompatibility complex, class I, C; MICA, MHC class I polypeptide-related sequence A. regression models that already incorporated HLA-B*2705, GWAS to search for human genetic determinants of the in- B*5101, B*5701, B*0801, and B*4501 as covariates, these SNPs terindividual variability observed in cellular immune responses did not result in a significant increase in the explained variation, to the MRKAd5 HIV-1 gag/pol/nef vaccine. HIV-1-specific T cell which confirms that the GWAS signals are indeed explained by responses were measured by IFN-c ELISPOT assays in a large the combined effect of functional HLA alleles (Table 3). cohort of vaccine recipients. We observed a good correlation between the magnitudes of T cell responses detected by IFN-c 1 DISCUSSION ELISPOT and those detected by CD8 intracellular cytokine staining, which is not surprising since the vaccine preferen- In an attempt to better delineate the host mechanisms involved tially induced CD81 T cells, and the CD41 T cells that were in the immune response to T cell vaccines, we performed a elicited primarily secreted IL-2 and tumor necrosis factor a [2].

Figure 3. HLA-B alleles associated with quantitative T cell responses to Gag in the interferon c enzyme-linked immunospot assay. Only white participants are included. PBMC, peripheral blood mononuclear cell; SFC, spot-forming cell. Table 3. P Values for Single-nucleotide Polymorphism (SNP) both to confirm meaningfully enhanced control at a population Associations for Top 5 SNPs in Genome-wide Association Study level. (GWAS) Comparable results had been reported previously in an im- munogenetic study of an HIV-1 vaccine. Responses to HIV-1 P values for SNP associations Top 5 proteins were measured in samples from ALVAC HIV re- SNPs in GWAS Adjusted combinant canarypox vaccine trials by means of a lytic cytotoxic a GWAS P value P value T lymphocyte assay, and the proportions of samples responding 6 rs4713462 1.9 3 102 .02 to Gag or Env were found to be significantly higher among those 26 rs4713460 2.4 3 10 .03 carrying HLA-B*27 or B*57 [25]. The canarypox T cell vaccine 25 rs2247056 1.4 3 10 .07 that was used was poorly immunogenic for eliciting CD81 rs2853935 1.4 3 1025 .06 T cells, and the study population was smaller; still, the concor- rs2523535 2.2 3 1025 .11 dance between our observations and those earlier results NOTE. The HLA-B alleles HLA-B*2705, *5101, *5701, *0801, and *4501 strengthens the case for an important role of protective HLA-B together explain most of the genome-wide association signals detected in the region. Results are shown for the 5 most associated SNPs in the analysis of alleles in differential cellular immune responses to HIV-1 Gag-specific responses in white participants. All linear regression models vaccines. include age, adenovirus serotype 5 titers, laboratory (HIV Vaccine Trials 1 Network vs Merck), and EIGENSTRAT values as covariates. Gag-specific CD8 T cell responses have repeatedly been a The adjusted P value is obtained for each SNP in a model in which the 5 found to be related to improved immune control of HIV-1 HLA-B alleles are incorporated. infection [12, 26, 27], which is explained largely because Gag is both highly immunogenic and highly conserved in sequence, Although no SNP reached genome-wide significance, of all and because mutational escapes often result in a decrease in viral tested polymorphisms, those located in the MHC region fitness. The results presented here show that Gag-specific im- showed the strongest association with responses to Gag, leading mune responses to T cell vaccines are modulated by the host us to look for possible associations between HLA class I alleles immunogenetic background. HLA class I typing data should and that immunological outcome. thus be discussed at the design stage and included in the analysis Several HLA-B alleles were found to be responsible for the of future studies of candidate vaccines eliciting robust CD81 SNP-associated signal: HLA-B*2705, B*5101, and B*5701 were Tcells. associated with higher Gag responses, whereas HLA-B*0801 Finally, it is noteworthy that most of the interindividual and B*4501 were associated with lower responses. Strikingly, all variation in cellular responses to the MRKAd5 HIV-1 gag/pol/ 5 alleles are also known to be associated with differences in nef vaccine remains unexplained. To help understand the rest of HIV-1 control [12]. HLA-B*57 is the human genetic factor that the variation, it seems appropriate to perform additional studies, has been most consistently associated with potent control of for example, by applying resequencing technology to uncover HIV-1 [13–17], with HLAB*5701 observed almost exclusively in rare causal genetic variants, since we here confirmed the exis- white individuals and HLA-B*5703 mostly found in individuals tence of a connection between natural viral control and vaccine of African ancestry. There is clear epidemiological and func- response. tional evidence for effective restriction of HIV-1 by HLA-B*27 [18, 19]. HLA-B*5101 was also known to be associated with Funding slower progression to AIDS [20], and its protective effect was This work was supported by the National Institute of Allergy and In- nicely confirmed in a recent study that showed that the allele fectious Diseases (NIAID) Center for HIV/AIDS Vaccine Immunology drives HIV-1 adaptation at a population level [21]. Conversely, (grant number U19 AI067854-05); and the NIAID HIV Vaccine Trials Network. HLA-B*0801 and HLA-B*4501 have both been associated with higher viral load and more rapid disease progression [20, 22, References 23]. The identification of the same HLA alleles, acting in com- parable directions, strongly suggests that overlapping mecha- 1. Buchbinder SP, Mehrotra DV, Duerr A, et al. Efficacy assessment of a cell-mediated immunity HIV-1 vaccine (the Step study): a double- nisms are implicated in the development of vaccine-induced blind, randomised, placebo-controlled, test-of-concept trial. Lancet cellular immunity to HIV-1 and of natural immune control of 2008; 372:1881–93. HIV-1. Of course, the ELISPOT results used in this study only 2. McElrath MJ, De Rosa SC, Moodie Z, et al. HIV-1 vaccine-induced provide information about the immunogenicity of the vaccine immunity in the test-of-concept Step study: a case-cohort analysis. Lancet 2008; 372:1894–905. and do not represent a correlate of protection for the tested 3. Catanzaro AT, Koup RA, Roederer M, et al. Phase 1 safety and im- vaccine. It is possible that the vaccine elicited appropriate CD81 munogenicity evaluation of a multiclade HIV-1 candidate vaccine T cell immune responses to enhance viral control, as suggested delivered by a replication-defective recombinant adenovirus vector. J Infect Dis 2006; 194:1638–49. by recent data [24], but that these are either too weak in function 4. Priddy FH, Brown D, Kublin J, et al. Safety and immu- or magnitude, occur in too small a subset of the population, or nogenicity of a replication-incompetent adenovirus type 5 HIV-1 clade B gag/pol/nef vaccine in healthy adults. Clin Infect Dis 2008; 46: 16. Fellay J, Ge D, Shianna KV, et al. Common genetic variation and the 1769–81. control of HIV-1 in humans. PLoS Genet 2009; 5:e1000791. 5. McCarthy MI, Abecasis GR, Cardon LR, et al. Genome-wide associa- 17. Pelak K, Goldstein DB, Walley NM, et al. Host determinants of HIV-1 tion studies for complex traits: consensus, uncertainty and challenges. control in African Americans. J Infect Dis 2010; 201:1141–9. Nat Rev Genet 2008; 9:356–69. 18. Goulder PJ, Phillips RE, Colbert RA, et al. Late escape from an im- 6. Moodie Z, Huang Y, Gu L, Hural J, Self SG. Statistical positivity criteria munodominant cytotoxic T-lymphocyte response associated with for the analysis of ELISpot assay data in HIV-1 vaccine trials. J Im- progression to AIDS. Nat Med 1997; 3:212–7. munol Methods 2006; 315:121–32. 19. Schneidewind A, Brockman MA, Yang R, et al. Escape from the 7. Moodie Z, Price L, Gouttefangeas C, et al. Response definition criteria dominant HLA-B27-restricted cytotoxic T-lymphocyte response in for ELISPOT assays revisited. Cancer Immunol Immunother 2010; Gag is associated with a dramatic reduction in human immunodefi- 59:1489–501. ciency virus type 1 replication. J virol 2007; 81:12382–93. 8. Wang K, Li M, Hadley D, et al. PennCNV: an integrated hidden 20. Kaslow RA, Carrington M, Apple R, et al. Influence of combinations of Markov model designed for high-resolution copy number variation human major histocompatibility complex genes on the course of HIV- detection in whole-genome SNP genotyping data. Genome Res 2007; 1 infection. Nat Med 1996; 2:405–11. 17:1665–74. 21. Kawashima Y, Pfafferott K, Frater J, et al. Adaptation of HIV-1 to 9. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, human leukocyte antigen class I. Nature 2009; 458:641–5. Reich D. Principal components analysis corrects for stratification in 22. Steel CM, Ludlam CA, Beatson D, et al. HLA haplotype A1 B8 DR3 as genome-wide association studies 2006; 38:904–9. a risk factor for HIV-related disease. Lancet 1988; 1:1185–8. 10. Purcell S, Neale B, Todd-Brown K, et al. PLINK: a tool set for whole- 23. Kiepiela P, Leslie AJ, Honeyborne I, et al. Dominant influence of HLA- genome association and population-based linkage analyses. Am J Hum B in mediating the potential co-evolution of HIV and HLA. Nature Genet 2007; 81:559–75. 2004; 432:769–75. 11. Whitlock MC. Combining probability from independent tests: the 24. Frahm N, Janes H, Friedrich DP, et al. Beneficial effects of protective weighted Z-method is superior to Fisher’s approach. J Evol Biol 2005; HLA class I allele expression and breadth of epitope recognition after 18:1368–73. vaccination on HIV viral load post-infection. AIDS Vaccine 2010; 12. Goulder PJ, Watkins DI. Impact of MHC class I diversity on immune P17.20. Atlanta, Georgia, USA. control of immunodeficiency virus replication. Nat Rev Immunol 25. Kaslow RA, Rivers C, Tang J, et al. Polymorphisms in HLA class I genes 2008; 8:619–30. associated with both favorable prognosis of human immunodeficiency 13. Migueles SA, Sabbaghian MS, Shupert WL, et al. HLA B*5701 is highly virus (HIV) type 1 infection and positive cytotoxic T-lymphocyte re- associated with restriction of virus replication in a subgroup of HIV- sponses to ALVAC-HIV recombinant canarypox vaccines. J Virol 2001; infected long term nonprogressors. Proc Natl Acad Sci U S A 2000; 75:8681–9. 97:2709–14. 26. Riviere Y, McChesney MB, Porrot F, et al. Gag-specific cytotoxic 14. Altfeld M, Addo MM, Rosenberg ES, et al. Influence of HLA-B57 on responses to HIV type 1 are associated with a decreased risk of clinical presentation and viral control during acute HIV-1 infection. progression to AIDS-related complex or AIDS. AIDS Res Hum IDS 2003; 17:2581–91. Retroviruses 1995; 11:903–7. 15. Fellay J, Shianna KV, Ge D, et al. A whole-genome association study of 27. Kiepiela P, Ngumbela K, Thobakgale C, et al. CD81 T-cell responses to major determinants for host control of HIV-1. Science 2007; different HIV proteins have discordant associations with viral load. Nat 317:944–7. Med 2007; 13:46–53.

Chapter 8

A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control

Bartha I, Carlson JM, Brumme CJ, McLaren PJ, Brumme ZL, John M, Haas DW, Martinez-Picado J, Dalmau J, López-Galíndez C, Casado C, Rauch A, Günthard HF, Bernasconi E, Vernazza P, Klimkait T, Yerly S, O'Brien SJ, Listgarten J, Pfeifer N, Lippert C, Fusi N, Kutalik Z, Allen TM, Müller V, Harrigan PR, Heckerman D, Telenti A, Fellay J. A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. Elife. 2013 Oct 29;2:e01123

RESEARCH ARTICLE elife.elifesciences.org

A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control István Bartha1,2,3,4, Jonathan M Carlson5†, Chanson J Brumme6†, Paul J McLaren1,2,4†, Zabrina L Brumme6,7, Mina John8, David W Haas9, Javier Martinez-Picado10,11, Judith Dalmau10, Cecilio López-Galíndez12, Concepción Casado12, Andri Rauch13, Huldrych F Günthard14, Enos Bernasconi15, Pietro Vernazza16, Thomas Klimkait17, Sabine Yerly18, Stephen J O’Brien19, Jennifer Listgarten5, Nico Pfeifer5‡, Christoph Lippert5, Nicolo Fusi5, Zoltán Kutalik4,20, Todd M Allen21, Viktor Müller3, P Richard Harrigan6,22, David Heckerman5, Amalio Telenti2*, Jacques Fellay1,2,4*, for the HIV Genome-to-Genome Study and the Swiss HIV Cohort Study

1School of Life Sciences, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland; 2Institute of Microbiology, University Hospital and University of Lausanne, Lausanne, Switzerland; 3Research Group of Theoretical Biology and *For correspondence: Amalio. Evolutionary Ecology, Eötvös Loránd University and the Hungarian Academy of 4 [email protected] (AT); jacques. Sciences, Budapest, Hungary; Swiss Institute of Bioinformatics, Lausanne, [email protected] (JF) Switzerland; 5eScience Group, Microsoft Research, Los Angeles, United States; 6 7 †These authors contributed BC Centre for Excellence in HIV/AIDS, Vancouver, Canada; Faculty of Health 8 equally to this work Sciences, Simon Fraser University, Burnaby, Canada; Institute of Immunology and Infectious Diseases, Murdoch University, Murdoch, Australia; 9Vanderbilt University ‡Present address: Department Medical Center, Nashville, United States; 10AIDS Research Institute IrsiCaixa, Institut of Computational Biology and Applied Algorithmics, Max d’Investigació en Ciències de la Salut Germans Trias i Pujol, Universitat Autònoma de 11 Planck Institute for Informatics, Barcelona, Badalona, Spain; Institució Catalana de Recerca i Estudis Avançats Saarbrücken, Germany (ICREA), Barcelona, Spain; 12Centro Nacional de Microbiología, Instituto de Salud 13 Competing interests: The Carlos III, Madrid, Spain; Clinic of Infectious Diseases, University of Bern & 14 authors declare that no Inselspital, Bern, Switzerland; Division of Infectious Diseases and Hospital competing interests exist. Epidemiology, University Hospital and University of Zürich, Zürich, Switzerland; 15 Funding: See page 13 Division of Infectious Diseases, Regional Hospital of Lugano, Lugano, Switzerland; 16Division of Infectious Diseases and Hospital Epidemiology, Cantonal Hospital, Received: 01 July 2013 St. Gallen, Switzerland; 17Department of Biomedicine, University of Basel, Basel, Accepted: 26 September 2013 18 Published: 29 October 2013 Switzerland; Laboratory of Virology, Geneva University Hospitals, Geneva, Switzerland; 19Theodosius Dobzhansky Center for Genome Bioinformatics, Reviewing editor: Gil McVean, St. Petersburg State University, St. Petersburg, Russia; 20Institute of Social and Oxford University, United Preventive Medicine, University Hospital and University of Lausanne, Lausanne, Kingdom Switzerland; 21Ragon Institute of MGH, MIT, and Harvard, Massachusetts General Copyright Bartha et al. This Hospital, Boston, United States; 22Faculty of Medicine, University of British Columbia, article is distributed under the Vancouver, Canada terms of the Creative Commons Attribution License, which permits unrestricted use and redistribution provided that the original author and source are credited.

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 1 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

Abstract HIV-1 sequence diversity is affected by selection pressures arising from host genomic factors. Using paired human and viral data from 1071 individuals, we ran >3000 genome-wide scans, testing for associations between host DNA polymorphisms, HIV-1 sequence variation and plasma viral load (VL), while considering human and viral population structure. We observed significant human SNP associations to a total of 48 HIV-1 amino acid variants (p<2.4 × 10−12). All associated SNPs mapped to the HLA class I region. Clinical relevance of host and pathogen variation was assessed using VL results. We identified two critical advantages to the use of viral variation for identifying host factors: (1) association signals are much stronger for HIV-1 sequence variants than VL, reflecting the ‘intermediate phenotype’ nature of viral variation; (2) association testing can be run without any clinical data. The proposed genome-to-genome approach highlights sites of genomic conflict and is a strategy generally applicable to studies of host–pathogen interaction. DOI: 10.7554/eLife.01123.001

eLife digest Developing treatments or vaccines for HIV is challenging because the genetic makeup of the virus is constantly changing in an effort to outwit the human immune system. Moreover, the immune system is highly variable as a result of the long-standing co-evolution of humans and microbes. Each individual will try to oppose the invading virus in a unique way, forcing the virus to acquire specific mutations that can be interpreted as the genetic signature of this one-against-one battle. To explore the influence of co-evolution on HIV, Bartha et al. took samples of both human and viral genomes from 1071 individuals infected with HIV, the AIDS virus, and used genotyping and sequencing technology to obtain a comprehensive description of the genetic variation in both. Computational techniques were then used to search for links between variants in the human DNA sequences and variants in the viral sequences. The most common type of genetic variation found in the human genome is a single nucleotide polymorphism, or SNP for short: a SNP is produced when a single nucleotide – an A, C, G or T – is replaced by a different nucleotide. Bartha et al. found that SNPs within the human DNA sequences in their study were linked to variations in 48 amino acids in HIV. Moreover, all these SNPs were found within a group of genes known as the HLA (human leukocyte antigen) system, which encodes for proteins that play a vital role in the immune response. This work identified the areas of the human genome that put pressure on the AIDS virus, and the regions of the virus that serve to escape human control. The approach developed by Bartha et al. allows the interactions between a microbe and a human host to be studied by looking at the genome of the microbe and the genome of the infected person. It also differentiates host-induced mutations that limit the capacity of the virus to do harm from those that are tolerated by the pathogen. A similar strategy could be used to study other infectious diseases. DOI: 10.7554/eLife.01123.002

Introduction Through multiple rounds of selection and escape, host and pathogen genomes are imprinted with signatures of co-evolution that are governed by Darwinian forces. On the host side, well-characterized anti-retroviral restriction factors, such as TRIM5α, APOBEC3G and BST2, harbor strong signals of selection in primate genomes, clear examples of retroviral pressure (Ortiz et al., 2009). On the virus side, obvious signs of selection are observable in the HIV-1 genome: escape mutations and reversions have been described in epitopes restricted by human leukocyte antigen (HLA) class I molecules and targeted by cytotoxic T lymphocyte (CTL) responses (Goulder et al., 2001; Kawashima et al., 2009). Sequence polymorphisms have also been reported recently in regions targeted by killer immunoglobulin- like receptors (KIR), suggesting evasion from immune pressure by natural killer (NK) cells (Alter et al., 2011). Evidence for the remodeling of retroviral genomes by host genetic pressure also comes from simian immunodeficiency virus (SIV) infection studies in rhesus macaques, where escape from restric- tive TRIM5α alleles has been observed in the viral capsid upon cross-species transmission of SIVsm

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 2 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

(Kirmaier et al., 2010). In contrast, human alleles of TRIM5α do not result in escape mutations, likely because of adaptation of the pathogen to the host (Rahm et al., 2013). Sequence adaptation is also a known feature of cross-species transmission. For example, a methionine in the matrix protein (Gag-30)

in SIVcpzPtt changed to arginine in lineages leading to HIV-1 and reverted to methionine when HIV-1 was passaged through chimpanzees (Wain et al., 2007). To date, combined analyses of human and HIV-1 genetic data have addressed the association of HLA and KIR genes with variants in the retroviral genome (Moore et al., 2002; Brumme et al., 2007; Bhattacharya et al., 2007; Kawashima et al., 2009; Alter et al., 2011; Carlson et al., 2012; Wright et al., 2012). Additionally, genome-wide association studies (GWAS) performed in the host have focused on various HIV-related clinical phenotypes (Fellay et al., 2007; Fellay et al., 2009; Pereyra et al., 2010). In parallel, large amounts of HIV-1 sequence data have been generated for phylogenetic studies, which shed new light on viral transmission and evolution (Kouyos et al., 2010; Alizon et al., 2010; Von Wyl et al., 2011), or allow clinically driven analyses of viral genes targeted by antiretroviral drugs (resistance testing) (Von Wyl et al., 2009). Building on the unprecedented possibility to acquire and combine paired human and viral genomic information from the same infected individuals; we employ an innovative strategy for global genome-to- genome host–pathogen analysis. By simultaneously testing for associations between genome-wide human variation, HIV-1 sequence diversity, and plasma viral load (VL), our approach allows the mapping of all sites of host–pathogen genomic interaction, the correction for both host and viral population stratification, and the assessment of the respective impact of human and HIV-1 variation on a clinical outcome (Figure 1).

Results Study participants, host genotypes, and HIV-1 sequence variation Full-length HIV-1 genome sequence and human genome-wide SNP data were obtained from seven studies or institutions on a total of 1071 antiretroviral naive patients of Western European ancestry, infected with HIV-1 subtype B. The homogeneity of the study population was confirmed by principal component analysis of the genotype matrix: together, the first five principal components explained 1% of total genotypic variation. After quality control of the human genotype data, imputation and filtering, ∼7 million SNPs were available for association testing. The full-length HIV-1 sequence is approximately 9.5 Kb long, corresponding to over 3000 encoded amino acids. Not all sequences were complete; on an average, viral residues were covered in 85% of the study population (range: 75% in Tat to 95% in Gag). Due to its hypervariable nature, the portion of the HIV-1 envelope gene that encodes the gp120 protein was not sequenced in most study samples and was therefore excluded. Overall 1126 residues of the HIV-1 proteome were found to be variable in at least 10 samples, for a total of 3381 different viral amino acids that could be represented by 3007 distinct binary variables. Host VL GWAS We first performed a classical GWAS of host determinants of HIV-1 VL (Figure 2A, Study A) using data from 698 patients (65% of the study population) for whom a VL phenotype could be reliably estimated. The top associations were observed in the HLA class I region on chromo- some 6 and were highly consistent with results observed previously (Fellay et al., 2009; Pereyra et al., 2010). The strongest associated SNP, rs9267454 (p = 1.5 × 10−8), is in partial linkage dis- equilibrium (LD) with HLA-B*57:01 (r2 = 0.47, D′ = 0.92), HLA-B*14:01 (r2 = 0.12, D′ = 1.0), HLA- Figure 1. A triangle of association testing. The B*27:05 (r2 = 0.01, D = 0.99), and the HLA-C -35 following association analyses were performed: [Study ′ 2 A] human SNPs vs plasma viral load (1 GWAS); [Study rs9264942 SNP (r = 0.07, D′ = 0.77), and thus B] human SNPs vs variable HIV-1 amino acids (3007 reflects these well-known associations with HIV-1 GWAS); and [Study C] variable HIV-1 amino acids vs control. These results confirm the quality of the plasma viral load (1 proteome-wide association study). study population for the purpose of genome anal- DOI: 10.7554/eLife.01123.003 ysis of determinants of HIV-1-related outcomes.

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 3 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

Figure 2. Results of the genome-wide association analyses. (A) Associations between human SNPs and HIV-1 plasma viral load. The dotted line shows the Bonferroni-corrected significance threshold (p-value < 7.25 × 10−9). (B) Associations between human SNPs and HIV-1 amino acid variants, with 3007 GWAS collapsed in a single Manhattan plot. The dotted line shows the Bonferroni-corrected significance threshold (p-value < 2.4 × 10−12). (C) Schematic representation Figure 2. Continued on next page

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 4 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

Figure 2. Continued of the HLA class I genes and of the SNPs associated with HIV-1 amino acid variants in the region. (D) Same association results as in panel B, projected on the HIV-1 proteome. Only the strongest association is shown for each amino acid. Significant associations are indicated by a blue dot. The gp120 part of the HIV-1 proteome was not tested. The colored bar below the plot area shows the positions of the optimally defined CD8+ T cell epitopes. An interactive version of this figure can be found at http://g2g.labtelenti.org (which is also available to download from Zenodo, http://dx.doi.org/10.5281/zenodo.7138). DOI: 10.7554/eLife.01123.004

Genome-to-genome analyses 3007 genome-wide analyses of associations between human SNPs and HIV-1 amino acid variants were performed in the full sample of 1071 individuals (Figure 2B, Study B) using logistic regression cor- rected for viral phylogeny (Carlson et al., 2008; Carlson et al., 2012). Highly significant associations were observed between SNPs in the major histocompatibility complex (MHC) region and multiple amino acids throughout the HIV-1 proteome (except in Vpu, Rev and the RNaseH subunit of RT) (Figure 2C), with Gag and Nef having a significantly higher density of associated variable sites than the rest of the proteome (Gag: 6.8% vs 2.6% p=0.001; Nef: 11% vs 2.6% p = 1.2 × 10−5, binomial tests). Using Bonferroni correction for multiple testing (threshold p = 2.4 × 10−12), significant human SNP associations were observed with 48 viral amino acids (Figure 2 and Table 1). None of these 48 amino acids mapped to known sites of major antiretroviral drug resistance mutations (Hirsch et al., 2008). The strongest association found was between rs72845950 and Nef position 135 (p = 2.7 × 10−66). Associations were much stronger between human SNPs and HIV-1 amino acids than with VL. For example, the SNP rs2395029, a proxy for HLA-B*57:01 (r2 = 0.93), has a p-value of 1.21 × 10−6 for association with VL, while it reaches a p-value of 4 × 10−59 for association with amino acid variation in Gag at position 242 (a well known position of escape from HLA-B*57:01). No significant signals were identified outside the MHC. A link to the complete set of association results can be found at http://g2g.labtelenti.org (which is also available to download from Zenodo, http://dx.doi.org/10.5281/ zenodo.7139). These results demonstrate the feasibility and improved power of performing association testing using viral genetic variation as outcome, independent of clinical phenotype. SNPs, HLA alleles and CTL epitopes We next assessed whether the top SNPs associated with HIV-1 amino acids represent indirect markers of HLA class I alleles known to exert evolutionary pressure on HIV-1 (Table 1). We tested pairwise correlations between significant MHC SNPs and HLA class I alleles. The analysis confirmed the existence of high LD between SNPs and HLA alleles targeting corresponding epitopes. For example, the strongest association (p = 2.7 × 10−66) was observed between residue 135 in Nef, located in an optimally defined A*24:02 epitope, and rs72845950, which strongly tags HLA-A*24:02 (r2 = 0.89). Furthermore, we observed that a substantial fraction of the identified viral amino acids (24/48, 50%) were located within an optimally defined CTL epitope restricted by one or more HLA alleles tagged by the associ- ated SNP (http://www.hiv.lanl.gov/content/immunology/tables/optimal_ctl_summary.html, supple- mented with a recently updated list of epitopes [Carlson et al., 2012]). However, in seven cases, the classical HLA allele implicated through LD with a tagging SNP did not match previously reported restriction patterns (Table 1). These data demonstrate that this approach can reconstruct a map of targets of HLA pressure across the viral proteome and identify sites outside classical epitopes that could represent additional escape variants or compensatory mutations. That a substantial proportion of associated viral amino acids lay outside known CTL epitopes also highlights this approach as a tool to guide novel epitope discovery (Bhattacharya et al., 2007; Almeida et al., 2011). Analysis of polymorphic amino acids within the HLA genes has been shown to improve power for detection of association with clinical outcome and has demonstrated the biological relevance of key residues in the HLA-B binding groove (Pereyra et al., 2010). Therefore, we used the genome-to- genome framework to characterize the evolutionary pressure of HLA class I amino acids on the viral genome. The top associations in all classical class I genes mapped to discrete residues in the binding grooves of the HLA molecule: HLA-A position 62 (p = 3.3 × 10−76 with HIV Nef 135), HLA-B position 70 (p = 7.1 × 10−57 with HIV Gag 242), HLA-C position 99 (p = 5.4 × 10−63 with HIV Nef 70). These data indicate that all class I HLA genes can exert strong pressure on the viral proteome through a shared mechanism. The association results for HLA amino acids can also be found at http://g2g.labtelenti.org (which is also available to download from Zenodo).

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 5 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease 6.80E-02 8.60E-01 8.30E-01 1.20E-01 7.70E-01 1.90E-01 3.50E-01 5.30E-03 8.10E-02 1.70E-05 9.70E-03 2.90E-05 7.80E-01 2.80E-01 1.10E-01 2.00E-02 4.00E-01 1.40E-01 1.80E-01 5.60E-01 2.60E-01 3.00E-01 aa vs VL (p) VL vs aa 1.80E-02 7.90E-01 3.50E-01 2.20E-01 4.60E-01 5.10E-01 5.50E-02 2.00E-06 2.50E-01 1.90E-06 1.10E-06 3.30E-07 8.30E-01 3.50E-01 7.90E-01 5.60E-01 7.20E-01 2.10E-01 1.50E-01 6.70E-01 7.80E-01 6.10E-01 SNP vs VL (p) 6.10E-17 8.70E-14 8.90E-21 8.80E-21 2.70E-19 6.70E-14 2.20E-14 2.30E-13 4.80E-15 3.80E-55 2.40E-62 3.00E-13 3.10E-17 4.50E-22 2.40E-12 1.30E-24 2.70E-55 4.80E-18 6.10E-19 5.10E-33 2.20E-13 1.00E-12 SNP vs aa (p) ) 2 B*15:01 (1.00/0.88) B*13:02 (1.00/0.96) – C*07:02 (0.99/0.84) A*31:01 (0.97/0.83) C*03:04 (0.99/0.59) B*07:02 (0.99/0.95) B*15:01 (0.94/0.42) B*08:01 (1.00/0.43) B*27:05 (1.00/0.92) C*07:02 (0.95/0.83) B*57:01 (1.00/0.97) B*07:02 (1.00/0.98) B*57:01 (1.00/0.98) B*57:01 (1.00/1.00) C*06:02 (0.95/0.71) C*05:01 (1.00/0.95) C*14:02 (1.00/0.96) C*05:01 (1.00/1.00) A*03:01 (1.00/0.81) B*51:01 (0.97/0.92) B*15:01 (1.00/0.82) B*44:02 (1.00/0.64) Tagging HLA (D’/r Tagging B*49:01 (1.00/1.00) A*03:01 (1.00/0.01) – RQANFLGKI (429-437) – – – – GPGHKARVL (355-363) – GEIYKRWIIL (259–268) KRWIILGLNK (263–272) – TSTLQEQIGW (240-249) FPVTPQVPLR (68–77) TSTLQEQIGW (240–249) STTVKAACWW (123–132) – – LYNTVATL (78-85) LYNTVATL – RLRPGGKKK (20–28) – – – – RLRDLLLIVTR (259–269) CTL epitope (codons) rs17881210 rs34268928 rs28896571 rs61754472 rs2523612 rs11966319 rs2249935 chr6:31376564 rs41557213 rs2596488 rs73392116 chr6:31345421 rs1055821 rs9264419 chr6:31267544 rs9264954 rs1655912 rs1050502 rs12524487 rs2596477 chr6:31285512 rs9278477 SNP 71 79 28 32 26 11 12 206 437 403 397 357 340 268 264 248 242 124 147 122 119 267 HIV position GP41 GAG GAG GAG GAG GAG GAG GAG Table 1. Continued on next page GAG NEF GAG INT GAG INT GAG INT GAG INT GAG INT Associations between HIV-1 amino acid variants and human polymorphisms 1. Associations between HIV-1 Table HIV gene GAG GP41

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 6 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease 9.50E-01 5.70E-01 5.50E-03 3.40E-01 4.90E-02 1.20E-02 3.30E-01 2.70E-01 2.40E-01 7.70E-02 1.20E-01 1.40E-01 1.50E-01 2.70E-01 1.30E-03 5.40E-02 1.50E-02 3.00E-01 9.50E-02 6.20E-01 aa vs VL (p) VL vs aa 4.70E-01 1.60E-01 9.10E-02 2.50E-01 1.80E-01 3.60E-01 1.90E-06 9.00E-01 4.40E-01 1.20E-01 9.80E-01 6.40E-02 5.30E-01 8.20E-01 8.10E-01 1.20E-06 2.80E-01 7.20E-01 2.50E-01 8.10E-01 SNP vs VL (p) 5.60E-30 1.70E-18 2.70E-66 2.80E-19 1.10E-12 4.40E-16 3.00E-22 1.10E-35 1.10E-13 1.50E-12 9.60E-35 3.50E-20 1.00E-27 1.20E-65 1.90E-24 2.90E-21 2.20E-17 6.70E-45 4.80E-36 1.80E-12 SNP vs aa (p) ) 2 B*15:01 (0.98/0.92) B*44:02 (1.00/0.64) A*24:02 (1.00/0.88) B*35:01 (0.95/0.89) B*51:01 (1.00/0.18) C*14:02 (1.00/1.00) B*57:01 (1.00/1.00) C*07:01 (1.00/0.98) B*44:03 (0.98/0.96) - B*08:01 (1.00/0.97) B*13:02 (0.93/0.86) B*08:01 (1.00/0.22) C*07:02 (0.97/0.30) A*11:01 (1.00/0.99) A*03:01 (1.00/0.99) C*03:04 (0.96/0.54) B*07:02 (1.00/0.29) B*57:01 (1.00/0.98) C*04:01 (0.90/0.63) B*15:01 (1.00/0.47) Tagging HLA (D’/r Tagging B*51:01 (0.97/0.92) B*07:02 (1.00/0.01) B*08:01 (1.00/1.00) – EEMNLPGRW (34-42) RYPLTFGW (134–141) RYPLTFGW – – - HTQGYFPDW (116–124) – – – FLKEKGGL (90–97) – – – (84–92) AVDLSHFLK QIYPGIKVR (269–277) – RPMTYKAAL (77–85) IVLPEKDSW (244–252) – – TAFTIPSI (128–135) TAFTIPSI RPMTYKAAL (77–85) CTL epitope (codons) – rs2263323 rs2523577 rs72845950 chr6:31397689 chr6:31102273 chr6:31236168 chr6:31402358 rs1049709 rs2524277 rs17194293 rs9265972 rs17190134 rs16896166 rs3128902 rs2395475 chr6:31411714 rs34768512 rs1050502 rs9295987 rs2428481 SNP 28 93 35 94 92 85 83 81 135 133 126 120 116 105 102 395 369 277 245 135 HIV position RNASE PR PR NEF NEF NEF NEF NEF NEF Table 1. Continued Table 1. Continued on next page NEF RT NEF RT NEF RT NEF RT NEF RT HIV gene NEF

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 7 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease 6.50E-01 2.80E-01 2.10E-01 9.30E-03 4.90E-01 1.40E-01 aa vs VL (p) VL vs aa .hiv.lanl.gov/ 5.40E-02 9.70E-07 5.20E-01 9.90E-07 3.40E-01 3.90E-01 SNP vs VL (p) 3.10E-13 5.40E-13 1.40E-12 1.50E-13 6.40E-21 4.40E-14 SNP vs aa (p) ) 2 B*27:05 (1.00/0.94) B*57:01 (1.00/0.98) B*49:01 (1.00/1.00) B*57:01 (1.00/0.98) C*12:03 (0.98/0.96) Tagging HLA (D’/r Tagging A*32:01 (0.98/0.95) VRHFPRIWL (31–39) – – ISKKAKGWF (31–39) CCFHCQVC (30–37) – CTL epitope (codons) ) were observed for 48 HIV-1 amino acid variants. The table shows the major amino acid variants present at each specific HIV-1 position, the strongest position, the strongest at each specific HIV-1 amino acid variants. The table shows the major variants present observed for 48 HIV-1 ) were chr6:31362941 rs2395029 rs7767850 chr6:31430060 rs16899214 rs9260615 SNP -12 32 74 51 33 32 29 HIV position Significant associations (p < 2.4 × 10 VPR VIF VIF VIF TAT Table 1. Continued Where by the tagged HLA class I allele(s) specified, and their positions within protein. and in [ Carlson et al., 2012 ]) restricted content/immunology/tables/optimal_ctl_summary.html CTL epitope has been described are no relevant by the same HLA class I allele have been described, only one is shown. Associations where multiple overlapping epitopes restricted columns give association p-values for comparisons between human SNPs and viral amino acids, plasma VL acids indicated with a dash. The last three than 1 alternate allele, the smallest association p-value observed at that position is reported. For tests involving viral amino acids accommodating more plasma VL, respectively. DOI: 10.7554/eLife.01123.005 associated SNP and its linked HLA class I allele(s), if applicable. The column ‘CTL Epitope (codons)’ lists published, optimally described CTL epitopes (available at http://www HIV gene TAT

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 8 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease HIV-1 amino acids vs plasma viral load To address whether there was an observable impact of viral mutation on a clinical outcome in this sample, we tested for associations between all HIV-1 amino acid variant and VL (Study C). After correc- tion for multiple testing (p threshold = 1.6 × 10−5 based on 3125 viral amino acids), we did not observe any significant associations. We then focused on estimating the changes in VL associated with the 48 HIV-1 amino acid variants that were identified as significantly associated with host SNPs (Figure 2D).

The effects of amino acid variation at these sites on VL ranged from –0.16 to +0.07 log10 copies/ml (Figure 3A). We also explored the combined fitness effect of multiple HIV-1 viral amino acid variants targeted by a single host marker using the well-understood model of HLA-B*57:01. We evaluated the effect on VL of 23 viral residues that associated with host variant rs2395029 (r2 = 0.93 with

Figure 3. Association of HIV-1 amino acid variants with plasma viral load. (A) Changes in VL (slope coefficients from the univariate regression model and standard error, log10 copies/ml) for the 48 HIV-1 amino acids that are associated with host SNPs in the genome-to-genome analysis. (B) rs2395029, a marker of HLA-B*57:01 is associated with a 0.38 log10 copies/ml lower VL (black bar) in comparison to the population mean. Gray bars represent changes in VL for amino acid variants associated with rs2395029 (p<0.001). In case of multiallelic positions, the change in VL is shown for all minor amino acids combined vs the major amino acid (e.g., GAG147 not I). DOI: 10.7554/eLife.01123.006

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 9 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

HLA-B*57:01) in the genome-to-genome analysis (selection cutoff: p<0.001). The marker rs2395029 was associated with a 0.38 log decrease in viral RNA copies/ml. The univariate effect on VL for each of the 23 viral amino acids targeted by this allele ranged from –0.16 to +0.12 (Figure 3B). These results suggest that the genome-to-genome approach can be linked to clinical/laboratory phenotypes, allowing for detailed understanding of the distribution and relative contribution of sites of host– pathogen interaction to disease outcome. Discussion HIV-1 host genomic studies performed so far have focused on clinically defined outcomes (resistance to infection, clinical presentation, disease progression or death) or on pathogen-related laboratory results (such as CD4+ T cell counts and VL set point). While useful, these phenotypes have significant drawbacks. First, consistency of phenotypic determination can be hard to achieve, and such inconsist- ency can adversely affect power in large-scale genetic studies performed across multiple centers (Evangelou et al., 2011). Second, a relatively long follow-up in the absence of antiretroviral treatment is necessary to obtain informative data about the natural history of infection. However, international guidelines now propose an early start of antiretroviral therapy in most HIV-1 infected individuals (Thompson et al., 2012), making the collection of large numbers of long-term untreated patients not only unrealistic but also ethically questionable. To overcome these limitations, we developed a novel approach for host genetic studies of infec- tious diseases, built on the unprecedented possibility to obtain paired genome-wide information from hosts and pathogens. We combined human polymorphism and HIV-1 sequence diversity in the same analytical framework to search for sites of human-virus genomic conflict, effectively using variation in HIV-1 amino acids as an ‘intermediate phenotype’ for association studies. Intermediate phenotypes have recently been shown to be useful in uncovering association signals that are not detectable using more complex clinical endpoints: illustrative examples include metabolomic biomarkers in cardiovascular research (Suhre et al., 2011), serum IgE concentration in the study of asthma (Moffatt et al., 2010), or neuroimaging-based phenotypes in psychiatry genetics (Rasetti and Weinberger, 2011). Variation in the pathogen sequence is an as-yet-untapped intermediate phenotype, specific by nature to genomic research in infectious diseases. Importantly, it depends on sequencing the pathogen, which could prove in many cases easier and more standardized than obtaining detailed clinical phenotypes. Our approach allowed the mapping of host genetic pressure on the HIV-1 genome. The strongest association signals genome-wide were observed between human SNPs tagging HLA class I alleles and viral mutations in their corresponding CTL epitopes. Additional association signals were observed outside of optimally defined CTL epitopes, which could indicate novel epitopes, or represent secondary (compensatory) mutations. In a single experiment, these results recapitulate extensive epidemiological and immunogenetic research and represent a proof-of-concept that biologically meaningful association signals are identifiable using a hypothesis-free strategy. Indeed, host factors leading to viral adaptation can be uncovered by searching for associated imprints in the viral genome. Of note, the International HIV Controllers Study demonstrated the importance of specific amino acid positions in the HLA-B binding groove on a clinical outcome (elite control) (Pereyra et al., 2010). We here extend this observation to the HLA-A and C grooves, emphasizing the similarity in mechanism of host pressure on the viral proteome that is not necessarily translated into observable clinical outcomes. We found a higher density of amino acid positions under selection in Gag and Nef compared with the rest of the HIV proteome. This is consistent with earlier findings that indicate the importance of Gag p24-specific CTL responses in slower progression to AIDS (Borghans et al., 2007; Brennan et al., 2012) or controller status (Dyer et al., 2008). Moreover, this further demonstrates that mapping host pressure on the pathogen proteome can reveal biologically relevant effects. Analyses were performed using samples from clinically well-characterized patients, most of them with repeated and reliable HIV-1 VL measurements in the absence of antiretroviral therapy. We were thus able to compare the results of GWAS assessing human genetic determinants of mean VL, a standard clinical correlate of HIV-1 control, and genome-to-genome GWAS on amino acid variants in the viral proteome. The use of HIV-1 variation as outcome resulted in a considerable gain in power to detect host factors: the lowest p-values were observed for SNPs mapping to the HLA class I region in both approaches, but associations were much stronger with HIV-1 amino acid variation

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 10 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

than for HIV-1 VL (2.7 × 10−66 vs 1 × 10−08), even when accounting for the increased number of multiple tests. In addition to identifying sites of interaction between the host and the pathogen, the study design allowed the scoring of biological consequences of such interaction, by assessing associations between host-driven escape at viral sites and an in vivo phenotype (VL). For example, we decomposed the effect of rs2395029 (a marker of HLA-B*57:01) on VL to the effects of the multiple viral amino acid vari- ants that are associated with that SNP. While some HIV-1 amino acid changes individually associate with decrease in VL, the compound image that emerges is one of a multiplicity of modest effects distributed across many residues. Correlations between host-associated variants and VL are difficult to interpret, because they may reflect fitness costs or compensation, the existence of strong (Iversen et al., 2006; Carlson et al.,, 2012) or novel (Almeida et al., 2011) immune responses, or the indirect impact of specific HLA class I alleles. Nevertheless, the observation that the majority of host-associated HIV-1 mutations do not correlate with any detectable change in VL confirms HIV’s remarkable capacity to adapt and compensate to immune pressure, often without measurable fitness cost. A significant confounder in both human and viral genomic analyses is the existence of population stratification, where shared ancestry between infected individuals, stratification by ethnic groups, non- random distribution of HIV-1 subtypes, or clusters of viral transmission can all have an influence on the population frequencies of specific mutations, and thus create spurious associations if not carefully controlled for. Previous studies usually controlled for viral population substructure but were limited in the control of human population stratification Moore( et al., 2002; Bhattacharya et al., 2007). Our approach offers the opportunity to correct for both factors, thanks to the availability of extensive host and viral genomic information. The present sample size provided approximately 80% power to detect a common human variant (minor allele frequency of 10%) with an odds ratio of 4.2 in the genome-to-genome analysis (Study B) and a viral amino acid explaining approximately 4% of the variation in plasma viral load (Study C) at the respective significance thresholds Purcell( et al., 2003). Consistent with most studies performed in HIV-1 host genetics over the past few years (reviewed in Telenti and Johnson (2012)), we did not identify previously unknown host genetic loci involved in host-viral interaction and HIV-1 restriction. The proposed approach can only detect polymorphic host factors that leave an imprint on the virus, which may exclude mediators of immunopathogenesis or genes involved in the establishment of tolerance (Medzhitov et al., 2012). An additional limitation is the incomplete nature of genomic information available both on the host side (common genotypes from GWAS) and on the viral side (near full-length consensus sequence; gp120 was not included in the analyses). Finally, the multiple hypothesis burden of a genome-to-genome scan is extremely high. It is conceivable that larger studies, or studies that focus on a subgroup of predefined host genes, would have power to detect novel asso- ciations. A comprehensive, but computationally challenging description of host–pathogen genomic interactions would require human genome sequencing, coupled with deep sequencing of intra-host retroviral subpopulations. In summary, we used a genome-to-genome, hypothesis-free approach to identify associations between host polymorphisms and HIV-1 genomic variation. This strategy allows a global assessment of host–pathogen interactions at the genome level and reveals sites of genomic conflict. Comparable approaches are immediately applicable to explore other important infectious diseases, as long as poly- morphic host factors exert sufficient selective pressure to trigger escape mutations in the pathogen. The observation that pathogen sequence variation, used as an intermediate phenotype, is more powerful than clinical and laboratory outcomes to identify some host factors allows smaller-scale studies and encourages analyses of less prevalent infectious diseases. Researchers involved in pathogen genome studies and host genetic studies should strongly consider the gathering of paired host–pathogen data. Materials and methods Ethics statement Participating centers provided local Institutional Review Board approval for genetic analysis. Study participants provided informed consent for genetic testing, with the exception of a subset where a procedure approved by the relevant Research Ethics Board allowed the use of anonymized historical specimens in the absence of a specific informed consent.

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 11 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease Participants Study participants are treatment-naïve individuals followed in one of the following cohorts or institu- tions: the Swiss HIV Cohort Study (SHCS, www.shcs.ch, [Schoeni-Affolter et al., 2010]); the HAART Observational Medical Evaluation and Research (HOMER) study in Vancouver, Canada (www.cfenet. ubc.ca/our-work/initiatives/homer); the AIDS Clinical Trials Group (ACTG) Network in the USA (actgnetwork.org); the International HIV Controllers Study in Boston, USA (IHCS, www.hivcontrollers. org); Western Australian HIV Cohort Study, Perth, Australia; the AIDS Research Institute IrsiCaixa in Badalona, Spain; and the Instituto de Salud Carlos III in Madrid, Spain. To reduce noise due to host and viral diversity, we only included individuals of recent Western European ancestry (confirmed by clustering with HapMap CEU individuals in principal component analysis of the genotype data [Price et al., 2006]), and infected with HIV-1 subtype B (as assessed by the REGA Subtyping Tool [De Oliveira et al., 2005]). Plasma VL determinations in the absence of antiretroviral therapy were available from patients from the SHCS and the HOMER study. The VL phenotype was defined as the average of

the log10-transformed numbers of HIV-1 RNA copies per ml of plasma, excluding measurements obtained in the first 6 months after seroconversion and during advanced immunosuppression (i.e., with <100 CD4+ T cells per ml of blood). Consequently, 698 study participants were eligible for VL analysis. Human genotype data DNA samples were genotyped in the context of previous GWAS (Fellay et al., 2009; Pereyra et al., 2010) or for the current study on various platforms, including the HumanHap550, Human 660W-Quad, Human1M and HumanOmniExpress BeadChips (Illumina Inc., San Diego, CA, USA), as well as the Genome-Wide Human SNP Array 6.0 (Affymetrix Inc., Santa Clara, CA, USA) (Table 2). Study partici- pants were filtered on the basis of genotyping quality, a sex check, and cryptic relatedness. SNP quality control was performed separately for each dataset: SNPs were filtered on the basis of missing- ness (excluded if called in <99% of participants), minor allele frequency (excluded if <0.01), and marked deviation from Hardy-Weinberg equilibrium (excluded if p<0.00005). Missing genotype impu- tation was performed with the Mach software per genotyping platform (in separate batches for Illumina 1M, OmniExpress, 550K and Affymetrix data) using 1000 Genomes Phase I CEU population data as reference haplotypes. Imputed markers were filtered on minor allele frequency (excluded if <0.01) and imputation quality using Mach’s reported r-squared measure (excluded if <0.3). SNPs with a deviation in the allele frequencies between platforms were excluded. High-resolution HLA class I typing Table 2. Distribution of samples across (4 digits; HLA-A, HLA-B, and HLA-C) was obtained genotyping platforms and cohorts using sequence-based methods, or imputed from N Genotyping platform Cohort the SNP genotyping data as described elsewhere (Jia et al., 2013). 140 Illumina 1M ACTG 6 Illumina OmniExpress 12v1H CARLOS III HIV-1 sequence data 518 Affymetrix 6.0 HOMER Near full-length retroviral sequence data were 136 Illumina OmniExpress12v1H HOMER obtained by bulk sequencing of viral RNA present 47 Illumina 650k IHCS in pretreatment-stored plasma, and in 11 cases, of 6 Illumina 660W-Quad IRSICAIXA proviral DNA isolated from peripheral blood mono- nuclear cells, as previously described (Sandonís 2 Illumina 1M SHCS et al., 2009; John et al., 2010). We defined an 79 Illumina 550k SHCS amino acid residue as variable if at least 10 study 122 Illumina OmniExpress12v1H SHCS samples presented an alternative allele. Per posi- 15 Illumina 550k WAHCS tion, separate binary variables were generated for each alternate amino acid, indicating the ACTG = AIDS Clinical Trials Group Network; CARLOS III = Instituto de Salud Carlos III; HOMER = HAART presence or absence of that allele in a given Observational Medical Evaluation and Research Study; sample. IHCS = International HIV Controllers Study; IRSICAIXA = AIDS Research Institute IrsiCaixa; SHCS = Swiss HIV Association analyses Cohort Study; WAHCS = Western Australian HIV Cohort To globally assess the association between human Study. genomic variation (SNPs), HIV-1 proteomic varia- DOI: 10.7554/eLife.01123.007 tion (amino acids) and clinical outcome (VL), we

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 12 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

performed three series of analyses (Figure 1): [A] human SNPs vs VL; [B] human SNPs vs HIV-1 amino acids; and [C] HIV-1 amino acids vs VL. To test for association between human SNPs and HIV-1 amino acids, we used phylogenetically corrected logistic regression (Carlson et al., 2008; Carlson et al., 2012). For association testing between polymorphic amino acids in human HLA genes and HIV sequence variation, we used standard logistic regression (for a binary HLA amino acid) or a multivariate omnibus test (when more than one alternate allele was present) including sex, cohort, and the coordinates of the first two principal component axes as covariates. We used linear regression models in PLINK to test for association between human SNPs and VL, and between HIV-1 amino acids and VL (Purcell et al., 2007), including sex, cohort, and the coordinates of the first two principal component axes as covariates (Price et al., 2006). An additive genetic model was used for all analyses involving human SNPs. Significance was assessed using Bonferroni correction (significance thresholds of 7.25 × 10−9, 2.4 × 10−12, and 1.6 × 10−5 for analyses A, B, and C, respectively, Figure 1).

Acknowledgements We would like to thank all the patients participating in these genetic studies, the many study nurses, physicians, data managers and laboratories involved in all the cohorts; Tanja Stadler and Sebastian Bonhoeffer (at ETH Zürich, Switzerland) and Samuel Alizon (at MIGEVEC, Montpellier, France) for helpful discussions; and Jennifer Troyer (at the Laboratory for Genomic Diversity, NCI) for her work on the HOMER genotyping data.

Additional information

Funding Grant reference Funder number Author Swiss National Science 33CS30_134277/Swiss Amalio Telenti, Foundation HIV Cohort Study, Jacques Fellay 31003A_132863/1, PP00P3_133703/1 Santos Suarez Foundation, Amalio Telenti, Lausanne Jacques Fellay Hungarian Academy of Bolyai János Research Viktor Müller Sciences Fellowship Michael Smith Foundation Zabrina L Brumme for Health Research Canadian Institutes of Zabrina L Brumme Health Research Sciex-NMS Program 10.267 István Bartha Spanish Ministry of Science SAF 2007-61036, Cecilio López-Galíndez, and Innovation 2010-17226 and Concepción Casado 2010-18917 Fundacion para la 36558/06, 36641/07, Cecilio López-Galíndez, investigacion y prevencion 36779/08, 360766/09 Concepción Casado del SIDA en Espana RETIC de Investigacion RD06/006/0036 Cecilio López-Galíndez, en SIDA Concepción Casado National Institute of P01-AI074415 Todd M Allen Allergy and Infectious Diseases (NIAID) Bill and Melinda Gates Todd M Allen Foundation SNF Professorship PP00P3_133703/1 Jacques Fellay

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 13 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

Author contributions IB, JMC, CJB, JL, NP, CL, NF, ZK, VM, Analysis and interpretation of data, Drafting or revising the article; PJM, AT, JF, Conception and design, Analysis and interpretation of data, Drafting or revising the article; ZLB, Acquisition of data, Analysis and interpretation of data, Drafting or revising the arti- cle, Contributed unpublished essential data or reagents; MJ, Acquisition of data, Drafting or revising the article, Contributed unpublished essential data or reagents; DWH, JM-P, JD, CL-G, CC, AR, HFG, EB, PV, TK, SY, SJO’B, PRH, Acquisition of data, Drafting or revising the article; TMA, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article; DH, Analysis and interpreta- tion of data, Drafting or revising the article, Contributed unpublished essential data or reagents

Ethics Human subjects: Participating centers provided local Institutional Review Board approval for genetic analysis. Study participants provided informed consent for genetic testing, with the exception of a subset where a procedure approved by the relevant Research Ethics Board allowed the use of anonymized historical specimens in the absence of a specific informed consent.

Additional files Major dataset

The following datasets were generated:

Database, license, Dataset ID and accessibility Author(s) Year Dataset title and/or URL information Bartha I, Carlson JM, 2013 Interactive HIV-Host http://dx.doi.org/ Publicly available at Brumme CJ, McLaren PJ, Genome-to-Genome Map 10.5281/zenodo.7138 Zenodo (https://zenodo. Brumme ZL, John M, et al. org). Bartha I, Carlson JM, 2013 Online Supplementary http://dx.doi.org/ Publicly available at Brumme CJ, McLaren PJ, Dataset of the HIV 10.5281/zenodo.7139 Zenodo (https://zenodo. Brumme ZL, John M, et al. Genome-to-Genome Study org).

References Alizon S, von Wyl V, Stadler T, Kouyos RD, Yerly S, Hirschel B, Böni J, et al. 2010. Phylogenetic approach reveals that virus genotype largely determines HIV set-point viral load. PLOS Pathogens 6:e1001123. doi: 10.1371/ journal.ppat.1001123. Almeida CA, Bronke C, Roberts SG, McKinnon E, Keane NM, Chopra A, Kadie C, et al. 2011. Translation of HLA-HIV associations to the cellular level: HIV adapts to inflate CD8 T cell responses against Nef and HLA-adapted variant epitopes. J Immunol 187:2502–13. doi: 10.4049/jimmunol.1100691. Alter G, Heckerman D, Schneidewind A, Fadda L, Kadie CM, Carlson JM, Oniangue-Ndza C, et al. 2011. HIV-1 adaptation to NK-cell-mediated immune pressure. Nature 476:96–100. doi: 10.1038/nature10237. Bhattacharya T, Daniels M, Heckerman D, Foley B, Frahm N, Kadie C, Carlson J, et al. 2007. Founder effects in the assessment of HIV polymorphisms and HLA allele associations. Science 315:1583–6. doi: 10.1126/ science.1131528. Borghans JA, Mølgaard A, de Boer RJ, Keşmir C. 2007. HLA alleles associated with slow progression to AIDS truly prefer to present HIV-1 P24. PLOS ONE 2:e920. doi: 10.1371/journal.pone.0000920. Brennan CA, Ibarrondo FJ, Sugar CA, Hausner MA, Shih R, Ng HL, Detels R, et al. 2012. Early HLA-B*57- restricted CD8+ T lymphocyte responses predict HIV-1 disease progression. J Virol 86:10505–16. doi: 10.1128/ JVI.00102-12. Brumme ZL, Brumme CJ, Heckerman D, Korber BT, Daniels M, Carlson J, Kadie C, et al. 2007. Evidence of differential HLA class i-mediated viral evolution in functional and accessory/regulatory genes of HIV-1. PLOS Pathogens 3:e94. doi: 10.1371/journal.ppat.0030094. Carlson JM, Brumme CJ, Martin E, Listgarten J, Brockman MA, Le AQ, Chui CK, et al. 2012. Correlates of protective cellular immunity revealed by analysis of population-level immune escape pathways in HIV-1. J Virol 86:13202–16. doi: 10.1128/JVI.01998-12. Carlson JM, Brumme ZL, Rousseau CM, Brumme CJ, Matthews P, Kadie C, Mullins JI, et al. 2008. Phylogenetic dependency networks: inferring patterns of CTL escape and codon covariation in HIV-1 Gag. Edited by Rob J De Boer. PLOS Comput Biol 4:23. doi: 10.1371/journal.pcbi.1000225. Carlson JM, Listgarten J, Pfeifer N, Tan V, Kadie C, Walker BD, Ndung’u T, et al. 2012. Widespread impact of HLA restriction on immune control and escape pathways of HIV-1. J Virol 86:5230–43. doi: 10.1128/ JVI.06728-11.

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 14 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

De Oliveira T, Deforche K, Cassol S, Salminen M, Paraskevis D, Seebregts C, Snoeck J, et al. 2005. An automated genotyping system for analysis of HIV-1 and other microbial sequences. Bioinformatics 21:3797–800. doi: 10.1093/bioinformatics/bti607. Dyer WB, Zaunders JJ, Yuan FF, Wang B, Learmont JC, Geczy AF, Saksena NK, McPhee DA, Gorry PR, Sullivan JS. 2008. Mechanisms of HIV non-progression; robust and sustained CD4+ T-cell proliferative responses to P24 antigen correlate with control of viraemia and lack of disease progression after long-term transfusion-acquired HIV-1 infection. Retrovirology 5:112. doi: 10.1186/1742-4690-5-112. Evangelou E, Fellay J, Colombo S, Martinez-Picado J, Obel N, Goldstein DB, Telenti A, Ioannidis JPA. 2011. Impact of phenotype definition on genome-wide association signals: empirical evaluation in human immunodeficiency virus type 1 infection. Am J Epidemiol 173:1336–42. doi: 10.1093/aje/kwr024. Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, Cirulli ET, Urban TJ, et al. 2009. Common genetic variation and the control of HIV-1 in humans. PLOS Genet 5:e1000791. doi: 10.1371/journal.pgen.1000791. Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, Weale M, Zhang K, et al. 2007. A Whole-genome association study of major determinants for host control of HIV-1. Science 317:944–7. doi: 10.1126/science.1143767. Goulder PJ, Brander C, Tang Y, Tremblay C, Colbert RA, Addo MM, Rosenberg ES, et al. 2001. Evolution and transmission of stable CTL escape mutations in HIV infection. Nature 412:334–8. doi: 10.1038/35085576. Hirsch MS, Günthard HF, Schapiro JM, Brun-Vézinet F, Clotet B, Hammer SM, Johnson VA, et al. 2008. Antiretroviral drug resistance testing in adult HIV-1 infection: 2008 recommendations of an international AIDS Society-USA panel. Clin Infect Dis 47:266–85. doi: 10.1086/589297. Iversen AKN, Stewart-Jones G, Learn GH, Christie N, Sylvester-Hviid C, Armitage AE, Kaul R, et al. 2006. Conflicting selective forces affect T cell receptor contacts in an immunodominant human immunodeficiency virus epitope. Nat Immunol 7:179–89. doi: 10.1038/ni1298. Jia X, Han B, Onengut-Gumuscu S, Chen WM, Concannon PJ, Rich SS, Raychaudhuri S, de Bakker PIW. 2013. Imputing amino acid polymorphisms in human leukocyte antigens. PLOS ONE 8:e64683. doi: 10.1371/journal. pone.0064683. John M, Heckerman D, James I, Park LP, Carlson JM, Chopra A, Gaudieri S, et al. 2010. Adaptive interactions between HLA and HIV-1: highly divergent selection imposed by HLA class I molecules with common supertype motifs. J Immunol 184:4368–77. doi: 10.4049/jimmunol.0903745. Kawashima Y, Pfafferott K, Frater J, Matthews P, Payne R, Addo M, Gatanaga H, et al. 2009. Adaptation of HIV-1 to human leukocyte antigen class I. Nature 458:641–5. doi: 10.1038/nature07746. Kirmaier A, Wu F, Newman RM, Hall LR, Morgan JS, O’Connor S, Marx PA, et al. 2010. TRIM5 suppresses cross-species transmission of a primate immunodeficiency virus and selects for emergence of resistant variants in the new species. PLOS Biol 8:e1000462. doi: 10.1371/journal.pbio.1000462. Kouyos RD, von Wyl V, Yerly S, Böni J, Taffé P, Shah C, Bürgisser P, et al. 2010. Molecular epidemiology reveals long-term changes in hiv type 1 subtype B transmission in Switzerland. J Infect Dis 201:1488–97. doi: 10.1086/651951. Medzhitov R, Schneider DS, Soares MP. 2012. Disease tolerance as a defense strategy. Science 335:936–41. doi: 10.1126/science.1214935. Moffatt MF, Gut IG, Demenais F, Strachan DP, Bouzigon E, Heath S, von Mutius E, Farrall M, Lathrop M, Cookson WOCM. 2010. A large-scale, consortium-based genomewide association study of asthma. N Engl J Med 363:1211–21. doi: 10.1056/NEJMoa0906312. Moore CB, John M, James IR, Christiansen FT, Witt CS, Mallal SA. 2002. Evidence of HIV-1 adaptation to HLA-restricted immune responses at a population level. Science 296:1439–43. doi: 10.1126/science.1069660. Ortiz M, Guex N, Patin E, Martin O, Xenarios I, Ciuffi A, Quintana-Murci L, Telenti A. 2009. Evolutionary trajectories of primate genes involved in HIV pathogenesis. Mol Biol Evol 26:2865–75. doi: 10.1093/molbev/ msp197. Pereyra F, Jia X, McLaren PJ. 2010. The major genetic determinants of HIV-1 control affect HLA class I peptide presentation. Science 330:1551–7. doi: 10.1126/science.1195271. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. 2006. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38:904–9. doi: 10.1038/ng1847. Purcell S, Cherny SS, Sham PC. 2003. Genetic Power Calculator: design of linkage and association genetic mapping studies of complex traits. Bioinformatics 19:149–50. doi: 10.1093/bioinformatics/19.1.149. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Human Genet 81:559–75. doi: 10.1086/519795. Rahm N, Gfeller D, Snoeck J, Martinez R, McLaren PJ, Ortiz M, Ciuffi A, Telenti A. 2013. Susceptibility and adaptation to human TRIM5α alleles at positive selected sites in HIV-1 capsid. Virology:1–9. doi: 10.1016/ j.virol.2013.03.021. Rasetti R, Weinberger DR. 2011. Intermediate phenotypes in psychiatric disorders. Curr Opin Genet Dev 21:340–8. doi: 10.1016/j.gde.2011.02.003. Sandonís V, Casado C, Alvaro T, Pernas M, Olivares I, García S, Rodríguez C, del Romero J, López-Galíndez C. 2009. A combination of defective DNA and protective host factors are found in a set of HIV-1 ancestral LTNPs. Virology 391:73–82. doi: 10.1016/j.virol.2009.05.022. Schoeni-Affolter F, Ledergerber B, Rickenbach M, Rudin C, Günthard HF, Telenti A, Furrer H, Yerly S, Francioli P. 2010. Cohort profile: the Swiss HIV cohort study. Int J Epidemiol 39:1179–89. doi: 10.1093/ije/dyp321. Suhre K, Shin SY, Petersen AK, Mohney RP, Meredith D, Wägele B, Altmaier E, et al. 2011. Human metabolic individuality in biomedical and pharmaceutical research. Nature 477:54–60. doi: 10.1038/nature10354.

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 15 of 16 Research article Genomics and evolutionary biology | Microbiology and infectious disease

Telenti A, Johnson WE. 2012. Host genes important to HIV replication and evolution. Cold Spring Harbor Perspectives in Medicine 2:a007203. doi: 10.1101/cshperspect.a007203. Thompson MA, Aberg JA, Hoy JF, Telenti A, Benson C, Cahn P, Eron JJ, et al. 2012. Antiretroviral treatment of adult HIV infection: 2012 recommendations of the International Antiviral Society-USA panel. J Am Med Assoc 308:387–402. doi: 10.1001/jama.2012.7961. Von Wyl V, Kouyos RD, Yerly S, Böni J, Shah C, Bürgisser P, Klimkait T, et al. 2011. The Role of migration and domestic transmission in the spread of HIV-1 non-B subtypes in Switzerland. J Infect Dis 204:1095–103. doi: 10.1093/infdis/jir491. Von Wyl V, Yerly S, Bürgisser P, Klimkait T, Battegay M, Bernasconi E, Cavassini M, et al. 2009. Long-term trends of HIV type 1 drug resistance prevalence among antiretroviral treatment-experienced patients in Switzerland. Clin Infect Dis 48:979–87. doi: 10.1086/597352. Wain LV, Bailes E, Bibollet-Ruche F, Decker JM, Keele BF, Van Heuverswyn F, Li Y, et al. 2007. Adaptation of HIV-1 to its human host. Mol Biol Evol 24:1853–60. doi: 10.1093/molbev/msm110. Wright JK, Brumme ZL, Julg B, van der Stok M, Mncube Z, Gao X, Carlson JM, et al. 2012. Lack of association between HLA class II alleles and in vitro replication capacities of recombinant viruses encoding HIV-1 subtype C Gag-protease from chronically infected individuals. J Virol 86:1273–6. doi: 10.1128/JVI.06533-11.

Bartha et al. eLife 2013;2:e01123. DOI: 10.7554/eLife.01123 16 of 16

Chapter 9

Genetic variation in IL28B predicts hepatitis C treatment- induced viral clearance

Ge D*, Fellay J*, Thompson AJ, Simon JS, Shianna KV, Urban TJ, Heinzen EL, Qiu P, Bertelsen AH, Muir AJ, Sulkowski M, McHutchison JG, Goldstein DB. Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance. Nature. 2009 Sep 17;461(7262):399-401

Vol 461 | 17 September 2009 | doi:10.1038/nature08309 LETTERS

Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance

Dongliang Ge1, Jacques Fellay1, Alexander J. Thompson2, Jason S. Simon3, Kevin V. Shianna1, Thomas J. Urban1, Erin L. Heinzen1, Ping Qiu3, Arthur H. Bertelsen3, Andrew J. Muir2, Mark Sulkowski4, John G. McHutchison2 & David B. Goldstein1

Chronic infection with hepatitis C virus (HCV) affects 170 million European-American population sample showing overwhelming people worldwide and is the leading cause of cirrhosis in North genome-wide significance (P 5 1.06 3 10225). Combining the P America1. Although the recommended treatment for chronic values across the population groups, the variant shows association infection involves a 48-week course of peginterferon-a-2b at 1.37 3 10228. The polymorphism resides 3 kilobases (kb) up- (PegIFN-a-2b) or -a-2a (PegIFN-a-2a) combined with ribavirin stream of the IL28B gene (Fig. 2), encoding IFN-l-3. (RBV), it is well known that many patients will not be cured by In patients of European ancestry, the CC genotype is associated treatment, and that patients of European ancestry have a signifi- with a twofold (95% confidence interval 1.8–2.3) greater rate of SVR cantly higher probability of being cured than patients of African than the TT genotype (Fig. 1), with similar ratios in both the African- ancestry. In addition to limited efficacy, treatment is often poorly American (threefold, 95% confidence interval 1.9–4.7) and the tolerated because of side effects that prevent some patients from Hispanic (twofold, 95% confidence interval 1.4–3.2) population completing therapy. For these reasons, identification of the groups. The magnitude of this association is compared in Table 1 with determinants of response to treatment is a high priority. Here we other host or viral factors known to influence SVR in patients infected report that a genetic polymorphism near the IL28B gene, encoding with genotype 1 HCV, including baseline viral load, fibrosis and interferon-l-3 (IFN-l-3), is associated with an approximately ethnicity4,5. Not only does the IL28B polymorphism strongly influence twofold change in response to treatment, both among patients of response within each of the major population groups, it also appears European ancestry (P 5 1.06 3 10225) and African-Americans to explain much of the difference in response between different (P 5 2.06 3 1023). Because the genotype leading to better response population groups (European-Americans compared with African- is in substantially greater frequency in European than African Americans). We estimate that approximately half of the difference populations, this genetic polymorphism also explains approxi- in SVR between populations can be accounted for by the difference mately half of the difference in response rates between African- in frequency of the C allele between African-Americans and indivi- Americans and patients of European ancestry. duals of European ancestry (Supplementary Information XI). To identify human genetic contributions to anti-HCV treatment Interestingly, it has also been well documented that East Asians have response, we have performed a genome-wide association study of more than 1,600 individuals who were part of the IDEAL study2, and we included a further 67 patients from another prospective treat- 100 P –25 P –3 P –3 P –28 ment study3. Briefly, the IDEAL study compared the effectiveness of = 1.06 × 10 = 2.06 × 10 = 4.39 × 10 = 1.37 × 10 three treatment regimens involving PegIFN-a-2b or PegIFN-a-2a 80 combined with RBV. It demonstrated similar efficacy of the two n n n IFN preparations and a significantly lower efficacy in self-reported = 336 n = 30 = 26 = 392 60 African-Americans compared with Americans of European ancestry n = 35 (European-Americans). All patients included were treatment-naive n = 559

Americans who were chronically infected with genotype 1 HCV. SVR (%) 40 n = 14 n = 186 Patients received 48 weeks of treatment and 24 weeks of follow-up. n = 433 A total of 1,671 individuals were genotyped using the Illumina n = 91 Human610-quad BeadChip, and we then searched for determinants 20 n = 102 n of treatment response as a primary endpoint. We defined successful = 70 treatment response and non-response according to standard defini- 0 tions4, concentrating on sustained virological response (SVR), which is T/T T/C C/C T/T T/C C/C T/T T/C C/C T/T T/C C/C the absence of detectable virus at the end of follow-up evaluation (Supplementary Information I). We included 1,137 patients who European- African-Americans Hispanics Combined satisfied stringent compliance criteria (Supplementary Information I) Americans rs12979860 in the analyses of treatment response, and 1,475 patients in a separate SVR (%) Non-SVR (%) analysis of baseline viral load. We found that a polymorphism on chromosome 19, rs12979860, Figure 1 | Percentage of SVR by genotypes of rs12979860. Data are is strongly associated with SVR in all patient groups (Fig. 1), with the percentages 1 s.e.m.

1Institute for Genome Sciences & Policy, Center for Human Genome Variation, Duke University, Durham, North Carolina 27708, USA. 2Duke Clinical Research Institute and Division of Gastroenterology, School of Medicine, Duke University, Durham, North Carolina 27705, USA. 3Schering-Plough Research Institute, Kenilworth, New Jersey 07033, USA. 4Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA. 399 LETTERS NATURE | Vol 461 | 17 September 2009

2 4 6 8 10 12 14 16 18 20 Y 1 3 5 7 9 11 13 15 17 19 21 X M East Asians 30.0 rs12979860 70 ( P ) P = 1.37 × 10–28

10 15.0 –log 0.0 European-Americans Chromosome 19 ideogram 50 0 Mb 10 Mb 20 Mb 40 Mb 50 Mb 60 Mb Hispanics SVR (%)

39,623 kb 39,666 kb 39,708 kb 39,750 kb 39,793 kb 39,835 kb 30 30.0 African-Americans ( P )

10 15.0

–log 10 0.0 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 NCCRP1 SYCN IL28B AC01 IL28A IL29 LRFN1 GMFG PAK4 rs12979860 C-allele frequency Figure 3 | Rate of SVR and rs12979860 C-allele frequency in diverse ethnic groups. The SVR rate in East Asians is adopted from Liu et al.7. Sample sizes 39,711 kb 39,721 kb 39,732 kb 39,743 kb 39,753 kb 39,764 kb 30.0 for C-allele frequency: n 5 61 (African-Americans); n 5 271 (European- Americans); n 5 16 (Hispanics); n 5 107 (East Asians); sample sizes for SVR ( P )

10 15.0 rate: n 5 191 (African-Americans); n 5 871 (European-Americans); n 5 75

–log (Hispanics); n 5 154 (East Asians). 0.0 IL28B AC011445.6 IL28A units (IU) ml21). Although this finding is counter-intuitive in that lower baseline viral loads predict a better response to treatment, it could relate to recent speculation about the role of IFN-stimulated Figure 2 | Genomic overview of the region of 19q13.13 surrounding the genes in modulating response to PegIFN9, and it seems plausible genome-wide significant determinant of response to treatment and that the IL28B polymorphism has a role in the regulation of intra- including the IL28B gene. The top panel shows a genome-wide view of the hepatic IFN-stimulated gene expression with consequences both for P values [2log10(P)]. Panels below show all genotyped SNPs in the region of viral load and treatment response (Supplementary Information significance and the structures of the surrounding genes. The SNPs that XIII). We also note that the polymorphism has no association with show genome-wide significant association with SVR are marked in red. The whether individuals are classified as having baseline viral loads above polymorphism rs12979860 (red arrow) is 3 kb upstream to the gene encoding IFN-l-3 (IL28B, blue arrow). Other SNPs in the same region or below a commonly used threshold that predicts respectively worse showing genome-wide significant P values largely reflect the same signal or better treatment response (Supplementary Information XIII), (Supplementary Information IX). The results were annotated using the indicating that the association of the polymorphism with clearance WGAViewer software19. and viral load may be independent. In addition, we note that the C-allele frequency was significantly reduced in the chronically higher SVR rates than patients of European ancestry6,7. By looking at a infected cohort compared with ethnically matched controls (0.63 26 random multi-ethnic population sample with unknown hepatitis C versus 0.73, controlled for population structure, P , 2.5 3 10 , status (Supplementary Information II), we observed a substantially Supplementary Information II and XVI), which suggests an asso- higher frequency of the C allele in East Asians (Fig. 3). Collectively, the ciation between the C allele and a higher rate of natural clearance SVR rates across different population groups displayed a striking of hepatitis C. We note, however, that determination of the precise concordance with C-allele frequency (Fig. 3). Finally, it is also note- effect of the C allele on clearance will require comparison between worthy that African-Americans with the CC genotype have a signifi- matched cohorts known to have and have not naturally cleared this cantly higher rate of response (53.3%) than individuals of European viral infection. ancestry who have the TT genotype (33.3%, P , 0.05), which empha- We sequenced the IL28B gene in 96 individuals, and found two sizes the greater importance of individual genotype compared with variants highly associated with rs12979860 (r2 . 0.85 for all compar- ethnicity in predicting treatment response8. isons in all populations): a G . C transition 37 base pairs (bp) We next tested whether this variant influences baseline (pre- upstream of the translation initiation codon (rs28416813), and a treatment) viral load and found a significant association in all groups non-synonymous coding single nucleotide polymorphism (SNP) (Supplementary Information XIII). Interestingly, the C allele, asso- (rs8103142) encoding the amino-acid substitution Lys70Arg. These ciated with better treatment response, is also associated with higher new variants were then genotyped in the full cohort. Owing to the baseline viral load (CC 6.35, n 5 485; TC 6.33, n 5 744; TT 6.16, high degree of correlation among the three SNPs, tests for independ- 210 n 5 246; P 5 1.21 3 10 ; viral loads given as log10 international ence among these variants, using all available patients, were not able

Table 1 | Comparison between the genetic and conventional clinical factors associating with SVR Odds ratio (95% confidence interval)

European-Americans African-Americans Hispanics IL28B rs12979860 genotype CC (versus CT and TT)* 7.3 (5.1–10.4) 6.1 (2.3–15.9) 5.6 (1.4–22.1) Baseline viral load (,600,000 IU ml21 versus $600,000 IU mL21){ 4.2 (2.6–6.6) 5.1 (1.9–13.9) 2.4 (0.7–8.8) Baseline fibrosis (METAVIR F0-2 versus F3-4){ 3.0 (1.8–5.1) 1.1 (0.3–5.2) 4.1 (0.7–25.5) Ethnicity (European-Americans/African-Americans) 3.1 (2.1–4.7) Odds ratios and 95% confidence intervals are generated from the logistic regression model. * Corresponding relative risks for rs12979860: 2.0 (95% confidence interval 1.8–2.3) in European-Americans; 3.0 (95% confidence interval 1.9–4.7) in African-Americans; 2.1 (95% confidence interval 1.4–3.2) in Hispanics. { In clinical practice it is customary to divide patients into high and low viral-load groups, reflecting a well-described threshold effect. The IDEAL trial used a threshold of 600,000 (ref. 2). { Fibrosis was scored by METAVIR stage on a baseline centrally evaluated liver biopsy2,18. 400 NATURE | Vol 461 | 17 September 2009 LETTERS to resolve which, if any, of these sites is uniquely responsible for the associations resulting from population stratification, we used a modified association with SVR. Additional HCV-infected cohorts may help to EIGENSTRAT17 method and corrected for population ancestry within each group. 28 determine whether one of these SNPs, or any other SNP in the region, We assessed significance with a Bonferroni correction (Pcutoff 5 4.4 3 10 ;see is causal for the association, as the pattern of association suggests the Supplementary Information VIII for details). possibility of more than one functional variant in the region Received 27 May; accepted 17 July 2009. (Supplementary Information IX). Ultimately, identification and elu- Published online 16 August; corrected 17 September 2009 (see full-text HTML version cidation of the functional SNPs will depend on in-depth functional for details). studies. 1. World Health Organization. Hepatitis C Æhttp://www.who.int/mediacentre/ Given the significant effect of the IL28B polymorphism on treatment factsheets/fs164/en/æ (2009). response, and its likely clinical relevance, it was considered important 2. McHutchison, J. G. et al. Peginterferon alfa-2b or alfa-2a with ribavirin for to compare the magnitude of different predictors of response for the treatment of hepatitis C infection. N. Engl. J. Med. 361, 580–593 (2009). 3. Muir, A. J., Bornstein, J. D. & Killenberg, P. G. Peginterferon alfa-2b and ribavirin patients studied here. We developed a logistic regression model that for the treatment of chronic hepatitis C in blacks and non-Hispanic whites. N. Engl. related clinical predictors to response rates (Supplementary J. Med. 350, 2265–2271 (2004). Information XI). We noted that the regression model showed that 4. Ghany, M. G., Strader, D. B., Thomas, D. L. & Seeff, L. B. Diagnosis, management, the CC genotype is associated with a more substantial difference in and treatment of hepatitis C: an update. Hepatology 49, 1335–1374 (2009). rate of response than other known baseline predictors included in the 5. Shiffman, M. L. et al. Peginterferon alfa-2a and ribavirin in patients with chronic hepatitis C who have failed prior treatment. Gastroenterology 126, 1015–1023; model. discussion 947 (2004). It seems likely therefore that advance knowledge of host genotype 6. Yan, K. K. et al. Treatment responses in Asians and Caucasians with chronic of patients infected with HCV could in the future become an import- hepatitis C infection. World J. Gastroenterol. 14, 3416–3420 (2008). ant component of the clinical decision to initiate treatment with 7. Liu, C. H. et al. Pegylated interferon-alpha-2a plus ribavirin for treatment-naive PegIFN and RBV. Many important clinical questions remain. The Asian patients with hepatitis C virus genotype 1 infection: a multicenter, randomized controlled trial. Clin. Infect. Dis. 47, 1260–1269 (2008). current data are specific to patients with genotype 1 infection. It will 8. Wilson, J. F. et al. Population genetic structure of variable drug response. Nature therefore be necessary to evaluate the role of host IL28B genotype and Genet. 29, 265–269 (2001). treatment response in other less common HCV genotypes. Novel 9. Sarasin-Filipowicz, M. et al. Interferon signaling and treatment outcome in chronic small molecules, including HCV protease inhibitors, are currently hepatitis C. Proc. Natl Acad. Sci. USA 105, 7034–7039 (2008). being developed and may soon be used in combination with 10. McHutchison, J. G. et al. Telaprevir with peginterferon and ribavirin for chronic 10 HCV genotype 1 infection. N. Engl. J. Med. 360, 1827–1838 (2009). PegIFN and RBV for the treatment of genotype 1 HCV ; the role 11. Robek, M. D., Boyd, B. S. & Chisari, F. V. Lambda interferon inhibits hepatitis B and of the IL28B genotype in these novel treatment regimens should C virus replication. J. Virol. 79, 3851–3854 (2005). therefore be investigated. 12. Shiffman, M. L. et al. PEG-IFN-l: antiviral activity and safety profile in a 4-week In conclusion, we have identified a polymorphism 3 kb upstream phase 1b study in relapsed genotype 1 hepatitis C infection. J. Hepatol. 50 (suppl. of IL28B that is significantly associated with response to PegIFN and 1), abstr. A643 s237 (2009). 13. Kotenko, S. V. et al. IFN-ls mediate antiviral protection through a distinct class II RBV for patients with chronic genotype 1 HCV infection. The poly- cytokine receptor complex. Nature Immunol. 4, 69–77 (2003). morphism explains much of the difference in response between 14. Sheppard, P. et al. IL-28, IL-29 and their class II cytokine receptor IL-28R. Nature European-American and African-American patients. Given that the Immunol. 4, 63–68 (2003). polymorphism appears to associate with natural clearance as well as 15. Purcell, S. et al. PLINK: a toolset for whole-genome association and population- treatment response, it seems likely that the gene product is involved based linkage analysis. Am. J. Hum. Genet. 81, 559–575 (2007). 16. Whitlock, M. C. Combining probability from independent tests: the weighted in the innate control of HCV. Indeed, IFN-ls have demonstrated Z-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005). 11 12 antiviral activity against genotype 1 HCV in vitro and in vivo . 17. Price, A. L. et al. Principal components analysis corrects for stratification in The IFN-l proteins, encoded by the IL28A/B and IL29 genes, were genome-wide association studies. Nature Genet. 38, 904–909 (2006). first described in 2003 (refs 13, 14). These IFNs signal through a 18. The French METAVIR Cooperative Study Group. Intraobserver and interobserver variations in liver biopsy interpretation in patients with chronic hepatitis C. unique receptor but appear to share a common downstream signal- Hepatology 20, 15–20 (1994). ling system with the type 1 IFNs, including IFN-a. These findings, 19. Ge, D. et al. WGAViewer: software for genomic annotation of whole genome and further study of the functional mechanism underlying the IL28B- association studies. Genome Res. 18, 640–643 (2008). response association, may help identify patients for whom therapy is Supplementary Information is linked to the online version of the paper at likely to be successful, and highlight the IFN-l signalling axis as a www.nature.com/nature. potential target for novel antiviral drug development. Acknowledgements We are indebted to the IDEAL principal investigators, the study coordinators, nurses and patients involved in the study. We also recognize METHODS SUMMARY E. Gustafson, P. Savino, D. Devlin, S. Noviello, M. Geffner, J. Albrecht and A. C. Need Our primary association tests involved single-marker genotype trend tests for their contributions to the study. This study was funded by Schering-Plough performed in three independent groups (European-Americans, n 5 871; Research Institute, Kenilworth, New Jersey. A.J.T. received funding support from African-Americans, n 5 191; Hispanics, n 5 75; Supplementary Information the National Health and Medical Research Council of Australia and the I), using logistic regression models for treatment response and linear regression Gastroenterological Society of Australia. for baseline viral load (Supplementary Information VI). Association tests were Author Contributions D.G., J.F., A.J.T. and J.S.S. contributed equally to this work. implemented in the PLINK software15, correcting for several clinical covariates, D.G., J.F. and A.J.T. performed the statistical and bioinformatical analyses. J.F. and including baseline (pre-treatment) HCV viral load and severity of fibrosis. Then A.J.T. defined the clinical phenotypes. K.V.S. performed the genotyping. D.B.G. and the association signals (P values) were combined using Stouffer’s weighted D.G. drafted the manuscript. J.S.S., J.G.M. and D.B.G. designed the study. All Z-method16, correctly taking into account sample sizes, effect sizes and effect authors collected and analysed data and contributed to preparing the manuscript. directions in each population. This combined P value was then reported as the Author Information Reprints and permissions information is available at main result, along with the P values in each ethnic group. A series of quality- www.nature.com/reprints. The authors declare competing financial interests: control steps resulted in 565,759 polymorphisms for the association tests. We details accompany the full-text HTML version of the paper at www.nature.com/ applied methods to assess copy number variants and tested the relation between nature. Correspondence and requests for materials should be addressed to D.B.G. copy number variants and SVR. To control for the possibility of spurious ([email protected]).

401

Chapter 10

ITPA gene variants protect against anemia in patients treated for chronic hepatitis C

Fellay J, Thompson AJ, Ge D, Gumbs CE, Urban TJ, Shianna KV, Little LD, Qiu P, Bertelsen AH, Watson M, Warner A, Muir AJ, Brass C, Albrecht J, Sulkowski M, McHutchison JG, Goldstein DB. ITPA gene variants protect against anaemia in patients treated for chronic hepatitis C. Nature. 2010 Mar 18;464(7287):405-8

Vol 464 | 18 March 2010 | doi:10.1038/nature08825 LETTERS

ITPA gene variants protect against anaemia in patients treated for chronic hepatitis C

Jacques Fellay1*, Alexander J. Thompson2*, Dongliang Ge1*, Curtis E. Gumbs1, Thomas J. Urban1, Kevin V. Shianna1, Latasha D. Little1, Ping Qiu3, Arthur H. Bertelsen3, Mark Watson3, Amelia Warner3, Andrew J. Muir2, Clifford Brass3, Janice Albrecht3, Mark Sulkowski4, John G. McHutchison2 & David B. Goldstein1

Chronic infection with the hepatitis C virus (HCV) affects The SNPs showing a genome-wide significant association with 170 million people worldwide and is an important cause of liver- quantitative week-4 Hb reduction were spread over a 250 kilobase related morbidity and mortality1. The standard of care therapy (kb) region that contains five different protein-coding genes (Fig. 1). combines pegylated interferon (pegIFN) alpha and ribavirin We tested the independence of the top association signals in the (RBV), and is associated with a range of treatment-limiting adverse European-American population using nested linear regression mod- effects2. One of the most important of these is RBV-induced els, in which individual SNPs were added after inclusion of haemolytic anaemia, which affects most patients and is severe rs6051702, and found evidence for multiple independent signals of enough to require dose modification in up to 15% of patients. association: the most strongly associated SNPs after accounting for Here we show that genetic variants leading to inosine triphospha- the contribution of rs6051702 have P values of 1.4 3 1029 tase deficiency, a condition not thought to be clinically important, (rs2295547) and 2.3 3 1024 (rs6051855). The persistence of strongly protect against haemolytic anaemia in hepatitis-C-infected patients significant association in the region after accounting for the discovery receiving RBV. variant suggests the possibility of multiple causal variants and/or Using DNA from consenting participants of the IDEAL study2,we rarer causal variants9. performed a genome-wide association study (GWAS) of determi- To identify candidate causal sites we searched the region for known nants of treatment-related anaemia in individuals with chronic geno- functional variants. We focused first on the inosine triphosphatase type 1 hepatitis C. A total of 1,602 DNA samples were genotyped in the context of a previously reported study of anti-HCV treatment Table 1 | Clinical characteristics of the study population 3 response . Following quality control steps (Supplementary Informa- Population tion I) 1,286 individuals, who were classified into three ethnic groups European- African- Hispanics (988 European-Americans, 198 African-Americans, 100 Hispanics), Americans Americans were available for analyses (Table 1). Our primary analysis focused on n 988 198 100 the quantitative change in haemoglobin (Hb) levels from baseline to Sex (F/M) 378/610 78/120 36/64 the fourth week of treatment—historically the point at which many Age (yrs) 47.3 (7.4) 49.7 (6.6) 44.8 (9.3) 2 patients begin erythropoietin treatment to stimulate red blood cell BMI (kg m2 ) 27.9 (4.5) 29.7 (5.0) 29.3 (5.4) Baseline weight (kg) 83.3 (16.1) 88.7 (14.3) 83.0 (16.7) production. We tested each of 565,759 single nucleotide polymor- Baseline liver fibrosis stage* (n,%) phisms (SNPs) passing quality control measures in a linear regression Minimal (F0–2) 876 (88.7%) 182 (91.9%) 86 (86.0%) model incorporating significant clinical covariates including baseline Advanced (F3–4) 112 (11.3%) 16 (8.1%) 14 (14.0%) Hb levels. Initial daily RBV dose{ (n,%) 800 mg 88 (8.9%) 4 (2.0%) 6 (6.0%) Several SNPs on chromosome 20 (20p13 region) were found to be 1,000 mg 377 (38.2%) 63 (31.8%) 41 (41.0%) strongly associated with treatment-induced reduction in Hb at week 1,200 mg 460 (46.6%) 117 (59.1%) 45 (45.0%) 4 (Fig. 1), with the European-American sample showing overwhelm- 1,400 mg 63 (6.4%) 14 (7.1%) 8 (8.0%) ing genome-wide significance (P 5 1.1 3 10245) for an association Peg-interferon treatment{ PegIFNa-2a 330 (33.4%) 66 (33.3%) 31 (31.0%) between quantitative Hb reduction and the SNP rs6051702. Asso- PegIFNa-2b 1.0 333 (33.7%) 69 (34.9%) 32 (32.0%) ciations with smaller effect sizes but in the same direction were PegIFNa-2b 1.5 325 (32.9%) 63 (31.8%) 37 (37.0%) observed in both the African-American and Hispanic samples Baseline Hb value (g dl21) 15.1 (1.2) 14.6 (1.2) 15.2 (1.3) Hb reduction at week 4, mean (g dl21) 2.9 (1.5) 2.4 (1.3) 3.2 (1.5) (Table 2). 1 Severe anaemia, Hb ,10 gdl2 at 89 (9.0%) 18 (9.1%) 11 (11.0%) Further association signals were detected in the hexokinase 1 gene week 4 (n,%) 5 3 27 (HK1) on chromosome 10 (P 5.3 10 for the intronic SNP Data shown are mean with s.d. in parentheses unless otherwise indicated. BMI, body mass rs10159477 in European-Americans; Supplementary Information index. II). This result is not genome-wide significant, but supported by * Liver biopsies were evaluated for METAVIR fibrosis staging. The METAVIR scoring system classifies fibrosis on a 5-point scale: F0, no fibrosis; F1, portal fibrosis without septa; F2, few other lines of evidence: rare HK1 mutations cause severe haemolytic septa; F3, numerous bridging septa without cirrhosis; F4, cirrhosis. anaemia in both humans4–6 and mice7; in a recent GWAS, HK1 SNPs { Initial daily RBV dose was weight-based on a sliding scale in subjects’ baseline weight, from 40–125 kg. associated with differences in Hb concentration and haematocrit in { PegIFNa-2b 1.0 and PegIFNa-2b 1.5 refer to 1.0 mgkg21 week21 and 1.5 mgkg21 week21 Europeans8. interferon, respectively.

1Institute for Genome Sciences & Policy, Center for Human Genome Variation, Duke University, Durham, North Carolina 27708, USA. 2Duke Clinical Research Institute and Division of Gastroenterology, School of Medicine, Division of Gastroenterology, Duke University, Durham, North Carolina 27705, USA. 3Schering-Plough Research Institute, Kenilworth, New Jersey 07033, USA. 4Johns Hopkins University School of Medicine, Baltimore, Maryland 21205, USA. *These authors contributed equally to this work. 405 LETTERS NATURE | Vol 464 | 18 March 2010

2 4 6 8 10 12 14 16 18 20 22 Y 13579111315171921XM 100% 50 90% ( P )

10 25 80% Wild type both variants

–log 0 70% Heterozygous rs7270101 60% Chromosome 20 ideogram Heterozygous rs1127354 50% Combined heterozygous 0 Mb 10 Mb 20 Mb 30 Mb 40 Mb 50 Mb 60 Mb Homozygous rs7270101 40% Homozygous rs1127354 3,090 kb 3,133 kb 3,176 kb 3,219 kb 3,262 kb 3,305 kb 30% 50 20% ( P )

10 25 10% 0% –log 0 CC AC AA AL121891.1 RP5 DDRGK1ITPA SLC4A11 C20orf194 rs6051702 genotype UBOX5 Figure 2 | Two ITPA polymorphisms known to be responsible for inosine FASTKD5 triphosphatase deficiency co-segregate with the rs6051702 C allele that strongly associates with protection against Hb reduction in European- Americans. In the European-American population included in the study, the low-activity ITPA variants rs1127354 (missense P32T) and rs7270101 (intronic, splicing-altering) are found almost exclusively on chromosomes that also carry the rs6051702 minor allele C (Supplementary Table 4). The graph shows, for each rs6051702 genotype, the percentages of individuals r2 that are either heterozygous or homozygous for at least one ITPA variant. 0.0 2 0.5 region, the highest r is 0.65, which is what is observed for our dis- covery variant (Supplementary Table 2). 1.0 This observation suggested the possibility that these two known functional ITPA variants are responsible for part or all of the observed association and confer protection against anaemia. To test this possibility, we first sequenced the entire coding region of the gene Figure 1 | Genomic overview of the 20q13 region including the genome- in a subset of 168 samples, and found no other obvious reduced wide significant associated variants and the ITPA gene. Indicated are the function mutations (Supplementary Table 3). We then genotyped P values [2log10(P)] of all genotyped SNPs in the region and the structures the known functional SNPs rs1127354 and rs7270101 in our entire of the surrounding genes. The SNPs that show genome-wide significant cohort. We observed that in European-Americans, the two func- association with quantitative reduction in haemoglobin levels are marked in tional ITPA variants were found almost exclusively on chromosomes 27 red. The results were annotated using the WGAViewer software . that also carry the rs6051702 minor allele C (Fig. 2 and Supplemen- tary Table 4). Moreover, when these two variants were incorporated (ITPA) gene, which encodes a protein that hydrolyses inosine tripho- into a regression model, they entirely explained the association signal sphate (ITP). Several gene mutations have been described that lead to initially identified (Supplementary Table 5). Finally, in European- ITPA deficiency, a benign red cell enzymopathy characterized by the Americans, each functional variant was strongly independently asso- accumulation of ITP in erythrocytes and increased toxicity of purine ciated with protection (Table 3). A similar picture was observed in analogue drugs10,11. In particular, reduced ITPA activity has been African-Americans and in Hispanics: the SNP that has the strongest documented for a missense variant in exon 2 (rs1127354, resulting association with treatment-induced Hb reduction in the region was in a proline-to-threonine substitution denoted P32T) and a splicing- different between populations (Table 2), but the signal is explained altering SNP located in the second intron (rs7270101)12–15. Several in each case by the associations between the most significant SNP studies have documented and replicated the functional effects of and the ITPA low-activity variants. When the evidence for each these polymorphisms, showing that both minor alleles independently functional SNP is combined across the populations, we find that reduce ITPA activity and that homozygosity for the P32T mutation the P values strengthen considerably to 1.7 3 10258 for rs1127354 results in non-detectable ITPA activity and strong accumulation of and 5.9 3 10226 for rs7270101 (Table 3). Therefore, several lines of ITP in red blood cells (Supplementary Table 1)12,16–18. Both SNPs can evidence confirm that the two known functional variants confer- therefore be considered validated functional variants. ring reduced ITPA activity are responsible for the protection against Using HapMap data from CEU parents (Utah residents with anaemia identified in the original GWAS. ancestry from northern and western Europe)19 we found both of To assess the affect of both functional alleles on anaemia, we con- these polymorphisms to be associated with rs6051702, with the sidered the combined low-activity allele described earlier in an addit- low-activity variants preferentially associating with the protective ive regression model. We found that the combined allele has an rs6501702 C allele. We can directly test whether the association signal association of P 5 2.2 3 10291 and that the resulting difference in reflects the functional alleles by considering them equivalent and ITPA function explains 28.7% of the variability in quantitative Hb defining a new allele as a ‘low-activity allele’ made up of either func- reduction in the European-American sample, 19.1% in African- tional variant (Supplementary Information III). We tested this allele Americans and 23.2% in Hispanics. for association with surrounding polymorphisms and found that of We made a direct assessment of the clinical relevance of these all the HapMap SNPs located in the surrounding 1 megabase (Mb) variants by considering the proportion of patients suffering clinically

Table 2 | GWAS of quantitative Hb reduction at week 4 SNP Sample in which the SNP shows the P value in European-Americans P value in African-American P value in Hispanics strongest association (n 5 988) (n 5 198) (n 5 100) rs6051702 European-Americans 1.1 3 10245 1.9 3 1021 9.5 3 1023 rs3810560 African-Americans 2.6 3 1022 1.2 3 1024 3.0 3 1021 rs11697114 Hispanics 2.0 3 1026 2.8 3 1022 2.1 3 1024 The 20p13 region SNPs showing the strongest association in each of the three study populations are reported, together with their P values in the different ethnic groups. 406 NATURE | Vol 464 | 18 March 2010 LETTERS

Table 3 | ITPA variants protect against Hb reduction ITP competes with RBV-TP in whatever cellular processes are affected ITPA variant Population MAF (%) P value Independent by RBV-TP, and thereby protects cells from the lytic effects of RBV- P value TP. An alternative explanation is that ITPA activity and/or ITP levels rs1127354 European-Americans 7.64.6 3 10252 2.3 3 10268 directly or indirectly influence RBV pharmacokinetics. We did not, African-Americans 4.62.7 3 1027 5.1 3 1027 however, observe any significant association between the ITPA- 23 25 Hispanics 4.01.2 3 10 5.6 3 10 deficiency variants and early or late anti-HCV treatment outcomes All (combined) 6.91.7 3 10258 5.9 3 10226 rs7270101 European-Americans 12.36.8 3 10222 3.6 3 10238 (Supplementary Information IV), which suggests that RBV pharma- African-Americans 7.93.0 3 1025 6.6 3 1025 cokinetics is not significantly altered in hepatic cells. We also evaluated Hispanics 8.03.8 3 1024 1.9 3 1025 global differences in the frequencies of the ITPA variants. We found 276 243 All (combined) 11.28.5 3 10 2.6 3 10 that although each of the two functional variants showed strong geo- The two ITPA functional SNPs rs1127354 and rs7270101 show strong independent association graphic differences in frequency (Supplementary Table 6), these are with protection against treatment-induced Hb reduction. Independent P values were calculated in models in which the other functional variant was already included. Combined P values for not correlated, and the predicted overall ITPA activity levels do not all three populations were obtained using the weighted Z-method26. MAF, minor allele seem to show important variation. This suggests that these variants do frequency. not create strong population-level differences in susceptibility to anaemia, as was observed for the effect of IL28B variation on treatment significant anaemia, which we defined as either a decline in Hb of response3. .3gdl21 or Hb levels ,10 g dl21, which is the threshold at which Two related features of these observations are worth emphasizing. RBV dose reduction is recommended. As depicted in Fig. 3, we First, the ITPA variants constitute a clear example of a synthetic asso- modelled this using both the individual genotypes and the degree ciation in which the effects of rarer functional variants are observed as of ITPA deficiency, predicted from the residual enzymatic activity an association for a more common variant present on a whole- measured in the presence of the two functional variants12,16–18.We genome genotyping chip: indeed, the minor-allele frequency is higher found that of 184 patients predicted to have deficient ITPA function for the top-associated SNP rs6051702 (19.4%) than for the causal corresponding to less than one-third of the normal enzymatic activity, variants rs1127354 (7.6%) and rs7270101 (12.3%) in European- none was observed to have Hb levels ,10 g dl21, and only seven Americans. Second, ITPA deficiency seems to behave as a classical (3.8%) had a decline in Hb of .3gdl21 at week 4. On the other hand, pharmacogenetic trait as first described by Motulsky22, in which of the 863 patients with predicted normal ITPA function, 11.7% had genetic variation that is otherwise innocuous confers a strong drug Hb levels ,10 g dl21 and 55.9% had a decline in Hb of .3gdl21. response phenotype. It is interesting to note that, in contrast with The mechanism of RBV-induced haemolytic anaemia remains many common diseases23, relatively common genetic variants have poorly understood20, but clearly involves an accumulation of active been identified by GWAS that explain very significant proportions forms of RBV in red blood cells, including the triphosphate form of the population variation in drug responses, including the ITPA (RBV-TP). Similarly, it has been well-documented that inosine tri- variants described here and the IL28B variants reported recently3.It phosphatase deficiency leads to an accumulation of ITP in red blood is possible that this difference stems in part from the fact that drug cells16,21. A simple model to explain these observations would be that responses represent new environmental challenges, and that variants with a strong effect would not necessarily be purged by selection, as Hb decline >3 g dl–1 Hb level <10 g dl–1 would be the case for variants affecting relatively early-onset diseases. 60 60 We note, however, that like many common diseases, drug responses 50 47.6 50 48.4 are also likely to be heavily influenced by variants that are too rare for effective representation in GWAS23. 40 40 30 30 Finally, because ITPA deficiency seems to be a benign condition, it 18.7 20 20 may be possible to protect against RBV-induced anaemia by phar- Percentage 9.3* Percentage 10* macological intervention against ITPA. The identification of inosine 10 4.5 10 0 0 0.8 0 0 triphosphatase deficiency as a major projective factor against RBV- AA CA CC CC AC AA induced haemolytic anaemia therefore not only provides a valuable n n n n n n = 10 = 156 = 1,114 = 18 = 252 = 1,010 pharmacogenetic diagnostic, but also a window into red blood cell rs1127354 rs7270101 biology and the processes governing lysis. 60 55.9 METHODS SUMMARY 50 All participants were included in the IDEAL study2, which compared the effec- 40 tiveness of three anti-HCV treatment regimens including RBV. The genotype 30 data used was described previously (Supplementary Information I)3. 20.4 20 Haemoglobin values were measured at baseline, at weeks 2, 4, 8 and 12, and Percentage 11.7* then every 6 weeks up to treatment completion (48 weeks). Follow-up measure- 10 4.6 0 0.9 ments were obtained 4, 12 and 24 weeks after treatment. We considered Hb 0 +++ ++ + None decline at week 4 to be a clinically important time point at which significant n = 31 n = 153 n = 230 n = 863 anaemia had occurred but no growth factor therapy had been instituted. We excluded patients who were ,80% adherent to pegIFN or RBV up to week 4 Predicted ITPA deficiency (n 5 95), and patients with missing week-4 Hb data (n 5 21). Three phenotypes Figure 3 | ITPA deficiency protects against clinically significant decline in were considered: (1) absolute reduction in Hb, (2) a reduction in Hb of Hb concentration induced by HCV anti-viral treatment. Percentages of .3gdl21, and (3) a reduction in Hb levels to ,10 g dl21. The threshold of Hb treated subjects with Hb decline of .3gdl21 (blue) or Hb concentrations reduction of .3gdl21 was chosen as a clinically significant Hb decline; the ,10 g dl21 (red) at week 4 are shown for the two ITPA low function variants threshold of Hb ,10 g dl21 was chosen because it is the level at which RBV dose rs1127354 and rs7270101 (top), and for different degrees of predicted ITPA reduction is recommended according to package insert. deficiency (bottom). Severity of ITPA deficiency was estimated from refs 12, Our association tests involved single-marker genotype trend tests of asso- 16–18: compared to wild type, ITPA activity decreased to 60% with ciation between each SNP and the phenotypes, using linear and logistic regres- rs7270101 heterozygosity (1); to 30% with rs1127354 heterozygosity or sion models implemented in PLINK24. Covariates included age, gender, weight, rs7270101 homozygosity (11); and to very low residual activity with fibrosis severity on pretreatment liver biopsy, baseline Hb level, RBV dose, and combined heterozygosity or rs1127354 homozygosity (111). Asterisk type/dose of pegIFN. To control for the possibility of spurious associations denotes that six subjects had Hb concentrations ,10 g dl21 at week 4 but Hb resulting from population stratification, we used a modified EIGENSTRAT decline of ,3gdl21. method25 (Supplementary Information I): we defined three ethnic groups, in 407 LETTERS NATURE | Vol 464 | 18 March 2010

which separate analyses were run, and corrected for population ancestry within 17. Atanasova, S. et al. Analysis of ITPA phenotype-genotype correlation in the each group. Combined P values for all three populations were obtained using the Bulgarian population revealed a novel gene variant in exon 6. Ther. Drug Monit. 29, Stouffer’s weight Z-method17. We assessed significance with Bonferroni correc- 6–10 (2007). tion using the total number of tests (565,759) as the denominator for the cal- 18. Maeda, T. et al. Genetic basis of inosine triphosphate pyrophosphohydrolase deficiency in the Japanese population. Mol. Genet. Metab. 85, 271–279 (2005). culation (P cutoff 5 8.8 3 1028). 19. The International HapMap Consortium. A haplotype map of the human genome. Received 13 November 2009; accepted 11 January 2010. Nature 437, 1299–1320 (2005). Published online 21 February 2010. 20. Russmann, S., Grattagliano, I., Portincasa, P., Palmieri, V. O. & Palasciano, G. Ribavirin-induced anemia: mechanisms, risk factors and related targets for future 1. Lavanchy, D. The global burden of hepatitis. Liver Int. 29, 74–81 (2009). research. Curr. Med. Chem. 13, 3351–3357 (2006). 2. McHutchison, J. G. et al. Peginterferon a-2b or a-2a with ribavirin for treatment of 21. Fraser, J. H., Meyers, H., Henderson, J. F., Brox, L. W. & McCoy, E. E. Individual hepatitis C infection. N. Engl. J. Med. 361, 580–593 (2009). variation in inosine triphosphate accumulation in human erythrocytes. Clin. 3. Ge, D. et al. Genetic variation in IL28B predicts hepatitis C treatment-induced viral Biochem. 8, 353–364 (1975). clearance. Nature 461, 399–401 (2009). 22. Motulsky, A. G. Drug reactions enzymes, and biochemical genetics. J. Am. Med. 4. van Wijk, R. et al. HK Utrecht: missense mutation in the active site of human Assoc. 165, 835–837 (1957). hexokinase associated with hexokinase deficiency and severe nonspherocytic 23. Goldstein, D. B. Common genetic variation and human traits. N. Engl. J. Med. 360, hemolytic anemia. Blood 101, 345–347 (2003). 1696–1698 (2009). 5. de Vooght, K. M. K., van Solinge, W. W., van Wesel, A. C., Kersting, S. & van Wijk, 24. Purcell, S. et al. PLINK: a tool set for whole-genome association and population- R. First mutation in the red blood cell-specific promoter of hexokinase combined based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007). with a novel missense mutation causes hexokinase deficiency and mild chronic 25. Price, A. L. et al. Principal components analysis corrects for stratification in hemolysis. Haematologica 94, 1203–1210 (2009). genome-wide association studies. Nature Genet. 38, 904–909 (2006). 6. Bonnefond, A. et al. A genetic variant in HK1 is associated with pro-anemic state 26. Whitlock, M. C. Combining probability from independent tests: the weighted and HbA1c but not other glycemic control related traits. Diabetes 58, 2687–2697 Z-method is superior to Fisher’s approach. J. Evol. Biol. 18, 1368–1373 (2005). (2009). 27. Ge, D. et al. WGAViewer: software for genomic annotation of whole genome 7. Peters, L. L. et al. Downeast anemia (dea), a new mouse model of severe association studies. Genome Res. 18, 640–643 (2008). nonspherocytic hemolytic anemia caused by hexokinase (HKI) deficiency. Blood Cells Mol. Dis. 27, 850–860 (2001). Supplementary Information is linked to the online version of the paper at 8. Ganesh, S. K. et al. Multiple loci influence erythrocyte phenotypes in the CHARGE www.nature.com/nature. Consortium. Nature Genet. 41, 1191–1198 (2009). 9. Dickson, S. P., Wang, K., Kranz, I., Hakonarson, H. & Goldstein, D. B. Rare variants Acknowledgements We are indebted to the IDEAL principal investigators, the create synthetic genome-wide associations. PLoS Biol. 8, e1000294 (2010). study coordinators, nurses and patients involved in the study. We also thank 10. Bierau, J., Lindhout, M. & Bakker, J. A. Pharmacogenetic significance of inosine E. Gustafson, P. Savino, D. Devlin, S. Noviello, M. Geffner, E. L. Heinzen, A. C. Need triphosphatase. Pharmacogenomics 8, 1221–1228 (2007). and E. T. Cirulli for their contributions to the study. This study was funded by the 11. Stocco, G. et al. Genetic polymorphism of inosine triphosphate pyrophosphatase Schering-Plough Research Institute, Kenilworth, New Jersey. A.J.T. receives is a determinant of mercaptopurine metabolism and toxicity during treatment for funding support from the National Health and Medical Research Council of acute lymphoblastic leukemia. Clin. Pharmacol. Ther. 85, 164–172 (2009). Australia and the Gastroenterological Society of Australia. 12. Sumi, S. et al. Genetic basis of inosine triphosphate pyrophosphohydrolase Author Contributions J.F., A.J.T. and D.G. contributed equally. D.B.G., J.G.M., J.A., deficiency. Hum. Genet. 111, 360–367 (2002). A.J.T. and J.F. defined the clinical phenotypes. J.F., A.J.T., D.G. and T.J.U. performed 13. Cao, H. & Hegele, R. A. DNA polymorphisms in ITPA including basis of inosine the statistical and bioinformatical analyses. K.V.S. performed the genotyping. triphosphatase deficiency. J. Hum. Genet. 47, 620–622 (2002). C.E.G. and L.D.L. performed the sequencing experiments. P.Q., A.H.B., M.W., A.W., 14. Arenas, M., Duley, J., Sumi, S., Sanderson, J. & Marinaki, A. The ITPA c.94C.A A.J.M., M.S., C.B. and J.A. collected and analysed the clinical data. A.J.M., M.S., and g.IVS2121A.C sequence variants contribute to missplicing of the ITPA gene. J.G.M. and D.B.G designed the study. J.F., A.J.T. and D.B.G. wrote the manuscript Biochim. Biophys. Acta 1772, 96–102 (2007). with critical input from all authors. 15. Stepchenkova, E. I. et al. Functional study of the P32T ITPA variant associated with drug sensitivity in humans. J. Mol. Biol. 392, 602–613 (2009). Author Information Reprints and permissions information is available at 16. Shipkova, M., Lorenz, K., Oellerich, M., Wieland, E. & von Ahsen, N. Measurement www.nature.com/reprints. The authors declare competing financial interests: of erythrocyte inosine triphosphate pyrophosphohydrolase (ITPA) activity by details accompany the full-text HTML version of the paper at www.nature.com/ HPLC and correlation of ITPA genotype-phenotype in a Caucasian population. nature. Correspondence and requests for materials should be addressed to D.B.G. Clin. Chem. 52, 240–247 (2006). ([email protected]) or J.G.M. ([email protected]).

408

Chapter 11

A perspective on the future of host genomics of chronic viral infections

This chapter has been modified from: Fellay J, Shianna KV, Telenti A, Goldstein DB. Host genetics and HIV-1: the final phase? PLoS Pathog. 2010 Oct 14;6(10):e1001033

A perspective on the future of host genomics of chronic viral infections

This is a crucial transition time for human genetics in general, and for host genetics of chronic viral infections in particular. After years of equivocal results from candidate gene analyses, multiple genome-wide association studies have been published that looked at various clinical and laboratory outcomes. Results from other studies that used various large-scale approaches (siRNA screens, transcriptome or proteome analysis, comparative genomics) have also shed new light on viral pathogenesis. However, most of the inter-individual variability in response to chronic viral infections remains to be explained. Focusing on HIV infection as a paradigmatic example, this perspective describes recent advances and proposes that sequencing and systems biology approaches should now be used to progress toward a better understanding of the complex interactions between viruses and their human host.

Introduction Many fundamental questions about how and why humans differ in their susceptibility to chronic viral pathogens remain largely unanswered. For example, it is has long been known that a fraction of the human population cannot be infected by HIV [1,2]. We still do not know, however, whether most of those who are resistant to infection are resistant due to innate or adaptive immunity, or to some other mechanism. Nor are the precise pathways that allow apparently permanent control of the virus amongst a subset of those that do become infected well understood. These questions are obviously central in the effort to develop effective antiviral strategies, and at their heart, they are genetic. Until recently, our capacity to systematically address these issues was limited. But genomic analyses have advanced to the point that comprehensive or nearly comprehensive analyses of the role of genetic variation in viral control are now within reach. A series of genome-wide association studies (GWAS) have provided a detailed description of how common variation influences outcomes of HIV [3-16]. More importantly, next-generation sequencing is now advanced sufficiently that a dedicated effort to uncover the role of rare variation has become feasible. Coupling these new developments with rich cohorts being built under the auspices of international groupings such as the International Collaboration for the Genomics of HIV, and with novel systems approaches, all ingredients are now available for drawing conclusive answers to these fundamental questions.

The role of human genetic variation: the complete picture Considering the number of HIV-related GWAS that have now been published, it seems reasonable to conclude that most of the common variants important in control of HIV (i.e. with moderate to high effect size) have been identified, at least in individuals of European ancestry. Despite this, it appears that most of the inter-individual differences in control remain to be explained. The confirmed host genetic determinants of HIV control are only able to explain about 20% of the observed variation in viral load or disease progression [8]. Noteworthy, such limited genetic knowledge can already be used to refine the prediction of disease progression, beyond the information provided by viral load only, as shown in Figure 1.

Figure 1: An additive genetic score helps predict HIV disease progression. Data are from Fellay et al. [8]: 1071 HIV infected subjects of Caucasian ancestry are included in the analysis. The columns show the proportions of study subjects that reached a progression outcome (CD4+ T cells <350/ul or initiation of combined antiretroviral treatment with CD4+ T cells <500/ul) during the first 5 years after estimated date of seroconversion in categories defined by HIV viral load and by a simple additive genetic score, in which 1 unit is counted for each “protective” allele. The minimum score is 0 for individuals that are homozygous for the major allele at rs2395029 (a proxy for HLA-B*5701), rs9264942 (HLA-C -35 variant), rs9261174 (ZNRD1) and CCR5-Δ32. The maximal observed score is 3 since no subject was heterozygous or homozygous for the minor allele at all 4 sites. Subjects were grouped in 3 categories to clearly show that the genetic score refines the prediction of progression, beyond the information provided by viral load only, throughout the range of set point values.

So, what is responsible for the large variability in HIV control that still remains unexplained? Clearly, part of it is attributable to the virus itself, as demonstrated by sudden changes in disease course and/or viral set point

upon super-infection in chronically infected patients [17] and by sizeable differences in viral load set point that can be observed between the donor and the recipient in HIV transmission pairs [18]. Environmental influences also play a role: for example, pro-inflammatory diseases are often associated with a significant increase in HIV viral load, while co-infections with viruses like GB virus C, HTLV-1 and HIV-2 have been reported to have an inhibitory effect on HIV replication [19]. However, as it is the case for many other human complex traits, it is not unreasonable to assume that rarer genetic variants are responsible for a sizeable fraction of the unexplained inter-individual differences [20]. For example, it is clear that a fraction of the population is highly resistant to infection by HIV. While homozygosity for CCR5Δ32 is responsible for some of these cases [21-23], it appears to explain only a minority of such observations. Several studies have shown a higher frequency of CCR5Δ32/Δ32 in HIV- uninfected hemophiliacs than in the general population (up to 25% compared to 1%, respectively), with the highest frequencies in those with severe hemophilia [24-26]. Homozygosity for CCR5Δ32 is also significantly enriched in highly exposed, yet seronegative homosexual men [21,27]. While those numbers clearly illustrate the high degree of exposure in these populations, it also suggests that other protective mechanisms are responsible for most individual cases of resistance. GWAS studies in these groups also failed to reveal any strong common variants conferring further protection [14,16]. The identification of the other variants responsible for protection therefore will require a deeper interrogation of the human genome than is possible using GWAS. While still relatively expensive and difficult to implement due to computational and bioinformatic challenges, it is feasible to carry out systematic discovery genetics using exome or genome sequencing [28]. Several recent reports demonstrated that the cause of Mendelian diseases could be identified using such sequencing strategies [29-31]. In the HIV field, several sequencing projects are underway, which did not however identify novel human determinants of resistance against HIV infection or viral control (unpublished results).

Systems Approaches Beyond the information generated by studies of DNA variation, the field can now press ahead with novel approaches that use a range of technologies. Prominently among these are the analyses of the transcriptome and proteome, and small interfering RNA (siRNA) screens (Figure 2). These genome-wide studies generate large data sets that can be analyzed in isolation, and increasingly, in integrated manner [32-34]. Below we summarize key studies in the HIV field using these techniques, and the first efforts at feeding information across studies. This section addresses the prospects for a systems biology approach in the study of HIV biology and pathogenesis.

Genome&wide* associa.on* studies* * Evolu.onary* analyses* * Genome&wide* transcriptome* studies* * siRNA/shRNA* screens* * miRNA*profiling* * Gain&of&func.on* screens* * Proteome* studies* * Protein* interac.on* databases*

Figure 2: Types of genome-wide and other large-scale studies published in the HIV field.

Transcriptome analyses. Hybridization-based assays have been used to perform genome-wide analysis of infected cells in vitro, and in vivo in HIV infected individuals [35-38]. Dynamic analyses using microarrays and RNA sequencing have been also completed [39,40]. The cell types investigated varied from the collective study of PBMCs, to cell-type specific studies [41]. The overarching message from these studies are (i) the massive modulation of the antiviral defense systems (the interferon response, including the antiretroviral intrinsic cellular defense apparatus), (ii) the prominent modulation of genes involved in the cell cycle and degradation/ proteasome pathways, and (iii) the absence of a characteristic expression pattern of effective control of viral replication (e.g. in elite controllers). The precision of transcriptome analyses is about to be greatly improved through the capacity to look at the transcriptome in single cells [42]. Proteome analyses. The number of proteins that could be assessed in a quantitative fashion used to be a limiting factor for large-scale proteomic studies. However, in 2012 Jäger et al. published the first description of the “global landscape of HIV-human protein complexes” [43]: using affinity tagging

and purification mass spectrometry, they systematically determined the physical interactions of all 18 HIV-1 proteins and polyproteins with human proteins in two different cell lines, and identified with high confidence 497 HIV- human protein-protein interactions. Another important resource in the field in the NCBI HIV-1 human protein interaction database, which currently includes over 12,000 interactions with more than 3,000 human genes, collected from about 5,500 publications [44]. siRNA and gain-of-function screens. Several siRNA transfection [45-47] and short hairpin RNA (shRNA) [48] transduction studies have targeted the coding RNA for >20.000 human proteins. Approximately 1000 proteins have been identified as potentially necessary for an optimal viral replication. However, there was minimal overlap across studies – possibly because of differences in cell types and in study design. None of the studies captured or were designed to identify genes that would restrict viral replication – i.e., their silencing would result in greater viral production. Overall, 34 genes were identified in 2 or more of the screens. However, among those genes that were shared by one or more studies, a pattern emerged that involves the nuclear pore machinery, the mediator complex, a number of key kinases, and components of the NFk-B complex (Figure 3). One gain-of-function screen used a cDNA library representing 15,000 unique genes in an infectious HIV-1 system [49].

Figure 3: Predicted interaction networks of genes identified as HIV dependency factors in siRNA/shRNA screens [45-48]. Links were predicted using STRING (http://string.embl.de/). Predicted interactions are depicted according to the type of available evidence. The interactions

(see color labels) include direct (physical) and indirect (functional) associations; they are derived from four sources: genomic context, high- throughput experiments, conserved co-expression, and previous knowledge from literature. The nature of the supporting evidence is indicated by the color lines: yellow – text mining, purple – experimental, red – gene fusion, light blue – protein-protein interactions, blue – genomic co-occurrence evidence.

Evolutionary data. Genes involved in immunity and inflammation among those exhibiting the strongest signatures of positive selection both across species and within humans [50-57]. Increasingly, evolutionary and comparative sequence analyses across species or within human populations, can be used to identify genes having played a major role in host survival and therefore likely to influence modern susceptibility to, or pathogenesis of, infectious diseases [58]. Given the relevance of endogenous and exogenous retroviruses in primate evolution, the identification of genomic signatures can provide an additional layer of data for analysis of contemporary susceptibility to HIV. A number of evolutionary analyses of genes involved in cellular defense against retroviruses have been reported [59-63]. Data integration. Progressively, researchers aim at integrating different layers of data. Rotger et al. [38] examined the correspondence of results from genome-wide transcriptome analysis of differentially expressed mRNA in CD4 T cells from infected individuals, with results from analysis of cis-acting genetic variants modulating gene expression in the same samples. Bushman and colleagues [32] applied meta-analytical procedures to assess a wider range of genome-wide studies and public interaction databases. More recently, Bartha et al. [34] developed a public resource supporting multipurpose analysis of genome-wide genetic variation and gene expression profile across multiple phenotypes relevant to HIV biology.

Outlook There is growing interest in applying non-reductionist approaches such as systems biology to the study of human diseases. The general premise of systems biology includes the high-throughput quantitative approach to a biological system that can be subject of iterative cycles of perturbation, and the modeling of the collected data. Viral infections, which result in a perturbed environment that can be exogenously manipulated through treatment, or modulated by genetic determinants, should now be approached under this research paradigm. The aim of host genetic research is to comprehensively describe human genetic influences on infectious diseases. Some genetic factors have now been convincingly associated with HIV and HCV outcomes, yet much effort is still needed to get the full picture. The field is now moving simultaneously towards greater depth in genome analysis and towards more breath and integration through systems biology. This ongoing transition brings renewed

hopes that genetic analysis of the human host will contribute substantially to understanding viral pathogenesis.

References 1. Kroner BL, Rosenberg PS, Aledort LM, Alvord WG, Goedert JJ (1994) HIV-1 infection incidence among persons with hemophilia in the United States and western Europe, 1978-1990. Multicenter Hemophilia Cohort Study. J Acquir Immune Defic Syndr 7: 279-286. 2. Kulkarni PS, Butera ST, Duerr AC (2003) Resistance to HIV-1 infection: lessons learned from studies of highly exposed persistently seronegative (HEPS) individuals. AIDS Rev 5: 87-103. 3. The International HIV Controllers Study (2010) The Major Genetic Determinants of HIV-1 Control Affect HLA Class I Peptide Presentation. Science 330: 1551-1557. 4. Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, et al. (2007) A whole-genome association study of major determinants for host control of HIV-1. Science 317: 944-947. 5. Limou S, Le Clerc S, Coulonges C, Carpentier W, Dina C, et al. (2009) Genomewide Association Study of an AIDS-Nonprogression Cohort Emphasizes the Role Played by HLA Genes (ANRS Genomewide Association Study 02). J Infect Dis 199: 419-426. 6. Dalmasso C, Carpentier W, Meyer L, Rouzioux C, Goujard C, et al. (2008) Distinct genetic loci control plasma HIV-RNA and cellular HIV-DNA levels in HIV-1 infection: the ANRS Genome Wide Association 01 study. PLoS ONE 3: e3907. 7. Le Clerc S, Limou S, Coulonges Cd, Carpentier W, Dina C, et al. (2009) Genomewide Association Study of a Rapid Progression Cohort Identifies New Susceptibility Alleles for AIDS (ANRS Genomewide Association Study 03). J Infect Dis 200: 1194-1201. 8. Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, et al. (2009) Common Genetic Variation and the Control of HIV-1 in Humans. PLoS Genet 5: e1000791. 9. Herbeck JT, Gottlieb GS, Winkler CA, Nelson GW, An P, et al. (2010) Multistage Genomewide Association Study Identifies a Locus at 1q41 Associated with Rate of HIV-1 Disease Progression to Clinical AIDS. J Infect Dis 201: 618-626. 10. Pelak K, Goldstein DB, Walley NM, Fellay J, Ge D, et al. (2010) Host determinants of HIV-1 control in African Americans. J Infect Dis 201: 1141-1149. 11. Petrovski S, Fellay J, Shianna KV, Carpenetti N, Kumwenda J, et al. (2011) Common human genetic variants and HIV-1 susceptibility: a genome-wide survey in a homogeneous African population. Aids 25: 513- 518.

12. Bol SM, Moerland PD, Limou S, van Remmerden Y, Coulonges C, et al. (2011) Genome-Wide Association Study Identifies Single Nucleotide Polymorphism in DYRK1A Associated with Replication of HIV-1 in Monocyte-Derived Macrophages. PLoS ONE 6: e17190. 13. Lingappa JR, Petrovski S, Kahle E, Fellay J, Shianna K, et al. (2011) Genomewide Association Study for Determinants of HIV-1 Acquisition and Viral Set Point in HIV-1 Serodiscordant Couples with Quantified Virus Exposure. PLoS ONE 6: e28632. 14. McLaren PJ, Coulonges Cd, Ripke S, van den Berg L, Buchbinder S, et al. (2013) Association Study of Common Genetic Variants and HIV-1 Acquisition in 6,300 Infected Cases and 7,200 Controls. PLoS Pathog 9: e1003515. 15. Bartha I, Carlson JM, Brumme CJ, McLaren PJ, Brumme ZL, et al. (2013) A genome-to-genome analysis of associations between human genetic variation, HIV-1 sequence diversity, and viral control. Elife 2: e01123. 16. Lane J, McLaren PJ, Dorrell L, Shianna KV, Stemke A, et al. (2013) A genome-wide association study of resistance to HIV infection in highly exposed uninfected individuals with hemophilia A. Hum Mol Genet 22: 1903-1910. 17. Smith DM, Richman DD, Little SJ (2005) HIV superinfection. J Infect Dis 192: 438-444. 18. Tang J, Tang S, Lobashevsky E, Zulu I, Aldrovandi G, et al. (2004) HLA allele sharing and HIV type 1 viremia in seroconverting Zambians with known transmitting partners. AIDS Res Hum Retroviruses 20: 19-25. 19. Kannangara S, DeSimone JA, Pomerantz RJ (2005) Attenuation of HIV-1 infection by other microbial agents. J Infect Dis 192: 1003-1009. 20. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461: 747-753. 21. Dean M, Carrington M, Winkler C, Huttley GA, Smith MW, et al. (1996) Genetic restriction of HIV-1 infection and progression to AIDS by a deletion allele of the CKR5 structural gene. Hemophilia Growth and Development Study, Multicenter AIDS Cohort Study, Multicenter Hemophilia Cohort Study, San Francisco City Cohort, ALIVE Study. Science 273: 1856-1862. 22. Liu R, Paxton WA, Choe S, Ceradini D, Martin SR, et al. (1996) Homozygous defect in HIV-1 coreceptor accounts for resistance of some multiply-exposed individuals to HIV-1 infection. Cell 86: 367-377. 23. Samson M, Libert F, Doranz BJ, Rucker J, Liesnard C, et al. (1996) Resistance to HIV-1 infection in caucasian individuals bearing mutant alleles of the CCR-5 chemokine receptor gene. Nature 382: 722-725. 24. Wilkinson DA, Operskalski EA, Busch MP, Mosley JW, Koup RA (1998) A 32-bp deletion within the CCR5 locus protects against transmission of parenterally acquired human immunodeficiency virus but does not affect progression to AIDS-defining illness. J Infect Dis 178: 1163-1166. 25. Zhang M, Goedert JJ, O'Brien TR (2003) High frequency of CCR5-delta32 homozygosity in HCV-infected, HIV-1-uninfected hemophiliacs results from resistance to HIV-1. Gastroenterology 124: 867-868.

26. Kupfer B, Kaiser R, Brackmann HH, Effenberger W, Rockstroh JK, et al. (1999) Protection against parenteral HIV-1 infection by homozygous deletion in the C-C chemokine receptor 5 gene. Aids 13: 1025-1028. 27. Huang Y, Paxton WA, Wolinsky SM, Neumann AU, Zhang L, et al. (1996) The role of a mutant CCR5 allele in HIV-1 transmission and disease progression. Nat Med 2: 1240-1243. 28. Cirulli ET, Goldstein DB (2010) Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet 11: 415-425. 29. Choi M, Scholl UI, Ji W, Liu T, Tikhonova IR, et al. (2009) Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci U S A 106: 19096-19101. 30. Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW, et al. (2009) Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461: 272-276. 31. Ng SB, Buckingham KJ, Lee C, Bigham AW, Tabor HK, et al. (2009) Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 42: 30-35. 32. Bushman FD, Malani N, Fernandes J, D'Orso I, Cagney G, et al. (2009) Host cell factors in HIV replication: meta-analysis of genome-wide studies. PLoS Pathog 5: e1000437. 33. Telenti A (2009) HIV-1 host interactions - integration of large scale datasets. F1000 Biology Reports 1: 71. 34. Bartha I, McLaren PJ, Ciuffi A, Fellay J, Telenti A (2014) GuavaH: a compendium of host genomic data in HIV biology and disease. Retrovirology 11: 6. 35. Sedaghat AR, German J, Teslovich TM, Cofrancesco J, Jr., Jie CC, et al. (2008) Chronic CD4+ T-cell activation and depletion in human immunodeficiency virus type 1 infection: type I interferon-mediated disruption of T-cell dynamics. J Virol 82: 1870-1883. 36. Hyrcza MD, Kovacs C, Loutfy M, Halpenny R, Heisler L, et al. (2007) Distinct transcriptional profiles in ex vivo CD4+ and CD8+ T cells are established early in human immunodeficiency virus type 1 infection and are characterized by a chronic interferon response as well as extensive transcriptional changes in CD8+ T cells. J Virol 81: 3477-3486. 37. Giri MS, Nebozyhn M, Raymond A, Gekonge B, Hancock A, et al. (2009) Circulating monocytes in HIV-1-infected viremic subjects exhibit an antiapoptosis gene signature and virus- and host-mediated apoptosis resistance. J Immunol 182: 4459-4470. 38. Rotger M, Dang KK, Fellay J, Heinzen EL, Feng S, et al. (2010) Genome- Wide mRNA Expression Correlates of Viral Control in CD4+ T-Cells from HIV-1-Infected Individuals. PLoS Pathog 6: e1000781. 39. Mohammadi P, Desfarges S, Bartha I, Joos B, Zangger N, et al. (2013) 24 hours in the life of HIV-1 in a T cell line. PLoS Pathog 9: e1003161. 40. Mohammadi P, di Iulio J, Munoz M, Martinez R, Bartha I, et al. (2014) Dynamics of HIV latency and reactivation in a primary CD4+ T cell model. PLoS Pathog 10: e1004156.

41. Giri MS, Nebozhyn M, Showe L, Montaner LJ (2006) Microarray data on gene modulation by HIV-1 in immune cells: 2000-2006. J Leukoc Biol 80: 1031-1043. 42. Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, et al. (2009) mRNA- Seq whole-transcriptome analysis of a single cell. Nat Meth 6: 377-382. 43. Jager S, Cimermancic P, Gulbahce N, Johnson JR, McGovern KE, et al. (2012) Global landscape of HIV-human protein complexes. Nature 481: 365-370. 44. Fu W, Sanders-Beer BE, Katz KS, Maglott DR, Pruitt KD, et al. (2009) Human immunodeficiency virus type 1, human protein interaction database at NCBI. Nucleic Acids Res 37: D417-422. 45. Brass AL, Dykxhoorn DM, Benita Y, Yan N, Engelman A, et al. (2008) Identification of host proteins required for HIV infection through a functional genomic screen. Science 319: 921-926. 46. Konig R, Zhou Y, Elleder D, Diamond TL, Bonamy GM, et al. (2008) Global analysis of host-pathogen interactions that regulate early-stage HIV-1 replication. Cell 135: 49-60. 47. Zhou H, Xu M, Huang Q, Gates AT, Zhang XD, et al. (2008) Genome- scale RNAi screen for host factors required for HIV replication. Cell Host Microbe 4: 495-504. 48. Yeung ML, Houzet L, Yedavalli VS, Jeang KT (2009) A genome-wide short hairpin RNA screening of jurkat T-cells for human proteins contributing to productive HIV-1 replication. J Biol Chem 284: 19463- 19473. 49. Nguyen DG, Yin H, Zhou Y, Wolff KC, Kuhen KL, et al. (2007) Identification of novel therapeutic targets for HIV infection through functional genomic cDNA screening. Virology 362: 16-25. 50. Sabeti PC, Schaffner SF, Fry B, Lohmueller J, Varilly P, et al. (2006) Positive natural selection in the human lineage. Science 312: 1614-1620. 51. Sabeti PC, Varilly P, Fry B, Lohmueller J, Hostetter E, et al. (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449: 913-918. 52. Voight BF, Kudaravalli S, Wen X, Pritchard JK (2006) A map of recent positive selection in the human genome. PLoS Biol 4: e72. 53. Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L (2008) Natural selection has driven population differentiation in modern humans. Nat Genet 40: 340-345. 54. Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, et al. (2005) Natural selection on protein-coding genes in the human genome. Nature 437: 1153-1157. 55. Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, et al. (2003) Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302: 1960-1963. 56. The Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437: 69-87.

57. Kosiol C, Vinar T, da Fonseca RR, Hubisz MJ, Bustamante CD, et al. (2008) Patterns of positive selection in six Mammalian genomes. PLoS Genet 4: e1000144. 58. Karlsson EK, Kwiatkowski DP, Sabeti PC (2014) Natural selection and infectious disease in human populations. Nat Rev Genet 15: 379-393. 59. Esnault C, Heidmann O, Delebecque F, Dewannieux M, Ribet D, et al. (2005) APOBEC3G cytidine deaminase inhibits retrotransposition of endogenous retroviruses. Nature 433: 430-433. 60. Stremlau M, Owens CM, Perron MJ, Kiessling M, Autissier P, et al. (2004) The cytoplasmic body component TRIM5alpha restricts HIV-1 infection in Old World monkeys. Nature 427: 848-853. 61. Sheehy AM, Gaddis NC, Choi JD, Malim MH (2002) Isolation of a human gene that inhibits HIV-1 infection and is suppressed by the viral Vif protein. Nature 418: 646-650. 62. Neil SJ, Zang T, Bieniasz PD (2008) Tetherin inhibits retrovirus release and is antagonized by HIV-1 Vpu. Nature 451: 425-430. 63. Ortiz M, Guex N, Patin E, Martin O, Xenarios I, et al. (2009) Evolutionary Trajectories of Primate Genes Involved in HIV Pathogenesis. Mol Biol Evol 26: 2865-2875.

Summary

Recent advances in genetic knowledge, bioinformatics and technology have transformed host genetic research and allowed comprehensive studies of the human genome. Genome-wide association studies based on genotyping arrays have been considered the approach of choice to search for host factors involved in the pathogenesis of chronic viral infections, and to assess their relative contribution. In the first and larger part of the thesis, I presented GWAS of human genetic determinants of various phenotypes related to HIV infection. Studies of spontaneous viral control in untreated, HIV-infected individuals affirmed the key role of the HLA region, and in particular of the HLA-B and HLA-C loci, in modulating viral load and disease progression. We also defined more precisely the known associations between CCR5Δ32 - a 32 base pair deletion in the CCR5 gene - and viral control (in heterozygous individuals), and with resistance to HIV infection (in homozygous individuals). Beyond CCR5, the GWAS of HIV acquisition did not identify new associated variant, implying that common genetic polymorphisms with moderate to high effect size do not appreciably impact HIV acquisition. An additional GWAS looked at T cell responses to an experimental HIV vaccine, and showed that the same HLA alleles are implicated in vaccine induced cellular immunity and in natural immune control, which is of relevance for vaccine design. Finally, we used variation in the viral genome as phenotype in multiple parallel GWAS. This method allowed the establishment of the first global map of human-HIV genomic interactions, by identifying the areas of the human genome that put pressure on HIV, and the regions of the virus that serve to escape human control. The second part of the thesis presents two GWAS performed in patients treated for chronic HCV infection, and investigating the efficacy and toxicity of peg- interferon alpha and ribavirin treatment. Interferon-lambda 3 (encoded by IL28B) was identified as the key determinant of treatment response and inosine triphosphatase deficiency was shown to play a major protective role against drug- induced anemia. These discoveries were considered major breakthroughs in virology and hepatology: by highlighting host factors that are responsible for important differences in pathogenesis, they facilitated the development of new diagnostic and therapeutic strategies. As an illustration, the IL28B genotyping test has already been used by several pharmaceutical companies as a randomization criterion for clinical trials and is marketed since 2011 as a diagnostic tool that can be helpful in therapeutic decision-making. The studies presented here provide a detailed description of how common variation influences control of HIV and HCV infections. Still, most of the inter- individual variability remains to be explained. As an example, the confirmed host genetic determinants of HIV control are only able to explain about 20% of the observed variation in viral load or disease progression. This is partly due to the fact that GWAS were only designed to detect common human genetic variants

(generally >1-3% minor allele frequency) with effect sizes that are large enough to create significant associations after correction for multiple testing. More common variants with smaller population effect sizes could be identified through a continuous increase in the number of subjects included in genome-wide studies, but this requires access to large numbers of phenotypically well characterized patients, which will be hard to achieve in an era of near universal anti-HIV therapy and highly efficient anti-HCV drugs. However, as it is the case for many other human complex traits, it is not unreasonable to assume that rarer genetic variants are responsible for a sizeable fraction of the unexplained inter-individual differences in response to viral infections. Therefore, targeted or whole genome sequencing strategies will prove essential to better appreciate the global contribution of the human genome to phenotypic outcomes. The recent development of high-throughput sequencing platforms has made this possible to do in an efficient manner, and exome sequencing studies are currently ongoing in the HIV and HCV fields. The ongoing historical transition into the genomic era brings new hopes that host genetics will contribute to a better understanding of viral infections, to the identification of new targets for drug and vaccine development, and to the creation of predictive tools that will make personalized medicine a reality in infectious diseases.

Samenvatting

Dankzij technologische ontwikkelingen is het nu mogelijk om variaties in het humane genoom en hun invloed op ziekten te bestuderen. Met behulp van microarray-chips hebben genoom-wijde associatiestudies (GWAS) belangrijke inzichten opgeleverd in de etiologie van chronische infectieziekten. In het eerste deel van dit proefschrift heb ik de resultaten beschreven van GWAS van het ziekteverloop van HIV-infectie. Deze studies laten zien dat er genetische factoren bestaan in de HLA genen (met name HLA-B en HLA-C), die replicatie van HIV kunnen onderdrukken in onbehandelde seropositieve patiënten. Ook is de rol van het CCR5Δ32-allel, een deletie van 32 baseparen in het CCR5-gen, nauwkeurig beschreven bij onderdrukking van de replicatie van HIV (in heterozygote dragers) en resistentie tegen infectie (in homozygote dragers). Behalve deze CCR5Δ32-associatie zijn er geen nieuwe associaties gevonden met de GWAS van HIV-resistentie, wat aangeeft dat (andere) hoogfrequente genetische varianten met een sterke effectgrootte waarschijnlijk niet zullen bestaan. Een andere GWAS van de immuunrespons tegen een experimenteel HIV-vaccin heeft laten zien dat dezelfde HLA-allelen betrokken zijn bij spontane onderdrukking van het virus als bij vaccin-geïnduceerde immuniteit. Tot slot hebben wij ook een statistische methode ontwikkeld om de interactie tussen het genoom van het virus en het humane genoom te bestuderen. Met deze methode is het mogelijk om genen aan te wijzen in het humane genoom die het virus onderdrukken en om genen aan te wijzen in het virale genoom die zulke evolutionaire druk juist proberen te ontlopen. Het tweede deel van dit proefschrift omvat twee GWAS in patiënten behandeld voor chronische infectie met hepatitis-C-virus om de effectiviteit en toxiciteit te bepalen van de behandeling met peg-interferon-alfa en ribavirine. Het gen dat codeert voor interferon-lambda-3 (IL28B) blijkt een belangrijke determinant voor de mate waarin patiënten reageren op behandeling. De studie laat ook zien dat inosine-trifosfatase-deficiëntie een beschermende werking heeft tegen bloedarmoede in patiënten die behandeld worden met ribavirine. Deze inzichten hebben geleid tot een gen-test die nu wordt ingezet bij klinische studies en ook gebruikt wordt als diagnostisch hulpmiddel voor de behandeling van HCV- infectie. De studies in dit proefschrift geven een gedetailleerd beeld van veelvoorkomende genetische variaties in het humane genoom en hoe deze HCV- en HIV-infecties kunnen beïnvloeden. Ondanks deze eerste bevindingen met GWAS blijft een groot deel van het interindividuele risico onverklaard. Genetische factoren (zoals HLA en CCR5) bepalen bijvoorbeeld slechts 20% van de waargenomen variantie in het ziekteverloop. Dit valt gedeeltelijk te verklaren doordat GWAS uitsluitend veelvoorkomende (hoogfrequente) allelen kunnen detecteren. Ook zijn de studies beperkt in de steekproefgrootte; met grotere studies zouden kleinere effecten gedetecteerd kunnen worden maar dit zal in de praktijk lastig te bereiken zijn. Verschillen in het DNA die minder vaak voorkomen in de populatie zullen waarschijnlijk ook een belangrijke rol spelen. Met behulp van DNA-sequencing in grote aantallen patiënten kunnen zulke laagfrequente allelen bestudeerd worden. De resultaten in dit proefschrift illustreren hoe de humane genetica ingezet kan worden om de onderliggende mechanismen van infectieziekten beter te begrijpen. Huidige ontwikkelingen in de genetica zullen hopelijk leiden tot betere therapieën, effectieve vaccins en nieuwe middelen voor diagnostiek.

Acknowledgements

Many thanks, first of all, to the two mentors that made me discover, learn and love science: Amalio Telenti and David Goldstein. Special thanks also to my “promotors” at UMC Utrecht: Paul de Bakker and Frank Miedema. I want to acknowledge the numerous colleagues and friends who contributed directly or indirectly to the work presented in this thesis. Here is a necessarily incomplete list, in no particular order: Paul McLaren, Mary Carrington, István Bartha, Kevin Shianna, Dongliang Ge, Thomas Urban, Alex Thompson, Slavé Petrovski, Curtis Gumbs, Kim Pelak, Liz Cirulli, Sara Colombo, Margalida Rotger, John McHutchison, Patrick Shea, Lucy Dorrell, Andrew McMichael, Norman Letvin, Myron Cohen, Barton Haynes, Steve Deeks, Galit Alter, Nicole Frahm, Chanson Brumme, Zabrina Brumme, Mina John, Andri Rauch, Huldrych Günthard, Patrick Francioli, Bruno Ledergerber, Patrick Descombes, Zoltán Kutalik, Viktor Müller, Richard Harrigan, Steven Wolinsky, Jean-François Zagury, Steve O’Brien, Cheryl Winkler, Simon Mallal, Javier Martinez-Picado, Manj Sandhu, Jonathan Carlson, David Heckerman. Finally, I would like to extend my appreciation to all the patients who accepted to participate in the studies presented in this thesis, and agreed to share biological samples and genetic data. Without them, none of this would have been possible.

Curriculum Vitae

Jacques Fellay (Martigny, Switzerland, August 12th 1974) graduated from the Collège of St-Maurice in 1993, before enrolling into medical school at the University of Fribourg and later and the University of Lausanne, Switzerland. During his studies, he spent a year as an exchange student in Vienna, Austria. He graduated as a medical doctor in 2000 and obtained his MD degree from the University of Lausanne in 2002 for his thesis entitled “Differences in response to antiretroviral treatment: pharmacokinetics, pharmacogenetics & toxicity”. Jacques Fellay subsequently worked as a resident and a fellow in internal medicine and infectious diseases in various Swiss hospitals, and was certified as an infectious diseases specialist in 2006. He then moved to the USA and joined David Goldstein’s research group at the Duke Center for Human Genetic Variation, where he worked on human genetic determinants of viral infections for 4 years. Since 2011, he is an Assistant Professor at the EPFL School of Life Sciences and a Group Leader at the Swiss Institute of Bioinformatics, in Lausanne. His research group uses a range of large-scale approaches to explore the genetic roots of inter-individual differences in response to infections. In 2013, Jacques Fellay was awarded the National Latsis Prize by the Swiss National Science Foundation in recognition of his work on the human genome and its defense mechanisms against viral diseases.

Publication List

First or co-first author

1. Pelak K*, Need AC*, Fellay J*, Shianna KV, Feng S, Urban TJ, Ge D, De Luca A, Martinez-Picado J, Wolinsky SM, Martinson JJ, Jamieson BD, Bream JH, Martin MP, Borrow P, Letvin NL, McMichael AJ, Haynes BF, Telenti A, Carrington M, Goldstein DB, Alter G. Copy number variation of genes encoding killer cell immunoglobulin-like receptors and the control of HIV-1. PLoS Biology 2011; 9(11):e1001208. 2. Fellay J, Frahm N, Shianna KV, Cirulli ET, Casimiro DR, Robertson MN, Haynes BF, Geraghty DE, McElrath MJ, Goldstein DB. Host genetic determinants of T cell responses to the MRKAd5 HIV-1 gag/pol/nef vaccine in the Step trial. Journal of Infectious Diseases 2011; 203(6):773-9 3. Petrovski S*, Fellay J*, Shianna KV, Caprenetti N, Kumwenda J, Kamanga G, Kamwendo DD, Haynes BF, Cohen MS, Goldstein DB. Common Human Genetic Variants and HIV-1 susceptibility: A genome-wide survey in a homogeneous African population. AIDS 2011; 25(4):513-8 4. Fellay J, Thompson AJ, Ge D, Gumbs CE, Urban TJ, Shianna KV, Little LD, Qiu P, Bertelsen AH, Watson M, Warner A, Muir AJ, Brass C, Albrecht J, Sulkowski M,McHutchison JG, Goldstein DB. ITPA gene variants protect against anemia in patients treated for chronic hepatitis C. Nature 2010; 464(7287):405-8 5. Rotger M*, Dang K*, Fellay J*, Heinzen EL, Feng S, Descombes P, Shianna KV, Ge D, Günthard HF, Goldstein DB, Telenti A, Swiss HIV Cohort Study, Center for HIV/AIDS Vaccine Immunology. Genome-wide mRNA expression correlates of viral control in CD4+ T-cells from HIV-1-Infected Individuals. PLoS Pathogens 2010; 6(2):e1000781 6. Fellay J, Shianna KV, Telenti A, Goldstein DB. Host Genetics and HIV-1: the final phase? PLoS Pathogens 2010; 6(10):e1001033 7. Fellay J, Ge D, Shianna KV, Colombo S, Ledergerber B, Cirulli ET, Urban TJ, Zhang K, Gumbs CE, Smith JP, Castagna A, Cozzi-Lepri A, De Luca A, Easterbrook P, Günthard HF, Mallal S, Mussini C, Dalmau J, Martinez-Picado J, Miro JM, Obel N, Wolinsky SM, Martinson JJ, Detels R, Margolick JB, Jacobson LP, Descombes P, Antonarakis SE, Beckmann JS, O’Brien SJ, Letvin NL, McMichael AJ, Haynes BH, Carrington M, Feng S, Telenti A, Goldstein DB. Common genetic variation and the control of HIV-1 in humans. PLoS Genetics 2009; 5(12):e1000791 8. Ge D*, Fellay J*, Thompson AJ, Simon JS, Shianna KV, Urban TJ, Heinzen EL, Qiu P, Bertelsen AH, Muir AJ, Sulkowski M, McHutchison JG, Goldstein DB. Genetic variation in IL28B predicts hepatitis C treatment-induced viral clearance. Nature 2009; 461(7262):399-401 9. Fellay J, Shianna KV, Ge D, Colombo S, Ledergerber B, Weale M, Zhang K, Gumbs C, Castagna A, Cossarizza A, Cozzi-Lepri A, De Luca A, Easterbrook P, Francioli P, Mallal S, Martinez-Picado J, Miro JM, Obel N, Smith JP, Wyniger J, Descombes P, Antonarakis SE, Letvin NL, McMichael AJ, Haynes BF, Telenti A, Goldstein DB. A whole-genome association study of major determinants for host control of HIV-1. Science 2007; 317(5840):944-7 10. Fellay J, Goldstein DB. Host genetics: fine-tuning innate signalling. Current Biology 2007; 17(13):R516-8

11. Fellay J, Cavassini M. HIV-AIDS: answers to new questions. Revue Médicale Suisse. 2006; 2(60):912-7. 12. Fellay J, Marzolini C, Decosterd LA, Golay KP, Baumann P, Buclin T, Telenti A, Eap CB. Variations of CYP3A activity induced by antiretroviral treatment in HIV-1 infected patients. European Journal of Clinical Pharmacology 2005;60(12):865-73 13. Fellay J, Venetz JP, Pascual M, Aubert JD, Seydoux C, Meylan PR. Treatment of Cytomegalovirus infection or disease in solid organ transplant recipients with Valganciclovir. American Journal of Transplantation 2005; 5(7):1781-2 14. Fellay J, Marzolini C, Meaden ER, Back DJ, Buclin T, Chave JP, Decosterd LA, Furrer H, Opravil M, Pantaleo G, Retelska D, Ruiz L, Schinkel AH, Vernazza P, Eap CB, Telenti A, Swiss HIV Cohort Study. Response to antiretroviral treatment in HIV-1-infected individuals with allelic variants of the multidrug resistance transporter 1: a pharmacogenetics study. Lancet 2002; 359:30-36 15. Fellay J, Boubaker K, Ledergerber B, Bernasconi E, Furrer H, Battegay M, Hirschel B, Vernazza P, Francioli P, Greub G, Flepp M, Telenti A, Swiss HIV Cohort Study. Prevalence of adverse events associated with potent antiretroviral treatment: Swiss HIV Cohort Study. Lancet 2001; 358:1322

Senior author

16. Regoes RR, McLaren PJ, Battegay M, Bernasconi E, Calmy A, Günthard HF, Hoffmann M, Rauch A, Telenti A, Fellay J. Disentangling human tolerance and resistance against HIV. PLoS Biology 2014; 12(9):e1001951 17. Bartha I, McLaren PJ, Ciuffi A, Fellay J*, Telenti A*. GuavaH: A compendium of host genomic data in HIV biology and disease. Retrovirology 2014; 11(1):6 18. Fellay J. Genomic Medicine and Infectious Diseases. Praxis 2014; 103(10):587-90 19. Bartha I, Carlson JM, Brumme CJ, McLaren PJ, Brumme ZL, John M, Haas DW, Martinez-Picado J, Dalmau J, López-Galíndez C, Casado C, Rauch A, Günthard HF, Bernasconi E, Vernazza P, Klimkait T, Yerly S, O’Brien SJ, Listgarten J, Pfeifer N, Lippert C, Fusi N, Kutalik Z, Allen TM, Müller V, Harrigan PR, Heckerman D, Telenti A, Fellay J. A Genome-to-Genome Analysis of Associations between Human Genetic Variation, HIV-1 Sequence Diversity, and Viral Control. eLife 2013; 2:e01123 20. McLaren PJ, Coulonges Cd, Ripke S, van den Berg L, Buchbinder S, Carrington M, Cossarizza A, Dalmau J, Deeks SG, Delaneau O, De Luca A, Goedert JJ, Haas D, Herbeck JT, Kathiresan S, Kirk GD, Lambotte O, Luo M, Mallal S, van Manen Dl, Martinez-Picado J, Meyer L, Miro JM, Mullins JI, Obel N, O'Brien SJ, Pereyra F, Plummer FA, Poli G, Qi Y, Rucart P, Sandhu MS, Shea PR, Schuitemaker H, Theodorou I, Vannberg F, Veldink J, Walker BD, Weintrob A, Winkler CA, Wolinsky S, Telenti A, Goldstein DB, de Bakker PIW, Zagury J-Fo, Fellay J. Association Study of Common Genetic Variants and HIV-1 Acquisition in 6,300 Infected Cases and 7,200 Controls. PLoS Pathogens 2013; 9(7): e1003515 21. Lane J, McLaren PJ, Dorrell L, Shianna KV, Stemke A, Pelak K, Moore S, Oldenburg J, Alvarez-Roman MT, Angelillo-Scherrer A, Boehlen F, Bolton- Maggs PHB, Brand B, Brown D, Chiang E, Cid-Haro AR, Clotet B, Collins P, Colombo S, Dalmau J, Fogarty P, Giangrande P, Gringeri A, Iyer R, Katsarou

O, Kempton C, Kuriakose P, Lin J, Makris M, Manco-Johnson M, Tsakiris DA, Martinez-Picado J, Mauser-Bunschoten E, Neff A, Oka S, Oyesiku L, Parra R, Peter-Salonen K, Powell J, Recht M, Shapiro A, Stine K, Talks K, Telenti A, Wilde J, Yee TT, Wolinsky SM, Martinson J, Hussain SK, Bream JH, Jacobson LP, Carrington M, Goedert JJ, Haynes BF, McMichael AJ, Goldstein DB, Fellay J. A genome-wide association study of resistance to HIV infection in highly exposed uninfected individuals with hemophilia A. Human Molecular Genetics 2013; 22:1903-10 22. Rauch A, Fellay J. Personalized Hepatitis C Therapy: Opportunities and Pitfalls. Expert Review of Molecular Diagnostics 2011; 11(2):127-9 23. Alter G, Brenchley J, Fellay J. Challenges Facing Young Investigators. IAVI Reports 2009; 13:14-7 24. Fellay J. Host genetics influences on HIV-1 disease. Antiviral Therapy 2009; 10.3851/IMP1253

Co-author

25. Rausell A, Mohammadi P, McLaren PJ, Xenarios I, Fellay J, Telenti A. Analysis of stop-gain and frameshift variants in human innate immunity genes. PLoS Computational Biology 2014, 10(7):e1003757 26. Mohammadi P, di Iulio J, Muñoz M, Bartha I, Cavassini M, Thorball C, Fellay J, Beerenwinkel N, Ciuffi A, Telenti A. Dynamics of HIV latency and reactivation in a primary CD4+ T cell model. PLoS Pathogens 2014, 10(5):e1004156 27. Vince N, Bashirova A, Lied A, Gao X, Dorrell L, McLaren PJ, Fellay J, Carrington M. HLA class I and KIR genes do not protect against HIV- infection in highly exposed uninfected individuals with hemophilia A. Journal of Infectious Diseases 2014; 210(7):1047-51 28. Bashirova AA, Martin-Gayo E, Jones DC, Qi Y, Apps R, Gao X, Burke PS, Taylor CJ, Rogich J, Wolinsky S, Bream JH, Duggal P, Hussain S, Martinson J, Weintrob A, Kirk GD, Fellay J, Buchbinder SP, Goedert JJ, Deeks SG, Pereyra F, Trowsdale J, Lichterfeld M, Telenti A, Walker BD, Allen RL, Carrington M, Yu XG. LILRB2 interaction with HLA class I correlates with control of HIV-1 infection. PLoS Genetics 2014; 10(3):e1004196 29. Hasse B, Walker AS, Fehr J, Furrer H, Hoffmann M, Battegay M, Calmy A, Fellay J, Di Benedetto C, Weber R, Ledergerber B. Co-Trimoxazole Prophylaxis Is Associated with Reduced Risk of Incident Tuberculosis in Participants in the Swiss HIV Cohort Study. Antimicrobial Agents and Chemotherapy. 2014; 58(4):2363-2368 30. Kulkarni S, Qi Y, O'huigin C, Pereyra F, Ramsuran V, McLaren P, Fellay J, Nelson G, Chen H, Liao W, Bass S, Apps R, Gao X, Yuki Y, Lied A, Ganesan A, Hunt PW, Deeks SG, Wolinsky S, Walker BD, Carrington M. Genetic interplay between HLA-C and MIR148A in HIV control and Crohn's disease. Proc Natl Acad Sci U S A 2013; 110:20705-10 31. Apps R, Qi Y, Carlson JM, Chen H, Gao X, Thomas R, Yuki Y, Del Prete G, Goulder P, Brumme ZL, Brumme CJ, John M, Mallal S, Nelson G, Bosch R, Heckerman D, Stein J, Soderberg KA, Moody MA, Denny TN, Zeng X, Fang J, Moffett A, Lifson JD, Goedert JJ, Buchbinder S, Kirk GD, Fellay J, McLaren PJ, Deeks SG, Pereyra F, Walker B, Michael NL, Weintrob A, Wolinsky S, Liao W, Carrington M. Opposing effects of HLA-C expression level in viral versus inflammatory disease. Science 2013; 340(6128):87-91

32. McLaren PJ, Fellay J, Telenti A. European genetic diversity and susceptibility to pathogens. Human Heredity 2013; 76:187-193 33. Rotger M, Glass TR, Junier T, Lundgren J, Neaton JD, Poloni ES, van 't Wout AB, Lubomirov R, Colombo S, Martinez R, Rauch A, Günthard HF, Neuhaus J, Wentworth D, van Manen D, Gras LA, Schuitemaker H, Albini L, Torti C, Jacobson LP, Li X, Kingsley LA, Carli F, Guaraldi G, Ford ES, Sereti I, Hadigan C, Martinez E, Arnedo M, Egaña-Gorroño L, Gatell JM, Law M, Bendall C, Petoumenos K, Rockstroh J, Wasmuth JC, Kabamba K, Delforge M, De Wit S, Berger F, Mauss S, de Paz Sierra M, Losso M, Belloso WH, Leyes M, Campins A, Mondi A, De Luca A, Bernardino I, Barriuso-Iglesias M, Torrecilla-Rodriguez A, Gonzalez-Garcia J, Arribas JR, Fanti I, Gel S, Puig J, Negredo E, Gutierrez M, Domingo P, Fischer J, Fätkenheuer G, Alonso- Villaverde C, Macken A, Woo J, McGinty T, Mallon P, Mangili A, Skinner S, Wanke CA, Reiss P, Weber R, Bucher HC, Fellay J, Telenti A, Tarr PE. Contribution of Genetic Background, Traditional Risk Factors, and HIV- Related Factors to Coronary Artery Disease Events in HIV-Positive Persons. Clin Infect Dis. 2013; 57(1):112-21 34. Ayday E, Raisaro JL, McLaren PJ, Fellay J, Hubaux JP. Privacy-Preserving Computation of Disease Risk by Using Genomic, Clinical, and Environmental Data. USENIX Security Workshop on Health Information Technologies (HealthTech '13), Aug. 2013 35. Clark PJ, Thompson AJ, Zhu Q, Vock DM, Zhu M, Patel K, Harrison SA, Naggie S, Ge D, Tillmann HL, Urban TJ, Shianna K, Fellay J, Goodman Z, Noviello S, Pedicone LD, Afdhal N, Sulkowski M, Albrecht JK, Goldstein DB, McHutchison JG, Muir AJ. The association of genetic variants with hepatic steatosis in patients with genotype 1 chronic hepatitis C infection. Dig Dis Sci. 2012; 57(8):2213-21 36. Stern M, Czaja K, Rauch A, Rickenbach M, Günthard HF, Battegay M, Fellay J, Hirschel B, Hess C; Swiss HIV Cohort Study Group. HLA-Bw4 identifies a population of HIV-infected patients with an increased capacity to control viral replication after structured treatment interruption. HIV Medicine 2012; 13(10):589-95 37. Clark PJ, Thompson AJ, Zhu M, Vock DM, Zhu Q, Ge D, Patel K, Harrison SA, Urban TJ, Naggie S, Fellay J, Tillmann HL, Shianna K, Noviello S, Pedicone LD, Esteban R, Kwo P, Sulkowski MS, Afdhal N, Albrecht JK, Goldstein DB, McHutchison JG, Muir AJ. Interleukin 28B polymorphisms are the only common genetic variants associated with low-density lipoprotein cholesterol (LDL-C) in genotype-1 chronic hepatitis C and determine the association between LDL-C and treatment response. J Viral Hepat. 2012; 19(5):332-40 38. Pillai SK, Abdel-Mohsen M, Guatelli J, Skasko M, Monto A, Fujimoto K, Yukl S, Greene WC, Kovari H, Rauch A, Fellay J, Battegay M, Hirschel B, Witteck A, Bernasconi E, Ledergerber B, Günthard HF, Wong JK. Role of retroviral restriction factors in the interferon-α-mediated suppression of HIV-1 in vivo. Proc Natl Acad Sci U S A 2012; 109(8):3035-40 39. Thompson AJ, Clark PJ, Singh A, Ge D, Fellay J, Zhu M, Zhu Q, Urban TJ, Patel K, Tillmann HL, Naggie S, Afdhal NH, Jacobson IM, Esteban R, Poordad F, Lawitz EJ, McCone J, Shiffman ML, Galler GW, King JW, Kwo PY, Shianna KV, Noviello S, Pedicone LD, Brass CA, Albrecht JK, Sulkowski MS, Goldstein DB, McHutchison JG, Muir AJ. Genome-wide association study of interferon-related cytopenia in chronic hepatitis C patients identifies

an association between ITPA variants and thrombocytopenia. Journal of Hepatology 2012; 56(2):313-9 40. Snoeck J, Fellay J, Bartha I, Douek DC, Telenti A. Mapping of positive selection sites in the HIV genome in the context of RNA and protein structural constraints. Retrovirology 2011; 8(1):87 41. Lingappa JR, Petrovski S, Kahle E, Fellay J, Shianna K, McElrath MJ, Thomas KK, Baeten JM, Celum C, Wald A, de Bruyn G, Mullins JI, Nakku- Joloba E, Farquhar C, Essex M, Donnell D, Kiarie J, Haynes B, Goldstein D. Genomewide Association Study for Determinants of HIV-1 Acquisition and Viral Set Point in HIV-1 Serodiscordant Couples with Quantified Virus Exposure. PLoS One 2011; 6(12):e28632 42. Vanasse GJ, Jeong J, Tate J, Bathulapalli H, Anderson D, Steen H, Fleming M, Fultz S, Telenti A, Fellay J, Justice AC, Berliner N. A polymorphism in the leptin gene promoter is associated with anemia in patients with HIV disease. Blood 2011; 118(20):5401-8 43. Haas DW, Kuritzkes D, Ritchie MD, Amur S, Gage B, Maartens G, Masys D, Fellay J, Phillips E, Ribaudo HJ, Freedberg K, Petropoulos C, Manolio T, Gulick RM, Haubrich R, Kim P, Dehlinger M, Abebe R, Telenti A. Pharmacogenomics of HIV Therapy: Summary of a Workshop Sponsored by the National Institute of Allergy and Infectious Diseases. HIV Clinical Trials 2011; 12(5):277-85 44. Rotger M, Dalmau J, Rauch A, McLaren P, Bosinger S, Battegay M, Bernasconi E, Descombes P, Erkizia I, Fellay J, Hirschel B, Miró JP, Palou E, Vernazza P, Woods M, Günthard HF, de Bakker P, Silvestri G, Martinez- Picado J, Telenti A. Comparative analysis of genomic features of human HIV- 1 infection and primate models of SIV infection. Journal of Clinical Investigations 2011; 121(6):2391-400 45. di Iulio J, Ciuffi A, Fitzmaurice K, Kelleher D, Rotger M, J. Fellay J, Pulit S, Furrer H, Günthard HF, Battegay M, Bernasconi E, Vernazza P, Hirschel B, Klenerman P, Telenti A, Rauch A. Estimating the net contribution of IL28B variation to spontaneous Hepatitis C virus clearance. Hepatology 2011; 53(5):1446-54 46. Evangelou E, Fellay J, Colombo S, Martinez-Picado J, Obel N, Goldstein DB, Ioannidis JPA, Telenti A. Impact of phenotype definition on genome-wide association signals: empirical evaluation in HIV-1 infection. American Journal of Epidemiology, 2011; 173(11):1336-42 47. Bol SM, Moerland PD, Limou S, van Remmerden Y, Coulonges C, van Manen D, Herbeck JT, Fellay J, Sieberer M, Sietzema JG, van’t Slot R, Zagury J, Schuitemaker H, van’t Wout AB. Genome-wide association study identifies single nucleotide polymorphism in DYRK1A associated with replication of HIV-1 in monocyte-derived macrophages. PLoS One 2011; 6(2):e17190 48. Hitomi Y, Cirulli ET, Fellay J, McHutchison JG, Thompson AJ, Gumbs CE, Shianna KV, Urban TJ, Goldstein DB. Inosine triphosphate protects against ribavirin-induced ATP loss by restoring adenylosuccinate synthase function. Gastroenterology 2011 140(4):1314-21 49. Urban TJ, Thompson AJ, Bradrick S, Fellay J, Schuppan D, Cronin KD, Hong L, McKenzie A, Shianna KV, McHutchison JG, Goldstein DB, Afdhal N. IL28B genotype predicts expression of innate immune response genes in chronic HCV infected liver. Hepatology 2010; 52(6):1888-96

50. Pelak K, Shianna KV, Ge D, Maia JM, Smith JP, Cirulli ET, Singh A, Zhu M, Fellay J, Dickson S, Gumbs C, Campbell CR, Hong LK, Lornsen KA, McKenzie AM, Sobreira NLM, Hoover-Fong J, Milner JD, Ottoman R, Haynes BF, Goedert JJ, Goldstein DB. The characterization of twenty sequenced human genomes. PLoS Genetics 2010; 6(9): pii: e1001111 51. Thompson AJ, Fellay J, McHutchison JG. How the human genome can predict response to Hepatitis C therapy. Current Hepatitis Reports 2010;9:1-8 52. Sabado RL, O’Brien M, Subedi A, Qin L, Hu N, Taylor E, Dibben O, Stacey A, Fellay J, Shianna KV, Siegal F, Shodell M, Shah K, Larsson M, Lifson J, Nadas A, Marmor M, Hutt R, Margolis D, Garmon D, Markowitz M, Valentine F, Borrow P, Bhardwaj N. Evidence of dysregulation of dendritic cells during primary HIV infection. Blood 2010; 116(19):3839-52 53. Thompson AJ, Fellay J, Patel K, Tillmann H, Naggie S, Ge D, Urban TJ, Shianna KV, Muir AJ, Fried M, Afdhal N, Goldstein DB, McHutchison JG. Variants in the ITPA gene protect against ribavirin-induced hemolytic anemia and need for ribavirin dose reduction. Gastroenterology 2010; 139(4):1181-9 54. Specht A, Telenti A, Martinez R, Fellay J, Bailes E, Evans DT, Carrington M, Hahn BH, Goldstein DB, Kirchhoff F. Counteraction of HLA-C-mediated immune control of HIV-1 by Nef. Journal of Virology 2010; 84(14):7300-11 55. Thompson AJ, Muir AJ, Sulkowski MS, Ge D, Fellay J, Shianna KV, Urban TJ, Afdhal NH, Jacobson IM, Esteban R, Poordad F, Lawitz EJ, McCone J, Shiffman ML, Galler GW, Lee WM, Reindollar R, King JW, Kwo PY, Ghalib RH, Freilich B, Nyberg LM, Zeuzem S, Poynard T, Vock DM, Pieper KS, Patel K, Tillmann HL, Noviello S, Koury K, Pedicone LD, Brass CA, Albrecht JK, Goldstein DB, McHutchison JG. IL28B polymorphism improves viral kinetics and is the strongest pre-treatment predictor of SVR in HCV-1 patients. Gatroenterology 2010; 139(1):120-9.e18 56. Pelak K, Goldstein DB, Walley NM, Fellay J, Ge D, Shianna KV, Gumbs C, Gao X, Maia JM, Cronin KD, Hussain SK, Carrington M, Michael NL, Weintrob AC. Host determinants of HIV-1 control in African Americans. Journal of Infectious Diseases 2010; 201:1141-9 57. Gheuens S, Fellay J, Goldstein DB, Koralnik IJ. Role of HLA class I alleles in Progressive Multifocal Leukoencephalopathy. Journal of Neurovirology 2010; 16(1):41-7 58. Siddiqui RA, Sauermann U, Altmüller J, Fritzer E, Nothnagel M, Dalibor N, Fellay J, Kaup F, Stahl-Hennig C, Nürnberg P, Krawczak M, Platzer M. X- chromosomal variation is associated with slow progression to AIDS in HIV-1 infected women. American Journal of Human Genetics 2009; 85(2):228-39 59. Thomas R, Apps R, Qi Y, Gao X, Male V, O'HUigin C, O'Connor G, Ge D, Fellay J, Martin JN, Margolick J, Goedert JJ, Buchbinder S, Kirk GD, Martin MP, Telenti A, Deeks SG, Walker BD, Goldstein DB, McVicar DW, Moffett A, Carrington M. HLA-C cell surface expression and control of HIV/AIDS correlate with a variant upstream of HLA-C. Nature Genetics 2009; 41(12):1290-4 60. Walley NM, Julg B, Dickson SP, Fellay J, Ge D, Walker BD, Carrington M, Cohen MS, de Bakker PI, Goldstein DB, Shianna KV, Haynes BF, Letvin NL, McMichael AJ, Michael NL, Weintrob AC. The Duffy antigen receptor for chemokines null promoter variant does not influence HIV-1 acquisition or disease progression. Cell Host Microbe 2009;5:408-10

61. Ge D, Zhang K, Need AC, Martin O, Fellay J, Telenti A, Goldstein DB. WGAViewer: A software for genomic annotation of whole genome association studies. Genome Research 2008; 18:640-3 62. Colombo S, Rauch A, Rotger M, Fellay J, Martinez R, Fux C, Thurnheer C, Günthard HF, Goldstein DB, Furrer H, Telenti A; Swiss HIV Cohort Study. The HCP5-Single Nucleotide Polymorphism: A Simple Screening Tool for Predicting Abacavir Hypersensitivity Reactions. Journal of Infectious Diseases 2008; 198:864-7 63. Manuel O, Venetz JP, Fellay J, Sturzenegger N, Fontana M, Matter M, Meylan PR, Pascual M. Efficacy and safety of valganciclovir prophylaxis combined with a tacrolimus / mycophenolate regimen in kidney transplantation. Swiss Medical Weekly 2007; 137:669–676 64. Keiser O, Fellay J, Opravil M, Hirsch HH, Hirschel B, Bernasconi E, Vernazza PL, Rickenbach M, Telenti A, Furrer H. Adverse events to antiretrovirals in the Swiss HIV Cohort Study: Impact on mortality and treatment modification. Antiviral Therapy 2007; 12:1157-64 65. Müller NJ, Furrer H, Kaiser L, Cavassini M, Fellay J, Chave JP, Weber M, Müllhaupt B, Demartines N, Giostra E, Mentha G, Hirsch HH, Weber R, Swiss HIV Cohort Study. HIV and solid organ transplantation: The Swiss experience. Swiss Medical Weekly 2006;136(11-12):194-6 66. Eap CB, Fellay J, Buclin T, Bleiber G, Golay KP, Brocard M, Baumann P, Telenti A. Cyp3A activity measured by the midazolam test is not related to 3435 C>T polymorphism in the multiple drug resistance transporter gene. Pharmacogenetics 2004;14(4):255-260 67. Csajka C, Marzolini C, Fattinger K, Decosterd LA, Fellay J, Telenti A, Biollaz J, Buclin T. Population pharmacokinetics and effects of efavirenz in HIV patients. Clinical Pharmacology & Therapeutics 2003; 73(1):20-30 68. Greub G, Fellay J, Telenti A. HIV lipoatrophy and mosquito bites. Clinical Infectious Diseases 2002;34(2):288-9