The Pharmacogenomics Journal (2004) 4, 88–90 & 2004 Nature Publishing Group All rights reserved 1470-269X/04 $25.00 www.nature.com/tpj PERSPECTIVE

binding of a drug to a neurotransmit- The HapMap project and its ter receptor (eg variants in the GABAA receptor subunits alter responsiveness application to genetic studies to the antiepileptic benzodiazepines). The challenges facing us in making of drug response effective use of genetic and genomic information in pharmacological appli- cations rely on being able to discover P Deloukas and D Bentley the biologically and medically impor- tant genetic variants for each pheno- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK type, and to apply them to improve the use of drugs in healthcare. Studies to date have mainly focused on candi- The Pharmacogenomics Journal (2004) 4, genic diseases and other phenotypes date genes, each one chosen on the 88–90. doi:10.1038/sj.tpj.6500226 in which multiple genetic and envir- basis of a prior hypothesis that they Published online 16 December 2003 onmental factors contribute to the risk encode a protein that is involved in a that an individual has to develop particular drug response. Such studies disease. It is now widely accepted that are now greatly enhanced by the A central goal in the study of human association studies offer greater statis- wealth of information on new genes biology is to understand the molecular tical power over linkage in detecting and variants that is available in basis of common disease, and variable genetic effects underlying complex the public domain as a result of the sensitivity to drugs and other environ- traits2 when the causative variant is Human Project and asso- mental factors. Adverse drug effects are common in the population. This ciated research. The more ambitious a major cause of hospitalisation.1 The power is determined by the allele approach would be to scan the development of more effective, safer frequency and risk ratio of the causa- entire genome for important new medicines requires understanding of tive variant in relationship to the variants—an approach which is not the genetic factors which govern vari- sample size. A key element in under- limited by any prior hypothesis, able drug response in different indivi- taking such studies is the establish- but which requires effective resources duals. Recent advances in and ment of a comprehensive catalogue and technology for genome-wide genomics are paving the way to devel- of common variants in the human analysis. But what is the best way op diagnostic tests that will enable the population. forward, how do we make the best administration of drugs to be tailored Detecting the clinically important use of the available information, to groups of individuals, and may in genetic factors in determining variable and what resources do we need to future help to define appropriate in- drug response poses a similar problem. develop? dividual dosages and drug combina- Variability in drug response is gov- tions in pharmacological treatment. erned both by genetic variants and HUMAN VARIATION These hopes are reflected in the nongenetic factors such as diet, age Modern humans are a young species in growth of pharmacogenetic and phar- and gender. The contribution of ge- evolutionary terms, with a limited macogenomic research. Here we re- netic factors may be classified into two amount of sequence variation. It has view the potential impact of current groups: pharmacokinetic variants af- been estimated that there are 11–15 research in human genetic variation fect uptake, absorption, metabolism or million variants with a minor allele on our understanding and manage- clearance of a drug, and occur, for frequency 41%. Over 90% of these ment of variable drug responses. example, in genes that encode liver variants are single-nucleotide poly- In the past 10 years, there has been enzymes or drug transporters. The morphisms (SNP). Through large-scale great success in identifying the genetic same genetic variants may affect the efforts, over 4.5 million SNPs are basis of rare Mendelian disorders response to a range of drugs, because currently available in the public do- through linkage studies in large af- multiple drugs are metabolised via the main (dbSNP; build116). However, fected families. The genes and under- same route (as in the well-known case only a small fraction of these SNPs lying mutations, which cause over of cytochrome P450 enzyme CYP2D6, has been characterised in terms of 1400 such conditions, are reported in which metabolises a quarter of all allele frequency and ethnic distribu- OMIM (http://www.ncbi.nlm.nih.- commonly used drugs). Pharmacody- tion. This year has seen the comple- gov). However, similar approaches namic variants affect the biological tion of a highly accurate and have yielded much more modest suc- function of the drug at the site of contiguous reference sequence of the cess when applied to common, poly- action—for example, by altering the euchromatic (gene-coding) part of the HapMap project and its application P Deloukas and D Bentley 89

. A comprehensive SNPs.7 Based on this rationale, an evenly throughout the human gen- set of annotated human genes is international effort, the HapMap pro- ome, resulting in sparse SNP coverage available, and comparative analysis ject, was launched in October 2002, in many regions. The first deliverable of the human genome sequence with the aim of constructing a gen- of the HapMap project is therefore the to those of other species is highlight- ome-wide map of LD and common discovery of new SNPs. New shotgun ing additional evolutionary conserved in four populations from sequence data have been generated regions, which are likely to be en- Africa, East-Asia and Europe. across the whole genome, using li- riched in functional elements such braries made from DNA samples of a as transcriptional regulators. As a range of individuals. The sequence result, we can begin to systematically THE HAPMAP PROJECT data are being aligned to the finished explore the sequence variation con- The study-design of the HapMap pro- human sequence, and candidate SNPs tent of the functional part of the ject includes four population samples, are being detected using the program genome, which recent studies estimate namely 30 trios from CEPH/Utah SSAHA-SNP. The new data, coupled not to exceed 5%, in the human families (North European descent), 30 with improvements to the SNP-calling population. Yoruban (Nigeria) trios, 48 unrelated algorithms, have already made a sig- An association study can be under- Japanese and 48 unrelated Chan Chi- nificant contribution to the current taken using either a direct or an nese. The International Consortium total of 4.5 million SNPs; it is antici- indirect approach. The former relies (members listed at http://www.hapma- pated that this figure will exceed 5.5 on the availability of a complete list of p.org) adopted a hierarchical mapping million by the end of 2003. Further- functional variants, which are then strategy. In phase 1 of the project, a more, the accumulation of sequence tested for association to the trait of map of evenly spaced SNPs (1 per 5 kb) data through these efforts made it interest in a number of phenotypically with minor allele frequency greater possible to devise a filter for selecting matched cases and controls. The latter than or equal to 0.05 is being gener- SNPs that are likely to be common involves testing a series of genomic ated. Analysis of local LD patterns in (minor allele frequency 40.05). Over variants across a region (or all) of the each population will identify ‘haplo- 1.3 million of the currently available genome for association, and relies on type blocks’, and also identify the SNPs have each allele ascertained the assumption that the causative intervening regions that require data by two individual sequence reads. variant will be in linkage disequili- from additional SNPs to try to detect Empirical data confirm that these brium (LD) with one of the variants LD. Such regions will be the main ‘double-hit’ SNPs convert to working tested. Thus, characterisation and un- focus of phase 2, which may include assays at a much higher rate than derstanding of the dynamics of LD multiple rounds of SNP selection and random SNPs. across the genome is necessary for genotyping. Some additional SNPs will The second deliverable of the Hap- enabling whole-genome association also be typed within the emerging Map project will be a genome-wide set studies. ‘ blocks’ to corroborate the of SNP assays validated in four popula- Recent studies conducted at various findings. Work is in progress to devel- tions. SNP assay development and levels of resolution have all shown op an optimal statistical approach that validation remain a costly and la- that the extent of LD in different parts describes accurately the highly vari- bour-intensive exercise; currently, of the genome is highly variable, able nature of LD, and to provide such work needs to be carried out averaging 5–20 kb but extending up precise parameters to assess comple- upfront for almost every genetic study to hundreds of kilobases.3–5 Variability tion of the project in each region of undertaken. Like other large-scale in average LD has also been observed the genome. It is estimated that over genomic projects, The HapMap con- between ethnic groups, owing to dif- 1.6 million SNPs will be tested in the sortium has set up continuous quality ferences in their demographic history, course of the project. Data will be control and assessment of the produc- for example, population bottlenecks, released regularly into the public do- tion pipeline. High throughput geno- admixture, genetic drift and natural main, initially via the consortium’s typing is carried out on five different selection. The key observation is that Data Coordination Center (DCC) platforms: MassExtend (Sequenom), present day chromosomes consist (http://www.hapmap.org), and from Invader (Third Wave), AcycloPrime-FP mainly of short segments that have there to dbSNP and other public (Perkin-Elmer), Golden Gate-BeadAr- undergone very limited or no historic databases. ray (Illumina) and Parallele. Cross- recombination, and that for each of What are the HapMap deliverables, checking has enabled independent these segments a few common haplo- and how may they contribute to the assessment of the performance of the types represent the majority in a advancement of human genetics in different platforms. Raw genotype population.3,6 For each region of high general and pharmacogenomics in data conform to a consistent high LD and low haplotypic diversity (also particular? standard (499.5% accuracy), based termed a haplotype block), only a few At the outset of the HapMap project, on reproducibility of results in dupli- variants are then needed to tag the it was recognised that the collection of cate samples, concordance using in- common haplotypes within it. These 2.4 million publicly available SNPs dependent assays for the same SNP are referred to as haplotype tag (ht) (October 2002) was distributed un- and checks for consistency of allelic

www.nature.com/tpj HapMap project and its application P Deloukas and D Bentley 90

inheritance patterns in pedigrees. The FUTURE BENEFITS/APPLICATIONS evaluation of new drugs during devel- HapMap effort to assess genotyping In the course of the next 2 years, the opment and in clinical trials. The platforms will be of direct relevance to HapMap project will result in a vali- importance of concerted international the use of large-scale genotyping for dated set of SNP-based markers that efforts to produce public resources is genetic studies in the future. describe local patterns of LD and underlined by the HapMap project, The third deliverable of the project capture common haplotypes across which follows the principles adopted will be an LD map of the human most of the genome in four popula- for the human genome sequence genome and the determination of the tions. Clearly, the applicability of the itself. underlying common haplotypes in HapMap in studying variation in po- regions of strong LD. Characterisation pulations other than those included in of local patterns of LD at high resolu- the study will need to be thoroughly tion across the genome is a challen- tested. As a tool, the HapMap is DUALITY OF INTEREST The authors declare that they have no ging task, owing to both the highly designed to enable the study of com- competing financial interests; both are variable nature of LD and the lack of mon variants for association to disease members of the International HapMap knowledge of the true demographic and drug response, but it should not consortium. history of population samples. Current be seen in isolation of other resources, statistical methods, such as various for example, a catalogue of all func- haplotype block definitions, plots of tional variants. Overlaid on the anno- Correspondence should be sent to: average LD vs physical distance, LDU tated genome sequence, a haplotype P Deloukas, Wellcome Trust Sanger maps and statistical estimates of re- map that integrates functional var- Institute, Hinxton, Cambridgeshire CB10 combination rates, all generate differ- iants will become a very powerful tool 1SA, UK. ent views of LD, which often do not for pharmacogenetic and pharmacoge- Tel: þ 44 0 1223 834 244 fully overlap. Appropriate training sets nomic studies. Both candidate gene Fax: þ 44 01223 494919 for evaluating these methods are being and whole-genome approaches will E-mail: [email protected] developed, for example, deep re-se- make use of the HapMap resource to quencing of genomic regions and screen common haplotypes for asso- genotyping of all identified variants. ciation to a complex genetic trait. The REFERENCES 1 Lazarou J et al. Incidence of adverse drug However, new tools are likely to be candidate gene approach relies on our reactions in hospitalized patients: a meta- needed and their development, knowledge of biochemical pathways analysis of prospective studies. JAMA 1998; although a priority for the consortium, and gene interactions, and can be 279: 1200–1205. will be open to the entire field with the complemented by targeted sequencing 2 Risch NJ. Searching for genetic determinants in the new millennium. Nature 2000; 405: prompt release of all raw genotype to discover rarer variants. As the 847–856. data. Tools to determine the under- picture for the major candidates be- 3 Patil N et al. Blocks of limited haplotype lying common haplotypes in each comes unravelled, the whole-genome diversity revealed by high-resolution scan- defined region of strong LD are already approaches will come into their own ning of human chromosome 21. Science 2001; 294: 1719–1723. available. The final product, the hap- to find new targets of clinical impor- 4 Gabriel SB et al. The structure of haplotype lotype map, will be a valuable tool for tance that do not rely on prior hy- blocks in the human genome. Science 2002; choosing optimal marker sets in any potheses of gene or protein function. 296: 2225–2229. study undertaking LD mapping of As a result of the concerted use of both 5 Dawson E et al. A first generation linkage disequilibrium map of human chromosome common complex traits. A fourth approaches, the identification of hu- 22. Nature 2002; 418: 544–548. deliverable of the HapMap project will man sequence variants which alter 6 Daly MJ et al. High-resolution haplotype thus be a minimal set of reference gene expression and protein function structure in the human genome. Nat Genet markers that tag each of the common will become a driving force for the 2001; 29: 229–232. 7 Johnson GC et al. Haplotype tagging for the haplotypes (hence ‘haplotype tag development of suitable genotype- identification of common disease genes. Nat SNPs’, or htSNPs). based screening methods to support Genet 2001; 29: 233–237.

The Pharmacogenomics Journal