Admixture Mapping
Total Page:16
File Type:pdf, Size:1020Kb
GG11CH04-Winkler ARI 4 August 2010 16:2 ANNUAL REVIEWS Further Admixture Mapping Click here for quick links to ∗ Annual Reviews content online, including: Comes of Age • Other articles in this volume • Top cited articles 1 1 • Top downloaded articles Cheryl A. Winkler, George W. Nelson, • Our comprehensive search and Michael W. Smith2 1Basic Science Program, SAIC-Frederick, Inc., Laboratory of Genomic Diversity, NCI Frederick, Frederick, MD 21702 2Advanced Technology Program, SAIC-Frederick, Inc., National Cancer Institute-Frederick, Frederick, MD 21702; email: [email protected] Annu. Rev. Genomics Hum. Genet. 2010. 11:65–89 Key Words First published online as a Review in Advance on mapping by admixture linkage disequilibrium, risk loci, genetic July 1, 2010 mapping, hidden Markov chain, health disparities The Annual Review of Genomics and Human Genetics is online at genom.annualreviews.org Abstract This article’s doi: Admixture mapping is based on the hypothesis that differences in disease 10.1146/annurev-genom-082509-141523 by Fordham University on 01/13/13. For personal use only. rates between populations are due in part to frequency differences in Copyright c 2010 by Annual Reviews. disease-causing genetic variants. In admixed populations, these genetic All rights reserved variants occur more often on chromosome segments inherited from 1527-8204/10/0922-0065$20.00 the ancestral population with the higher disease variant frequency. A ∗This is a work of the U.S. Government and is not genome scan for disease association requires only enough markers to Annu. Rev. Genom. Human Genet. 2010.11:65-89. Downloaded from www.annualreviews.org subject to copyright protection in the United identify the ancestral chromosome segments; for recently admixed pop- States. ulations, such as African Americans, 1,500–2,500 ancestry-informative markers (AIMs) are sufficient. The method was proposed over 50 years ago, but the AIM panels and statistical methods required have only recently become available. Since the first admixture scan in 2005, the genetic bases for a range of diseases/traits have been identified by admix- ture mapping. Here, we provide a historical perspective, review AIM panels and software packages, and discuss recent successes and unex- pected insights into human diseases that exhibit disparate rates across human populations. 65 GG11CH04-Winkler ARI 4 August 2010 16:2 INTRODUCTION HISTORICAL PERSPECTIVE A major goal of human genetics is to identify The human diaspora that has occurred in the genetic variation predisposing to complex, last 400–600 years has resulted in gene flow be- common human diseases. Genome-wide- tween previously separated human subpopula- association scans (GWAS) have led to the tions (Figure 1). Meiotic crossover in admixed discovery of thousands of alleles associated populations leads to a mosaic of chromosomal with human diseases and traits (31), but GWAS segments derived from one or the other are costly and, because of the large number ancestral (parental) subpopulation (Figure of single nucleotide polymorphisms (SNPs) 2). The duration, direction, and rate of gene analyzed, incur a stiff statistical penalty (61). flow between two populations influence the This is problematic because of the growing proportion of admixture and the length of chro- realization that most common-risk alleles have mosomal segments derived from the ancestral small effect sizes while alleles with large effect populations, which will vary among individuals. sizes tend to be much less frequent (37). To In admixed populations where gene flow com- have the power to identify small to moderate menced within the last several hundred years effect requires thousands of cases and controls (e.g., African Americans or Hispanics), linked (Table 1). Admixture mapping provides an alleles will show extended linkage disequilib- attractive and more powered alternative to rium (LD) relative to the ancestral populations. GWAS for gene discovery in admixed popula- Using the admixture linkage disequilibrium tions for a subset of diseases/traits that are dif- (ALD) generated in recently admixed human ferentially distributed in the ancestral (parental) populations to assign the traits to linkage populations (49). The idea is straightforward— groups was first proposed by Rife in 1954 (59), the genetic variation causing the disease/trait of but it took nearly 4 decades for the approach interest will be more frequent on chromosome to gain serious attention. In 1988, Chakroborty segments derived from the parental population & Weiss (13) renewed interest in ALD to map with the higher disease/trait incidence. genes by positing a classical linkage approach In this review, we provide only a brief analogous to family or hybrid animal studies overview of the theoretical basis and statistic that exploits long range LD to limit the number methods for admixture mapping as these have of markers required for genome-wide coverage. been expertly treated by others (41, 43, 69, 72, This was followed by a flurry of theoretical pa- 85). Instead, we provide a comprehensive re- pers in the 1990s (11, 41, 60, 72); these early by Fordham University on 01/13/13. For personal use only. view of admixture mapping software programs, pioneers advanced various statistical strategies the development of admixture panels, applica- and methods, all of which were based on the tions, and successes in the 5-year span mark- association between a marker allele and trait ing the passage from an elegant theoretical ap- to assign genes to a linkage group—a method proach to a powerful gene mapping application Stephens et al. (72) termed mapping by admix- Annu. Rev. Genom. Human Genet. 2010.11:65-89. Downloaded from www.annualreviews.org with several notable successes. ture linkage disequilibrium (MALD). Table 1 Whole-genome-wide mapping methods for gene discovery Linkage Admixture Association Reference Statistical power Low High Medium to high 49, 61 Number of SNPs Low < 1,000 Low 1,500–3,000 Very high >500,000 50 Resolution Low 20–30 cM Intermediate 1–10 cM High 1–10 cM 35, 36 Sensitivity to genetic heterogeneity Low Low to moderate High 81 66 Winkler · Nelson · Smith GG11CH04-Winkler ARI 4 August 2010 16:2 1526 1492 1499 1450 1500s Figure 1 Major migrations and diasporas, 1400–1800, that are sources of important admixed populations for admixture mapping. In 1998, McKeigue (42) proposed an alter- cases and controls are available, most inves- native approach to disease gene localization that tigators elect to calculate both a case-control tested for the linkage of the disease or trait with statistic quantifying the difference in ancestry parental ancestry at each locus, defined as 0, 1, between cases and controls, and a case-only or 2 allele copies inherited from the ancestral statistic comparing the extent of ancestry populations. McKeigue named this approach at each locus to the genome-wide average. admixture mapping because it is based on the Like hybrid or family linkage studies, a association of local chromosomal ancestry moderate number (≈1,500–2,500) of ancestry- with the disease rather than on LD between informative markers is needed for the initial by Fordham University on 01/13/13. For personal use only. the marker and phenotype (Figure 3). It was genome-wide scan, followed by fine mapping quickly appreciated that admixture mapping with additional markers to identify the causal could be applied to case-control studies by com- allele (49). Further, Patterson has shown that paring locus-specific ancestry at each ancestry- admixture mapping has the power to detect informative marker (AIM) between cases and association with relatively modest odds ratios Annu. Rev. Genom. Human Genet. 2010.11:65-89. Downloaded from www.annualreviews.org controls. Hoggart et al. (32) and Montana & with 2,500 or fewer cases (Figure 4); this offers Pritchard (45) showed through simulations that an advantage over standard GWAS results the extent of ancestry at each locus could also because it uses far fewer markers and thus has be compared to genome-wide average ancestry a much more modest correction for multiple for a case-only study design (32, 45). They comparisons. demonstrated that for rare diseases, an affected- The development of the theoretical ap- case-only design is highly efficient and better proach to using ancestry to map genes by powered than a case-control design. A statis- McKeigue in 1998 was followed rapidly by tical deviation in local ancestry away from the the development of statistical methods, pan- genome-wide average could be used to identify els of AIMs for admixture typing, and software a peak region of interest. In studies where both programs, many of which became available by www.annualreviews.org • Admixture Mapping Comes of Age 67 GG11CH04-Winkler ARI 4 August 2010 16:2 Continental rates across populations are due to population- isolation of populations specific frequency differences of the causal variant(s). THE PROCESS OF ADMIXTURE Gene flow between reproductively isolated populations results in chromosomal admixture with contributions from each contributing an- cestral population. The gene flow can be a sin- gle event in time or continuous over many gen- erations. The gene flow results in the tem- porary generation of long haplotype blocks, Generations which includes polymorphic variants, derived of admixture from one or the other ancestral population (Figure 2). These blocks of alleles in ALD are extremely extended in the first few generations following introgression, but the LD segments become progressively shorter by recombination with increasing generations. The length of hap- lotype segments derived from each of the ances- tral parent populations is a function of both the number of generations since the initial admix- ture event and the duration of gene flow. In recently admixed populations, the ex- tent of ALD is intermediate between ances- tral populations in which recombination has Today occurred over hundreds of thousands of gen- erations and families in which recombination Figure 2 has occurred in only a few generations.