<<

3292

Hypothesis/Commentary

Perils in the Use of Linkage Disequilibrium for Fine Gene Mapping: Simple Insights from

Prakash Gorroochurn Division of Statistical Genetics, Department of , Mailman School of Public Health, Columbia University, New York, New York

Abstract

It is generally believed that genome-wide association tary explains why this strategy can be misleading.It (GWA) studies stand a good chance for finding argues that there is an intrinsic problem in the way susceptibility genes for common complex diseases. LD is currently used for fine-mapping.This is Although the results thus far have been somewhat because most of the metrics that are currently used promising, there are still many inherent difficulties to measure LD are inadequate, as they do not take and many initial associations do not get replicated. into account evolutionary variables that shape the LD The common strategy in GWA studies has been that structure of the human genome.Recent research on of selecting the most statistically significant single another metric, based on Male´cot’s model for isola- nucleotide polymorphisms with the hope that these tion by distance, holds considerable promise for will be very physically close to causal variants GWA studies and merits more serious consideration because of strong linkage disequilibrium (LD).Using by geneticists. (Cancer Epidemiol Biomarkers Prev simple ideas from population genetics, this commen- 2008;17(12):3292–7)

Introduction

Genome-wide association (GWA) studies are well under difficulties inherent in the conventional use of LD. By way and, in their first couple of years, have identified using simple population genetics ideas, this commentary variants for several complex diseases, including at least aims to enumerate specific instances where the results of four types of cancer (1). This seems promising, although LD fine-mapping can be misinterpreted. it remains to be seen how many of these will be Of the several theoretical studies on LD, an important consistently replicated in the future. Once a GWA study simulation-based workis that of Kruglyak(7). Kruglyak is done, there are usually two types of subsequent argued that LD between the common variant and other association studies (2): those aimed at replicating markers was essentially short-ranged and was unlikely the original significant markers (the so-called ‘‘exact’’ to extend beyond a genetic distance of 3 kb. A more approach) and those aimed at fine-mapping other loci recent study (8) found that LD could extend to genetic in the surrounding region in addition to the original distances of 1 Mb. Empirical studies have shown that LD significant loci (the so-called ‘‘local’’ approach). Both the is highly ‘‘patchy’’ in nature and does not decrease original GWA and any follow-up association study monotonically with physical distance. Chromosomes depend crucially on the concept of linkage disequili- tend to contain blocks of long-range LD (which is brium (LD; also known as gametic disequilibrium, allelic different from Kruglyak’s predictions) separated by association, or two-locus Hardy-Weinberg disequili- small region of recombination hotspots (9-11). LD is brium). It is unsurprising therefore that, well before high within blocks but low between blocks. GWA studies were under way, a vital taskwas to elucidate the LD structure of the human genome. This structure could shed considerable light on the prospects Insights from Population Genetics and limitations of LD mapping, especially for GWA. The indiscriminate use of LD can, however, lead to Insight 1: Population History Is a Key Determinant inaccurate results, especially when doing fine-mapping. of LD in Small DNA Regions and There Is Usually No Association studies are usually criticized on design and Correlation between LD and Physical Distance when methodologic grounds (3-6) while underestimating the Doing Fine-Mapping. The cystic fibrosis gene was found by climbing up a gradient until a peakwas reached. However, when the same principles are applied to the

Received 8/4/08; revised 8/29/08; accepted 9/15/08. Huntington’s disease (HD) gene, something unexpected Grant support: National Institute of Mental Health grant MH-48858. happens (see Fig. 1). Requests for reprints: Prakash Gorroochurn, Division of Statistical Genetics, The HD gene shows weakLD to some physically close Department of Biostatistics, Mailman School of Public Health, Columbia University, markers (e.g., D4S180) but strong LD to a more Room 620, 722 West 168th Street, New York, NY 10032. Phone: 212-342-1263; Fax: 212-342-0484. E-mail: [email protected] physically distant marker (D4S98). Thus, if both markers Copyright D 2008 American Association for Cancer Research. were tested for association with disease, one (D4S98) doi:10.1158/1055-9965.EPI-08-0717 would show strong association through LD with the HD

Cancer Epidemiol Biomarkers Prev 2008;17(12). December 2008

Downloaded from cebp.aacrjournals.org on September 28, 2021. © 2008 American Association for Cancer Research. Cancer Epidemiology,Biomarkers & Prevention 3293

gene, whereas the other (D4S180) would show weak large value of Dt (or Dt¶) if mutant at L originated association. This could mislead many to believe that the very recently backin time or if L is located in a HD gene was physically much closer to the D4S98 locus recombination cold spot (14). Another locus could be than to the D4S180 locus when in fact the opposite is true. relatively close to the given locus and result in small Is such erratic LD behavior surprising? Not when values of Dt (or Dt¶). looked from a population genetics perspective. Indeed, Insight 2: Generates LD between the noncorrelation between LD and physical distance is Disease and Marker Alleles in a Random Fashion, so the norm rather than the exception when dealing with a Particular Disease-Marker Association Found in One small DNA regions (e.g., <100 kb; ref. 12). To understand Population Need Not Exist in Another Population or why, consider the simple equation for LD: Might Even Be in the Opposite Direction. Templeton t (12) provides an example for two linked loci. Both the Dt ¼ D0ð1 À Þ ; ð1Þ human populations living in southern Italy west of the Apennine mountain range and on the nearby island of D t D where t is the LD after generations, 0 is the initial Sardinia were formed from a founder effect. In the Italian u D LD, and is the recombination fraction. ( t is actually population, all Med1 G6PD-deficient males had red/ t the deviation at generation from random association of green color blindness, whereas in the Sardinian popula- u two-locus frequencies.) When is small (e.g., tion almost none of the Med1 G6PD-deficient males had u À3 D  D <10 ), t 0, the value of LD is primarily red/green color blindness. Thus, in both cases, Med1 determined by its initial value (which is a function of G6PD-deficient males are strongly associated with red/ haplotype and frequencies) and does not decay green color blindness but in complete opposite direc- with either physical distance or time. Different markers tions. Furthermore, if we had selected cases with red/ at various short distances from a given markers will have green color blindness by sampling from both Italian and D different values of 0 depending on the evolutionary Sardinian populations, it is very likely that no association history of the population. whatsoever would be found. For example, consider three biallelic loci A, B, and C All of this makes sense because populations with small across a small DNA region (e.g., <100 kb), with B closer effective sizes undergo extensive amounts of genetic drift. A a than C to A (see Fig. 2). Suppose A has both and Genetic drift is the random fluctuation of allele frequen- B/B C/C alleles, but B and C have homozygotes and , cies across generations because of finite population size. respectively. We shall consider the normalized LD Let the mutant allele at the G6PD locus be denoted by a D¶ D measure (which equals divided by its theoretical and the allele for color blindness by b, with both alleles maximum value), because it is more indicative of the having relatively small frequencies. Let the wild-type relative strength of LD. Because there is no allelic alleles at each locus be A and B, respectively. Now, for D¶ variation at B and C, = 0 for alleles at A and B and these two loci, we have four , with frequencies for alleles at A and C. Suppose now an initial LD is even smaller than those of the corresponding alleles. The created between alleles A and B through gene flow from combination of small effective population size and rare several founders carrying copies of b. At about the same haplotypes means that each one is subject to considerable time in the past, an initial LD is created between alleles A random fluctuation, so that any one of them could and C through a single mutation of a C allele to c. Now, increase in frequency. In our example, the frequency of several copies of b are introduced with some on the same the ab haplotype increased in the Italian population, chromosome as A and others on the same chromosome whereas the frequency of the Ab haplotype increased in as a, whereas the single copy of c will always be the Sardinian population. Therefore, because genetic drift associated with A (or a depending on which chromo- is a stochastic phenomenon, it can create an association some the mutation occurred). Therefore, the initial LD between the alleles at a disease locus and a marker locus D ¶ AC AB 0 will be larger in magnitude for than for , in a completely random fashion. This explains the although locus C is further away from A than B is. Thus, completely opposite associations in the two cases. in general, Dt¶ is no indicator of the relative proximity of Furthermore, when the two populations are pooled, the a locus. When doing fine-mapping in small DNA LD between the alleles at the loci is destroyed because regions, there is usually no correlation between the now, in the total population, the b allele can occur on the magnitude of LD and physical distance (or recombina- same chromosome as either an a or an A allele. tion fraction): the fact that a particular marker has a very The haphazard effect of genetic drift on LD raises high disease association does not guarantee it must be questions on the legitimacy of replicating a significant physically close to the causative locus. Strachan and association found in one population in a different Read (ref. 13, p. 448) provide an accurate description of population. Emphasis has often been placed on replicat- the issue at hand: ‘‘...the patchy nature of LD, with some ing initial disease-marker associations. However, if we long-range correlations coexisting with short-range lack are willing to accept that LD is generally population- of correlation means that despite the theoretical high specific, then failure to replicate using different popula- resolution of association studies, in reality one can tions should be no reason for alarm. Thus, if an initial seldom be confident that one is looking in exactly the association is found in one population but no association right place for the susceptibility determinant.’’ in a different population, then it is not necessary for the Even for moderate to large DNA regions (f1-10 Mb), first association to be a false-positive. Both results can be the use of Dt (or Dt¶) can still be problematic. In that equally valid because they are specific for the popula- case, the value of u in Eq. (1) is no longer inconsequen- tions in which they were done. Whereas this point D u t D tial, and both 0 and (1 - ) influence the value of t.A about the specificity of LD has been well taken by some locus L relatively far from a given locus could give a (3, 15-17), it is perhaps not universally appreciated.

Cancer Epidemiol Biomarkers Prev 2008;17(12). December 2008

Downloaded from cebp.aacrjournals.org on September 28, 2021. © 2008 American Association for Cancer Research. 3294 Use of Linkage Disequilibrium for Fine Gene Mapping

Insight 3: The Act of Combining Two or More Genetically Distinct Subpopulations Creates LD between Any Two Alleles (at Different Loci) with Different Allele Frequencies Even when the Loci Are Unlinked. A well-known example is reported in Figure 2. At locus A, alleles are both of the A and a types. At Knowler et al. (18), whereby a strong negative association locus B, all alleles are of the B type and several b alleles are was found between non-insulin-dependent diabetes and introduced by gene flow. At locus C, all alleles are of the C type a haplotype at the IgG locus in the Pima and Papago and a single C allele mutates to c. tribes. When the study was repeated by stratifying on reported ancestry, the initial association vanished. This a classic case of confounding by population each of, say, two genetically distinct subpopulations. Let stratification, which arises when cases and controls are the frequency of M in subpopulation i(i = 1,2) be ri and N S r 6¼ r s 6¼ s sampled from genetically distinct subpopulations or that of be i, where 1 2 and 1 2. Then, when the when they are sampled from a single population made subpopulations are combined, LD is created between the up of genetically distinct subgroups (19). In fact, genetic two alleles with variable (12) drift is one of the major evolutionary forces that can create differentiation among subpopulations. There have Dcomb ¼ ð1 À Þðr1 À r2Þðs1 À s2Þ; ð3Þ been several recent methods to combat PS in association studies (20-23), with the principal components method of where p is the size of subpopulation 1 relative to Price et al. (22) being especially suited for GWA studies. subpopulation 2. Because of this (negative) disequili- PS gives rise to a spurious disease-marker association brium in our example, the cases will be overrepresented even if a true causal gene never existed (even if the in terms of the N allele, with the reverse being true in disease was nongenetic). In the example above, the the controls, thus implying a disease-marker association. association is an artifact of the controls having a higher However, such an association provides no indication as proportion of European ancestry than the cases. In to the physical proximity of the true causal locus, which Gorroochurn et al. (24), we gave the necessary and could potentially be f100 Mb away or even on different sufficient condition for such false disease-marker asso- chromosomes. ciations (due to PS) to arise. The LD in Eq. (3) is an example of spurious LD, XK XK XK because the disequilibrium created is not the result of idiri À idi iri ¼ 0; ð2Þ linkage (and physical proximity), but other population i¼1 i¼1 i¼1 artifacts, in this case gene flow between two genetically distinct subpopulations. Indeed, when any of the con- p ,d r where i i ,and i are, respectively, the relative ditions (random mating, infinite population size, no subpopulation size, the disease prevalence, and the allele selection, no mutation, and no gene flow) necessary for any i K frequency (for allele) in subpopulation , and is the one-locus HWE to hold are violated, two-locus HWE is number of discrete subpopulations. From Eq. (2), consid- also distorted and spurious LD can be created (unless the er for example two subpopulations of the same size. PS LD is due to close physical proximity). would cause the null hypothesis of no disease-marker association to be always false, for any marker allele as Insight 4: When Two or More Genetically Distinct long its frequency is different in the two subpopulations, Subpopulations Are Admixed, Assortative Mating and the disease prevalences are also different. There is no Helps Maintain Spurious LD in the Total Population. necessity for a true causal gene to exist. A typical example is the U.S. population. Early coloni- We can also consider the situation when a true causal zation of North America resulted in the admixture of marker exists. Let M be any marker allele at a locus and European and sub-Saharan African populations that had N be an unassociated disease allele at a different locus in allele frequency differences at numerous loci. Along with admixture between these two subpopulations for centu- ries, there also exists strong assortative mating based on skin color (12). A key observation here is that, in spite of thelonghistoryofadmixturebetweenthesetwo subpopulations, significant allele frequency differences are maintained at several loci both for loci influencing skin color and for other loci having no effect on skin color. This is because when genetically distinct populations are combined, assortative mating ensures that initial allelic differences are maintained for genes influencing A the genetic trait. Suppose allele at locus L1 influences a trait on which assortative mating occurs and suppose B A allele at locus L2 has no such influence. If alleles and B each initially had different frequencies in the African and European subpopulations, Eq. (3) predicts that LD is created between A and B. Because of this LD, assortative mating on the trait influenced by A causes the differences Figure 1. HD locus (total DNA stretch is 2,500 kb). Adapted in B to be also maintained, although B has no effect on from Krawczak and Schmidtke (47). the trait whatsoever. For example, the African and

Cancer Epidemiol Biomarkers Prev 2008;17(12). December 2008

Downloaded from cebp.aacrjournals.org on September 28, 2021. © 2008 American Association for Cancer Research. Cancer Epidemiology,Biomarkers & Prevention 3295

are functions of r and vanish when r = 0. These metrics do not explicitly take into account the various - ary forces that shape the current LD structure. Instead, they are functions of only the current haplotype and allele frequencies. Because they ignore population history, these LD metrics are unable to capture the extensive variations of the actual LD across the human genome, making them prone to the very problems we have outlined in Insights from Population Genetics. The following concluding remarks from Lewontin (26) are quite sobering: ‘‘...we may be able to find a measure of association that is preserved under particular conditions, but the search for a ‘pure’ measure of gametic disequili- brium is doomed to failure.’’ However, Collins, Morton, and others (27-34) have recently put forward a LD metric (known as the association probability, qx) based on population genetics theory, more specifically on Male´cot’s model for isolation by distance (35, 36): Top, Figure 3. LDU map for a 216-kb segment of class II of  ¼ð1 À LÞMeÀx þ L ð4Þ MHC. Bottom, corresponding recombination hotspots. From x Jeffreys et al. (9). In the above, the variable L is spurious LD (any LD that is not due to linkage), e (>0) is a constant for each European populations are characterized by differences at interval between two loci, x is the physical distance several blood group loci, although there is no tendency between the two loci, and M is the association probability for nonrandom mating to occur based on blood group. at x = 0. Morton et al. (28) have shown that qx is the most The differences are due to initial differences being efficient for modeling LD as a function of distance maintained because of LD between blood group genes compared with other commonly used metrics. The and genes for skin color. The LD created between alleles Male´cot’s model has a major advantage in that it A and B above has little chance of being broken due to incorporates the various evolutionary forces that shape assortative mating. This explains the potential for the LD structure in a particular population. More extensive amounts of LD to be maintained in such a specifically, the variable M depends on whether the population. susceptibility gene has a monophyletic or polyphyletic The interaction of admixture with assortative mating origin; it is equal to 1 if there is a unique susceptibility adds further wrinkles to LD fine-mapping. The sustained gene (as in Mendelian diseases) and it is <1 otherwise allele frequency differences at several loci, in spite of (as in complex diseases). The variable e is a function of generations of admixture, implies that a case-control the number of generations for the equilibrium probabil- association study done from the total population could ity in Eq. (4) to be achieved and of various evolutionary be considerably prone to PS, thus leading to many false- forces such as mutation, selection, and recombination positives. Moreover, the extensive LD created because of (27). By using composite likelihoods (37), the Male´cot’s allele frequency differences is again a case of spurious model is fitted to each marker of interest in a particular LD. We also note that, even for homogenous populations, DNA region. Each marker has its own e estimate, but Redden and Allison (25) have shown that assortative there are single values of L and M for the given region mating can lead to spurious associations. (38). A LD map is then built with additive distances in terms of LD units (LDU). One LDU is defined as one LD Mapping Using Population Genetics Theory: swept radius and corresponds to ex = 1; it is the physical The Male´cot’s Model distance over which ‘‘useful LD’’ [the first term in the right in Eq. (4)] decays to 1/e  37% of its starting value. Various evolutionary forces can create strong LD Figure 3 gives an example of a LDU map for a 216-kb between alleles at loci that can be physically far away segment of class II region of MHC, with corresponding from each other, and the use of such LD for fine gene hot spots of recombination (9). Because they are defined mapping can be misleading. Selecting a candidate in terms of ex, the LDUs are negatively correlated with marker that is highly associated with a disease usually LD and positively correlated with recombination. More implies that the disease marker is in strong LD with the technical details on the construction of LD maps can be candidate marker, but this fact in itself gives no found in Maniatis et al. (39) and a discussion of optimal indication as to the physical whereabouts of the disease statistical properties of qx can be found in Shete (40). locus and even as to the latter’s very existence! Eq. (4) indicates that the LD between markers at two The problem arises because of our conceptualization of loci is decomposed into two components: the ‘‘true’’ LD the association (or LD) between alleles at two loci, that is, (due to linkage) and the spurious LD (due to population as the correlation r between the alleles or a function of artifacts). However, only the ‘‘true’’ LD contributes to the the correlation. Almost all pairwise metrics (such as D, LD map. Thus, two markers physically far apart could be 2 D¶, and r ) that are conventionally used to measure LD in high LD but only a few LDUs from each other if the

Cancer Epidemiol Biomarkers Prev 2008;17(12). December 2008

Downloaded from cebp.aacrjournals.org on September 28, 2021. © 2008 American Association for Cancer Research. 3296 Use of Linkage Disequilibrium for Fine Gene Mapping

association was due to population artifacts. If we were to 3. Hirschhorn JN, Lohmueller K, Byrne E, Hirschhorn K. A compre- D D¶ r2 hensive review of studies. Genet Med 2002;4: rely on (or , , etc.), these would be reasonably large 45 – 61. and wrongly suggest that the markers were physically 4. Ott J. Association of genetic loci: replication or not, that is the close; however, the low value for LDU would correctly question. Neurology 2004;63:955 – 8. indicate that the two markers were indeed physically far 5. Cardon LR, Bell JI. Association study designs for complex diseases. Nat Rev Genet 2001;2:91 – 9. apart in spite of being strongly associated! This gives an 6. Gorroochurn P, Hodge SE, Heiman GA, Durner M, Greenberg DA. example of how the use of LD maps can directly address Non-replication of association studies: ‘‘pseudo-failures’’ to repli- the problems associated with the conventional methods cate? Genet Med 2007;9:325 – 31. used for LD fine-mapping as explained in Insights from 7. KruglyakL. Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nat Genet 1999;22:139 – 44. Population Genetics. 8. Pritchard JK, Przeworski M. Linkage disequilibrium in humans: LD maps are thus better able to physically locate genes models and data. Am J Hum Genet 2001;69:1 – 14. of interest and thus less prone to most of the problems 9. Jeffreys AJ, Kauppi L, Neumann R. Intensely punctate meiotic mentioned in Insights from Population Genetics. The recombination in the class II region of the major histocompatibility complex. Nat Genet 2001;29:217 – 22. LD maps built by Maniatis et al. (33) and others (e.g., 10. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES. High- refs. 41-43) support this statement. For example, by using resolution haplotype structure in the human genome. Nat Genet standard LD mapping techniques, Hosking et al. (44) 2001;29:229 – 32. located the CYP2D6 gene (which is associated with poor 11. Petes TD. Meiotic recombination hot spots and cold spots. Nat Rev Genet 2001;2:360 – 9. drug-metabolizing activity) to within a DNA interval of 12. Templeton AR. Population genetics and microevolutionary theory. 390 kb. Using LDU locations, Maniatis et al. (33) were John Wiley & Sons; 2006. able to predict the location of the gene only 14.9 kb from 13. Strachan T, Read A. Human molecular genetics. KY: Garland its true location, surrounded by a 95% confidence Science/Taylor & Francis Group; 2004. 14. Jobling MA, Hurles ME, Tyler-Smith C. Human evolutionary interval of 172 kb. LD maps can also be constructed on genetics. New York: Garland Science; 2003. genome-wide scales for GWA studies through selected 15. Vieland VJ. The replication requirement. Nat Genet 2001;29:244 – 5. map assembly from DNA segments (39). Kuo et al. point 16. Ioannidis JP, Ntzani EE, Trikalinos TA, Contopoulos-Ioannidis DG. out that LD maps are achievable even at the highest Replication validity of genetic association studies. Nat Genet 2001;29: 306–9. marker densities, including the HapMap data with 17. Palmer LJ, Cardon LR. Shaking the tree: mapping complex disease >3 million single nucleotide polymorphisms. genes with linkage disequilibrium. Lancet 2005;366:1223 – 34. 18. Knowler WC, Williams RC, Pettitt DJ, Steinberg AG. Gm3;5,13,14 and type 2 diabetes mellitus: an association in American Indians with Conclusion genetic admixture. Am J Hum Genet 1988;43:520 – 6. 19. Gorroochurn P, Hodge SE, Heiman GA, Greenberg DA. A unified Understanding the major limitations of the current use of approach for quantifying, testing and correcting population strati- fication in case-control association studies. Hum Hered 2007;64: LD for fine-mapping is essential lest the technique will be 149 – 59. regarded with skepticism and distrust by many. More- 20. Devlin B, Roeder K. Genomic control for association studies. over, recent research on metrics based on population Biometrics 1999;55:997 – 1004. genetics theory represents a positive step toward 21. Pritchard JK, Stephens M, Rosenberg NA, Donnelly P. Association mapping in structured populations. Am J Hum Genet 2000;67: alleviating some of the current disenchantment with 170 – 81. association studies. In spite of the advantages offered by 22. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, ShadickNA, the Male´cot’s model and LD maps, the more popular Reich D. Principal components analysis corrects for stratification in strategy in GWA studies has been simply the selection of genome-wide association studies. Nat Genet 2006;38:904 – 9. 23. Gorroochurn P, Heiman GA, Hodge SE, Greenberg DA. Centralizing the statistically most significant single nucleotide poly- the non-central chi-square: a new method to correct for population morphisms (45, 46). However, selecting significant single stratification in genetic case-control association studies. Genet nucleotide polymorphisms and then trying to find the Epidemiol 2006;30:277 – 89. causal gene in a neighboring region based on strong LD 24. Gorroochurn P, Hodge SE, Heiman G, Greenberg DA. Effect of population stratification on case-control association studies. II. False- will likely lead to many mapping problems in the future. positive rates and their limiting behavior as number of subpopula- tions increases. Hum Hered 2004;58:40 – 8. 25. Redden DT, Allison DB. The effect of assortative mating upon genetic Disclosure of Potential Conflicts of Interest association studies: spurious associations and population substruc- ture in the absence of admixture. Behav Genet 2006;36:678 – 86. No potential conflicts of interest were disclosed. 26. Lewontin RC. On measures of gametic disequilibrium. Genetics 1988; 120:849 – 52. 27. Collins A, Morton NE. Mapping a disease locus by allelic association. Acknowledgments Proc Natl Acad Sci U S A 1998;95:1741 – 5. The costs of publication of this article were defrayed in part by 28. Morton NE, Zhang W, Taillon-Miller P, Ennis S, KwokPY, Collins A. the payment of page charges. This article must therefore be The optimal measure of allelic association. Proc Natl Acad Sci U S A hereby marked advertisement in accordance with 18 U.S.C. 2001;98:5217 – 21. Section 1734 solely to indicate this fact. 29. Lonjou C, Collins A, Morton NE. Allelic association between marker loci. Proc Natl Acad Sci U S A 1999;96:1621 – 6. We thankDrs. S.E. Hodge, G.A. Heiman, R. Fan, M. Durner, 30. Zhang W, Collins A, Maniatis N, Tapper W, Morton NE. Properties and W.J. Ewens for helpful comments and the two anonymous of linkage disequilibrium (LD) maps. Proc Natl Acad Sci U S A 2002; reviewers for useful suggestions. 99:17004 – 7. 31. Maniatis N, Collins A, Gibson J, Zhang W, Tapper W, Morton NE. Positional cloning by linkage disequilibrium. Am J Hum Genet 2004; 74:846 – 55. References 32. Collins A, Lau W, De La Vega F. Mapping genes for common 1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights diseases: the case for genetic (LD) maps. Hum Hered 2004;58:2 – 9. into the genetics of common disease. J Clin Invest 2008;118:1590 – 605. 33. Maniatis N, Morton NE, Gibson J, Xu CF, Hosking LK, Collins A. 2. Clarke GM, Carter KW, Palmer LJ, Morris AP, Cardon LR. Fine The optimal measure of linkage disequilibrium reduces error in mapping versus replication in whole-genome association studies. association mapping of affection status. Hum Mol Genet 2005;14: Am J Hum Genet 2007;81:995 – 1005. 145 – 53.

Cancer Epidemiol Biomarkers Prev 2008;17(12). December 2008

Downloaded from cebp.aacrjournals.org on September 28, 2021. © 2008 American Association for Cancer Research. Cancer Epidemiology,Biomarkers & Prevention 3297

34. Morton NE. Linkage disequilibrium maps and association mapping. 42. Tapper W, Collins A, Gibson J, Maniatis N, Ennis S, Morton NE. A J Clin Invest 2005;115:1425 – 30. map of the human genome in linkage disequilibrium units. Proc Natl 35. Male´cot G. Les Mathe´matiques de l’He´re´dite´. Paris: Masson et Cie; Acad Sci U S A 2005;102:11835 – 9. 1948. 43. De La Vega F, Isaac H, Collins A, et al. The linkage disequilibrium 36. Male´cot G. The mathematics of heredity. San Francisco: maps of three human chromosomes across four populations reflect Freeman; 1969. their demographic history and a common underlying recombination 37. Devlin B, Risch N, Roeder K. Disequilibrium mapping: composite pattern. Genome Res 2005;15:454 – 62. likelihood for pairwise disequilibrium. Genomics 1996;36:1 – 16. 44. Hosking LK, Boyd PR, Xu CF, et al. Linkage disequilibrium 38. Tapper W. Linkage disequilibrium maps and location databases. mapping identifies a 390 kb region associated with CYP2D6 In: Collins AR(ed). Linkage Disequilibrium and Association Map- poor drug metabolising activity. Pharmacogenomics J 2002;2: ping, pp 23 – 45. New Jersey: Humana Press; 2007. 165 – 75. 39. Kuo T-Y, Lau W, Collins AR. LDMAP. In: Collins AR(ed). Linkage 45. Klein RJ, Zeiss C, Chew EY, et al. Complement factor H poly- Disequilibrium and Association Mapping, pp 47 – 57. New Jersey: morphism in age-related macular degeneration. Science 2005;308: Humana Press; 2007. 385 – 9. 40. Shete S. A note on the optimal measure of allelic association. Ann 46. The Wellcome Trust Case Control Consortium. Genome-wide Hum Genet 2003;67:189 – 91. association study of 14,000 cases of seven common diseases and 41. Tapper WJ, Maniatis N, Morton NE, Collins A. A metric linkage 3,000 shared controls. Nature 2007;447:661 – 78. disequilibrium map of a human chromosome. Ann Hum Genet 2003; 47. KrawczakM, SchmidtkeJ. DNA fingerprinting. Oxford: BIOS 67:487 – 94. Scientific Publishers; 1998.

Cancer Epidemiol Biomarkers Prev 2008;17(12). December 2008

Downloaded from cebp.aacrjournals.org on September 28, 2021. © 2008 American Association for Cancer Research. Perils in the Use of Linkage Disequilibrium for Fine Gene Mapping: Simple Insights from Population Genetics

Prakash Gorroochurn

Cancer Epidemiol Biomarkers Prev 2008;17:3292-3297.

Updated version Access the most recent version of this article at: http://cebp.aacrjournals.org/content/17/12/3292

Cited articles This article cites 39 articles, 9 of which you can access for free at: http://cebp.aacrjournals.org/content/17/12/3292.full#ref-list-1

E-mail alerts Sign up to receive free email-alerts related to this article or journal.

Reprints and To order reprints of this article or to subscribe to the journal, contact the AACR Publications Subscriptions Department at [email protected].

Permissions To request permission to re-use all or part of this article, use this link http://cebp.aacrjournals.org/content/17/12/3292. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC) Rightslink site.

Downloaded from cebp.aacrjournals.org on September 28, 2021. © 2008 American Association for Cancer Research.