
Copyright 2004 by the Genetics Society of America DOI: 10.1534/genetics.103.021584 Application of Coalescent Methods to Reveal Fine-Scale Rate Variation and Recombination Hotspots Paul Fearnhead,* Rosalind M. Harding,†,1 Julie A. Schneider,†,2 Simon Myers‡ and Peter Donnelly‡,3 *Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, United Kingdom, †Medical Research Council Molecular Hematology Unit, Institute of Molecular Medicine, University of Oxford, Oxford OX3 9DS, United Kingdom and ‡Department of Statistics, University of Oxford, Oxford OX1 3TG, United Kingdom Manuscript received August 22, 2003 Accepted for publication April 29, 2004 ABSTRACT There has been considerable recent interest in understanding the way in which recombination rates vary over small physical distances, and the extent of recombination hotspots, in various genomes. Here we adapt, apply, and assess the power of recently developed coalescent-based approaches to estimating recombination rates from sequence polymorphism data. We apply full-likelihood estimation to study rate variation in and around a well-characterized recombination hotspot in humans, in the -globin gene cluster, and show that it provides similar estimates, consistent with those from sperm studies, from two populations deliberately chosen to have different demographic and selectional histories. We also demon- strate how approximate-likelihood methods can be used to detect local recombination hotspots from genomic-scale SNP data. In a simulation study based on 80 100-kb regions, these methods detect 43 out of 60 hotspots (ranging from 1 to 2 kb in size), with only two false positives out of 2000 subregions that were tested for the presence of a hotspot. Our study suggests that new computational tools for sophisticated analysis of population diversity data are valuable for hotspot detection and fine-scale mapping of local recombination rates. ECOMBINATION plays a central role in shaping (MHC) class II region (Jeffreys et al. 2001; Jeffreys R patterns of molecular genetic diversity in natural and Neumann 2002). However, achieving high resolu- populations. There is growing evidence of extensive vari- tion using laboratory-based methods is technically diffi- ation in recombination rates over scales as small as kilo- cult and costly, and the recombination rate estimates bases, although neither the mechanism nor the pattern are specific to males. As there are substantial differences of this variation is well understood (Daly et al. 2001; in sex-specific recombination rates at the centimorgan Jeffreys et al. 2001; Petes 2001; Nachman 2002; scale (Kong et al. 2002), we should not expect fine-scale Schneider et al. 2002). In humans, the traditional ap- recombination rate estimates from sperm studies to be proach to estimating recombination rates has been fully informative of the evolutionary process of recombi- through pedigree studies, but these have resolution only nation, which averages across males and females, and over scales of centimorgans (or megabases in physi- over long spans of time. cal distance at the human genome-wide average recom- Another source of information about local recombi- bination rate). Even in organisms where breeding ex- nation rates is population genetics data. These data do periments are more straightforward, the landscape of not simply enumerate a direct count of the recombina- fine-scale variation in the recombination rate remains tion events, but have the advantage of reflecting the relatively uncharted. evolutionary process. While qualitative information Recently, laboratory-based studies have mapped re- about underlying recombination rates can sometimes combination hotspots within several human loci from be obtained from standard pairwise measures of linkage analysis of crossovers detected in sperm, for example, disequilibrium (LD), quantitative estimation of recombi- to intervals of 1–2 kb within the major histocompatibility nation rates from polymorphism data is a challenging statistical problem that has only recently begun to attract considerable attention (e.g., Griffiths and Marjoram 1Present address: Department of Statistics, University of Oxford, Ox- 1996a; Kuhner et al. 2000; Wall 2000; Fearnhead and ford OX1 3TG, United Kingdom. Donnelly 2001, 2002; Hudson 2001; McVean et al. 2Present address: National Cancer Institute, 31 Center Dr., 31/10A52 2002). At one end of a spectrum of statistical approaches Bethesda, MD 20892-2590. are simpler methods, which ignore much of the infor- 3Corresponding author: Department of Statistics, University of Oxford, 1 S. Parks Rd., Oxford OX1 3TG, United Kingdom. mation in the sample and base inference on one or a E-mail: [email protected] few summary statistics (Wall 2000). At the other ex- Genetics 167: 2067–2081 ( August 2004) 2068 P. Fearnhead et al. treme, so-called full-likelihood methods utilize all of the tion hotspot to a 2-kb region just 5Ј of the -globin gene information in the data, but at the expense of being (Wall et al. 2003). highly computationally intensive (Griffiths and Mar- The availability of molecularly phased population joram 1996a; Kuhner et al. 2000; Fearnhead and Don- data from a worldwide sample of populations, and of nelly 2001). An intermediate class of computational direct estimates of recombination rates across the hot- methods approximates the true likelihood in various spot from sperm studies, is ideal for examining both ways (Hudson 2001; Fearnhead and Donnelly 2002; the accuracy of full-likelihood estimation and its sensitiv- McVean et al. 2002; Li and Stephens 2003). To date, ity to different population demographic histories. We virtually all statistical methods have assumed that the analyze samples from two populations, The Gambia and recombination rate is constant over the interval being the United Kingdom, deliberately chosen on the basis studied. of their different demographic histories and selective With the advent of surveys of molecular genetic varia- pressures. Another valuable feature of these data for tion on genomic scales, particularly but not exclusively the present analyses is that there are no additional un- in humans, computational population-based methods certainties associated with allelic (haplotype) inference should enable fine-scale recombination rate variation to by statistical methods. be characterized across the human and other genomes, There is growing evidence that recombination hot- providing information that may be crucial not only for spots may be widespread across the human genome, disentangling the molecular, demographic, and selec- although to date relatively few have been characterized. tive factors generating linkage disequilibrium, but also It is thus of considerable interest to be able to detect for association mapping of complex diseases (Krug- hotspots (defined here as small regions where the re- lyak 1999; Jorde 2000; Ott 2000; Pritchard and combination rate is increased considerably relative to Przeworski 2001; Reich et al. 2001). the local background rate) on genomic scales, and such Here we adapt, apply, and assess two methods to study information would also be invaluable for the design and different aspects of fine-scale variation in recombination analysis of disease studies. Again, one potential source rates from population data. We first apply full-likelihood of information is population data, with the increasing estimation (Fearnhead and Donnelly 2001) to study availability of population surveys of genetic diversity over fine-scale rate variation around and inside one of the genomic scales. Because of the computational burden most well-characterized recombination hotspots in the involved, it does not seem practicable to adapt full- human genome. The -globin gene complex on human likelihood methods to this problem. Here we present chromosome 11p (Figure 1) includes a recombination and study a new method for detecting hotspots on the hotspot that was originally identified by Chakravarti et basis of an approximate-likelihood approach (Fearn- al. (1984). In that study, a pattern of random association head and Donnelly 2002), which can closely mimic Ј Ј  between 5 and 3 haplotypes for the -globin complex, full-likelihood inference. Informally, the approach is to based on restriction fragment length polymorphisms separately analyze small subregions of the genome, and (RFLPs), was replicated in four populations, suggesting then to detect hotspots by comparing the likelihood Ј  a recombination hotspot 5 to the -globin gene with- curves for each subregion against that for an underlying Ј in a 9.1-kb interval marked by a TaqI RFLP 5 to the background rate. ␦-globin gene (Figure 1). Chakravarti et al. (1984) estimated recombination rates to be elevated 3–30 times above the genome-wide average. Recent single-sperm MATERIALS AND METHODS informative meioses gave an estimate 5000ف typing of for the male recombination fraction across the hotspot -Globin DNA sequences: The haplotype sequences were of 80 ϫ 10Ϫ5 kbϪ1 [95% confidence interval (C.I.): 9 ϫ reported in Harding et al. (1997) and consist of data from Ϫ Ϫ 31 chromosomes from The Gambia and 4b chromosomes 10 5–160 ϫ 10 5] and detected no recombination Ј from Oxfordshire, United Kingdom. The chromosomes were events over the adjacent 5 90-kb region (Schneider et sequenced in a 3-kb region that encompasses the -globin al. 2002). (The human genome-wide average recombi- gene (see Figure 1). Haplotypes were determined experimen- ϫ 10Ϫ5 kbϪ1.) There have also been tally. Polymorphisms
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-