24 Recombination Hotspots in Nonallelic Homologous Recombination
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 24 / NAHR Hotspots 341 24 Recombination Hotspots in Nonallelic Homologous Recombination Matthew E. Hurles, PhD and James R. Lupski, MD, PhD CONTENTS INTRODUCTION IDENTIFICATION OF RECOMBINATION HOTSPOTS FEATURES OF AHR AND NAHR HOTSPOTS MECHANISTIC BASIS OF RECOMBINATION HOTSPOTS EVOLUTIONARY ORIGINS OF HOTSPOTS CONCLUSIONS AND FUTURE WORK SUMMARY REFERENCES INTRODUCTION Rearrangement breakpoints resulting from nonallelic homologous recombination (NAHR) are typically clustered within small, well-defined portions of the segmental duplications that promote the rearrangement. These NAHR “hotspots” have been identified in every NAHR- promoted rearrangement in which breakpoint junctions have been sequenced in sufficient numbers. Enhancement of recombinatorial activity in NAHR hotspots varies from 3 to 237 times more than in the surrounding “cold” duplicated sequence. NAHR hotspots share many features in common with allelic homologous recombination (AHR) hotspots. Both AHR and NAHR hotspots appear to be relatively small (<2 kb) and are initiated by double-strand breaks. Gene conversion events as well as crossovers are enhanced at NAHR hotspots. Recent work has improved our understanding of the origins of NAHR and AHR hotspots, with both appear- ing to be relatively short-lived phenomena. Our present understanding of NAHR hotspots comes from a limited number of locus-specific studies. In the future, we can expect genome- wide analyses to provide many further insights. During meiosis, AHR occurs between homologous chromosomes, generating haplotypic diversity in the succeeding generation. The distribution of these recombination events through- out the genome has been known to be nonrandom for decades, but in the past 5 years ever-finer spatial resolutions have revealed dramatic heterogeneity in AHR rates at the DNA sequence level. These studies have demonstrated the existence of recombination hotspots, where AHR From: Genomic Disorders: The Genomic Basis of Disease Edited by: J. R. Lupski and P. Stankiewicz © Humana Press, Totowa, NJ 341 342 Part V / Functional Aspects of Genome Structure rates can be several orders of magnitude more than in surrounding “cold” regions. In parallel to these developments, sufficient numbers of breakpoints of selected NAHR rearrangements have been characterized at the DNA sequence level to resolve the distribution of crossovers in these cases. This has similarly led to the identification of NAHR hotspots within paralogous sequences. This chapter profiles AHR and NAHR hotspots and discusses their similarities with a view to understanding the molecular mechanisms underpinning pathogenic NAHR. It is necessary to be clear when defining what constitutes a recombination hotspot. In the most general sense, a hotspot is a region of elevated activity relative to its surroundings. However, activity in the “surroundings” can be quantified in a number of ways. We consider a recombination hotspot to be an interval of DNA, defined at the sequence level, which mani- fests elevated recombinatorial activity relative to its immediate flanking sequences. In prin- ciple, a recombination hotspot could alternatively be defined as exhibiting elevated recombinatorial activity relative to the genome average. It is worth noting that the DNA1 hotspot identified experimentally in the major histocompatibility complex (MHC) region (1) using the former criterion actually exhibits lower recombinatorial activity than the genome average. Likewise, crossover in the NAHR hotspot in the Charcot-Marie-Tooth disease type 1A (CMT1A)-REP (2) does not appear to be significantly more frequent than the average genome- wide recombination rate and has been referred to as a positional specificity for strand exchange (3). Nonetheless, studies of both regions revealed that not all homologous sequences are equal and a “punctate” pattern of crossovers is revealed for both AHR and NAHR. In addition, homologous recombination is a process that can result in one of two outcomes: a crossover event or a gene conversion. Consequently, we consider that a recombination hotspot may become apparent because of elevated levels of either outcome. The Importance of Recombination Hotspots There are a handful of pathogenic mutation processes that operate in the human genome (e.g., base substitution, replication slippage, NAHR). For some of these mutational mecha- nisms we have a good understanding of the rates and locations at which these mutations arise (e.g., base substitution and replication slippage), whereas for NAHR we do not. Examining the distribution of recombination events at the sequence level is perhaps one of the most important clues we can have for understanding the basis of pathogenic NAHR. In a more general sense, the fine-mapping of all homologous recombination (HR) processes can only help our admit- tedly basic comprehension of what is a fundamental cellular process. In recent years there has been a growing desire to be able to use patterns of linkage disequi- librium (LD) throughout the genome to design more efficient association studies for the detec- tion of genes predisposing to complex diseases (4). These patterns of LD have been investigated at high resolution at several sites within the genome and a common finding is that the physical distance over which LD persists in the genome is highly variable (5). This variability leads to a “block-like” haplotypic structure in which regions of high LD are separated by shorter sequences across which LD is minimal (6). One of the main causal mechanisms by which this block-like structure might arise is the existence of extreme AHR rate heterogeneity. The idea being that recombination rates are low or absent within haplotype blocks but that hotspots of recombinatorial activity map between blocks. Other possible causal mechanisms of these haplotypic structures exist, however these alternative processes are dependent on population- specific demographic factors. Given the immense effort it is taking to map haplotypic struc- tures in four human populations in the HapMap project (7), it is important to know to what Chapter 24 / NAHR Hotspots 343 degree it is possible to predict haplotypic structures in other populations from the data on these four. If the distribution of LD is determined mainly by population demography then haplotype structures are likely to differ markedly between populations. If, however, recombination hotspots, a molecular rather then population-based process, accounts for patterns of LD and these hotspots are shared between populations, then haplotype structures are much more likely to be shared between populations. In addition, accurate estimates of recombination rates are required when drawing evolution- ary inferences from autosomal variation, for example, the estimation of the very recent age of the HIV-resistant CCR5-Δ32 mutation (8). The apparent recent origin of this variant has lead to further work examining the possible role of the CCR5-Δ32 mutation in conferring resistance to recent plagues within the past 1000 years (9,10). This dating estimation is almost entirely predicated on an assumption of recombination rate homogeneity across a 46-Mb portion of chromosome 3. IDENTIFICATION OF RECOMBINATION HOTSPOTS The most obvious means to examine recombination rate heterogeneity is to identify a sufficient number of individual recombination events within a set of defined physical intervals, such that the frequency of recombination in one interval can be compared with that in another. This can be achieved at a variety of different spatial resolutions. Despite this conceptual similarity, the methods used to identify AHR and NAHR hotspots have traditionally been very different. Finding NAHR Hotspots NAHR events are generally ascertained as a result of their deleterious phenotypic outcomes; usually conveyed by gene dosage effects secondary to DNA rearrangements (e.g., deletion or duplication). In this sense they are similar to phenotype-based screens in model organisms. De novo NAHR events are subsequently identified by comparing the genomes of patients and their parents. A collection of de novo NAHR events can then be fine-mapped to identify and ulti- mately sequence the rearrangement breakpoints. Not all recurrent rearrangements result from NAHR (11) and so the preliminary identification of similar sized rearrangements must be followed by the mapping of breakpoints to blocks of duplicated sequence. Most NAHR breakpoints have been mapped to a pair of duplicated sequences (also referred to as low-copy repeats, segmental duplications, or duplicons), but not to a specific interval within these duplicated sequences. There are two major complicating factors that hinder the fine-mapping of NAHR breakpoints within duplicated sequences: the presence of the dupli- cated sequences in multiple copies, and the complex sequence variation within duplicated sequences. Somatic cell hybrids that retain only the rearranged copy of a chromosome and not its unrearranged homolog have been found to be extremely useful in disentangling the signals from the multiple copies of the duplicated sequences that have driven the NAHR events. Reducing the copy number of the sequences flanking the breakpoint interval enables the isolation of a junction-specific breakpoint that can be subsequently sequenced. The complex pattern of sequence variation within duplicated sequences means that variants apparent within the reference sequence that distinguish between proximal and distal copies of a duplicated sequence (known