Segmental Duplications 73 5 Segmental Duplications
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 5 / Segmental Duplications 73 5 Segmental Duplications Andrew J. Sharp, PhD and Evan E. Eichler, PhD CONTENTS INTRODUCTION FEATURES OF SEGMENTAL DUPLICATIONS DISTRIBUTION OF SEGMENTAL DUPLICATIONS SEGMENTAL DUPLICATIONS AND EVOLUTION SEGMENTAL DUPLICATIONS AND GENOMIC VARIATION SUMMARY REFERENCES INTRODUCTION Until recent times, the identification and characterization of segmental duplications has often been based purely on anecdotal reports. However, the completion of the Human Genome Project has now made possible the systematic analysis of the extent and distribution of dupli- cated sequences in humans. Both in situ hybridization and in silico approaches have shown that approx 5% of our genome is composed of highly homologous duplicated sequence (1,2), with enrichments of six- to sevenfold in pericentromeric (3), and two- to threefold in subtelomeric regions (4), respectively. Not only does the presence of these paralogous segments represent a significant challenge to the correct assembly of the human genome, but there is also an increasing awareness of their role in human evolution, variation, and disease. We present a review of segmental duplications in the mammalian genome. We describe their basic characteristics, distribution, and dynamic nature during recent evolutionary history. Based on these features, we discuss models to account for the proliferation of these sequences in the mammalian lineage, and also their contribution towards karyotypic evolution and phe- notypic differences between primates. Finally, we highlight the role of segmental duplications as mediators of human variation at the genomic level. FEATURES OF SEGMENTAL DUPLICATIONS Human segmental duplications (also termed low-copy repeats) are blocks of DNA ranging from 1 to 200 kb in length that occur at more than one site within the genome and share a high level (often >90%) of sequence identity. They can include both genic sequences and high-copy repeats, such as long interspersed elements and short interspersed elements, and, unlike tan- dem duplications, are interspersed throughout the genome. This unique distribution pattern From: Genomic Disorders: The Genomic Basis of Disease Edited by: J. R. Lupski and P. Stankiewicz © Humana Press, Totowa, NJ 73 74 Part II / Genomic Structure suggests that segmental duplications arise through a process termed duplicative transposition, whereby whole blocks of sequence are first duplicated and then transposed from one genomic location to another. Although they have been identified on every human chromosome, their distribution is nonuniform with some chromosomes and chromosomal regions showing pecu- liar enrichment. Segmental duplications tend to cluster within pericentromeric and, to a lesser extent, subtelomeric regions. Many human pericentromeric regions in particular contain high concentrations of duplicated sequence, often arranged in large blocks composed of complex modular structures of individual duplications (3). Segmental duplications may be further classified based on their chromosomal distribution. Although some blocks of sequence may be duplicated to multiple locations within a single chromosome (termed intrachromosomal duplication), others may be located on non-homologous chromosomes (inter- or transchromosomal duplication). Intriguingly, the distribution of these two types of duplication appears to be largely exclusive of one another (2), suggesting mecha- nistic differences in their mode of propagation. Intrachromosomal Duplications Initial identification of chromosome-specific duplications often came about through the study of common microdeletion/micoduplication syndromes, such as Prader-Willi/Angelman syndromes, Williams-Beuren syndrome, Smith-Magenis syndrome, Charcot-Marie-Tooth disease type 1A, and DiGeorge/Velocardiofacial syndrome (5). As more of these syndromes were analyzed, it became apparent that the presence of large and highly homologous segmental duplications flanking these sites of rearrangement was a recurring theme. Available evidence suggests that homology between these duplicated sequences acts as a substrate for unequal meiotic recombination, leading to the deletion, duplication, or inversion of the intervening sequence (Fig. 1). Review of known genomic disorders caused by chromosome-specific duplications shows that these usually involve duplications that are more than 95% similar and 10–500 kb in length, separated by 50 kb to 4 Mb of DNA (5). Although intrachromosomal duplications are found throughout the euchromatic portions of virtually every human chromosome, some chromosomes, such as 1, 9, 16, 17, and 22, show particularly high concentrations within their most proximal regions (2). In these cases the density of duplication can be such that little or no unique sequence occurs across relatively large stretches (300 kb to 4 Mb) of DNA. The organization of these regions can be complex, with large duplication blocks often composed of smaller modules, which have been derived from different genomic locations. This feature has often led to difficulties in characterizing these regions, necessitating the development of specialized methods to allow these regions to be successfully mapped and sequenced. Many intrachromosomal duplications also share very high levels of sequence identity, with the majority having more than 95% identity between paralogous copies. In extreme cases, this level of nucleotide identity approaches the frequency of allelic variation found in the genome as a whole (approx 1 base per kilobase) (6). This property further confounds the mapping and identification of individual duplicated segments within the genome, and may also represent a significant impediment in the ability to distinguish true allelic polymorphism from paralogous genomic copies (7–9). Indeed, both gene and single nucleotide polymorphism annotation show significant improvements in accuracy when duplicated sequences are correctly defined within a genome assembly (10). Thus, the correct definition of segmental duplications is an important aspect of achieving high-quality genomic sequence (11). Chapter 5 / Segmental Duplications 75 Fig. 1. Segmental duplications mediate structural rearrangement. Misalignment of paralogous blocks of sequence during meiosis leads to unequal recombination and the deletion or duplication of the interven- ing sequence. These events may create structural polymorphisms or, if the genes (A,B,C) flanked by the duplications are dosage sensitive, genomic disease. ᭺, centromere; TEL, telomere. (Reproduced with permission from ref. 95.) Based on the neutral mutation rate in primates of approx 1.5 × 10–9 substitutions per site per year (12), the high levels of homology observed between many chromosome-specific dupli- cations suggests they have emerged only recently during evolutionary history. Indeed, com- parative analysis of different primate lineages has demonstrated that some are species-specific, confirming their dynamic nature (13). In other cases, analysis of the levels of sequence diver- gence between pairs of duplication in different primates, such as those flanking the common Williams-Beuren syndrome deletion region in 7q11.23 and the large palindromic repeats on long arm of the Y chromosome, has provided evidence that they may also act as substrates for gene conversion (14–16). Although fluorescence in situ hybridization analysis shows the presence of these duplications in multiple primate species, thus, suggesting they originated before the separation of these lin- eages, estimates of their evolutionary age based on the rate of nucleotide divergence because of random mutation suggest a much more recent origin (17,18). One explanation for this apparent contradiction is that the homology between pairs of repeats acts as a substrate for gene conversion events, thus, homogenizing their sequence and maintaining unexpectedly high levels of identity. Although less likely, an alternative explanation is that some segments of DNA have duplicated to the same genomic location independently in different primate lineages. 76 Part II / Genomic Structure Interchromosomal Duplications The most striking property of duplications that map to nonhomologous chromosomes is their propensity to accumulate adjacent to certain regions of the genome, particularly in pericentromeric and subtelomeric regions and in the short arms of the acrocentric chromo- somes. In silico analysis has shown a six- to sevenfold enrichment of duplicated segments within 100 kb and 1 Mb of telomeres and centromeres, respectively (3), and more than one- third of all interchromosomal duplications have a pericentromeric localization. Indeed, inter- chromosomal duplication has been one of the major forces involved in remodeling these chromosomal regions in recent primate evolution (3). As with chromosome-specific duplications, many of which have been implicated as media- tors of recurrent microdeletion/microduplication syndromes, certain interchromosomal dupli- cations have also been associated with recurrent chromosomal rearrangement (19–21), albeit at much lower frequencies. In addition, certain pericentromeric and subtelomeric regions have been found to exhibit polymorphic variation within the human population, indicating that the process of interchromosomal duplication is ongoing (22,23). DISTRIBUTION OF SEGMENTAL DUPLICATIONS One of the most remarkable features of segmental duplications is their tendency to often cluster in close proximity to one another, particularly within peri- and subtelomeric regions