Microsatellites: Simple Sequences with Complex Evolution

Microsatellites: Simple Sequences with Complex Evolution

REVIEWS MICROSATELLITES: SIMPLE SEQUENCES WITH COMPLEX EVOLUTION Hans Ellegren Few genetic markers, if any, have found such widespread use as microsatellites, or simple/short tandem repeats. Features such as hypervariability and ubiquitous occurrence explain their usefulness, but these features also pose several questions. For example, why are microsatellites so abundant, why are they so polymorphic and by what mechanism do they mutate? Most importantly, what governs the intricate balance between the frequent genesis and expansion of simple repetitive arrays, and the fact that microsatellite repeats rarely reach appreciable lengths? In other words, how do microsatellites evolve? HETEROZYGOSITY Assuming chance association of nucleotides, the proba- multiple alleles, which is in sharp contrast to unique The proportion of individuals in bility of finding the sequence CACACACACACACA- DNA. With the advent of PCR in the late 1980s, the a population that carry two CACACA more than once in the human genome is analysis and genotyping of microsatellite polymor- different alleles at a locus. negligible. However, perfect or near-perfect tandem phisms became straightforward (see TIMELINE). Micro- GENE FLOW iterations of short sequence motifs of this kind are satellites quickly became the marker of choice in genome The transfer of alleles within and extremely common in eukaryotic genomes and, in the mapping, and subsequently also in population genetics between populations that arises case of the human genome, they are found at hundreds studies and related areas. from migration and dispersal. of thousands of places along chromosomes1.This par- For a neutral marker, the degree of polymorphism ticular genomic feature is not restricted to (CA)n repeats is proportional to the underlying rate of mutation. — every possible motif of mono-, di, tri- and tetranu- Given the extensive polymorphism of microsatellites, cleotide repeats is vastly overrepresented in the genome. it follows that mutations must occur frequently — an Ever since their discovery in the early 1980s, the ubiqui- assumption that is supported by direct observations5. tous occurrence of microsatellites — also referred to as The rate and direction of mutations constitute two short tandem repeats (STRs) or simple sequence repeats basic factors in the estimation of genetic distance on (SSRs) — has puzzled geneticists. Why are they so com- the basis of microsatellite data. By applying theoretical mon? Do they fulfill some function or are they simply models of microsatellite evolution to empirical data, junk DNA sequences that should perhaps be viewed as population geneticists attempt to, for example, deter- ‘selfish DNA’2,3? Addressing these questions is important mine how long ago two populations diverged, or mea- if we wish to understand how genomes are organized sure the amount of GENE FLOW between populations. and why most genomes are filled with sequences other However, despite the extensive use of microsatellite Department of Evolutionary than genes. markers over the past 15 years, it is clear that many Biology, Evolutionary Biology Microsatellites are among the most variable types theoretical models fail to accurately explain allele fre- Centre, Uppsala University, of DNA sequence in the genome4.In contrast to unique quency distributions in natural populations. Importantly, Norbyvägen 18D, DNA, microsatellite polymorphisms derive mainly from it seems that microsatellite evolution is a far more com- SE-752 36 Uppsala, Sweden. e-mail: variability in length rather than in the primary sequence. plex process than was previously thought. A deeper [email protected] Moreover, genetic variation at many microsatellite loci is understanding of the evolutionary and mutational doi:10.1038/nrg1348 characterized by high HETEROZYGOSITY and the presence of properties of microsatellites is therefore needed, not NATURE REVIEWS | GENETICS VOLUME 5 | JUNE 2004 | 435 REVIEWS Timeline | The early history of microsatellites Sequence analysis of Demonstration of extensive alleles at the globin locus length variability of tandem Microsatellites used to Fine-scale analysis of the found a varying number of repetitive DNA as revealed by Development of PCR- derive the first detailed genetic relationships among short sequence DNA fingerprinting of based microsatellite map of the human human populations made motifs122,123 *. minisatellites125. genotyping127–129. genome131. possible by microsatellites132. 1981 1982 1985 1986 1989 1991 1992 1993 1994 Identification of a novel repeated Regions of ‘cryptic simplicity’ Microsatellites introduced Large numbers of microsatellite element — alternating pyrimidine–purine identified as an important for studies of natural mutations identified from polymers — in eukaryotic genomes, source of genetic variation126. populations 84,130. pedigree analysis in humans5. with Z-DNA-forming potential 124. *The importance of these findings seemed to be insignificant, as laborious cloning and sequencing procedures were required to analyse the polymorphisms122,123. only to understand how the genome is organized, but It is more complicated to define the minimum num- also to correctly interpret and use microsatellite data in ber of iterations needed for a repetitive sequence to be population genetics studies. referred to as a microsatellite. For instance, the sequence Recent new information provides clues to the mys- CACA occurs frequently in the human genome: should tery of microsatellite repeats. First, whole-genome it be seen as (CA)2 microsatellites or just as unique sequence data provide an unbiased picture of the occur- sequence? In practice, the threshold that is used when rence and genomic distribution of repetitive elements. describing the occurrence of a microsatellite in a genomic Second, large-scale pedigree analysis in different organ- sample data set must be specified. Unfortunately, no real isms gives direct insight into the characteristics of consensus has been reached on this matter; whereas de novo mutation events. Third, molecular studies of the some use a minimum number of base pairs, others use a DNA replication machinery show what might go wrong minimum number of repeat units, and in both cases, during microsatellite replication. Here, I review these the numbers have varied. The issue is further compli- new findings and summarize our current knowledge cated by the lack of agreement on how much degener- about microsatellite evolution. Emerging from the new acy should be accepted for characterizing a slightly data is the picture of a heterogeneous mutation process, imperfect tandem repetitive sequence as a microsatellite. showing distinct differences in rates and patterns of Mismatch considerations are particularly important mutation among loci and species. when using algorithms (such as RepeatMasker, Sputnik and Tandem Repeats Finder;see online links box and The genome biology of microsatellites BOX 1) to search large genomic sequences for repeats. What is a microsatellite? Genomes are scattered with It is appropriate to further classify microsatellites simple repeats. Tandem repeats occur in the form of according to their association with coding sequence as iterations of repeat units of almost anything from a sin- this is related to the mutational and selective forces that gle base pair to thousands of base pairs. Mono-, di-, tri- operate on different types of repeat. The bulk of simple and tetranucleotide repeats are the main types of repeats are embedded in non-coding DNA, either in the microsatellite, but repeats of five (penta-) or six (hexa-) intergenic sequence or in the introns. Microsatellites nucleotides are usually classified as microsatellites as that are used as genetic markers are usually of this type well. Repeats of longer units form minisatellites or, in and are generally assumed to evolve neutrally. Their fre- the extreme case, satellite DNA. The term satellite DNA quency and distribution should therefore reflect the originates from the observation in the 1960s of a frac- underlying mutation process. In coding DNA, selection tion of sheared DNA that showed a distinct buoyant against frameshift mutations effectively hinders the density, detectable as a ‘satellite peak’ in DENSITY GRADIENT expansion of everything other than trinucleotide CENTRIFUGATION,and that was subsequently identified repeats6,for which there might be further length con- as large centromeric tandem repeats. When shorter straints related to protein function7.Trinucleotide (10–30-bp) tandem repeats were later identified, they repeats associated with human disease comprise a spe- came to be known as minisatellites. Finally, with the dis- cial class of microsatellites in coding DNA. These loci covery of tandem iterations of simple sequence motifs, undergo extensive repeat expansions, the mutational the term microsatellites was coined. The difference mechanism of which is thought to differ from that of between the terms micro- and minisatellites might not most microsatellites in the genome. For instance, the DENSITY GRADIENT be obvious per se,but it is motivated by the difference in establishment of hairpin structures with a relatively high CENTRIFUGATION Separation of biomolecules on the mutational mechanisms of repeats of just a few amount of base-pair complementarities might stabilize the basis of their density. nucleotides and of ten or more (see below). loops that are generated during replication slippage. 436 | JUNE 2004 | VOLUME 5 www.nature.com/reviews/genetics REVIEWS

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us