Copyright 0 1992 by the Societyof America

The Evolution of Tandemly Repetitive DNA: Recombination Rules

Rosalind M. Harding,* A. J. Boyce? andJ. B. Clegg* *MRC MolecularHaematology Unit, Institute of Molecular Medicine, University of Oxford, JohnRadcliffe Hospital, Headington, Oxford OX3 SOU, England, and ?Department of Biological Anthropology, University of Oxford, Oxford OX2 64S, England Manuscript received April 3, 1992 Accepted for publication August 3, 1992

ABSTRACT Variable numbers of tandem repeats (VNTRs), which include hypervariable regions, and , can be assigned together with to define a classof noncoding tandemly repetitive DNA (TR-DNA). The evolution of TR-DNA is assumed to be driven by an unbiased recombinational process.A simulation modelof unequal exchange is presented and used to investigate the evolutionary persistence of single TR-DNA lineages. Three different recombination rules are specified to govern the expansion and contraction of a TR-DNA lineage from an initial array of two repeats to, finally, a single repeat allele, which cannot participate in a misalignment and exchange process. In the absence of amplification or selection acting to bias array evolution toward expansion, the probability of attaining a target array size is a function only of the initial number of repeats. Weshow that the proportions oflineages attaining a targeted array size are the same irrespective of recombination rule and rate, demonstrating that our simulation modelis well behaved. The time takento attain a target array size, the persistence of the target array, and the total persistence time of repetitive array structure,are functions of the initial number of repeats, the rate of recombination, and the rules of misalignment preceding recombinational exchange. These relation- ships are investigated usingour simulation model. While misalignmentcontraint is probably greatest for satellite DNA it also seems important in accounting for the evolution of VNTR loci including minisatellites. This conclusion is consistent with the observed nonrandom distributions of VNTRs and other TR-DNAs in the human .

ONCODINGDNA sequenceswith “variable VNTRs to be examined within a context of TR-DNA N numbers of tandem repeats,” termed VNTRs, evolution. includethose loci called hypervariableregions The evolution of TR-DNA can be viewed within (HVRs), minisatellites and microsatellites. Grouping the even broader contextof the evolution of repetitive VNTRs together with satellite DNA creates a class of DNA, for which there area large numberof analytical noncoding tandemly repetitive DNA, hereafter de- and simulation models. Much attention has beengiven noted as TR-DNA. Satellite TR-DNA regions may to the evolutionary forces acting on multigenefamilies also have variable numbers of tandem repeats, but (HARDISON199 1 ; HUGHES199 1 ; LOOMISand GILPIN this is difficult todetermine. Because VNTRs are 1986; MAEDA and SMITHIES1986) andon inter- extensively polymorphic they can be used to address spersed repeats such as transposable elements(CHAR- a wide range of problems in forensicscience (JEFFREYS LESWORTH and LANGLEY1991; MARUYAMAand et al. 199 l), the determination of family relationships HARTL1991), SINES and LINES (BUCHETON1990; (JEFFREYS, TURNERand DEBENHAM1991), human SINGER 1982; ZUCKERKANDL, LATTER and JURKA gene mapping (NAKAMURAet al. 1987), and popula- 1989). Questionsabout the evolution of repetitive tion genetics (BAIRDet al. 1986; BALAZSet al. 1989; DNA haveaddressed two broad issues. First, how CHAKRABORTYet al. 1991 ; DEKA,CHAKRABORTY and does recombinational exchange resolving as genecon- FERRELL199 1; FLINTet al. 1989). The many appli- version,promote the evolutionary persistence and cations of VNTR loci rest on assumptions about their spreading of particular repeat lineages across the ge- evolutionary dynamics.Interestingly, a nome of a (NAGYLAKIand PETES1982; NA- VNTR modelpresented by GRAY and JEFFREYS GYLAKI 1984a,b, 1990; OHTA 1978, 1989; OHTA and (199 1) seems to sit at odds with general models for DOVER 1983)? Second,what processesof unequal TR-DNA, developed before the explosion of interest exchange promoteor control the variationin numbers in VNTRs. Are VNTRs, minisatellites in particular, of repeats within lineages (KRUGERand VOGEL1975; qualitatively different from otherTR-DNAs? The aim PERELSONand BELL 1977; SMITH1976; TAKAHATA of the study reported here is to review the biology of 1981)? The studies addressing the second question TR-DNA, in particular the VNTR loci, and on this are pertinent to the developmenta TR-DNA of model foundation, build a simulation model which enables which can account for VNTR dynamics.

Genetics 132: 847-859 (November, 1992) 848 R. M. Harding, A. J. Boyce and J. B. Clegg A major aim of these earlier theoretical studies was This model is shown to be compatible with the expan- to determine the balance of evolutionary forces per- sion times of two minisatellite VNTRs in humans from mitting the accumulation and stabilization of large homologous TR-DNA loci in a primate ancestor.The arrays ofrepeats. The dual parametersof genetic drift justification for this model is that, while the probabil- and unequal exchange, unbiased toward array expan- ity of generating a large and hypervariable tandem sion by amplification or selection, cannot account for array at any particular is small, given the vast equilibrium distributions of large numbers of repeats number of potentialTR-DNA lociin the genome, (WALSH1987). Amplification is ageneralized term many arrays should be large. for any mutational process that expands DNA length. We mean by unequal exchange a mutational process BIOLOGICALBACKGROUND TO ATR-DNA of recombination either between homologous chro- MODEL mosomes or within chromosomes between sister chro- matids. To counteract the loss of repeats by drift and Structure: Minisatellite DNA, HVRs andother unequal exchange it is necessary to posit a balancing VNTRs, including simple-sequence or rate of amplification or positive selection. Unless these DNA (TAUTZ1989; WEBERand MAY 1989)share evolutionary forces are ongoing, the probability of commonstructural features with highly repeated finding a large tandem array at a potential TR-DNA DNA such as telomeric (BLACKBURN 1990; 199 1) and locus is low. Selection is important in studies applica- satellite (WILLARDand WAYE 1987)sequences. Their ble to multigene families (TAKAHATA1981), as is similarities suggest that these DNAs are subject to the amplification in those of transposable elements and same generalevolutionary processes of recombina- (CHARLESWORTH and LANGLEY 199 1). tional . Within this class of polymorphic non- Either selection or amplification, or both, have also coding TR-DNA loci, however, there is a range of been incorporated into evolutionary models of TR- repeatunit size andnumber. The repeat unit sequences of several to tens of base pairs DNA lineages to account forthe accumulation of large UEFFREYS, WILSONand THEIN1985; WoNG et al. stable tandem arraysof satellite DNA (STEPHAN1986, 1987) in minisatellite loci are shorter than thetypical 1987, 1989; WALSH1987). An important conclusion motifs of satellite DNA (SINGER 1982), but larger than of these analyses is that the evolutionary persistence the di- and trinucleotide repeats of microsatellite loci, of large arrays critically depends on balancea between such as CA repeats (TAUTZ1989). Also, the average a moderate, or at least equivalent, rate of amplifica- copy numbers per minisatellite allele in the tens to tion relative to a low rate of unequal exchange. hundreds are intermediatebetween the thousandfold GRAY andJEFFREYS (1991) have alternatively em- copies of satellite sequences and the few copies com- phasized that sufficient rates of amplification may be prizing amicrosatellite allele. This structuralvariation low, even when rates of unequal exchange are high. among TR-DNAs points to differences in the recom- In their model for the evolutionary dynamics of a binational rules acting on them. minisatellite, MS32, amplification is simulated as a Rules and rates of recombination: Recombination duplication of uniquesequence creating an initial of minisatellite alleles probably occurs during, or after array of two tandem repeats. The dynamics subse- replication, as DNA slippage or unequal sister chro- quent to the single amplification event are modeled matid exchange (USCE). The evidence against a role as a random walk via a path of unbiased array expan- for homologous recombination within minisatellite sion and contraction to a single repeat evolutionary arraysderives from the nonrandom association of dead end.Rates of unequal exchangeare proportional minisatellite alleles on different . This has to allele size and ordersof magnitude higher than the been observed for the insulin 5’HVR (ROTWEINet al. low rates determined for stable, persistent TR-DNA 1986), the VNTRlocus, YNZ22 (WOLFF,NAKAMURA by analytical modeling (WALSH1987). Since there is and WHITE 1988), theminisatellite (MS 1) locus, D S71 a moderate probability (in the range of 0.001-0.05) (WOLFFet al. 1989), the HRAS1 3’VNTR (KASPER- that a long sequence of unequal exchange events is CZYK, DIMARTINOand KRONTIRIS1990) anda-globin initiated by a single duplication, it seems that a low 3‘HVR alleles (MARTINSON1991). On the other hand, rate of amplification is sufficient. In this model, short new evidence from internally mapped minisatellite arrays of tandem repeats are generally stable but a VNTR alleles suggests that localized recombination tandem-repeat lineage has a reasonable likelihood of between nonidentical homologous alleles does occur, amplifying explosively into hypervariability and, as generating both sequenceconversion and new length rapidly, decaying. Instead of specifying equilibrium variation (JEFFREYS et al. 1991). However, most new conditions forinfinite persistence time, arate and rule length mutant alleles probably result from misalign- of recombination are definedto generate a short ment or slippage between sister chromatids. phase of expansion to hypervariability within a much There is no evidence that minisatellite mutation is longer period forthe existence of at least two repeats. biased towards the production of either expansion or Evolution of Tandemly Repetitive DNA 849 contraction mutants by intrastrand or ampli- and of VASSARTet al. (1987) who found that the insert- fication (ARMOURet al. 1989). Also, length changes free, wild-type M13 DNA detects hy- corresponding to amean change in allele repeat copy pervariable TR-DNA loci in the DNA of humans and number of 5%, and of up to 200 repeat units, de- other animals. Reviews by DOVER(1 989) andJARMAN scribed for the minisatellite MSl locus (JEFFREYS et al. and WELLS (1989)concluded that minisatellite 1988), indicate that recombinational mutationis prob- VNTRs are most probably found in regions of high ably a consequence of USCE. An assumption of un- homologous recombination as products, rather than biased USCEis also applicable to the evolutionary as enhancers of recombination. Nonetheless, detec- dynamics of satellite DNA. DURFYand WILLARD tion of a novel minisatellite-specificDNA-binding pro- (1989) inferred mutation by unequal exchange from tein (COLLICKand JEFFREYS 1990) and the new evi- patterns of sequence variation in alpha satellite DNA, dence for exchangebetween homologous minisatellite a family of centromeric TR-DNAs, sharing a funda- alleles (JEFFREYS et al. 199 1)revitalizes the hypothesis mental monomer repeat unit of about 171 bp. Nota- that minisatellite VNTRs have a functional role in bly, the majority of misalignments were found to be recombination. This raises the question of selective of the orderof a few copies of the higher order repeat maintenance of array size. However, if selection is unit and they concludedthat large misalignments drivingthe evolution of minisatellite VNTRs they wereuncommon, occurring at least nomore fre- would persist, along with genes and other functional quently than observed by JEFFREYS et al. (1988) and sequences, in different species lineages afterdiver- ARMOURet al. (1 989) forminisatellite DNA. A prev- gence from a common ancestor. alence of small misalignments, measured in numbers VNTR evolutionary persistence: GRAY andJEF- of repeats, can beproduced by mechanisms other FREYS (199 1) investigated several species of primates than unbiased USCE-mechanisms whichmay be for the presence of loci homologous with the human biased toward either contraction or amplification such MS32 minisatellite locus D1S8 and the MS1 minisa- as slipped strand mispairing (LEVISONand GUTMAN tellite locus DlS7.These loci have large unstable 1987) at the replication fork or during DNA strand tandem arrays in humans, but cross-hybridize to loci repair.However, while biased replication or repair with relatively short arraysof several diverged repeats slippage is a likely amplification mechanism for gen- in great apes and Old World monkeys. MS32, but not erating the small motifs of microsatellite DNA, satel- MSl, also cross-hybridizes to repeats in New World lite repeats are probably too large to be subject to this monkeys. Both minisatellites failed to cross-hybridize process. with prosimian DNA. The prevalence of short arrays Rates of mutation at TR-DNA loci vary. They are indicates that this was its likely ancestral state and that very high for some minisatellite VNTRs, in the range the MS1 and MS32 minisatellites began toexpand of 10-4-1O-' events per allele (JEFFREYS et al. 1988). after the hominid lineage diverged from great apes Rates of mutation also appear to vary for differently (GRAY andJEFFREYS 1991). Interestingly, at the MS1 sized alleles at the same locus, with largerVNTR homologous locus in the Colobus monkey, anOld arrays being less stable than smaller arrays (JARMAN World monkey species, there has been a presumeably and WELLS1989). Also, the most variable and unsta- independentarray expansion (GRAYand JEFFREYS ble minisatellite loci do seem to bethose with the 1991). greatest average array length measured in numbers Other VNTRs, for which there is no indication of of repeats (JARMAN et al. 1986; JARMAN and WELLS a functional role in recombination, show long evolu- 1989; JEFFREYS, WILSONand THEIN1985). tionary persistence. TR-DNA sequences homologous Function: Allelic length variation for most noncod- to the human {"globin IVSl HVR have been found ing TR-DNA is assumed to be neutral, although this in chimpanzee,goat, horse and mouse, butnot in may be contained within functional limits (ZUCKER- chicken or duck (FLINT,TAYLOR and CLEGG1988). KANDL, LATTERand JURKA 1989). An architectural This suggests that this TR-DNA locus probably orig- role in providing binding sites for scaffolding inated in the mammalian ancestor after divergence during replication or has been suggested from the avian lineage. Further evidence for a long for TR-DNA such as satellite DNA (WELCHet al. persistence time derives from the evident homology 1989), andthis may apply to VNTR arrays. An alter- between the human {-globin IVSl HVR and the tan- native, and much debated, function for minisatellite dem-repeat locus found in the IVSl (first ) of VNTRs as a recombination signal, was first proposed the pseudo-{"globin gene. Also, four 14-bp repeats in by JEFFREYS, WILSON and THEIN(1985). However, a IVSl are identical between horse { and pseudo-r and sequence-specific signaling functionshared by all a subsequent four repeatsare similar. The duplication VNTRs seems unlikely given the findings of VERG- of the {-globin gene is presumed to predate the mam- NAUD (1 989),who reported that random short oligo- malian radiation (FLINT, TAYLORand CLEGG1988). nucleotide motifs can detect polymorphic VNTR loci, The homology of the {-globin IVSlrepeat motif 850 R. M. Harding, A. J. Boyce and J. B. Clegg between human and goat, andalso between the func- ity to bias recombination by amplification or intra- tional gene and the in humans implies strand deletion, and the capacity to follow evolution- that, although array expansions may be independent ary changein populations. The evolutionary dynamics and recent, a precursor arrayof some several repeats of TR-DNA subject to population processes will be probably existed in a common mammalian ancestor. presented elsewhere. Whatever the functional significance of TR-DNA ar- Our simulation model has the constraint that the rays, their variable persistence times and array sizes number of misaligned repeats can never be greater are not consistent with evolution by strong directional than the length of the progenitor array, n repeats, or optimizing selection. minus a minimal length necessary for alignment, set TR-DNA chromosomal location: While TR-DNA at 1 repeat. As the initial arrays are set at 2 repeats loci are found on all human chromosomes, minisatel- and duplication of unique sequence to create repeti- lite VNTRs in humans appear to have a biased chro- tive structure is not incorporated in the simulation mosomal distribution toward as against in- runs for this study, recombinational mutation ceases terstitial locations (NAKAMURAet al. 1988; ROYLEet in lineages reducedto a single repeat. The three al. 1988). However, there are other VNTR families recombination rules which delimit possible misalign- which are widely distributed, such as the M13 family ments are also further regulated by a probability (CHRISTMANN,LAGODA and ZANG 1991) and that of function for recombinational exchangegiven an allele the simple repeat (CAC)5 (NURNBERGet al. 1989). array of n repeats and a misalignment of k repeats. The patterns of “minisatellite” chromosomal distri- The first rule limits misalignment, k, to a single bution in mice (KELLYet al. 1989) andcattle (GEORGES repeat and is referred to as “single-repeat misalign- et al. 1991) also show random genomic distributions. ment,” abbreviated as SR-M. The probability of re- These distributions are comparable with those for combinational exchange equals 1 for misalignments microsatellite polymorphisms, the class of TR-DNA of 1, and 0 for misalignments 2 I k 5 n - 1, for all most greatly dispersed across the . A allele arrays, n I2. The second rule limits maximal third pattern of distribution is seen for satellite DNA misalignment to a constant number of repeats, the which is mainly found in centromeric heterochroma- “target,” t, set at the beginning of a simulation, and is tin (WILLARD 1990). referred to as “target-maximum misalignment” (TM- M). The probabilities of exchange are a step function, A SIMULATION MODEL FOR TR-DNA specifying a uniform probability for misalignments 1 ~k~t,or1 t, for all allele arrays, n 2 2. The is investigated here using a Monte Carlo simulation third rule allows maximal misalignment of up to the model coded in FORTRAN and run on an IBM PS/ total allele length, n, in array size minus 1 repeat and 2 386 computer. The evolutionary process is simu- is referred toas “allele-maximum misalignment” (AM- lated as a random walk. This model is not formulated M). This rule was used by GRAY andJEFFREYS (1 991) numerically, with a predefined probability transition in their study of the evolutionary dynamics of the matrix and Markov recurrence relations, but rather minisatellite MS32. They derived from MS32 data a iterates through a decision-making process using ran- specific probability function for recombinational ex- dom numbers. This is because it is easier to specify change, i, dependent on the misalignment k where recombination rules at a fundamental level than to 15kSn-1: define the probability of mutation between each and every pair of possible allelic states, for each different (n - i)3.4 Pr(i) = kFn-1 set of recombination rules. Although amodel can only (n - k)3.4 represent an abstracted andsimplified biological real- k= 1 ity, theimportant advantage of this MonteCarlo simulation model compared with a numerical simula- This MS32 probability function makes small misalign- tion or analytical model is that its limitations are due ments more likely than large ones. To enable compar- less to assumptions introduced for tractability of im- ison between the results of this study and those of plementationthan to those introduced because the GRAYand JEFFREYS (1991),the MS32 probability complex mechanics of recombinational mutation op- function is used here with the AM-M rule. erating at themolecular level are not fully understood. For each recombination rule, tandem-repeat line- While primarily used in this study to investigate ages are followed as they expand from initial arrays evolution by unbiased recombinational exchange in of 2repeats toward a preset targetarray size and single chromosomal lineages, the simulation model is contract until a single repeat allele is generated. Per- not limited to this purpose. It is a more general tool sistence time statistics were calculated on all lineages for theexploratory study of the evolution of TR-DNA attaining a target of 20 in a set of 1,500 simulation and incorporates other features including the flexibil- runs, on all lineages attaining a target of 50 in a set Evolution of Tandemly Repetitive DNA 85 1 of 3,000 runs, on all lineages attaining a target of 200 in 15,000 runs and on all lineages attaining a target 100: 100 of 500 in 30,000 runs. The total numbers of simula- tion runs were chosen to ensureapproximately 50 1OO:lO target-attaining lineages. These targets representtyp- ical average array sizes for VNTR loci (JEFFREYS et al. 1988). For the target-attaining lineages, the maximum array size attained is recorded. Lag, gain, dwell and decay phases of persistence time are as described by 0 20 40 60 80 100 120 GRAYand JEFFREYS (1991). The lag is the first phase numbers of recombination events, z of expansion to the target, from the start to the last expected 0 observed generation at which the array is equal to two repeats, and the gain is the second phase of rapidly increasing I:K;LIRK 1 .“I’crsistence time computed in numbers of recotnhi- array size. The duration of the dwell phase is the Ilation events. z. for three ratios of USCE to intrastr;und deletion. Solid sh;~dcdbars: expected value from analytical TK-DNA model number of generations between when an array first of’ ~’ALSH(I 987). Stippled bars: average over 100 simulation runs equals a targetsize and thelast time in its evolutionary using the SK-M rule. history it is ever equally as large. Contraction of an array is described by the decay from the target to the (ie., X = 1.8 X pergeneration) to check that first time it returns to a size of two repeats and the persistence times were half as long. end phase between when an array of two repeats is Some results based on multiple lineages are initially finally reduced to a single repeat andbecomes extinct. presented to demonstrate that the persistence time The end phase is the difference between the decay estimates from our simulation model are consistent and extinction phases described by GRAY andJEF- with those of the analytical TR-DNA model developed FREYS (1 991). We referto the “gain-dwell-decay” by WALSH(1987). In these multiple lineage simula- period as a “dynamic” phase for brevity in the context tions, as with the single lineage simulations, there are of describing the simulations. no population dynamics due to stochastic fluctuation The simulation model controls the frequency of in mutationrate or reproductive success (genetic USCE deterministically, but uses random numbers, drift). In WALSH’S(1 987) analytical TR-DNA model, first,to generate a misalignment of some variable persistence times are finite, not because of drift, but number of repeats between the replicated sister chro- because recombinational exchange is biased toward matid arrays, and secondly, to choose the outcome of contraction by intrastrand deletion, requiring these a sister chromatid exchange with equal probability as same conditions in our simulation model. Persistence an expansion or contraction. The size of the array is time is computed for three ratios of a rate of unequal updated at each recombination event. The allelic re- exchangeper repeat, X = 9 x to rates of intra- stranddeletion per repeat, t = 9 x t = 9 X combination rate, p, is modelled as a linear function of allele array size by accumulating the rate set per and E = 9 X lo-’. The analytical TR-DNA model gives expected persistence times as numbers of recom- repeat, X, across repeats and generations until it is bination events. We too compute persistence times as equal to 1. Although it is possible for a tandem array numbers of recombination events to enable compari- to become very large, generating recombinationa rate son, but more generally report persistence times in per allele greater than 1, recombination events are numbers of generations to permit comparison with constrained to single crossovers per lineage per gen- GRAYand JEFFREYS’(1991) minisatellite VNTR eration to maintain consistency with WALSH’S(1 987) model. analytical TR-DNA model. The simulations run for this study are based on the RESULTS high recombination rate per repeat of X = 9 X because the random number generators are called in Comparison of oursimulation model with every generation, and arun executes faster if, for the WALSH’s (1987) analytical TR-DNA model: Figure samenumber of recombination events, they occur 1 demonstrates the concordance between persistence over fewer generations. Since persistence times in times observed for our simulation model and those generations are proportional to they expected from WALSH’S(1987) analytical TR-DNA can easily be rescaled for lower mutation rates. Mul- model. Persistence times are measured in recombina- tiplying by 1,000 rescales the persistence times to time tion events for lineages evolving subject to SR-M and spans comparable with those computed from the min- exchange biased by different rates of intrastrand dele- isatellite VNTR model (GRAY andJEFFREYS 1991). tion. Persistence time decreases with increased rates An extra set of simulations was run with the AM-M of intrastrand deletion. (For all subsequent results, rule for arecombination rate per repeattwice as high there is no intrastrand deletion and USCE is simulated 852 R. M. Harding, A. J. Boyce and J. B. Clegg mately 1,470 generations and there is not much vari- ation between targets(Figure 3). A trade-off that allows the numberof recombination events to a target to be less if the number of repeats per misalignment step is greater, equalizes the expansion and contrac- tion times fordifferently targeted lineages. Conse- quently, persistence time does not appear to greatly 50 200 20 50 500 differ between targets (Figure 3A). However, dwell target array size times at atargeted array size are proportionately smaller for larger targeted TR-DNAlineages (Figure 3B). This is because for bigger arrays, larger misalign- FIGURE2.-Lineages which attain targets of 20,50. 200and 500 ments in a single recombination event are possible, as percentages of total simulation attempts and compared for re- increasing the probability of contraction to a single combination rule. repeat. The brief duration of the dwell phase in the evolutionary history of a TR-DNAlineage attaining a as an unbiased process.) target of 200 repeats, assuming AM", is pictured in Array expansion: Figure 2 shows that percentages Figure 3C. The dwell phase is the number of gener- of lineages attaining preset targets do not vary sub- ations between the first and last times that the array stantially between different recombination rules, in is larger than 200 repeats. accordance with the expectations of an unbiased ran- Persistence times assuming TM-M vary greatly be- dom walk model (KARLIN and TAYLOR1975). Ap- tween targets. Whereas for the targets, 20 and 200, of tandem-repeat lineages starting proximately 4-5% TR-DNA persistence times are similar to each other, from 2 repeats attain preset targetsof at least and 20, for the targets 50 and 500, respectively, a very much anorder of magnitude fewer lineages (0.4-0.5%) longer, and a very much shorter, average persistence attain targets of at least 200 repeats. The same pro- time has resulted (Table Figure comparing A portionality is evident for targets of and re- 1). 4, 50 500 and B, shows that the considerablevariation is mainly peats with of lineages attaining and 1-2% 50 0.1- in the dwell phase at the target. It can be explained 0.2%attaining repeats. The numbers of simula- 500 as a consequence of the equal probability weighting tionattempts taken to attain each targetand the of TM-M. Allowing greater misalignment with uni- percentagesthat the target-attaining lineages com- form probabilities for misalignment length, TR-DNA prise out of the total attempts aregiven in Table 1. Persistence time with different recombination array persistence is highly unpredictable, and may be rates: Mean persistence time in numbers of recombi- short, as for the set of lineages subject to constraint nation events, and means, standard deviations and at 500 repeats, but may be very long, as occurred in medians of lag, gain, dwell and end phases in genera- the set of lineages constrained at 200 repeats. The tions are reported in Table l. Note that the distribu- duration of the dwell phase in the evolutionary history tions of phase duration are positively skewed. For the for one of the short TR-DNA lineages attaining the AM" rule, statistics are given for the set recombi- target of 200 repeats by TM-M is shown in Figure nation rate doubled (AM-M') as well as for the set 4c. rate (AM"*). Comparing persistence times T(Z)for Figure 5A shows that for SR-M, unlike the AM-M the set and doubledrates indicates that they are and TM-M rules, lineages attaining larger target sizes slightly reduced by decreasing the mutation rate, con- generally take longer to do so. This is because there sistent with expected rounding error. However, the is no trade-off for the numberof recombination events effect is minimal and the phase estimates are compa- to atarget with misalignments constrained to one rable tothose reported by GRAYandJEFFREYS (1 99 1). repeat. Persistence times are one to two orders of Assuming AM", the average persistence time T(z) magnitude longer for SR-M compared with AM-M for lineages attaining 500 repeats isless than 300 and TM-M. In comparison with AM", dwelltime recombination (USCE) events. Since this lineage ex- assuming maximal constraint is longest for the largest pansion occurs with a probability of approximately target (Figure 5B). Figure 5C shows a typical evolu- 0.2% (Table l), for one such lineage expansion there tionary history of array expansion to a target of 200 have also been 500 single-repeat duplications. This is repeats with SR-M. consistent with the findings of WALSH (1987) that As duplicate arraysare vulnerable to contraction to rates per repeatof amplification must be greater than, onerepeat with consequent extinction of the TR- or at least equal to, rates per repeat of USCE. DNA lineage, their evolutionary life span is short Persistence time with different recombination unless they expand. Of lineages that do expand to rules: With a recombination rate of X = 9 X IO-' the targets, the greatest partof their evolutionary history, AM-M rule generates a persistence time of approxi- regardless of recombinationrule, is dynamic with Evolution of Tandemly Repetitive DNA 853

'p! 43% ma-a ma-a ma-a ma mm m mm m mm m mm -g - E$ "" h " "" " ?g2 ???'p! ???'p! -m E%n omao z$?z om*o 2% -m=?' -em- m e- v " -4- "- g5 v v2 $.;$ v v 2 52 - 1": "Y?": S?D: v& ?'p! &moa *mas reiEm mmmm m--m mm-m mm 22 4 2.5 $ P W In-. m 'p! -&mm a*mm *mo& -0 2: ;?% mmam -ma* P-000- mm W 8s -m m m* 00 m*-m mm -m* -iaJ 3 m- 2- P- 534c .B - --h---h "h -- o *-?m- a!":y, 'p!??, ? ? &k&& "m- m-m0 22 'J F .: mmma *mm0 om22 .BEY 4"- Or-002 corn-- "V ss 50- ab M e- 3=- m- m- v - E;z e - v T, $$W x 2 ES E z?Zz '??zz c?v?zz v-& s-cu maoo P-*Q,o P--mm mm m*:: -m-r- mm a" ma-m- mm zzs .-g& CZy1 c - * 2 -2- e m .$ 'g + 'p! 'p! m 'p! .SE' coma0 omoom &m-a 00- mm n3C .-K P-P-vm mamm elm-* mm E 2.8 C? *. c9 ? B m m Q, e $'x m .- - d 2 g -23 "" "h "h "h gi E 2 m**m c???, ??TA "1 U &&d& -mwm mo*P- mm zc $2 8 gY C t.*mP- o-m- -L.P-P- ma-m m a*Q,. "" m 5,; 9T, m. ? 99 m. 09 "- E < *m mm m0 .-8 E -m -0 -W -LW kzg I 4 -U v 5 '4 0- m- %G 4 M 5 9 v 2 242 z h iuu n ???'p! ???F c?? -a U&> .B z maoo mP-*a abmo A& :m c4 .-ss P-**m -mmm P-vmm am*a $2. CI >a -mm-. --m.m. - P- U',- D" 2 3-3 '0- 311 m G P-- 0 P- 4 m m* - m Obe win e U', * p-m g-; 4 -ox E ..'P m $m 0 L va mveo mmvm P-P-mm om HZ mm--. ammo P-o*m 0- 22 11: *U) mm-m. mm I "m-0- -m-m- -o'! 5 e m 5 m pb 3 E W -$5: !i "h- "" "h- ." %&b ??Z? Z??ZZ q? mn 8 v-?zz zs ~-mduE .-CI mP-m0 mawm m*mm m0am mc3 I ==-g v h =?c %m c "'?z3-2 -. v v v ,-z m F .c iijz c WE0 &mr-r-*(Dm- ??":c? ?c?? ? 'Z .-C v!z- -mP-m mw t- .S m*mm mamm -mP-m m0?p &- d "m-m" m*-P-- mm-0" ma"'4 4 * P- $F g 2 m -EO aJ d .5 'p! Pie mmmm P-mmm P-mmm mm P-mmm mmm* mmmm P-m 2; a2g - - "0 %CS hhhh "h- "" ." P-m-P- *.ye+ "??c9 rdmit-: a*** amrb 55 2: e,.; mmP- P- 2 2 2 0 zp2* -ma -m .-:& "" "V%-a" W1 ,'P-" H e 2X-z v c - v)-??K . zzoz c??:: y?zz :m 3 m-me ommcn 2% ZlIZG 22m*m mm "-W 2% P- a,eEugx"E ??oi': ?'p!?? -P- m -m imc ,o*mw,mmam ZA0mm 20, 2- y i Z-mow ***mm o*~mm g.&%$ 0- m m c 02-0- m m P- .g g G ?D - - - 'J a- m ,X 8 ,g E.2.S ._ 0; .g 3 m m e B" Y - m mm m* m ;5bE~ in 0 m m * c .-2 .-i m WOZS .-i .-i v1 m bmgk MD y1 g,,,, mm ozuw $22 -m 28 o??oq?o???c? 0-c?": ": c? ="+$E .- o**m*o---m moo00 000 00 .mxsm el m" 0- - m rUca, -m*P-mmma-m -*-mm -P- m -ulP-mw-*mma omr-am o*m gg :$a's E g 0 0 0 0 yi 5.2'J m m m m c $zF: L%EFEL3jkZL%%TE !%% EE O,c1!0 9 %EEE$: 8ZZE$: &EEE&&bz E S& .c om.5 tri $ddt-m $<

500 500 f

200 II t 50

20 II " 500 10000 500 1500 200025000 20000 150000 10000 5000 generations generations

lag 0 dynamic 0 end lag 0 dynamic 0 end

,B 500 f

I 0 500 1000 1500 2000 0 25000 200005000 15000 10000 generations generations

w expand 0 dwell 0 contract w expand 0 dwell contract ,c .c

31 41 51 611 5111 4121 31 71 81 71 1 61 1151 2141 31 generations generations FIGURE3.-Persistence times in generations averaged over sets FIGURE4.-Persistence times in generations averaged over sets of lineages attaining each target. 1, with recombinational mutation of lineages attaining each target, f, with recombinational mutation subject to AM". (A) Length of dynamic (gain + dwell + decay) subject to TM-M. (A) Length of dynamic (gain + dwell + decay) phase highlighted against shaded lag and end phases. (B) Length of phase highlighted against shaded lag and end phases. (B) Length of dwell phase at target array size highlighted against shaded expan- dwell phase at target array size highlighted against shaded expan- sion (lag + gain) and contraction (decay + end) phases. (C) Array sion (lag + gain) and contraction (decay + end) phases. (C) Array size evolution of a typical lineage in the set attaining a targetof 200 size evolution of a typical lineage in the set attaining a target of 200 repeats. repeats. varying numbers of three or more repeats (Figure cause of the flexibility allowed to explore alternative 6A). The duration of evolutionary time that alineage molecular constraints on misalignment. However, can dwell at a large target size is,however, dependent simulations provide results less amenable to critical on recombination rule,and is consistently greatest inspection and replication than analytical models, and with SR-M. Dwell times may, however, also dominate so we have built upon afoundation established by with TM-M (Figure 6B). others. As well as reporting results to show concord- ance with preceding studies, we have also presented DISCUSSION some results regardingdistributions of array size, although they could be more elegantly demonstrated In this study we have presented a simulation model by an analytical approach. Since we have reached the for the evolution of TR-DNA making variables of same conclusions by simulation as can be done analyt- both recombination rule and rate. The aim of the ically, credence is given to our conclusions on the study was to investigate different recombination rules consequences of recombination rule for persistence and we have used a computer simulation model be- time. The latterremain unconfirmed by analytical Evolution of Tandemly Repetitive DNA 855

500 I I

t 2007, 50 ;;:;;,;: AM.M;;:;;,;:

SR-M TM.M - I I AM.M \ 0 7000001400000 2100000 2800000 3500000 0% 40%20% 60 % 80% 100% generations persistence time 1 w lag 0 dynamicend 1 w lag 0 dynamic 0 end

B 500 )uis;:; E

t 200p 50 ,

20 b is;:; 0 700000 14000002100000 2800000 3500000 AM.M

generations 0 % 20 % 6040% Yo 80% 100% I expand 0 dwell contract I persistence time I I ~~ w expand 0 dwell contract

FIGURE6.-Proportional persistence times comparing recombi- nation rules. (A) Length of dynamic (gain + dwell + decay) phase highlighted against shaded lag and end phases. (B) Length of dwell phase at target array size highlighted against shaded expansion (lag + gain) and contraction (decay + end) phases.

cordance with WALSH(1987) and others, it is neces- sary, and they did. The important conclusion to be

1 101201301 401 501 601701801 901 made from GRAY andJEFFREYS' (1 99 1) model is that generations using generations as units of evolutionary time, in- stead of numbers of recombination events, clearly FIGURE5.-Persistence times in generations averaged over sets of lineages attaining each target, 1, with recombinational mutation demonstrates that the evolutionary persistence of a subject to SR-M. (A) Length of dynamic (gain + dwell + decay) random walk process can trace back to distant ances- phase highlighted against shaded lag and end phases. (B) Length of tors in the phylogenetic history of an extant species. dwell phase at target array size highlighted against shaded expan- The evolutionary persistence of accumulated TR- sion (lag + gain) and contraction (decay + end) phases. (C) Array DNA, even though greater in durationthan the size evolution of a typical lineage in the set attaining a targetof 200 repeats. evolutionary span of a species, may nonetheless be transient,and does not have to bemodelled by modeling. an equilibrium balance between recombination and For the record, comparableresults of both our TR- either amplification or selection. DNA simulation model andGRAY and JEFFREYS' Distributions of array size suggest a test for biased (1 991) simulation model for the minisatellite MS32, array expansion: Varying the rules on misalignment show concordance with WALSH'S(1987) analytical constraint altersonly the persistence times over which TR-DNA model. GRAY andJEFFREYS' (199 1) assump TR-DNA arrays accumulate and decay and not the tion of a duplication event to initiate VNTR evolution array sizes that TR-DNA lineages may attain. That is consistent with the conclusions of WALSH'S(1 987) the probability of an array exceeding some sizeis analytical TR-DNA model that amplification, possibly independent of the misalignment step is a result de- as duplication of unique sequence, is a critical process monstrable by the theory of stochastic processes using in TR-DNA evolution. Despite GRAY andJEFFREYS' Wald's Identity (KARLINand TAYLOR1975). Let the (1991) conclusion that under their stochastic model change of array size in time, x(t), be modeled by a of unbiased USCE it is not necessary to invoke addi- nonsymmetrical random walk beginning at x(0) with tional processes such as saltatory amplification in ac- anabsorbing boundary at one.In the absence of 856 R. M. Harding, A. J. Boyce andClegg J. B. selection or amplification, we expect the probabilities peats and the second, either reflecting or absorbing, of expansion and contraction to be equal, P = 1/2, to preventarrays from expanding infinitely. This which allows us to represent x(t) by its mean value, barrier can be accounted for by selective constraint x(0). If at time 0, x(0) = 2, the initial array size, we on array expansion. But, before we build upon this can calculate the probability that the array at some kind of equilibrium model, itwould be appropriate to future time will attain the target before contracting confirm by observation the hypothesis that amplifica- to 1 as: [x(O) - l]/[target - 13. The probabilities of tion is a continuous process of duplication occurring attaining the targets, 20, 50, 200 and 500 are 5.2%, at rates similar to rates of recombination. Molecular 2%, 0.5% and 0.2%,respectively. Both our simulation biology has not yet provided any such evidence. probabilities and those calculated by GRAYand JEF- Persistence times area function of recombination FREYS (199 1)are biased downwards by rounding er- rule: While array size is not dependent on recombi- ror. Nonetheless, an unbiased stochastic model, how- nation rule, evolutionary persistence time is. Regard- ever formulated, gives an expected ratio of small to lessof thearray size, our simulations suggest that large TR-DNA arrays in the genome. This ratio has tandem arrayshaving long persistence times are under the form of 2,000 2-repeat TR-DNA loci : 100 20- much greater misalignment constraint than VNTRs : 10 200-repeat VNTRs : 1 2,000-re- arrays with short persistence times. peat VNTR locus. AM-M allows misalignments to be very large when PERELSONand BELL(1977) also used an unbiased tandem arrays are large. If large arrays can greatly random-walk model to calculate the probabilities of misalign, they can expand to largesizes in short times, attaining array sizes greater than an initial size. They but they can also go extinct veryeasily, and large assumed an absorbing boundary at zero rather than array sizes are transient. These evolutionarydynamics one andformulated the probability of finding an were observed despitemodification of the AM-M rule expanded arrayas a functionof time. This probability by a probability function which decreases recombina- was shown to be maximized soon after beginning the tional exchange for increasing misalignment length. random walk. At this time point the probability of This was the rule used by GRAYand JEFFREYS (1991) finding an array of at least 10, 50, 100 or 500repeats to representthe transience of hypervariable mini- is 3.9%, 0.74%, 0.37% or0.074%, respectively. satellites using the example of MS32. They showed The observed proportionality for unbiased stochas- that if a small array of tandem repeats at the MS32 tic models could be used as a null expectation in tests locus began to expand after divergence of the Homo for amplification or selection bias. For instance, an lineage from the other great apes, the 700,000 gen- experimentalapproach that enabled observation of erationtime span since is long enoughfor a large array expansion of the same small initial TR-DNA array of 200 repeats or more to have been generated, locus in multiple clonal lineages of yeastor some other assuming a rate of exchange per repeat of h = 9 x model organism, would enable an estimate of the However, since a 200 repeat array mutating by proportion of lineages thatexpand. An alternative AM-Mis not very stable, if this rule is operating, test, which seems reasonable inview of the nonde- MS32 arrays are probably now contracting. pendence of array expansion probabibties on recom- The TM-M rule uses a step function for describing bination rule, would be to survey TR-DNA loci across the probabilities of different misalignment lengths. a genome. A much higher proportionality of large Only arrays smaller than the preset target are vulner- and intermediate VNTR arraysto duplicateTR-DNA able to one-step extinction. As with AM-M short ar- loci than expected for a stochastic model would indi- rays are unstable and may either go extinct or expand cate the importance of amplification or selection bias quickly. However, arrays that expand to sizes greater in tandem array expansion. than the targetare protected fromone-step extinction Although array amplification has been modelled by and may remain large for longer dwell phases com- the duplication of unique motifs in this andother pared with the AM-M rule. There has been time since (WALSH1987; GRAYand JEFFREYS 1991) studies, it is the divergence of humans and great apes for a mini- expectedthat thearrays themselves are subject to satellite locus to have expanded by TM-M, and if so duplication and perhaps also to multiplication (possi- still to be in a dwell phase. TM-M works as an alter- bly by rolling-circle replication). These processes may native and simpler rule to model the transience of permit amplification of a large amount of TR-DNA hypervariable minisatellites. The occurrence of misa- by sporadic and widely dispersed expansion events in lignment constraint would be consistent with obser- the genome. Low rates of saltatory amplification could vations of asymmetry in the location of deletion events inject the same amount of TR-DNA as a moderate in minisatellite arrays. JEFFREYS, NEUMANNand WIL- rate of ongoing duplication. A model that incorpo- SON (1990) observed that the internal repeat struc- rates an ongoing rate of duplicationrequires two tures of MS32 minisatellite alleles indicated greater boundaries, the first reflecting to amplify single re- stability of the 5’ ends and a gradient of increasing EvolutionRepetitive of Tandemly DNA 857 variability toward the 3' ends. However, more data at which to look for the key factors in the instability on the size and nature of in minisatellite of a short TR-DNA locus and its potential to expand alleles are needed toevaluate whether a step function quickly as a VNTR. At a local level, the sequence of is a reasonable approximation of misalignment distri- the repeat motif may be important in conferring in- butions. stability and increasing the rate of recombinational With the SR-M rule, misalignment is maximally mutation (MITANI,TAKAHASHI and KOMINAMI 1990). constrained. Consequently TR-DNA expansionto, Alternatively, the general chromosomal location of a and contraction from, a large size takes a long time. TR-DNA locus may account for differences in misa- The gain plus dwell time consequently far exceeds the lignment constraint. Chromosomal location has been expected persistence time for the MS32 minisatellite, shown to be critical for the activation of a recombi- unless the mutation rate per repeat is three orders of national hot spot in the fission yeast Schizosaccharo- magnitude greater than that suggested by GRAY and myces pombe (PONTICELLIand SMITH1992). The im- JEFFREYS (199 1). Assuming recombination is con- portance of misalignment constraints forthe evolution strained to SR-M, an average generation span of 10 of satellite and minisatellite DNAs, as shown by sim- years, and a rate of exchange per repeat of X = 9 X ulation modelling, andthe differentialpatterns of persistence times would be of the order of 50 genomic distributionsof satellite DNA at million years. This result indicates recombination con- and minisatellite DNA near telomeres, suggest that strained to SR-M may account for the evolutionary misalignment constraint varies with chromosomal lo- persistence of the TR-DNA loci in of the {- cation. In fact, chromosomal neighbourhood may be globin gene and pseudogene. With moderate rates of at least as important as motif sequence for the expan- recombination but maximal constraint on misalign- sion of some TR-DNA loci as hypervariable mini- ment, a TR-DNA locus may not only have a long satellite VNTRs, while other TR-DNA is much more persistence time, but also become a polymorphic stable. VNTR. Assuming a stochastic model for TR-DNA evolu- The persistence of satellite DNA (WALSH1987) is tion with an absorbing boundary at arrays reduced to also consistent with great constraint on misalignment. one repeat, implies that there must be large numbers Our TR-DNA model predictsthat satellite arrays of potential TR-DNA sites in the genome, and that would mutate in steps of single or few repeats. Also, those which expand as VNTRs will be unrelated by the USCE rate, while not ashigh as thatat minisatellite sequence similarity. However, if there aresource TR- VNTRs, may be orders of magnitude higher than DNA sequences with long persistence times due to nucleotide substitution rates. These rates may indeed SR-M constraint, many descendant VNTRs may show be high enough to generate VNTRpolymorphism in sequence relationships. The motif similarities shared satellite arrays. Our model applied to satellite DNA by families of minisatellite VNTRs suggest their com- contrasts with that of WALSH(1987) which predicts a mon descent from anold and persisting ancestral TR- very low rate of unequalexchange consistent with DNA locus. Arelated VNTRoccurring within an observations that satellite DNA is found in regions of intron and constrained to SR-M by its location would low homologous recombination. Both our model and be a candidate for the ancestral locus. The dispersal that of WALSH(1 987)assumes amplification as a du- of the minisatellite-related VNTRs across chromo- plication process. Arguably,generala TR-DNA somes suggests the action of DNA-mediated transpo- model would better account for satellite DNA if the sition within and between chromosomes, particularly amplification rule was also a variable and could alter- near chromosome telomeres (WONG, ROYLEand JEF- natively occur as a saltatory burst generating a large FREYS 1990). A strategy for the detectionof multilocus initial array. However, given that large satellite arrays minisatellites for DNA fingerprinting may be to start exist, choosing the most appropriate rule for their with a TR-DNA probe which is known to have had a current dynamics may be resolved as follows. If satel- long persistence time in the genome of the species of lite arrays are experiencing very little recombination interest, rather than a minisatellite probe from a dif- they should show relatively uniform sequence diver- ferent species. gence between any two member repeats of the array. Conclusions: Our simulation model places minisa- On the other hand, if satellite DNA is evolving by a tellite VNTR evolution within the general context of moderate rate of unequal exchange, assuming SR-M, TR-DNA evolution. This has been achieved by incor- then repeats close to each other should show greater porating recombination rule as an equally important similarity than repeats that are far apart. No doubt, variable as the rate of recombinational exchange. By molecular data pertaining to these expectations will subjecting misalignment to varying degrees of con- soon be available to test between them. straint the evolution of all classes of TR-DNA can be Consequences of chromosomal location for TR- accountedfor. It is suggested that satellite DNA DNA evolution: There aredifferent structural levels evolves under greatest misalignment constraint but at 858 R. M. Harding, A. J. Boyce and J. B. Clegg a moderate rate with most mutation steps consisting tandemly repeated DNA sequences. 5 8 10-82 1. of single repeats. Hypervariable minisatellite VNTRs FLINT,J.. A.M. TAYLORand J. B. CLEGC, 1988 Structure and evolution of the horse zeta globin locus. J. Mol. Biol. 199: 427- may be characterized by their release from misalign- 437. ment constraint, yet even so, the region of exchange FLINT, J., A. J. BOYCE,J. J. MARTINSONand J. B. CLEGG, and mutability within thearray is probably small. 1989 Population bottleneck in Polynesia revealed by minisa- Unconstrained USCE as modelled by the AM-M rule tellites. Hum. Genet. 83: 257-263. may be less likely than a constrained TM-M rule. We GEORGES, M.,A. GUNAWARDANA,D. W. THREADGILL,M. LATH- emphasize theconcordance of results achieved by ROP, I. OLSAKER,A. MISHRA,L. L. SARGEANT,A. SCHOEBER- LEIN,M. R. STEELE,C. TERRY,D. S. THREADGILL,X. ZHAO, both analytical and simulation modelling of TR-DNA T. HOLM, R.FRIES~~~ J. E. WOMACK, 1991 Characterization evolution. The abstraction of rule as well as rate of a set of variable number of tandem repeat markersconserved suggests new ways of understanding thechromosomal in Bovidae. Genomics 11: 24-32. distribution of, and relationships between, TR-DNA GRAY,1. C., and A. J. JEFFREYS,1991 Evolutionary transience of loci in the genome. hypervariable minisatellites in man and the primates. Proc. R. Soc. Lond. B 243: 241-253. HARDISON,R. C., 1991 Evolutionof globin genes families, pp. We thank J. FLINT, J. J. MARTINSONand T. E. A. PETO for 272-289 in Evolution at the Molecular Level, edited by R. K. discussion throughout the study and our anonymous reviewers for SELANDER, A.G. CLARK andT. S. WHITTAM.Sinauer Associ- their criticisms and suggestions. ates, Sunderland, Mass. HUGHES,A. L.,1991 Evolutionary originand diversification of LITERATURE CITED the mam~nalian CDl antigen genes. Mol. Biol. Evol. 8: 185- 201. ARMOUR,J. A. L., I. PATEL,S. L. THEIN,M. F. FEY and A. J. JARMAN,A. P., and R. A. WELLS,1989 Hypervariable minisatel- JEFFREYS, 1989a Analysis ofsomatic mutations at human lites: recombinators or innocent bystanders? Trends Gen. 5: minisatellite loci in tumours and cell lines. Genomics 4: 328- 367-37 1. 334. JARMAN, A. P., R. D. NICHOLLS,D. J. WEATHERALL, J.B. CLEGG UAIRD, M., I. BALAZS,A. GIUSTI, L. MIYAZAKI,L. NICHOLAS,K. and D. R. HIGGS, 1986 Molecular characterisation ofa hy- WEXLER,E. KANTER,J. GLASSBERG,F. ALLEN,P. RUBINSTEIN pervariable region downstream of the human a-globin gene and I.. SUSSMAN,1986 Allele frequencydistribution oftwo cluster. EMBO J. 5: 1857-1863. highly polymorphic DNA sequences in three ethnic groups and JEFFREYS, A. J., R. NEUMANN andV. WILSON,1990 Repeat unit its application to the determination of paternity. Am. J. Hum. sequence variation in minisatellites: a novel source of DNA Genet. 39: 489-50 1. for studying allelic variation and mutation by BAIAZS,I., M. BAIRD,M. CLYNEand E. MEADE, 1989Human single molecule analysis. Cell 60: 473-485. population genetic studies of five hypervariable DNA loci. Am. JEFFREYS, A. J., M. TURNERand P. DERENHAM, 1991 The effi- J. Hum. Genet. 44: 182-190. ciency of multilocus DNA fingerprint probes for individuali- BLACKBURN,E. H., 1990 Telomeres and their synthesis. Science zation and establishment of family relationships, determined 249: 489-490. from extensive casework. Am. J. Hum. Genet. 48: 824-840. BLACKBURN,E. H.,1991 Structure and function of telomeres. JEFFREYS,A. J., V. WILSONand S. L. THEIN,1985 Hypervariable Nature 350: 569-573. ‘minisatellite’ regions in human DNA. Nature 314: 67-73. BUCHETON,A,, 1990 1transposable elementsand I-R hybrid JEFFREYS,A. J., N. J. ROYLE, V. WILSON and Z. WONG, dysgenesis in Drosophila. Trends Genet. 6: 16-2 1. 1988Spontaneous mutation rates to new length alleles at CHAKRABORTY,M. R., FORNAGE, R. GUECUEN andE. BOERWINKLE, tandem-repetitive hypervariable loci in human DNA. Nature 1991 Populationgenetics of hypervariable loci: analysis of 332: 278-281. PCR based VNTR polymorphism within a population,pp. JEFFREYS,A. J., A. MACLEOD,K. TAMAKI,D. L. NEIL and D. G. 127-143 in DNA Fingerprinting: Approaches and Applications, MONCKTON, 1991 Minisatellite repeat coding as a digital ap- edited by T. BURKE,G. DOLF,A. J. JEFFREYS and R. WOLFF. proach to DNA typing. Nature 354: 204-209. Birkhauser Verlag Basel, Switzerland. KARLIN,S., and H. M. TAYLOR,1975 A First Course in Stochastic CHARLESWORTH,B., and C. H.LANGLEY, 1991 Population ge- Processes, Ed. 2. Academic Press, San Diego. netics of transposable elements in Drosophila, pp. 150-1 76 in KASPERCZYK,A,, N. A. DIMARTINOand T. G. KRONTIRIS, Evolution at the Molecular Level, edited by R. K. SELANDER,A. 1990 Minisatellite allelediversification: the originof rare G. CLARK andT. S. WHITTAM. SinauerAssociates, Sunderland, alleles at the HRASI locus. Am. J. Hum. Genet. 47: 854-859. Mass. KELLY, R.,G. BULFIELD, A. COLLICK,M. GIBBS and A. J. JEFFREYS, (:HRISTMANN, A,, P. J. L. LAGODAand K. D. ZANG,1991 Non- 1989 Characterization of a highly unstable mouse minisatel- radioactive in situ hybridization pattern of the M 13 minisatel- lite locus: evidence for somatic mutation during early devel- lite sequences on human metaphase chromosomes. Hum. Ge- opment. Genomics 5: 844-856. net. 86: 487-490. KRUGER,J., and F. VOGEL,1975 of unequal (:OLLICK, A,,and A. J.JEFFREYS, 1990 Detectionof a novel crossing over. J. Mol. Evol. 4: 201-247. minisatellite-specific DNA-binding protein. Nucleic Acids Res. LEVINSON,G., and G. A. GUTMAN, 1987 Slipped-strand mispair- 18: 625-629. ing:a major tnechanism for DNA sequenceevolution. Mol. I~KA,R., R. CHAKRABORTY and R.E. FERRELL, 1991 A popula- Bid. Evol. 4: 203-22 1. tion genetic study of six VNTR loci in three ethnically defined LOOMIS,W. F., and M.E. GILPIN,1986 Multigene families and populations. Genomics 11: 83-92. vestigial sequences. Proc. Natl. Acad. Sci. USA 33: 2143-2147. DOVER,G. A,, 1989 DNA fingerprints: victims or perpetrators of MAEDA,N., and 0. SMITHIES, 1986 The evolution of multigene DNA turnover? Nature 342: 347-348. Families: human haptoglobin genes. Annu. Rev. Genet. 20: 81- I~URFY,S. J., and H. F. WILLARD,1989 Patterns of intra-and 108. interarray sequencevariation in alpha satellite from the human MARTINSON,J.J., 1991 Genetic variation in South Pacific Island- X chro~noso~ne:evidence for short range homogenimtion of VIS, 1’h.D. Thesis, University of Oxford, Oxford, U.K. EvolutionRepetitive of Tandemly DNA a59

MARUYAMA,K., and D. L. HARTL, 1991 Evolution of the trans- unequal crossover. Science 191: 528-535. posable element mariner in Drosophila species. Genetics 128: STEPHAN, W., 1986 Recombination and the evolution of satellite 319-329. DNA. Genet. Res. 47: 167-174. MITANI,K., Y. TAKAHASHIand R. KOMINAMI,1990 A GGCAGG STEPHAN,W., 1987 Quantitative variation andchromosomal lo- motif in minisatellites affecting their germlinestability. J. Biol. cation of satellite DNAs. Genet. Res. 50: 41-52. Chem. 265: 15203-15210. STEPHAN,W., 1989 Tandem-repetitive noncoding DNA: forms NAGYLAKI, T., 1984a Evolution of multigene families under in- and forces. Mol. Biol. Evol. 6: 198-21 2. terchromosomal . Proc. Natl. Acad. Sci. USA TAKAHATA, N., 1981A mathematical study on the distribution of 81: 3796-3800. the number of repeated genes per chromosome. Genet. Res. NAGYLAKI,T., 1984b The evolution of multigene families under 38: 97-102. intrachromosomal gene conversion. Genetics106 529-548. TAUTZ,D., 1989 Hypervariability of simple sequences as ageneral NAGYLAKI,T., 1990 Gene conversion, linkage, and the evolution source for polymorphic DNA markers. Nucleic Acids Res. 17: of repeatedgenes dispersed among multiple chromosomes. 6463-6472. Genetics 126: 261-276. VASSART,G., M.GEORGES, R. MONSIEUR,H. BROCAS,A. S. LE- NAGYLAKI,T., and T. D. PETES, 1982Intrachromosomal gene QUARRE, D. CHRISTOPHE,1987 A sequence in MI 3 phage conversion andthe maintenance of sequence homogeneity detects hypervariableminisatellites in human and animal DNA. among repeated genes. Genetics 100: 315-337. Science 235: 683-684. NAKAMURA,Y., M. LEPPERT,P. O'CONNELL,R. WOLFF,T. HOLM, VERGNAUD, G., 1989Polymers of random short M. CULVER,C. MARTIN,E. FUJIMOTO,M. HOFF, E. KUMLIN detect polymorphic loci in the human genome. Nucleic Acids andR. WHITE, 1987 Variable number of tandemrepeat Res. 17: 7623-7630. WALSH,J. B., 1987 Persistence of tandem arrays: implications for (VNTR)markers for human gene mapping. Science 235: satellite and simple-sequence DNAs. Genetics 115: 553-567. 1616-1622. WEBER,J. I-., and P. E. MAY, 1989Abundant class of human NAKAMURA,Y., M. CARLSON,K. KRAPCHO,M. KANAMORI and R. DNA polymorphisms which can be typed using the polymerase WHITE, 1988 New approach for isolation of VNTR markers. chain reaction. Am. J. Hum. Genet. 44: 388-396. Am. J. Hum. Genet. 43: 854-859. WELCH,H. M., J. K. DARBY,A. J. PILL, C. M. KO, and B. CARRITT, NURNBERG,P., L. ROEWER, H. NEITZEL,K. SPERLING,A. POPERL, 1989 Transposition, amplification, and divergence in the or- J. HUNDRIESER, P&HE,H. C. EPPELEN,H. ZISCHLER and J. T. igin of the DNFZS loci, apolymorphic repetitive sequence EPPLEN,1989 DNA fingerprinting with theoligonucleotide family on chromosomes 1 and 3. Genomics 5: 423-430. probe(CAC)5/(GTG)n: somatic stability and germline muta- WILLARD,H. F., 1990 Centromeres of mammalian chromosomes. tions. Hum. Genet. 84: 75-78. Trends Genet. 6: 410-416. OHTA, T., 1978Theoretical population geneticsof repeated WILLARD,H. F., andJ. S. WAYE,1987 Hierarchial order in genes forming a multigene family. Genetics 88: 845-861. chromosome-specific human alpha satellite DNA. Trends Ge- OHTA, T., 1989 Time for spreading of compensatory mutations net. 3: 192- 198. under . Genetics 123: 579-584. WOLFF, R. K., Y. NAKAMURAand R. WHITE, 1988 Molecular OHTA,T., and G. A. DOVER, 1983 Population genetics of multi- characterization of a spontaneously generated new allele at a gene families that aredispersed into two or more chromosomes. VNTR locus: no exchange of flanking DNAsequence. Ge- Proc. Natl. Acad. Sci. USA 80: 4079-4083. nomics 3: 347-35 l. PERELSON,A. S., and G. I. BELL, 1977 Mathematicalmodels for WOLFF, R. K., R. PLAETKE,A. J.JEFFREYS and R. WHITE, the evolution of multigene families by unequal crossing over. 1989Unequal crossingoverbetween homologous chromo- Nature 265: 304-310. somes is nor the major mechanism involved in the generation PONTICELLI,A. S., and G. R. SMITH, 1992 Chromosomal context of new alleles at VNTR loci. Genomics 5: 382-384. dependence of a eukaryotic recombinational hot spot. Proc. WONG, 2.. N. J. ROYLEand A. J. JEFFREYS, 1990 A novel human Natl. Acad. Sci. USA 89227-231. DNA polymorphism resultingfrom transfer of DNA from KOTWEIN,P., S. YOKOYAMA,D. K. DIDIERand J. M. CHIRGWIN, chromosome 6 to chromosome 16. Genomics 7: 222-234. 1986Genetic an;llysis of thehypervariable regionflanking WONG, 2.. V. WILSON,I. PATEL, S. POVEYand A. J. JEFFREYS, the human insulin gene. Am. J. Hum. Genet. 39291-299. 1987 Characrerization of a panel of highly variabk minisatei- ROYLE,N. J., R. E. CLARKSON,Z. WONG and A. J. JEFFREYS, lites cloned from human DNA. Annu. Hum. Genet. 51: 269- 1988 Clustering of hypervariable minisatellites in the proter- 288. minal regions of human autosomes. Genomics 3: 352-360. ZUCKERKANDL,E., G. LATTERand J. JURKA, 1989 Maintenance SINGER,M. F., 1982 Highly repeated sequences in mammalian of function without selection: Alu sequences as cheap genes. J. . Internat. Rev. Cytol. 76: 63-1 12. Mol. Evol. 29: 504-5 12. SMITH,G. P., 1976 Evolutionof repeated DNA sequences by Communicating editor: W-H. LI