Multiple Substitutions Create Biased Estimates of Divergence Times and Small Increases in the Variance to Mean Ratio
Total Page:16
File Type:pdf, Size:1020Kb
Heredity 58 (1987) 331-339 The Genetical Society of Great Britain Received 30 May 1986 Multiple substitutions create biased estimates of divergence times and small increases in the variance to mean ratio G. Brian GoJding Department of Biology, 4700 Keele St., York University, North York, Ontario, Canada M3J 1P3. Analysis of mutational processes has demonstrated that mutations usually occur as non-random events with many factors that influence the fidelity of DNA replication. One such unusual pattern of mutation shows that some mutational events will create more than one sequence alteration. This possibility is not generally considered in estimates of sequence divergence and yet affects both the mean and variance of these estimates. Theoretical results and simulation results are presented to examine how extensive the effects of multiple alterations resulting from single mutational events may be on sequence divergence. It is shown that estimates of the divergence times are biased but that this bias is not large unless the number of sequence alterations per event are unrealistically large. The number of alterations per event required to achieve a given bias is determined. The variance is increased by multiple alterations above the variance expected for the same mean number of single alterations, but not up to the levels that are observed in nature. The resulting increase in the variance to mean ratio changes with the amount of divergence, from an initially high ratio followed by a slow decline to one. INTRODUCTION nucleotides (Kimura, 1981; Aoki et aL, 1980; Sequencecomparisons among homologous genes Takahata and Kimura, 1981; Kaplan and Risko, allow differences between species to be measured 1982; Aquadro et a!., 1984). Other mutational and permit the reconstruction of phylogenies. biases will also affect estimates of divergence. Using these measures, the divergence time between Mutations are well known to have hotspots (e.g., species can be estimated if a molecular clock Benzer, 1961; Coulondre et al., 1978) and Holm- (Zuckerkandi and Pauling, 1965) is assumed. quist has demonstrated that hotspots of substitu- Methods to estimate sequence divergence were first tion act to increase the number of unobserved proposed by Jukes and Cantor (1969). They noted substitutions at a single site (Holmquist and Pearl, that the amount of divergence between two protein 1980; Holmquist et aL, 1982). sequences is a simple function of the product of These formulae and theirunderlying the rate of substitutions and the time since the two hypotheses also imply a predictable variance of proteins diverged. Kimura and Ohta (1972) sequence divergence. Observations generally sug- modified and applied this formula to measure the gest a variance that is consistently larger than pre- divergence between two DNA sequences. These dicted (Ohta and Kimura, 1971; Langley and Fitch, formulae strictly apply only when substitutions 1973, 1974; Fitch and Langley, 1976). Usually, the behave as random stochastic events with constant observed variance is two to three times the size rate. expected but Gillespie (1986) has recently Recently, this work has been further modified observed variances up to 35 times that expected. to include biases in the ways in which mutations This and more extensive examinations of sequence occur (Kithura 1980). For example, transitions are evolution (Hudson, 1983) have been used as known to occur more frequently than transversions evidence suggesting that the neutral theory is an (e.g., Vogel and Kopun, 1977). The modified insufficient explanation for the observed substitu- sequence divergence formulae are applicable with tions. more than simple transitional biases and can Currenttheoriesand experimentsof include diverse patterns of mutation between the mutagenesis are indicating that other types of 332 G. BRIAN GOLDING biases may occur during mutational events and it will correct one copy according to the sequence that several substitutions may occur as a result of of the other copy (conversion in multigene families a single event. Here, the effects of multiple events is a well known event, e.g.,Arnheim,1983). If only on estimates of mean sequence divergence and on a single copy is being observed, then several the variance of these estimates are investigated. It sequence alterations (that had slowly accumulated is shown that when such effects occur, the mean in the other gene) will suddenly appear as the rates of sequence divergence will be consistently, result of a single conversion event. This mechanism but only slightly biased. Multiple substitutions per is not limited to conversion only between complete event will also increase the variance to mean ratio. genes, smaller segments within genes may also be However, the expected rates at which multiple converted. events might occur are probably too low to explain the high ratios observed in nature but they may Observedto occur in prokaryotes contribute to part of the answer. The evidence that multiple mutations are a Amodel of mutagenesis that produces frameshifts necessary factor to consider will first be reviewed. via direct repeats was first proposed by Streisinger Having established that multiple events can poten- eta!.(1966). In this model, a small number of tially occur, some theoretical results pertaining to nucleotides are inserted or deleted by a misalign- their effects will be derived and, finally, simulation ment of the polymerase onto an incorrect template. results are presented which demonstrate some of This template is then duplicated or its complement these effects. is deleted. Recently, Ripley (1982) and Ripley and Glickman (1983) have suggested that palindromic sequences (inverted repeats) may also template EVIDENCE OF MULTIPLE SUBSTITUTIONS deletions and frameshift mutations. Indications that this type of mechanism is acting in nature is Expected to occur mechanistically supported by the widespread observations of excessive numbers of "runs" and repeats most There are several mechanisms by which DNA rep- recently shown in viruses by Grantham eta!.(1985) licates itself that might lead naturally to a clustering and in prokaryotes and eukaryotes by Tautz eta!. of substitutions and to multiple changes. One of (1986). these is the repair pathway in DNA that has been There is, however, no reason to assume that termed bypass repair or SOS repair (Kornberg, only frameshifts and deletions can be created by 1980). This repair pathway involves synthesis such a process; base substitutions have also been under a relaxed state of proofreading in order to found to be templated via such a mechanism permit replication past, otherwise, unrepairable (deBoer and Ripley, 1984). Mutations that are lesions in the DNA. With such synthesis, every easily explained by misalignment mechanisms base inserted has a high probability of being a have been observed in the phage T4 (deBoer and mutant until normal synthesis begins again. Ripley, 1984), in B. coli,inSalmonellatyphimurium Another mechanism with this potential is indi- and in yeast (Ripley and Glickman, 1983). Com- cated by experiments showing that repair tracts in plementary to these studies, exciting in vitro DNA sequences can be very long. It has been experiments are showing a necessary role for direct found that long-type repair tracts in human cells repeats in the production of base substitutions and are induced following some mutagenic treatments deletions (Kunkel and Alexander, 1985). Misalign- and that these tracts may be up to 40 or 50 nucleo- ment mutagenesis has recently been reviewed by tides long (Synder and Regan, 1982). Again, if Drake et aL (1983) where they conclude that this errors are more likely to be made by repair enzy- mechanism is the most logical way to explain the mes, or if this tract is replicated by an error prone observed occurrence of multiple substitutions and polymerase, multiple substitutions may be the that complex mutations would be an immediate result of a single lesion. Note that these erroneous consequence. Furthermore, the resulting mutations base substitutions need not be immediately would have a highly nonrandom nature due to adjacent. their use of nearby sequence as a template. Gene conversion (Slightom eta!.,1980) is Many mutations in prokaryotes have been another mechanism that leads immediately to observed to occur in runs of identical base pairs. many sequencealterations. Consider two In T4, a frameshift hotspot consists of 6 adjacent homologous, genes that have recently diverged adenines (Pribnow et a!., 1981). Milkman and within an organism. When gene conversion occurs, Crawford (1983) have also observed clustered base ESTIMATION OF DIVERGENCE TIME 333 substitutions in the evolution of E. coli trp genes the set of covarions is slightly altered, for example that implicate events affecting runs of base pairs. due to climatic factors, a whole set of substitutions Again, these would result from a single event. may no longer be selectively deleterious. A model of mutation incorporating multiple events could to occur in eukaryotes provide a rough approximation to substitutions Thought that are under this kind of selective pressure. Possibleindications of multiple mutations were observed in one of the first sequence studies of mouse globin genes. Konkel et a!. (1979) invented THEORY OF MULTIPLE EVENTS the term "block mutations" to describe the clusters of substitutions they found between the /3" and In the absence of reverse mutation j3maJ globin genes. Within the second intron of Consider just