<<

Copyright Ó 2007 by the Society of America DOI: 10.1534/genetics.107.076067

An Unusually Low Rate in Dictyostelium discoideum,an Organism With Unusually Abundant

Ryan McConnell, Sara Middlemist, Clea Scala, Joan E. Strassmann and David C. Queller1 Department of Ecology and Evolutionary Biology, Rice University, Houston, Texas 77005 Manuscript received May 18, 2007 Accepted for publication September 4, 2007

ABSTRACT The of the social amoeba Dictyostelium discoideum is known to have a very high density of microsatellite repeats, including thousands of triplet microsatellite repeats in coding regions that apparently code for long runs of single amino acids. We used a mutation accumulation study to see if unusually high microsatellite mutation rates contribute to this pattern. There was a modest bias toward that increase repeat number, but because upward mutations were smaller than downward ones, this did not lead to a net average increase in size. Longer microsatellites had higher mutation rates than shorter ones, but did not show greater directional bias. The most striking finding is that the overall mutation rate is the lowest reported for microsatellites: 1 3 106 for 10 dinucleotide loci and 6 3 106 for 52 trinucleotide loci (which were longer). High microsatellite mutation rates therefore do not explain the high incidence of microsatellites. The causal relation may in fact be reversed, with low mutation rates evolving to protect against deleterious fitness effects of mutation at the numerous microsatellites.

ICROSATELLITES, also known as simple se- In humans, certain triplet repeats that occur in or M quence repeats, are long stretches of a short near coding regions are subject to expansions that (1–6 bp), tandemly repeated DNA unit, such as the directly cause genetic diseases (Ashley and Warren motif CAA repeated 20 times. Microsatellites are com- 1995; Cummings and Zoghbi 2000). Whether D. dis- mon throughout eukaryotic and their lengths coideum experiences such deleterious effects from its are often highly polymorphic, making them powerful many coding-region repeats is unknown. However, un- markers for use in genetic mapping (Weber 1990; published work shows that these exonic microsatellites Dietrich et al. 1994; Dib et al. 1996; Roder et al. 1998), are highly variable (C. Scala,N.Mehdiabadi,J. (Jarne and Lagoda 1996; Di Strassmann and D. Queller, unpublished results), Rienzo et al. 1998; Goldstein and Schlotterer 1999; suggesting that they are not tightly controlled by Thuillet et al. 2002), and determination of kinship selection. However, selection ought to be potent in D. (Queller et al. 1993). discoideum. It has a large geographic range ½eastern The social amoeba Dictyostelium discoideum has the North America and part of eastern Asia (Swanson et al. highest density of microsatellite repeats of any sequenced 1999) and therefore should have a large effective organism, making up .11% of its genome (Eichinger population size. Molecular evidence suggests that it is et al. 2005). As is usual (Ellegren 2004), the noncoding typical of unicellular to have a population regions are richest in microsatellites, because they are size (estimated as Nem) large enough to make selection a less functional. However, there is also an exceptional very potent force relative to drift (Lynch and Conery number of long triplet repeat loci within genes, resulting 2003). This makes it harder to explain the persistence of in large numbers of homopolymer amino acid strings. large numbers of apparently functionless, or even The most common are polyasparagine and polyglut- deleterious, microsatellites. amine; 2091 of the 13,541 predicted genes have tracts of Mutational changes in the number of repeats occur $20 consecutive repeats, and some of these have multiple during DNA replication when the two DNA strands tracts (Eichinger et al. 2005). Microsatellites occur on temporarily dissociate and then realign out of register, average every 724 bp in exons and encode 3.3% of all creating an unpaired repeat loop on one of the strands amino acids (Eichinger et al. 2005). Other eukaryotic (Streisinger et al. 1966; Levinson and Gutman 1987; genomes also have amino acid repeats, although at a Schlo¨tterer and Tautz 1992; Strand et al. 1993). much lower density (Marcotte et al. 1998; Li et al. 2004). Primary replication slippage occurring on the template strand deletes repeat units, while slippage on the nascent strand creates additional repeats. The alter- 1Corresponding author: Department of Ecology and Evolutionary Bio- mith logy, MS-170, Rice University, 6100 Main St., Houston, TX 77005. native hypothesis of unequal crossing over (S E-mail: [email protected] 1973; Sia et al. 1997) is not supported by research that

Genetics 177: 1499–1507 (November 2007) 1500 R. McConnell et al. experimentally restricted most forms of recombination We estimated D. discoideum microsatellite mutation in Escherichia coli (Levinson and Gutman 1987) and rates using a mutation accumulation experiment. In yeast (Henderson and Petes 1992) without lowering such experiments, lines are repeatedly passed through microsatellite instability. single-cell bottlenecks to fix mutations randomly. The Kruglyak et al. (1998) proposed a mutation model cell divisions between the single-cell bottlenecks pro- suggesting that higher mutation rates result in more vide some opportunity for strong selection to have microsatellites and a shift toward longer microsatellites. effects, but weakly selected mutations will be repre- High mutation rates could also account for the mainte- sented nearly randomly. nance of high variability. So one possible explanation for the high number, long length, and variability of microsatellites in D. discoideum is that this species could MATERIALS AND METHODS have an unusually high mutation rate for microsatel- lites. It is this hypothesis that we test in this report. Mutation accumulation: We started each of 90 mutation accu- Microsatellites mutate at rates much higher than the mulation lines from a common ancestor, the lab-maintained 9 axenic D. discoideum AX4 clone. The lines grew on SM agarose usual base-pair substitution rate of 10 /locus/gener- plates (10 g glucose, 10 g bactopeptone, 1 g yeast, 1 g MgSO , llegren uschiazzo emmell 4 ation (E 2000b; B and G 1.9 g KH2PO4, 0.6 g K2HPO4, 20 g agar, 1 liter H2O) with the 2006). Drosophila microsatellites have the lowest reported bacterium Klebsiella aerogenes used as a food source for the mutation rates: in the 106–104 range (Schlo¨tterer amoebas. To obtain our mutation accumulation lines, we plated et al. 1998; Schug et al. 1998; Harr and Schlo¨tterer out the ancestral clone and selected a single plaque to serve as azquez the ancestor (perfectly circular clearings or plaques in the 2000; V et al. 2000). Mammalian mutation rates, bacterial lawn derive from a single cell). This single cell line 5 2 including that of humans, fall between 10 and 10 was plated out clonally, 10 single plaques were selected, and (Serikawa et al. 1992; Weber and Wong 1993; Dietrich the process was repeated to obtain 10 plaques from each of et al. 1994; Brinkmann et al. 1998; Sajantila et al.1999; these. Xu et al. 2000), as do rates reported for plants (Udupa From the resulting 100 lines, 90 were used as mutation aum huillet igouroux accumulation lines and the remaining 10 were control lines and B 2001; T et al.2002;V et al. that are not part of this report. Each mutation accumulation 2002). line was put though a series of 70 single-cell bottlenecks, Slippage rates in vitro are 100- to 1000-fold higher separated by 48-hr episodes of growth on plates as described than in vivo rates (Strand et al. 1993) because func- above. The single-cell bottlenecks were accomplished by tional mismatch repair systems maintain drastically randomly selecting a clonal plaque at the end of 48 hr and transferring cells from that plaque to the next plate. lower rates in the latter. Only those slippage mutations We estimate that the 48-hr growth periods encompassed an overlooked by the mismatch repair system are propa- average of 14.18 cell generations. This figure is the un- gated in successive replication events. Mutations in the weighted average of estimates for the ancestral clone (14.12 mismatch repair system destabilize microsatellite DNA 6 SD 0.62, an average of eight estimates) and the 90 mutation in E. coli (Levinson and Gutman 1987), yeast (Strand accumulation clones at the end of the experiment (14.24 6 SD ierdl olodner 0.71, n ¼ 90). Each estimate was obtained by collecting and et al. 1993; W et al. 1997), and humans (K counting the cells from a single clonal plaque after 48 hr and 1996). Observed microsatellite mutation rates thus taking the base 2 logarithm. Thus, each line went through reflect a balance between primary replication slippage 14.18 3 70 ¼ 1007 cell generations. and mismatch repair efficiency. We extracted DNA from all 90 lines at the completion of the Mutation rates are not uniform even within a ge- 10th and the 70th bottleneck. D. discoideum has a multicellular fruiting stage and we extracted DNA from the spore masses nome. Most strikingly, rate of slippage increases with with 150 ml of a 5% chelex solution. The thousand generations microsatellite length (Weber and Wong 1993; Kroutil of the experiment were all in the single-cell vegetative stage. et al. 1996; Wierdl et al. 1997; Brinkmann et al. 1998; Microsatellite selection, amplification, and genotyping: We Schlo¨tterer 2000; Ellegren 2004), as there are more downloaded the genomic DNA sequence of all six D. discoideum sites where slippage can occur and the conformational chromosomes from the online Dictyostelium database (http:// www.dictybase.org). A modified version of the program Sputnik entropy of slippage is 2 kcal/mol more destabilizing (http://espressosoftware.com/pages/sputnik.jsp) was used to for long direct repeats than for shorter repetitive runs compile a list of all microsatellites containing at least five perfect (Harvey 1997). repeat units. We designed three sets of primers. First, we A simple stepwise mutation model is not stable; it designed primers for 27 of the longest trinucleotide repeat leads to continual growth of microsatellites (Kruglyak microsatellite loci occurring within coding regions (exons) of genes (Table 1). The selected microsatellites comprised five et al. 1998). The size of microsatellites might be limited different repeat motifs: (CAA)n, (AAT)n, (AGT)n, (TCA)n,and if large microsatellites have a downward slippage bias (GAA)n. Each motif can be read multiple ways, and 10 codons (Wierdl et al. 1997; Harr and Schlo¨tterer 2000) or if were included in the study (e.g., the CAA motif also includes large microsatellites tend to have large deletion muta- ACA, AAC, TTG, TGT, and GTT codons). These microsatellites tions (Ellegren 2000a). Another such factor is point ranged in length from 33 to 76 repeat units, although 17 of the etes 24 were at least 50 repeats long. mutations that interrupt the repeat sequence (P To more systematically explore the role of different repeat et al. 1997; Kruglyak et al. 1998), but we did not motifs, we selected an additional 28 microsatellites from examine this factor. within exonic regions of genes (Table 2), 7 from each of the Low Microsatellite Mutation Rate 1501

TABLE 1 The 24 long trinucleotide microsatellites, categorized by codon and length

Primary Encoded Dictybase ID Other repeats No. of amino acid (gene containing in amplified Codon repeats homopolymer microsatellite) region (.10 bp) Primer sequences (forward, reverse) GAA 33 Glu (E) DDB0202150 None 59-CAGATGATGATTTTGAGGAAGAA 59-TTCTCCCATTTCTTGATCGTC CAA 48 Gln (Q) DDB0188933 (CAA)5 59-TTGATCAAAAGATACATCATTATTTGG 59-TGATCAACAGCAACAACAACAA CAA 50 Gln (Q) DDB0206397 (CAA)5 59-TGGACAACAACCAATTCAACA 59-TGTGGCTGAAAATTAGGGTCA CAA 50 Gln (Q) DDB0217463 (AAT)3 59-CAATGATTGATGGATTGTTTGATT 59-TGGAGTGAATTGGGTAACAAAA CAA 52 Gln (Q) DDB0218043 None 59-CTCCATTTTGTAAGGCAATTTT 59-TCTGAAGATGATGGTGGTGA CAA 54 Gln (Q) DDB0204229 None 59-CCAAATTTCATTTCAATCTCTCA 59-TCTAAAATTGGGGTAGTTGGTGA TCA 45 Ser (S) DDB0218643 None 59-TTAATCAAAGTCAAATTGGTTTACAA 59-ATTATTGTTATTTGATGATGATGATGT TCA 46 Ser (S) DDB0219853 None 59-CAATACCACCACCACCACAG 59-GGTGGCGATGATGATGTAGTT TCA 47 Ser (S) DDB0185301 None 59-AAACCGACAGAAACTGATCTTT 59-TTTTAAAATTGATAAAGATGATGACGA TCA 49 Ser (S) DDB0204624 None 59-CACAAACCTCAACTTCAACAACA 59-TTGGTTTTGTTGATGACTCAAT TCA 51 Ser (S) DDB0205463 (TCA)7 59-TCTCCAACATCATCATTAACAGA 59-CAATTATTAATCTCCATTTCATTTTCA TCA 54 Ser (S) DDB0217532 None 59-CCTGAACAAACACATTCCTCAA 59-GGGGTTATTGTTGGTGCTGA TCA 55 Ser (S) DDB0169144 None 59-AGGATAGCTCTCAGCCATCAA 59-AATGTTGGTTGGGATGATGA TCA 70 Ser (S) DDB0168484 None 59-TCAAGCTCAAGCTCTTCATCA 59-GCATTTGCACACCTGGATTA TCT 41 Ser (S) DDB0219539 None 59-AAAATTTTGATTGGTCAAAAATAGA 59-CCATATCTTTAATTGATTTCGTTGA AAT 62 Asn (N) DDB0190379 (AAT)7; (AAT)4 59-TTGGAAAAAGCCAACAACCT 59-TCAAAGTCCATGGTACAAAACC AAT 66 Asn (N) DDB0187962 None 59-TGGAAATTAGAAAACTAATCAATGACA 59-GACATAATTGTCGTCTTCTAGAGGTTT AAT 67 Asn (N) DDB0218176 (AAT)5 59-AACAGTTTTAAAAATCCAAATAGAGG 59-CCCCATCCCGTTTCACTAAT AAT 68 Asn (N) DDB0219227 None 59-GAGTCGATGTAATCAACCATCAG 59-AAAACTGGTACTGCAACCACAA AAT 69 Asn (N) DDB0205468 None 59-GAAAAAGAAAAAGAAGAAAAAGAAGA 59-TCAGAAGAAATCTTTGTTGAAGAGA AAT 71 Asn (N) DDB0220097 (AAT)9 59-CCTTTGCCACCACACAATAG 59-TTATCATTTGTTGGCGTTGA AAT 71 Asn (N) DDB0217187 (AAT)9; (AAT)4 59-CCTTTGCCACCACACAATAG 59-TTATCATTTGTTGGCGTTGA AAT 74 Asn (N) DDB0206433 None 59-TGATATGAATCCTTCAGCTGTCA 59-TGATGTTGCTGAACTTGTTGTTT AAT 76 Asn (N) DDB0217693 (AAT)5 59-GTTGCCGATGATGAAGATGA 59-ATGGAAATAACACACAACCAAAGA

motifs (TCA)n, (AGT)n, (AAT)n, and (CAA)n, again with of repeated DNA within genes, the polymerase chain reaction multiple codons represented. For each motif, the 7 micro- (PCR)-amplified region sometimes contained small perfect satellites had lengths of 10, 15, 20, 25, 30, 35, and 40 repeat repeats in addition to the desired microsatellite. Such addi- units (with the exception of three microsatellites that differed tional repeats .10 bp are also listed in Tables 1 and 2. from these stated lengths by 1 repeat unit and one case that To explore whether our results also applied to what is differed by 2 repeat units; see Table 2). Due to the high density usually the most common class of microsatellites in most other 1502 R. McConnell et al.

TABLE 2 The 28 trinucleotide microsatellites chosen for seven specific lengths and grouped by motif

Primary Encoded Dictybase ID Other repeats No. of amino acid (gene containing in amplified Codon repeats homopolymer microsatellite) region (.10 bp) Primer sequences (forward, reverse)

GAT 10 Asp (D) DDB0217027 (ATC)6 59-AAAATTTCTTCGTCATCGTCA 59-CAAACAACACCAATTTCCAAAA TCA 15 Ser (S) DDB0187888 None 59-CAACAACAACAACAACCAATG 59-CCTGATGATGCTGATGATAGTGA TCA 20 Ser (S) DDB0185770 None 59-TGGTGGTAATGTTTTTGATGGA 59-AACTCCAATTTTGCCACCTAA TCA 25 Ser (S) DDB0205654 None 59-CAATTTCACGGCTGTACAACAT 59-TCACCAGCATCTTCTTCTTCTTC TCA 30 Ser (S) DDB0189523 None 59-TGATTTGTTGTTGCTGTTGTTG 59-TGTGATGGTCCAAAAGGTGA GAT 34 Asp (D) DDB0185762 None 59-ATTTGGCCATCGATTTCATC 59-GATATATTCCTAGTTCCACCACCA TCA 40 Ser (S) DDB0168851 None 59-TGTTAAGCCACAACCACCAA 59-TTGATGCTGAGGTCATAGTCG ACT 10 Thr (T) DDB0186078 None 59-TCAGCAGTTGTGGTGGTTGT 59-GCTCCTCTGATGAAGCAAATG ACT 15 Thr (T) DDB0217198 None 59-CAATTCATGCAAGACAAATTTCA 59-TGGTGGTTGTGTTGTTGTTG AGT 20 Ser (S) DDB0217992 None 59-TTTCAAATGAAATTATTGGTTTCG 59-TTGCTATCATCATCACCACCA AGT 24 Ser (S) DDB0218479 (TAG)5; (AAT)4 59-CCACCACTATTATCATCATCGCTAC 59-CCACCAGTAGCAGTAGCAACA AGT 29 Ser (S) DDB0205678 None 59-TCCAACCTAATTCGTAGATTGGTAA 59-CCAATAACACCAATTTCAGCAA AGT 35 Ser (S) DDB0204798 None 59-TTTCAAAACCATTTACAAAAATTGA 59-CACTACTACTACCACCACCTTCACC AGT 38 Ser (S) DDB0217727 (AAT)22; (AAT)15; (AAT)6 59-GGAAGTTGGTTCAAATCAAAGA 59-TGGTGCACAAAATTCACTACTTT AAT 10 Asn (N) DDB0217474 None 59-AGAGCTCCGAAACAACCAAC 59-TGATCACCTCCACCACCAC AAT 15 Asn (N) DDB0186493 (AAT)12; (AAT)4 59-TTCCTTGAAAGTAATGGTTTATCA 59-CTTCATCAATTTGCGTATCCA AAT 20 Asn (N) DDB0229990 None 59-TCACCTTCCAAACCTCAACC 59-TGTTTTTCGTGTAAAGTTTTCTCA AAT 25 Asn (N) DDB0191683 None 59-TGGTGTCATTTGATATCTAACATTTTC 59-TTACTTCACATTCAACATTCAAAAA AAT 30 Asn (N) DDB0218590 None 59-CAATTGGTGAATTTGCTCTAATTTT 59-AAAGAAGAAGAGATTGGTAATCAAGA AAT 35 Asn (N) DDB0167286 (AAT)9 59-TTCAATTAAACGAAATCAATTGGTAA 59-TTCAGAAATTTTAGAAGCATTCTTTG AAT 40 Asn (N) DDB0190388 (AAT)4 59-TGTTAATGACGGTAGTGTTAGTGG 59-CCATTTTCACCACTTCCAAA AAC 10 Asn (N) DDB0205452 (AAC)4; (TAA)4 59-TCATCATCATCATCACAACAACA 59-ACCTTGGCTTTCACCTCTCA ACA 15 Thr (T) DDB0184155 None 59-ACACCAACTGCATCACCATT 59-TTTGTTGAGGTGTTTGTGGTG CAA 20 Gln (Q) DDB0169070 (AAG)6 59-TTCAGCTTCCTCAGCTTCTTC 59-AACCACCTGGTTTAGATAATGG CAA 25 Gln (Q) DDB0229350 (CAA)4 59-TCCAAAGATGCTGAATGTGG 59-GGTGCAAAGAAACCACCACT CAA 30 Gln (Q) DDB0188273 (CAA)4 59-GAATTTGTGGTGGGACCTGT 59-TCAAACATTCCTTTGCTTCAA CAA 35 Gln (Q) DDB0190587 (AAT)4 59-TGCAAAGAAAATGTCAGCAA 59-TTTTATAATCCATGGTTTTTCTTCA CAA 40 Gln (Q) DDB0189776 (CAA)9; (CAA)6 59-TCAAATCAACTTTGGGAGCA 59-TTTGTTGGTTGTTGTTGTTGC Low Microsatellite Mutation Rate 1503

TABLE 3 The 10 dinucleotide repeat loci

Primary Other Dictybase ID repeats (gene containing in amplified No. of microsatellite region Motif repeats in intron) (.10 bp) Primer sequences (forward, reverse)

AT 14 DDB0235342 T18;T13 59-AATATTCCATTACCCATCCACTT-39 59-TGGTGATACATTTGAAACCAAAA-39 AT 15 DDB0189392 T9;A11 59-TGACATCATTTTAGAAAGCCAAGA-39 59-TCAACAGTGTGTGCGATTTTT-39 AT 15 DDB0233815 None 59-TGTTGTTCATTGTTCATTGTTTAGG-39 59-TAATTGGACCTGCACCCTCT-39 AT 16 DDB0216449 None 59-GAAGGGGAAAATGATGGTTG-39 59-TCGAAATTTTTACTTTTCTTTCG-39 AT 17 DDB0189593 A13 59-CCCAAAATGGAAACAAAACC-39 59-TGTTTTATATTAAATAACTGCGCAAA-39 AT 18 DDB0205827 None 59-TTCAAACTTTATCAGAAAGAATAAAGG-39 59-GAATGGTTGGTTGTTCACGA-39 AT 20 DDB0233972 None 59-TGCTTACCACACGCTTTCAC-39 59-TTTTTATTTGGGTAAAAATTGGTT-39 TC 20 DDB0204051 None 59-CAAATTCAAAATCAAATGTCATCA-39 59-AAAATTGGTTCAACCTTTTCCTC-39 AT 24 DDB0189519 T24 59-TTTAAACCTTTTTGGTAAAGTTGG-39 59-TTTCAATTGATAATTTCTGTTTAAAGG-39 AT 34 DDB0234212 A27;A18 59-AGGGTTTTCCCAGTCACGACGTTTTCAGGTA GTTCTGATTCACCAAA-39 59-TGATTTTGGTCTATGACTTAATCTTTT-39 species, we also designed primers for 10 dinucleotide repeat had mutated and replicated during the grow-up phase in the loci (Table 3). These ranged from 14 to 34 repeats in the establishment of the lines. genome sequence and all were from predicted introns. The experiment included 90 lines that had gone through 71 Nine had AT repeats and 1 had CT repeats. Some additional bottlenecks of 14.18 generations each, for a total of 90,610 repeats occurred inside the amplified regions of both dinu- meioses/locus. For a given set of loci, the mutation rate was cleotides and trinucleotides and these are also listed in therefore calculated as number of mutations/(number of loci Tables 1–3. 3 90,610). Confidence intervals on mutation rates were We tagged one member of each primer pair with a calculated by first obtaining 95% confidence intervals fluorescent dye molecule at the 59-end. The PCR amplified (C.I.’s) on the number of mutations using the cumulative the microsatellite loci in all 90 mutation accumulation lines 0.025 and 0.975 points of the cumulative Poisson distribution using the DNA extracted after the 70th bottleneck. Each 20-ml and then dividing by the appropriate number of mitoses PCR reaction contained 2 ml of DNA extract, 8 ml of a primer (Casella and Berger 1990). mixture containing forward and reverse primers, 6 ml water, 2 ml103 buffer, 1.2 ml magnesium chloride, 0.6 ml dNTPs, 0.25 ml BSA, and 0.3 ml Taq polymerase. A touchdown PCR RESULTS procedure cycled 20 times through a range of annealing temperatures between 60° and 50°, dropping by 0.5° with each Trinucleotide mutation rate: We observed 33 changes new cycle before holding at 50° for 10 additional cycles. A in repeat number for the trinucleotide loci that we 10-min polymerase extension period at 72° concluded the assayed. Each was rechecked from the DNA extracted reaction. The amplified products were cleaned and precipi- tated with ethanol. We determined PCR product length on an after bottleneck 10 to see if it occurred prior to that Applied Biosystems 3100 model automated genetic sequencer time. Eleven of the mutations had occurred by bottle- running the programs GeneScan 3.7 and GENOTYPER. neck 10 and 22 occurred in the 60 following bottlenecks. Length differences of multiples of the repeat unit (usually This is significantly more mutations than expected 3 bp) between the PCR product and the ancestor indicated in the period before generation 10 (x2 ¼ 8.02, d.f. ¼ 1, repeat mutations. We ignored differences of significantly less than the repeat unit, as they are either measurement error or P , 0.005). Closer examination showed that 3 of the 11 real changes not attributable to misalignment and slippage in early mutations were duplicates—the same mutational the triplet repeat microsatellite. change occurring twice in the same line—which might Microsatellites that had mutated were then amplified using indicate that the mutation actually occurred at one DNA extracted from the 10th bottleneck. Three trinucleotide single time during the grow-up generation that began loci were discarded (and are therefore not listed in Table 1) because they had apparently non-independent mutations. the lines. One of these mutations added 1 repeat, one This was indicated by multiple (10 or more) identical subtracted 1 repeat, and one added 11 repeats. It seems mutations, present by generation 10, that suggested that they possible that the changes of 1 repeat unit each occurred 1504 R. McConnell et al.

Figure 1.—Number of mutations observed for different changes in repeat number. The zero class (no mutation) is not shown. twice, but at least the 11-repeat change seems likely to have occurred only once as such large changes are quite rare (see below). If these three mutations are deleted from the data set, the difference before and after bottleneck 10 is no longer significant (x2 ¼ 2.86, d.f. ¼ 1, P . 0.05). We therefore eliminated one copy of each of these three mutations from the data set. Dividing the number of observed mutations by the number of mitoses experienced in the mutation accu- mulation lineages yields an estimated mutation rate of 6.37 3 106 (95% C.I. 4.30 3 106–9.09 3 106). This value is quite low for a microsatellite mutation rate, contrary to the hypothesis that high slippage rates are the cause of the high density and variability of micro- satellites in D. discoideum. This conclusion is unaffected by our decision to eliminate the three duplicate copies, as including them raises the mutation rate by only 10%. Size and direction of trinucleotide mutations: Figure 1 shows the repeat number changes observed in our 30 Figure 2.—Mutational changes as a function of original re- trinucleotide mutations. Most of the mutations—20 of peat numbers. The size of solid circles indicates the number 30—were changes of a single repeat. Of the remaining of overlapping data points from 1 to 4. (A) Number of muta- tions observed (y ¼ 0.024x 0.38; P ¼ 0.003). (B) Absolute 10, 5 changed by 2 repeats and the remainder ranged up value of the change in number of repeats of mutations ob- to a loss of 32 repeats. There was a bias toward increases served (y ¼ 0.11x 2.94; P ¼ 0.12, but is significant if the out- in repeat numbers, with 20 increases and 10 decreases. lier at 32 is removed; see text). (C) Signed change in However, largely because of the one mutation that lost number of repeats of mutations observed (y ¼ 0.044x 32 repeats, the average change in repeat number was 2.20; P ¼ 0.6). not significantly .0 (0.23 6 SE 1.27). The average of the increases was 2.43 (60.79) and the average loss of the (Figure 2B). The regression is not significant but this is decreases was 4.89 (63.22). Excluding the 32-repeat largely due to the outlier that lost 32 repeats. This point loss, there is a significant upward bias (1.35 6 SE 0.63). ought to support an increasing trend because it is a large Length dependence of trinucleotide mutations: The change at a locus with many repeats, but it adds so much mutation rate increased with the number of repeats in variance that it prevents significance from being ob- the locus (Figure 2A). For the 26 trinucleotide loci with tained. Removing that point results in a significant ,40 repeats, the mutation rate was 8.49 3 107 (95% positive slope (y ¼ 0.067x 1.52; P ¼ 0.049). The loci C.I. 1.03 3 107–3.07 3 106), while for the 26 tri- having .60 repeats showed most of the mutations of .1 nucleotide loci with .40 repeats, it was 1.18 3 105 repeat and all 10 of the mutations changing .2 repeats. (95% C.I. 7.90 3 106–1.72 3 105). Because of the The largest change, a loss of 32 repeats, was seen at tendency of microsatellite mutation rates to increase a 67-repeat locus, which is consistent with control of with length, rates are sometimes calculated on a per- microsatellites by large deletions. However, there was no repeat basis. The loci had a total of 2064 repeats, and the distinct tendency for longer microsatellites to contract mutation rate per repeat was 1.60 3 107. instead of expand (Figure 2C). The absolute sizes of mutational changes also in- Motif dependence of trinucleotide mutations: We creased with the original repeat number of the locus assayed five different trinucleotide motifs, comprising Low Microsatellite Mutation Rate 1505

TABLE 4 Trinucleotide mutation rates by motif

Codons Loci Mutations Mutation rate 95% C.I. AAT 16 8 5.52 3 106 6.31 3 106–1.79 3 105 AAC, CAA, ACA 12 7 6.44 3 106 5.70 3 106–1.93 3 105 AGT, ACT 7 1 1.58 3 106 7.56 3 106–2.90 3 105 TCA, GAT 15 9 6.62 3 106 6.18 3 106–1.82 3 105 GAA, TCT 2 5 2.76 3 105 1.34 3 106–3.99 3 105

10 different codons. Table 4 shows the estimated et al. 1997), and the later addition of 39 dinucleotide mutation rates for each. The three motifs for which we brought the dinucleotide mutation rate up to 9.3 3 106 sampled at least 10 loci had remarkably similar esti- (Schug et al. 1997). Another estimate from 16 dinucle- mates. The other two had point estimates that were a otide and 12 trinucleotide loci yielded a similar rate of little higher and lower, but the confidence intervals all 5.1 3 106 (Vazquezet al. 2000). Although these rates are overlap and do not indicate any true differences. very similar to those of D. discoideum, the Drosophila loci Dinucleotide mutation rate: Only a single mutation studied had fewer repeats. Thus, when calculated on a was observed for the 10 dinucleotide loci. It was a per repeat basis (see Kruglyak et al. 2000), the D. change from 34 to 36 repeat units in the longest repeat discoideum rates (dinucleotide 5.7 3 108; trinucleotide studied (locus DDB0234212). This gives an average mu- 1.6 3 107) are somewhat lower than the Drosophila tation rate of 1.10 3 106 (95% C.I. 2.79 3 108–6.15 3 ones (dinucleotide 7.7 3 107; trinucleotide 2.7 3 107) 106), lower than that observed for the trinucleotides. and also lower than those estimated for yeast (dinucle- The dinucleotide mutation rate could be even lower otide 9.2 3 107; trinucleotide 5.0 3 107). than this estimate because the mutated locus included Thus, the D. discoideum point estimates appear to be two long mononucleotide repeats (Table 3), but a similar to, or even lower than, the lowest yet recorded. change of 4 bases is more easily explained by a mutation The main point, however, is not which species has the in the long dinucleotide locus. The mutation rate per lowest rate, but that we can definitively exclude the repeat was 5.72 3 108. hypothesis that the high abundance and length of microsatellites in D. discoideum derives from a higher- than-normal mutation rate. D. discoideum has accumu- DISCUSSION lated numerous long microsatellites for some other Low mutation rate: We initiated this study with the reason, in spite of its low mutation rate. idea that a high mutation rate might help explain the D. discoideum generally has low mutation rates in the extraordinary abundance of microsatellites in the D. presence of various mutagenizing agents, perhaps se- discoideum genome. High abundance and also high lected for because of high exposure to chemical mutagens variability could be explained if D. discoideum had an in the soil (Deering et al. 1996). The low microsatellite unusually high mutation rate for microsatellites, lead- mutation rates that we found may simply be one man- ing to a higher rate of neutral evolution at these loci. In ifestation of a generally low mutation rate. fact, we found the opposite: D. discoideum has an Properties of trinucleotide repeat mutations: In unusually low microsatellite mutation rate, 6.37 3 addition to showing an overall low mutation rate, 106/locus/generation for the trinucleotides tested the data also provide information on the properties and 1.10 3 106 for the dinucleotides. In other species, of mutations that may provide further insight into mutation rates are usually higher—in the range of 102– why D. discoideum has so many microsatellites relative 105/generation (Ellegren 2000b; Buschiazzo and to other species. One feature that would increase Gemmell 2006)—although many of these estimates are microsatellite accumulation is an upward bias in for multicellular organisms, which can have many mutations, as has been observed in some species mitoses per generation. (Amos et al. 1996; Primmer et al. 1996; Vigouroux The D. discoideum mutation rates are most similar to et al. 2002). At first, there appears to be an upward those of Drosophila melanogaster and Saccharomyces cerevi- bias in D. discoideum trinucleotides, with 20 of 30 siae, which are exceptional among previously studied mutations leading to increases in repeat number. organisms for their low microsatellite mutation rates. However, because decreases tended to be larger than Two mutation accumulation studies in D. melanogaster increases, there is no significant gain of sequence yielded low mutation rates. One estimate from 24 D. unless the outlier loss of 32 repeats is excluded. Thus, melanogaster 10 dinucleotide, 6 trinucleotide, and 8 this factor does not explain why D. discoideum has so tetranucleotide repeat loci was 6.3 3 106 (Schug many long repeat loci. 1506 R. McConnell et al.

It has been suggested that repeat numbers may be long repeats, particularly in genes, it could select for regulated by a change in the direction of mutations with highly efficient mismatch repair mechanisms. length: shorter loci might have an upward mutational We conclude with a new hypothesis for what that bias while longer loci have a downward mutational bias other factor driving microsatellite abundance might be. (Garza et al. 1995). Some evidence has been found for At .77%, D. discoideum has one of the most AT-rich this pattern (Lai et al. 1994; Harr and Schlo¨tterer genomes known (Eichinger et al. 2005). This could have 1998; Xu et al. 2000). However, this model is not the effect of increasing the supply of proto-microsatellites. supported by our data; mutations in loci with many Microsatellites can start from duplication of nonrepeat repeats are not more likely to result in losses (Figure sequences (Zhu et al. 2000; Nishizawa and Nishizawa 2C). Even including the 32-repeat-loss mutation, which 2002) or they can start from chance point mutations occurred in a large microsatellite, there was no trend generating enough repeats to increase the chance of toward larger average losses with high repeat number. slippage to higher repeat numbers, perhaps with some We found no dependence of mutation rate on the critical threshold (Levinson and Gutman 1987; Messier triplet motif. This provides further evidence against the et al. 1996). A natural extension of the second hypothesis hypothesis that mutation rates are driving microsatellite is that an AT-biased genome (or a CG-biased one) would abundance. Among triplet motifs, AAT is by far the most tend to accumulate more small repeat sequences by point common in D. discoideum, both inside and outside of substitution than an unbiased genome and therefore coding sequences (Eichinger et al. 2005). Apparently have more sequences passing the threshold where the this does not result from unusually poor replication or slippage process takes over. This idea was incorporated of AAT tracts, because AAT mutation rates in a null model by Dieringer and Schlo¨tterer (2003; were typical among those that we measured. However, we see Figure 1) but AT bias was not proposed as a primary did not test motifs that rarely appeared in long repeats, so explanation for differences between species. Depristo it remains possible that their low abundance is due to still et al. (2006) proposed that AT bias accounts for variation lower rates of mutation. in abundance of low-complexity regions in proteins. We Selection: The impact of repeats of amino acids on suggest that the explanation will apply most strongly to D. discoideum is unknown. Variation in triplet repeats in microsatellites (as the units of lowest complexity and coding regions sometimes has some functional signifi- highest slippage) and that it should apply even more cance (Fondon and Garner 2004; Li et al. 2004; strongly to nonprotein sequences than to constrained Hammock and Young 2005). A number of human ge- coding ones. netic diseases arise from expansion of triplet repeats in We thank Amanda Cruess for the computer search for microsatellite coding regions (Ashley and Warren 1995; Cummings sequences, Bill Loomis for sharing his unpublished Western blot, and and Zoghbi 2000), showing that long repeats can some- two anonymous referees for comments. This article is based upon work times be detrimental. supported by the National Science Foundation grant EF-0328455. One possible explanation for the large number of long repeats in coding regions is that there is some unknown splicing mechanism that removes these re- LITERATURE CITED peats either from mRNA or from protein. Such repeats Amos, W., S. J. Sawcer,R.W.Feakes and D. C. Rubinsztein, could become common and long because splicing out 1996 Microsatellites show mutational bias and heterozygote in- renders them harmless. However, these triplet repeats stability. Nature 13: 390–391. do appear in cDNA sequences and are therefore pre- Ashley, C. T., and S. T. Warren, 1995 Trinucleotide repeat expan- sion and human disease. Annu. Rev. Genet. 29: 703–728. sent in mRNA. A mechanism for splicing amino acid Brinkmann, B., M. Klintschar,F.Neuhuber,J.Huhne and B. Rolf, repeats out of proteins seems unlikely, and one piece of 1998 Mutation rate in human microsatellites: influence of the evidence argues against it. A Western blot of D. dis- structure and length of the tandem repeat. Am. J. Hum. Genet. 62: 1408–1415. coideum protein stained with an antibody that recognizes Buschiazzo, E., and N. J. Gemmell, 2006 The rise, fall and renais- stretches of 25 or more glutamines shows a very broad sance of microsatellites in eukaryotic genomes. BioEssays 28: smear, suggesting that proteins of all sizes have these 1040–1050. oomis Casella, G., and R. L. Berger, 1990 Statistical Inference. Wadsworth, repeats (W. F. L , personal communication). Belmont, CA. Future studies are needed to determine if long Cummings, C. J., and H. Y. Zoghbi, 2000 Trinucletide repeats: repeats are detrimental to fitness in D. discoideum.If mechanisms and pathophysiology. Annu. Rev. Genomics Hum. Genet. 1: 281–328. they are detrimental, it could explain why D. discoideum Deering, R. A., R. B. Guyer,L.Stevens and T. E. Watson-Thais, shows microsatellite mutation rates that are so low—- 1996 Some repair-deficient mutants of Dictyostelium discoi- even lower than in the previous standard for low rates, deum display enhanced susceptibilities to bleomycin. Antimi- crob. Agents Chemother. 40: 464–467. D. melanogaster. It is possible that there is a causal rela- DePristo, M., M. Zilversmit and D. Hartl, 2006 On the abun- tionship in the opposite direction from what we first dance, amino acid composition, and evolutionary dynamics of hypothesized. We initially supposed that high mutation low-complexity regions in proteins. Gene 378: 19–30. Dib, C., S. Faure,C.Fizames,D.Samson,N.Drouot et al., 1996 A rates might drive the evolution of long repeats. But it is comprehensive genetic map of the human genome based on also possible that if some other factor generates many 5,264 microsatellites. Nature 380: 152–154. Low Microsatellite Mutation Rate 1507

Dieringer, D., and C. Shlo¨tterer, 2003 Two distinct modes of mi- Petes, T. D., P. W. Greenwell and M. Dominska, 1997 Stabilization crosatellite mutation processes: evidence from the complete ge- of microsatellite sequences by variant repeats in the yeast Saccha- nome sequences of nine species. Genome Res. 13: 2242–2251. romyces cerevisiae. Genetics 146: 491–498. Dietrich, W. F., J. C. Miller,R.G.Steen,M.Merchant,D.Damron Primmer, C. R., H. Ellegren,N.Saino and A. P. Møller, 1996 Di- et al., 1994 A genetic map of the mouse with 4,006 simple se- rectional evolution in microsatellite mutations. Nature quence length polymorphisms. Nat. Genet. 7: 220–245. 13: 391–393. DiRienzo, A., P. Donnelly,C.Toomajian,B.Sisk,A.Hill et al., Queller, D. C., J. E. Strassmann and C. R. Hughes, 1993 Microsat- 1998 Heterogeneity of microsatellite mutations within and be- ellites and kinship. Trends Ecol. Evol. 8: 285–288. tween loci, and implications for human demographic histories. Roder, M. S., K. Korzun,J.P.Katja Wendehake, M.-H. Tixier,P. Genetics 148: 1269–1284. Leroy et al., 1998 A microsatellite map of wheat. Genetics Eichinger, L., J. A. Pachebat,G.Glo¨ckner, M.-A. Rajandream, 149: 2007–2023. R. Sucgang et al., 2005 The genome of the social amoeba Dic- Sajantila, A., M. Lukka and A.-C. Syvanen, 1999 Experimentally tyostelium discoideum. Nature 435: 43–57. observed germline mutations at human micro- and mini-satellite Ellegren, H., 2000a Heterogeneous mutation processes in human loci. Eur. J. Hum. Genet. 7: 263–266. microsatellite DNA sequences. Nat. Genet. 24: 400–402. Schlo¨tterer, C., 2000 Evolutionary dynamics of microsatellite Ellegren, H., 2000b Microsatellite mutation in the germline: impli- DNA. Chromosoma 109: 365–371. cations for evolutionary inference. Trends Genet. 16: 551–558. Schlo¨tterer, C., and D. Tautz, 1992 Slippage synthesis of simple Ellegren, H., 2004 Microsatellites: simple sequences with complex sequence DNA. Nucleic Acids Res. 20: 211–215. evolution. Nat. Rev. Genet. 5: 435–445. Schlo¨tterer, C., R. Ritter,B.Harr and G. Brem, 1998 High mu- Fondon, J. W., and H. R. Garner, 2004 Molecular origins of rapid tation rates of a long microsatellite allele in Drosophila mela- and continuous morphological evolution. Proc. Natl. Acad. Sci. nogaster provides evidence for allele specific mutation rates. USA 101: 18058–18063. Mol. Biol. Evol. 15: 1269–1274. Garza, J. C., M. Slatkin and N. B. Freimer, 1995 Microsatellite al- Schug, M. D., T. F. C. Mackay and C. F. Aquadro, 1997 Low mu- lele frequencies in humans and chimpanzees, with implication tation rates of microsatellites in Drosophila melanogaster. Nat. for constraints on allele size. Mol. Biol. Evol. 12: 594–603. Genet. 15: 99–102. Goldstein, D. B., and C. Schlo¨tterer (editors), 1999 Microsatellites: Schug, M. D., C. M. Hutter,K.A.Wetterstrand,M.S.Gaudette, Evolution and Applications. Oxford University Press, Oxford. T. F. C. Mackay et al., 1998 The mutation rates of di-, tri-, and Hammock, A. D., and L. J. Young, 2005 Microsatellite instability tetranucleotide repeats in Drosophila melanogaster. Mol. Biol. Evol. generates diversity in brain and sociobehavioral traits. Science 15: 1751–1760. 308: 1630–1634. Serikawa, T., T. Kuramoto,P.Hilbert,M.Mori,J.Yamada et al., Harr, B., and C. Schlo¨tterer, 2000 Long microsatellite alleles in 1992 Rat gene mapping using PCR-analyzed microsatellites. Drosophila melanogaster have a downward mutation bias and short Genetics 131: 701–721. persistence times, which cause their genome-wide underrepre- Sia, E., R. Kokoska,M.Dominska,P.Greenwell and T. D. Petes, sentation. Genetics 155: 1213–1220. 1997 Microsatellite instability in yeast: dependence on repeat Harvey, S. C., 1997 Slipped structures in DNA triplet repeat se- unit size and DNA mismatch repair genes. Mol. Cell. Biol. 17: quences: entropic contributions to genetic instabilities. Biochem- 2851–2858. istry 36: 3047–3049. Smith, G. P., 1973 Unequal crossover and the evolution of multi- Henderson,S.T.,andT.D.Petes,1992 Instabilityofsimplesequence gene families. Cold Spring Harb. Symp. Quant. Biol. 38: 507–513. DNA in Saccharomyces cerevisiae. Mol. Cell. Biol. 92: 2749–2757. Strand,M.,T.A.Prolla,R.M.Liskay and T. D. Petes, 1993 Desta- Jarne, P., and P. Lagoda, 1996 Microsatellites: from molecules to bilization of tracts of simple repetitive DNA in yeast by mutations populations and back. Trends Ecol. Evol. 11: 424–429. affecting DNA mismatch repair. Nature 365: 274–276. Kolodner, R., 1996 Biochemistry and genetics of eukaryotic mis- Streisinger, G., Y. Okada,J.Emrich,J.Newton,A.Tsugita et al., match repair. Genes Dev. 10: 1433–1442. 1966 Frameshift mutations and the genetic code. Cold Spring Kroutil, L. C., K. Register,K.Bebenek and T. A. Kunkel, Harb. Symp. Quant. Biol. 31: 77–84. 1996 Exonucleotic proofreading during replication of repeti- Swanson, A. R., E. Vadell and J. C. Cavender, 1999 Global distri- tive DNA. Biochemistry 35: 1046–1053. bution of forest soil dictyostelids. J. Biogeogr. 26: 133–148. Kruglyak, S., R. T. Durrett,M.D.Schug and C. F. Aquadro, Thuillet, A. C., D. Bru,J.David,P.Roumet,S.Santoni et al., 1998 Equilibrium distributions of microsatellite repeat length 2002 Direct estimation of mutation rate for 10 microsatellite resulting from a balance between slippage events and point mu- loci in durum wheat, Triticum turgidum (L.) Thell. ssp durum desf. tations. Proc. Natl. Acad. Sci. USA 95: 10774–10778. Mol. Biol. Evol. 19: 122–125. Kruglyak, S., R. T. Durrett,M.D.Schug and C. F. Aquadro, Udupa, S. M., and M. Baum, 2001 High mutation rate and muta- 2000 Distribution and abundance of microsatellites in the yeast tional bias at (TAA)n microsatellite loci in chickpea (Cicer arieti- genome can be explained by a balance between slippage events num L.). Mol. Genet. Genomics 265: 1097–1103. and point mutations. Mol. Biol. Evol. 17: 1210–1219. Vazquez, F., T. Perez,J.Albornoz and A. Domı´nguez, 2000 Esti- Lai, C., R. F. Lyman,A.D.Long,C.H.Langley and T. F. C. Mackay, mation of the mutation rates in Drosophila melanogaster. Genet. 1994 Naturally occurring variation in bristle number and DNA Res. 76: 323–326. polymorphisms at the scabrous locus of Drosophila melanogaster. Sci- Vigouroux, Y., J. S. Jaqueth,Y.Matsuoka,O.S.Smith,W.D.Beavis ence 266: 1697–1702. et al., 2002 Rate and pattern of mutation at microsatellite loci in Levinson, G., and G. Gutman, 1987 Slipped-strand mispairing: a maize. Mol. Biol. Evol. 19: 1251–1260. eber major mechanism for DNA sequence evolution. Mol. Biol. Evol. W , J., 1990 Informativeness of human (dC-dA)n. (dG-dT)n poly- 4: 203–221. morphisms. Genomics 7: 524–530. Li, Y.-C., A. B. Korol,T.Fahima and E. Nevo, 2004 Microsatellites Weber, J., and C. Wong, 1993 Mutation of human short tandem re- within genes: structure, function, and evolution. Mol. Biol. Evol. peats. Hum. Mol. Genet. 2: 524–530. 21: 991–1007. Wierdl, M., M. Dominska and T. D. Petes, 1997 Microsatellite in- Lynch, M., and J. S. Conery, 2003 The origins of genome complex- stability in yeast: dependence on the length of the microsatellite. ity. Science 302: 1401–1404. Genetics 146: 769–779. Marcotte, E. M., M. Pellegrini,T.O.Yeates and D. Eisenberg, Xu, X., M. Peng,Z.Fang and X. Xu, 2000 The direction of micro- 1998 A census of protein repeats. J. Mol. Biol. 293: 151–160. satellite mutations is dependent upon allele length. Nat. Genet. Messier, W., S. H. Li and C. B. Stewart, 1996 The birth of micro- 24: 396–399. satellites. Nature 381: 483. Zhu, Y., J. E. Strassmann and D. C. Queller, 2000 Insertions, sub- Nishizawa, M., and K. Nishizawa, 2002 A DNA sequence evolution stitutions, and the origin of microsatellites. Genet. Res. 76: analysis generated by simulation and the Markov chain Monte 227–236. Carlo method implicates strand slippage in a majority of inser- tions and deletions. J. Mol. Evol. 55: 706–717. Communicating editor: P. Phillips