Extraordinary genome stability in the tetraurelia

Way Sunga,1, Abraham E. Tuckera, Thomas G. Doaka, Eunjin Choia, W. Kelley Thomasb, and Michael Lyncha

aDepartment of Biology, Indiana University, Bloomington, IN 47405; and bDepartment of Molecular Cellular and Biomedical Sciences, University of New Hampshire, Durham, NH 03824

Edited by Detlef Weigel, Max Planck Institute for Developmental Biology, Tübingen, Germany, and approved October 10, 2012 (received for review June 21, 2012) plays a central role in all evolutionary processes and is also from this experiment), or ∼30–50 fissions when starved (9), P. the basis of genetic disorders. Established base-substitution muta- tetraurelia undergoes a self-fertilization process known as au- tion rates in eukaryotes range between ∼5 × 10−10 and 5 × 10−8 per togamy (9), at which time the old macronucleus is destroyed and site per generation, but here we report a genome-wide estimate for replaced by a processed version of the new micronuclear genome Paramecium tetraurelia that is more than an order of magnitude (10). When in contact with compatible mating types, P. tetraurelia lower than any previous eukaryotic estimate. Nevertheless, when can also undergo conjugation (10), although, this can be pre- the mutation rate per cell division is extrapolated to the length of vented in the laboratory by using a stock consisting of only one the sexual cycle for this , the measure obtained is comparable mating type. Under conditions of exclusive autogamy, all muta- to that for multicellular species with similar genome sizes. Because tions arising in the micronucleus during clonal propagation Paramecium has a transcriptionally silent germ-line nucleus, these should accumulate in a completely neutral fashion, with the fit- results are consistent with the hypothesis that natural selection ness effects only being realized after sexual . operates on the cumulative germ-line replication fidelity per epi- The that accumulate in the micronucleus during sode of somatic gene expression, with the germ-line mutation rate P. tetraurelia clonal division are analogous to mutations that per cell division evolving downward to the lower barrier imposed accumulate during germ-line cell divisions of metazoan, as the by random genetic drift. We observe ciliate-specificmodifications of effects of both types of mutations are not realized until expres- EVOLUTION widely conserved amino acid sites in DNA polymerases as one po- sion after the subsequent sexual generation. Although metazoans tential explanation for unusually high levels of replication fidelity. have some of the highest per-generation mutation rates known, the mutation rate per germ-line cell division remains quite low in mutation accumulation | drift-barrier such species (6). This observation suggests that although selec- tion operates to drive down mutation rates (6, 11–14), because it utation is the ultimate source of genetic variation, which can only operate on the visible effects of mutations, the efficiency Mnot only drives adaptive processes but also contributes to of selection to reduce the germ-line mutation rate is constrained genetic disorders, and in some cases, extinction. Understanding by the amount of time during which mutations experience qui- the mutation rate is critical in determining rates of molecular escent germ-line sequestration (15). If this hypothesis is correct, evolution, estimating effective population sizes, understanding the mutation rate per cell division in P. tetraurelia (and in the impact of mutations on organismal fitness, and evaluating the general) is expected to be lower than that in other unicellular power of drift, selection, and recombination in shaping genomes species. Thus, application of the MA procedure to P. tetraurelia (1). However, because of the extraordinarily high degree of provides a unique opportunity to shed light on the mechanisms replication fidelity in most organisms, procuring accurate meas- driving the evolution of the mutation rate. ures of mutation rate has been fraught with difficulties. The re- Results cent application of high-throughput sequencing technology to mutation-accumulation (MA) lines has now revolutionized this Mutation Rate. To estimate the mutation rate in P. tetraurelia,we area of inquiry, yielding essentially unbiased estimates of the analyzed the macronuclear genomes of seven P. tetraurelia MA genome-wide spontaneous mutation rate and spectrum in several lines that had been propagated through single-cell bottlenecks for eukaryotes (2–5). Under the MA process, repeated population ∼3,300 cell divisions starting from an isogenic state and sub- bottlenecks minimize the efficiency of selection, enabling even sequently restricted to periodic autogamy. At the time of the final highly deleterious mutations to accumulate in an effectively assay, essentially all detected mutations will have resided in the fl neutral fashion. At various points in the MA process, the accu- germ line at the last autogamy, and hence will re ect true germ- mulated pool of mutations in the individual lines can be assayed line mutations. A previous estimate of the scaling of the mutation ∼ × using high-throughput sequencing. So far, MA experiments in rate and genome size predicts a base-substitution rate of 1.5 −9 ∼ eukaryotes have shown that per-generation mutation rates gen- 10 per site per generation given the 72-Mb genome size of P. tetraurelia erally correlate with genome size and effective population size (6, 7, 16), which would have generated at least a few (6, 7). However, the available data involving whole-genome se- hundred base substitutions in each MA line. However, across all quencing of MA experiments include only a limited number of unicellular eukaryotes (2, 8), and the factors driving mutation- rate evolution across the eukaryotic domain remain unclear. Author contributions: W.S., W.K.T., and M.L. designed research; W.S., A.T., T.G.D., and E.C. To further understand mutation-rate evolution in eukaryotes, performed research; W.S. analyzed data; and W.S., W.K.T., and M.L. wrote the paper. fl we have applied the MA process to the ubiquitous unicellular The authors declare no con ict of interest. freshwater ciliate Paramecium tetraurelia. This species, and cil- This article is a PNAS Direct Submission. iates in general, have a distinct reproductive biology that may Data deposition: The sequences reported in this paper have been deposited in the P. tetraurelia National Center for Biotechnology Information Short Read Archive (SRA), www.ncbi. constrain mutation-rate evolution. exhibits nuclear nlm.nih.gov/sra, [accession nos. SRP013857 (study), SRS346546 (sample), and dimorphism, harboring a transcriptionally silent (germ-line) mi- SRX155532 (experiment)]. cronucleus and a transcriptionally active (somatic) macronu- 1To whom correspondence should be addressed. E-mail: [email protected]. fi cleus. The species normally reproduces by mitotic binary ssion, This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. but after ∼75 cell divisions under high-growth conditions (data 1073/pnas.1210663109/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1210663109 PNAS Early Edition | 1of6 Downloaded by guest on September 28, 2021 seven lines (an average of 63.2 Mb analyzable sequence per line), (71%) and for silent sites alone (76%) (Table S2) are sub- we identified only 29 mutations, yielding an overall base-sub- stantially below this expectation, implying that the A/T mutation − stitution mutation rate of 1.94 (SE = 0.36) × 10 11 per site per bias has historically been opposed by other forces. Although we cell division (Table 1). A separate maximum-likelihood approach cannot rule out the possibility that the P. tetraurelia genome is − (17) yielded a slightly higher rate of 2.64 (SE = 0.22) × 10 11 per currently moving toward mutation equilibrium, the structure of site per cell division. At a level ∼75× lower than the expectation the P. tetraurelia genome may facilitate G/C maintenance by bi- for eukaryotes with a similar genome size, and even 10× lower ased gene conversion (23). The inverse correlation between than most prokaryotic estimates, the P. tetraurelia mutation rate is chromosome size and G/C content in the P. tetraurelia genome by far the lowest ever observed. (10) is consistent with an increased density of crossover events Our analysis also revealed a total of five insertions (but no (and associated conversion events) in smaller chromosomes, es- deletions) across the seven lines, with only one involving a simple pecially when one considers the G/C bias of gene conversion sequence repeat (Table S1), yielding an insertion/deletion rate of found in most organisms (24). − 3.87 (2.10) × 10 12 per site per cell division. The low proportion of all mutations that are insertion/deletion events (0.15) is con- Mitochondrial Mutations. The same selective pressure that is sta- sistent with observations in humans (0.06%) (18) and Arabidopsis bilizing the mutation rate in the silent P. tetraurelia germ line thaliana (0.10) (4). should not apply to the mitochondrial genome in this species, as Using the annotation of the P. tetraurelia genome (19), we mitochondrial genes are expressed every generation, although attempted to identify the functional context of each base sub- mutations are expected to be found in a heteroplasmic state for stitution (Table 1). Twenty-three of the 29 substitutions (79.3%) a number of cell divisions after first appearance. Using methods were in coding regions, with the remaining six found in intergenic identical to those for detecting nuclear mutations, with an av- sites, consistent with the random expectation based upon overall erage of 39,422 analyzed sites per line, we discovered no fixed genome composition (χ2 test, P = 0.48, 2 df) (Table 2). If the mitochondrial mutations across the seven lines, but were able to substitution mutations in coding regions are randomly distrib- estimate the base-substitution mutation rate by using the fraction uted across P. tetraurelia codons (Table S2), 29.1% are expected of mutations attaining significantly higher frequencies (>3 SDs) to result in silent changes. The observed ratio (8 of 23 = 34.8%) than expected from random sequencing error (Fig. S1). Sum- does not differ significantly from this expectation (χ2 test, P = ming the allele frequencies for all heteroplasmic MA-derived df mitochondrial base substitutions, we obtain an estimate of 6.96 0.55, 1 ), suggesting that purifying selection on coding-region − mutations was not a significant force in the MA lines. (2.44) × 10 8 base-substitution mutations per site per cell di- vision (Table S3), which is quite high compared with most other − Mutational Bias. There are 5.2× more G/C→A/T base sub- eukaryotes (∼1 × 10 8) (2, 6, 25, 26) and ∼2,600× the stitutions than A/T→G/C mutations in this species, and taking P. tetraurelia nuclear rate, but consistent with the observation into consideration the A/T genome composition (71%), this that Paramecium mitochondrial silent-site diversity estimates are implies that G/C→A/T base substitutions arise ∼12.9-times more among the highest ever reported (27). frequently per target site than do A/T→G/C base substitutions (Fig. 1). This mutation bias toward A/T, which is consistent with Discussion observations made in all other species to date (2–5, 20, 21), may During periods of vegetative propagation in P. tetraurelia, the be a consequence of the spontaneous deamination of cytosine germ-line micronucleus is replicated but not expressed (10, 24), and the conversion of guanine to 8-oxo-guanine (22). Over the and hence germ-line mutation accumulation is unrestrained by course of the experiment, zero A:T→G:C transitions were ob- natural selection until expression by the new macronucleus fol- served (Fig. 1). If we assume that mutations are Poisson dis- lowing an episode of sexual reproduction. If the P. tetraurelia tributed, there is a >0.95 probability of not seeing any A:T→G:C micronucleus had a typical base-substitution mutation rate for its − base-substitution mutations across seven lines if the rate of origin genome size [∼1.5 × 10 9 per site per generation with a genome − of such mutations is ≤2.51 × 10 12. size of 72 Mb (6)], each genome would be expected to accu- If the P. tetraurelia genome has reached a nucleotide-content mulate ∼7.9 mutations over the 75 asexual generations, and the equilibrium from mutation pressure alone, a genome composition subsequent exposure following autogamy would likely impose of 93% A/T would be expected based on the observed mutation a very high burden for a genome with 78% coding density (24). spectrum. However, the A/T contents for both the entire genome Thus, we propose that lengthy periods of germ-line sequestration

Table 1. P. tetraurelia summary statistics, base-substitution distribution, and mutation rate MA line

Base substitutions 15 25 30 40 55 60 70 Pooled

Intergenic 1020111 6 Intron 0000000 0 Synonymous 3110201 8 Nonsynonymous 342203115 Ts/Tv ratio 0.40 0.25 0.67 – 2.00 0.33 – 0.73 Read depth 31.55 86.54 59.36 28.54 45.36 42.79 73.78 52.56 Analyzed sites (×107) 6.59 7.02 7.00 4.47 6.97 5.45 7.01 6.36 Generations (×103) 3.31 3.32 3.31 3.31 3.31 3.31 3.31 3.31 − Mutation rate (×10 11) 3.21 2.14 2.16 1.36 1.30 1.73 1.67 1.94 SE (×10−11) 1.21 0.96 0.97 0.96 0.75 0.87 0.96 0.36

Base-substitution distribution, transition/transversion (Ts/Tv) ratios, number of analyzable sites per line, gener- ations at time of sequencing, mutation rate per site per generation, and the SE of the base-substitution mutation rate per site per generation across seven P. tetraurelia MA lines. Pooled column is the total sum for summary statistics, and average for mutation rates.

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109 Sung et al. Downloaded by guest on September 28, 2021 Table 2. P. tetraurelia mutation accumulation derived

mutations ) 2.4 -11 Line Scaff Position Cons Mut Type Orig AA New AA

15 110 113028 C A EX H N 2.0 15 125 27587 A T EX P P 15 150 6745 T G EX F V 1.6 15 184 1534 A T EX H L 15 560 447 G A EX I I 15 63 208 A T IG 1.2 15 72 326857 C T EX I I 25 11 241977 C G EX F L 0.8 25 157 128026 A T EX L L 25 32 259470 C G EX R P 25 6 647359 A T EX L Q 0.4 25 89 192525 C T EX T I 30 103 52586 C T EX P S

30 145 29743 G T IG Conditional Base Substitution0.0 Rates (x 10 30 175 100327 A T IG AT > GC GC > AT AT > TA GC > TA AT > CG GC > CG 30 65 330649 C A EX H N Transitions Transversions 30 88 68666 G A EX F F 40 1 547782 T G EX K Q Fig. 1. Conditional base-substitution rates for P. tetraurelia MA lines. Rates 40 8 591378 C G EX S * for each base-substitution type normalized by genome base composition; error bars indicate one SE. Gray horizontal line indicates the average mu- 55 515 1496 G A EX I I tation rate per site across all lines, with gray shading showing the SE. 55 62 62583 G A IG 55 73 359233 T A EX S S 60 102 226059 G T EX L I

Viewed in this way, the mutation rate per sexual episode in EVOLUTION 60 16 307226 G A EX S L P. tetraurelia is equal to 75× the rate per germ-line cell division, 60 57 454181 T G IG which is essentially the same as the per-generation expectation − 60 162 73827 C A EX C F for other eukaryotes with similar genome sizes, ∼1.5 × 10 9 per 70 146 108424 G T EX P P site (Fig. 2). Thus, the extraordinarily low mutation rate in this 70 60 187448 A T EX K N species appears to be a consequence of the high efficiency of 70 79 28592 C A IG selection operating on this globally abundant species combined P. tetraurelia mutations identified using the consensus approach. Cons, with the added constraint of germ-line sequestration. The un- consensus nucleotide; Line, mutation accumulation line; Mut, mutation nu- usual level of mutation-rate depression observed in Paramecium cleotide; New AA, new amino acid; Orig AA, original amino acid; Scaff, scaf- is not expected in unicellular species that lack a protected germ fold; Type, coding context (EX, exon; IG, intergenic). line (e.g., bacteria and yeast), as such cells immediately experi- ence the effects of mutations. in P. tetraurelia promote unusually strong selection for low germ- line mutation rates (on a per-cell division basis).

Because the majority of mutations with fitness effects are Mus musculus slightly deleterious (1, 28), natural selection will generally operate

to promote mechanisms that minimize the production of replica- ) -9 tion errors and maximize the efficiency of repair of nonreplicating 10 Homo sapiens DNA (e.g., higher fidelity polymerases, and improved repair enzymes) (6, 11, 14). However, the point at which the advantage of Drosophila melanogaster

a further reduction in the mutation rate equals the power of Caenorhabditis elegans fi random genetic drift (resulting from nite population size) rep- Paramecium tetraurelia (per sexual episode) resents the lower bound to which the mutation rate can be driven 1 by natural selection (15). Because selection can only operate on

expressed mutations, the unit of selection on the mutation rate in Saccharomyces cerevisiae a species with a sequestered germ line is the pool of deleterious mutations accumulated over all germ-line cell divisions, which 0.1 generally have no phenotypic effects until emerging into a soma in the following generation. Thus, in a metazoan, selection operates to minimize the per-generation mutation rate, and this is accom- Paramecium tetraurelia plished by reducing the mutation rate per cell division by a factor Base-Substitution Mutation Rate (x10 equal to the number of germ-line cell divisions per generation. As 0.01 a consequence, although mammals have the highest known per- 0.01 0.1 1 generation mutation rates, the rates per cell division are very low GenomeSize(Mb) in the germ line (6). Not all mutations are deleterious, and the fraction of mutations that are beneficial may vary with environ- Fig. 2. Base-substitution rate scaling for P. tetraurelia per sexual episode. – Filled squares represent base-substitution mutation rates derived from hu- mental context (29 32). However, for sexually reproducing man disease alleles (18) and MA projects (2–5). P. tetraurelia open circle organisms like P. tetraurelia, the indirect selection for a higher fi represents the base-substitution mutation rate per cell division from this mutation rate associated with bene cial mutations is expected experiment, and the P. tetraurelia filled circle indicates the base-substitution to be small relative to the downward pressure associated with mutation rate per sexual episode. Linear regression for filled icons only (r2 = background deleterious mutations (14, 33). 0.85, P = 0.002).

Sung et al. PNAS Early Edition | 3of6 Downloaded by guest on September 28, 2021 The high level of replication fidelity in P. tetraurelia may be lineage (Table 3). Nevertheless, it remains unclear why residues achieved via modifications in replication enzymes, as alterations strongly conserved in other eukaryotic lineages and presumably of critical domains in B-family replicative polymerases are known essential for high replication fidelity would be specifically altered to have direct effects on mutation rates (34–36). B-family repli- in ciliates to achieve even higher accuracy. cative polymerases α, δ, ε,andζ are primary enzymes involved in Despite the global distribution and high local abundances of initiating replication, synthesizing the leading and lagging strand, the study species, conservative estimates of standing levels of and allowing for synthesis of DNA across lesions (35, 37, 38). In silent-site heterozygosity within species of the P. aurelia complex addition to being involved in the synthesis of the leading and (πs = 0.0096) are not extraordinarily high (27, 40), and for other lagging strands, δ and ε polymerases also contain 3′ proofreading ciliates may be even lower [e.g., Tetrahymena thermophila, πs = exonuclease domains critical to excision of incorrectly polymer- 0.0030 (41)], an observation that lead the latter authors to sug- ized nucleotides (39). Using BLAST (Materials and Methods), we gest that effective population sizes (Ne) may be low in ciliates identified all P. tetraurelia B-family DNA polymerases (α, δ, ε,and (41). A re-evaluation of this interpretation appears to be in or- ζ homologs) and compared these to other eukaryotes. Remark- der, as it is derived under the assumption of a substantially ably, ciliate-specific amino acid changes in the primary catalytic higher mutation rate than observed herein. Although low per- sites of α-region III and ζ-region III result in a switches of amino generation mutation rates may not be common to all ciliates, acid polarity (T→V; T→V/I/M) (Table 3) in active sites that are assuming standing levels of silent-site diversity reflect the neutral otherwise highly conserved across eukaryotes. Moreover, the expectation (4Neu), by factoring out the term 4u, we estimate ε 8 7 proofreading exonuclease of DNA polymerase exhibits ciliate- that Ne is on the order of 10 for P. tetraurelia and perhaps 10 specific changes in the highly conserved region II to highly for T. thermophila. Because there is substantial evidence of se- charged amino acids (F→R/K) (Table 3). Although such changes lection on silent sites (42), especially in species with an elevated in amino acid sequence might not necessarily translate into an Ne, these estimates are likely to be downwardly biased. improvement in DNA fidelity and lower mutation rates, the Finally, it is worth noting that like P. tetraurelia, the distantly unique sequences of ciliate DNA polymerases suggest a funda- related ciliate T. thermophila shares strikingly similar mod- mental change in the mechanisms of replication fidelity in this ifications to active sites in B-family DNA polymerases (Table 3),

Table 3. Differences in α and ε B-family DNA polymerases of P. tetraurelia Polymerase catalytic sites Proofreading 3′→5′ exonuclease

Species Region III Region I Exo II Exo III α AA change * AA position 839 885 H. sapiens KLTANSMYGC I YGDTDS I M. musculus KLTANSMYGC I YGDTDS I D. melanogaster KLTANSMYGCVYGDTDS L C. elegans KLTANSMYGCVYGDTDS I S. cerevisiae KLTANSMYGCVYGDTDS V A. thaliana KLTANSMYGC I YGDTDS I T. pseudonana KLTANSMYGC I YGDTDS I T. cruzi KLTANSMYGC I YGDTDS V T. thermophila KLVANSMYGC I YGDTDS L P. tetraurelia KLVANS I YGC I YGDTDSM T. gondii KLTANS I YGCVYGDTDS I G. lamblia KLCANAMYGS I YGDTDS I ε AA change * + * AA position 783 825 314 410 H. sapiens KC I LNS F YGY L E LDTDG I NGDFFDYSVSDA M. musculus KC I LNS F YGY L E LDTDG I NGDFFDYSVSDA D. melanogaster KC I LNS F YGY L E LDTDG I NGDFFDYSVSDA C. elegans KC I LNS F YGY L E LDTDG I NGDFFDYSVSDA S. cerevisiae KV I LNS F YGY L E LDTDG I NGDFFDYSVSDA A. thaliana KC I LNS F YGY L E LDTDG I NGDFFDYSVSDA T. pseudonana KC I LNS F YGY L E LDTDG I NGDFFDYSVSDA T. cruzi KC I LNS F YGY L E LDTDG I NGDYFDYSVSDA T. thermophila K I I LNS F YGY L E LDTDG I NGDKFDYSVSDA P. tetraurelia KI I LNS FYGYL ELDTDG I NGDRFDYSISDS T. gondii KC I LNS F YGYME LDTDG I NGDTFDYSVSDA G. lamblia KC I LNS F YGY L E LDTDG I NGDFFDYSVSDA

Subset of conserved residues from DNA polymerases α and ε (46), α exonucleases are not involved in proofreading. AA change: “*” designates ciliate- specific amino acid change in polarity; “+” designates ciliate-specific amino acid change in charge. AA position indicates amino acid position in each gene. Species and gene identifier number (α/ε) or genome flat file (GFF) number from top to bottom: Homo sapiens (106507301/3192938), Mus musculus (6679409/ 195947387), Drosophila melanogaster (217344/23172053), Caenorhabditis elegans (257146921/17507143), Arabidopsis thaliana (332010917/3885342), Saccha- romyces cerevisiae (929851/171409), Trypanosoma cruzi (71661998/70882804), Thalassiosira pseudonana (220967910/220976143), Tetrahymena thermophila (47.m00199/3812.m02363, GFF), Paramecium tetraurelia (GSPATP00014027001/GSPATP00033819001, GFF), Toxoplasma gondii (4808579/237841111), and Giardia lamblia (159108384/159115199). Dark gray boxes indicate residues that are specifictoP. tetraurelia and T. thermophila.

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109 Sung et al. Downloaded by guest on September 28, 2021 and also undergoes multiple rounds of vegetative reproduction positives using traditional sequencing. For all cases, the wild-type nucleotide before a sexual cycle. Although further biochemical assays are was also confirmed at the mutation site in at least 3 other lines without needed to confirm whether modifications in B-family poly- the mutation. merases are truly responsible for the reduction in mutation rate, Mutation Rate Calculations. To calculate the base-substitution mutation rate per the nuclear dimorphism unique to ciliated protozoa suggests that = ∕ this lineage is a logical target in the search for commercially site per cell division for each line, we used the following equation ubs m nT, fi where ubs is the base-substitution mutation rate (per haploid nucleotide site useful, high- delity DNA polymerases. per generation), m is the number of observed base substitutions, n is the Materials and Methods number of haploid nucleotide sites analyzed and T is the number of gen- erations that occurred in the mutationp accumulationffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi line. The SE for an in- Mutation Accumulation Process and DNA Extraction. For ∼3,300 generations, = ∕ dividual line is calculated using ubs nT,p andffiffiffiffi the total SE of base- 100 independent P. tetraurelia MA lines (reference strain d4-2) were passed substitution mutation rate is given by SEpooled = s ∕ N,where(s) is the average through single-cell bottlenecks every day to a new tube with fresh wheat SE of the mutation rates across all lines and (N) is the number of lines analyzed. grass medium (yeast extract/cerophyl/Na HPO /stigmasterol/H O) seeded 2 4 2 The same calculation was used to calculate insertion/deletion (indel) mutation with Klebsiella pneumonia. rate, with u replaced with u . Self-fertilization (autogamy) was stimulated by starvation, which was bs indel The maximum-likelihood estimates of the base-substitution mutation (u) prevented during daily single-cell transfers to fresh medium. In the absence of and error rate (∈) across all lines were obtained from the maximum of the autogamy, Paramecium cells senesce. Therefore, every 20–25 d (∼75 gen- log-likelihood function (17), as previously described (4). The initial values of erations), cells from each P. tetraurelia MA line were transferred into a single × −11 ∈ ∼ well. After approximately 3 d in the well, the cells reach saturation and un- u (1.94 10 per site per generation) and ( 0.005 per site) used to ini- dergo autogamy. Autogamy was detected with an Aceto-Carmine nuclear tialize the optimization process were those obtained by the consensus stain that shows fragmentation of the old macronucleus. Single cells were method. A minimum coverage cutoff of 10 and maximum coverage cutoff fi picked when the majority of cells in a well are undergoing autogamy, and of 100 were used to obtain the nal maximum-likelihood estimate. they and their descendants were then propagated using daily single-cell transfers until the next scheduled autogamy. Because macronuclear frag- Mitochondrial Analysis. To determine mitochondrial base-substitution rates, it ments were retained for several generations in the progeny, we were able to is necessary to first set a minimum allele-frequency cutoff for calling a het- ensure that the MA lines went through autogamy by continually monitoring eroplasmic mitochondrial base substitution. Prior studies involving Illumina descendants of each line. reads have shown low false-positive rates in calling heteroplasmic mito- Over the course of the 4-y experiment, 52 lines died out. It is impossible to chondrial base substitutions when requiring a minimum of three forward and say in each case if this was a result of mutational processes, or to handling. three reverse reads with a minimum allele-frequency cutoff greater than 0.10 EVOLUTION Attempts were made in all cases to revert to an earlier generation, but failed (43). We applied the same criteria across an average of 39,422 mitochondrial for the 52 lines. Paramecium cells are difficult to cryogenically preserve, sites in each P. tetraurelia MA line. The MA process does not enforce limiting how far back we could go to revive lines. Eight of the 48 remaining a complete organelle bottleneck, so unlike the nuclear genome, selection lines were randomly selected for DNA extraction. may be still operating at a low efficiency, and mutations will remain in Cells of each selected line were filtered using 10-μm Nitex filter cloth, and a heteroplasmic state. To determine mitochondrial base-substitution rates, it washed several times in Drys Buffer (sodium citrate/NaH2PO4/Na2HPO4/ is necessary to assume that the ultimate fixation of a mitochondrial base CaCl2). Homogenization and lysis was done using 0.5% Nonidet P-40 sus- substitution is determined solely by genetic drift. Under this assumption, pended in MgCl2/TrisCl/sucrose, which leaves the macronuclei intact. After neutral theory dictates that the fixation rate is equal to the mutation rate centrifugation of the lysate, the supernatant containing most of the bac- (44), and the probability of fixation (di) is dependent on the current fre- teria, mitochondria, and micronuclei was discarded. DNA extraction of the quency within an individual (i) (26). Sequencing error can contribute to allele pellet containing relatively pure macronuclei was done using CTAB buffer, frequencies at a site, so we subtracted the MA-specific error rates (Fig. S2) and was further purified to Illumina library standards using phenol chloro- for the reference nucleotide type nucerr at the heteroplasmic site. The form and ethanol precipitation. The P. tetraurelia lines were sequenced final estimate for mitochondrial base-substitution rate for each line is using the Illumina GAIIx platform and mapped to the reference genome calculated by: (SI Materials and Methods). XÀ Á À Á ubs = di − nucerr ∕ nT : Mutation Identification Procedure. To identify putative mutations (putations), i a consensus approach was used, comparing each individual line (focal line) against the consensus of all of the remaining lines. This methodology has previously been used in multiple MA experiments, and is robust against B-Family DNA Polymerases. The B-family DNA polymerases α, δ, ε, and ζ of 12 sequencing or alignment errors that may have occurred in the reference eukaryotic species were obtained from the National Center for Bio- genome (2–4). With the consensus approach, we identified a total of 111 technology Information (NCBI) data repository (Table 3). The sequence of putations before the filtering process. After filtering for paralogy and se- each individual DNA polymerase was used to search for homology in the – quencing errors (SI Materials and Methods and Tables S4 S7), we reduced gene database for P. tetraurelia (19, 24) and T. thermophila (45) using − the number of putations to 29. The 29 remaining mutations (Table 2) are BLASTP (minimum e-value cutoff of 10 10 to identify homology). To detect ∼ 26.8% of the original putations (29 of 108), which is comparable to the differences in each polymerase, the resulting BLAST matches were aligned fi fi nal ratio of mutations to putations observed after ltering in other MA with the original DNA polymerases from the NCBI. experiments [e.g., A. thaliana MA experiment 99 of 538 or ∼18.4% after filtering (3)]. A subset of randomly selected filtered putations was directly fl ACKNOWLEDGMENTS. Support for this study was provided by National Insti- sequenced in both directions using standard uorescent sequencing tech- tutes of Health Grant R01 GM036827 (to M.L. and W.K.T.); National Science nology on an ABI3730. Across the seven P. tetraurelia MA lines, 12 of the Foundation Grant MCB-1050161 (to M.L.), and US Department of Defense 12 randomly selected putations were verified as MA-derived mutations, and Multidisciplinary University Research Initiative Award W911NF-09-1-0444 (to 10 of the 10 randomly selected filtered putations were verified as false- M.L., P. Foster, H. Tang, and S. Finkel).

1. Baer CF, Miyamoto MM, Denver DR (2007) Mutation rate variation in multicellular 8. Sung W, Ackerman MS, Miller SF, Doak TG, Lynch M (2012) The drift-barrier hy- eukaryotes: Causes and consequences. Nat Rev Genet 8(8):619–631. pothesis and mutation-rate evolution. Proc Natl Acad Sci USA. 2. Lynch M, et al. (2008) A genome-wide view of the spectrum of spontaneous muta- 9. Berger JD (1986) Autogamy in Paramecium. Cell cycle stage-specific commitment to tions in yeast. Proc Natl Acad Sci USA 105(27):9272–9277. . Exp Cell Res 166(2):475–485. 3. Denver DR, et al. (2009) A genome-wide view of Caenorhabditis elegans base-sub- 10. Duret L, et al. (2008) Analysis of sequence variability in the macronuclear DNA of stitution mutation processes. Proc Natl Acad Sci USA 106(38):16310–16314. Paramecium tetraurelia: A somatic view of the . Genome Res 18(4):585–596. 4. Ossowski S, et al. (2010) The rate and molecular spectrum of spontaneous mutations 11. Kimura M (1967) On the evolutionary adjustment of spontaneous mutation rates. in Arabidopsis thaliana. Science 327(5961):92–94. Genet Res 9(1):23–24. 5. Keightley PD, et al. (2009) Analysis of the genome sequences of three Drosophila mel- 12. Dawson KJ (1999) The dynamics of infinitesimally rare alleles, applied to the evolution of anogaster spontaneous mutation accumulation lines. Genome Res 19(7):1195–1201. mutation rates and the expression of deleterious mutations. Theor Popul Biol 55(1):1–22. 6. Lynch M (2010) Evolution of the mutation rate. Trends Genet 26(8):345–352. 13. Johnson T (1999) The approach to mutation-selection balance in an infinite asexual 7. Lynch M, Conery JS (2003) The origins of genome complexity. Science 302(5649):1401–1404. population, and the evolution of mutation rates. Proc Biol Sci 266(1436):2389–2397.

Sung et al. PNAS Early Edition | 5of6 Downloaded by guest on September 28, 2021 14. Sniegowski PD, Gerrish PJ, Johnson T, Shaver A (2000) The evolution of mutation 31. Hall DW, Mahmoudizad R, Hurd AW, Joseph SB (2008) Spontaneous mutations in rates: Separating causes from consequences. Bioessays 22(12):1057–1066. diploid Saccharomyces cerevisiae: Another thousand cell generations. Genet Res 15. Lynch M (2011) The lower bound to the evolution of mutation rates. Genome Biol (Camb) 90(3):229–241. – Evol 3:1107 1118. 32. Joseph SB, Hall DW (2004) Spontaneous mutations in diploid Saccharomyces cer- – 16. Lynch M (2006) The origins of eukaryotic gene structure. Mol Biol Evol 23(2):450 468. evisiae: More beneficial than expected. Genetics 168(4):1817–1825. fi 17. Lynch M (2008) Estimation of nucleotide diversity, disequilibrium coef cients, and 33. Johnson T (1999) Beneficial mutations, hitchhiking and the evolution of mutation mutation rates from high-coverage genome-sequencing projects. Mol Biol Evol 25 rates in sexual populations. Genetics 151(4):1621–1631. (11):2409–2419. 34. Pavlov YI, Shcherbakova PV, Kunkel TA (2001) In vivo consequences of putative active 18. Lynch M (2010) Rate, molecular spectrum, and consequences of human mutation. site mutations in yeast DNA polymerases alpha, epsilon, delta, and zeta. Genetics 159 Proc Natl Acad Sci USA 107(3):961–968. – 19. Arnaiz O, Sperling L (2011) ParameciumDB in 2011: New tools and new data for (1):47 64. fi functional and comparative genomics of the model ciliate Paramecium tetraurelia. 35. Kunkel TA (2009) Evolving views of DNA replication (in) delity. Cold Spring Harb – Nucleic Acids Res 39(Database issue):D632–D636. Symp Quant Biol 74:91 101. 20. Hershberg R, Petrov DA (2010) Evidence that mutation is universally biased towards 36. Loh E, Salk JJ, Loeb LA (2010) Optimization of DNA polymerase mutation rates during AT in bacteria. PLoS Genet 6(9):e1001115. bacterial evolution. Proc Natl Acad Sci USA 107(3):1154–1159. 21. Hildebrand F, Meyer A, Eyre-Walker A (2010) Evidence of selection upon genomic GC- 37. Hubscher U, Maga G, Spadari S (2002) Eukaryotic DNA polymerases. Annu Rev content in bacteria. PLoS Genet 6(9):e1001107. Biochem 71:133–163. 22. Duncan BK, Miller JH (1980) Mutagenic deamination of cytosine residues in DNA. 38. Wang TS (1991) Eukaryotic DNA polymerases. Annu Rev Biochem 60:513–552. Nature 287(5782):560–561. 39. Shevelev IV, Hübscher U (2002) The 3′ 5′ exonucleases. Nat Rev Mol Cell Biol 3(5): 23. Duret L, Galtier N (2009) Biased gene conversion and the evolution of mammalian 364–376. genomic landscapes. Annu Rev Genomics Hum Genet 10:285–311. 40. Snoke MS, Berendonk TU, Barth D, Lynch M (2006) Large global effective population 24. Aury JM, et al. (2006) Global trends of whole-genome duplications revealed by the sizes in Paramecium. Mol Biol Evol 23(12):2474–2479. – ciliate Paramecium tetraurelia. Nature 444(7116):171 178. 41. Katz LA, Snoeyenbos-West O, Doerder FP (2006) Patterns of protein evolution in 25. Denver DR, Morris K, Lynch M, Vassilieva LL, Thomas WK (2000) High direct estimate Tetrahymena thermophila: Implications for estimates of effective population size. of the mutation rate in the mitochondrial genome of Caenorhabditis elegans. Science Mol Biol Evol 23(3):608–614. 289(5488):2342–2344. 42. Lynch M (2007) The Origins of Genome Architecture (Sinauer Associates, Sunderland, 26. Haag-Liautard C, et al. (2008) Direct estimation of the mitochondrial DNA mutation MA.). rate in Drosophila melanogaster. PLoS Biol 6(8):e204. 43. Li M, et al. (2010) Detecting heteroplasmy from high-throughput sequencing of 27. Catania F, Wurmser F, Potekhin AA, Przybos E, Lynch M (2009) Genetic diversity in the – Paramecium aurelia species complex. Mol Biol Evol 26(2):421–431. complete human mitochondrial DNA genomes. Am J Hum Genet 87(2):237 249. 28. Eyre-Walker A, Keightley PD (2007) The distribution of fitness effects of new muta- 44. Kimura M (1983) The Neutral Theory of Molecular Evolution (Cambridge Univ Press, tions. Nat Rev Genet 8(8):610–618. Cambridge, UK). 29. Rutter MT, et al. (2012) Fitness of Arabidopsis thaliana mutation accumulation lines 45. Eisen JA, et al. (2006) Macronuclear genome sequence of the ciliate Tetrahymena whose spontaneous mutations are known. Evolution 66(7):2335–2339. thermophila, a model eukaryote. PLoS Biol 4(9):e286. 30. Hall DW, Joseph SB (2010) A high frequency of beneficial mutations across multiple 46. Pavlov YI, Shcherbakova PV, Rogozin IB (2006) Roles of DNA polymerases in replica- fitness components in Saccharomyces cerevisiae. Genetics 185(4):1397–1409. tion, repair, and recombination in eukaryotes. Int Rev Cytol 255:41–132.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1210663109 Sung et al. Downloaded by guest on September 28, 2021