<<

Determinants of spontaneous in the PNAS PLUS bacterium Escherichia coli as revealed by whole-genome sequencing

Patricia L. Fostera,1, Heewook Leeb, Ellen Popodia, Jesse P. Townesa, and Haixu Tangb

aDepartment of Biology, Indiana University, Bloomington, IN 47405; and bSchool of Informatics and Computing, Indiana University, Bloomington, IN 47405

Edited by Graham C. Walker, Massachusetts Institute of Technology, Cambridge, MA, and approved September 16, 2015 (received for review June 21, 2015) A complete understanding of evolutionary processes requires that (3, 4), which in bacteria is achieved easily by streaking for single factors determining spontaneous mutation rates and spectra be colonies on agar medium. This procedure allows to identified and characterized. Using mutation accumulation followed accumulate in an unbiased manner with a minimum of selective by whole-genome sequencing, we found that the mutation rates of pressure. After a sufficient number of generations have occurred, three widely diverged commensal Escherichia coli strains differ only the genomes are sequenced, and mutations are identified. Using − by about 50%, suggesting that a rate of 1–2 × 10 3 mutations per this technique, we recently determined the intrinsic mutation rates generation per genome is common for this bacterium. Four major and mutational spectra of repair-proficient strains of Escherichia forces are postulated to contribute to spontaneous mutations: in- coli and Bacillus subtilis and documented the mutational impact of trinsic DNA polymerase errors, endogenously induced DNA damage, the loss of the major error-correcting system, mismatch repair DNA damage caused by exogenous agents, and the activities of (MMR) (5–7). In the studies reported here we concentrate on error-prone polymerases. To determine the relative importance of E. coli, first asking if other commensal strains of E. coli have the these factors, we studied 11 strains, each defective for a major DNA same and spectrum as our K12 strain and whether repair pathway. The striking result was that only loss of the ability changing the growth medium influences mutation. Then we de- to prevent or repair oxidative DNA damage significantly impacted

termined the mutational effects of the loss of several important GENETICS mutation rates or spectra. These results suggest that, with the ex- DNA repair pathways. Our major conclusion is that, under the ception of oxidative damage, endogenously induced DNA damage conditions of our experiments, mutation rates and spectra are does not perturb the overall accuracy of DNA replication in normally nearly impervious to the loss of DNA repair functions except for growing cells and that repair pathways may exist primarily to de- those that deal with oxidative DNA damage. We also show that the fend against exogenously induced DNA damage. The thousands of mutagenicity of a major oxidative lesion, 7,8-dihydro-8-oxoguanine mutations caused by oxidative damage recovered across the entire (8-oxoG), is highly dependent on the local sequence context. genome revealed strong local-sequence biases of these mutations. Specifically, we found that the identity of the 3′ base can affect the Results and Discussion mutability of a by oxidative damage by as much as eightfold. Mutation Rates, Spectra, and Biases of Wild-Type E. coli Strains. Al- though E. coli has been a model organism for decades, estimates mutation rate | mutation accumulation | evolution | DNA repair | of its spontaneous mutation rate vary by more than an order of oxidative DNA damage magnitude (ref. 8 and the references therein). In a previous study

complete understanding of the evolution and stability of the Significance Agenome requires that the determinants of spontaneous mutation be identified and characterized. Among the variety of mistakes that can occur during DNA transactions, four sources of Because genetic variation underlies evolution, a complete un- sequence variation appear to dominate in prokaryotes: intrinsic derstanding of evolutionary processes requires identifying and DNA polymerase errors, endogenously induced DNA damage, characterizing the forces determining the stability of the ge- DNA damage induced by exogenous agents, and the activities of nome. Using mutation accumulation and whole-genome se- error-prone polymerases. This conclusion is based on changes in quencing, we found that spontaneous mutation rates in three Escherichia coli the rates and spectra of mutations that occur when genes affecting widely diverged strains are nearly identical. To these processes are deleted or amplified. In particular, loss of a determine the importance of DNA damage in driving mutation DNA repair pathway often gives a mutator phenotype, indicating rates, we investigated 11 strains, each defective for a major DNA that the pathway of interest exerts an important limitation on repair pathway. The striking result was that only loss of the spontaneous mutation (1). However, investigations of the muta- ability to repair or prevent oxidative DNA damage significantly genic impact of various DNA repair pathways have relied almost impacted mutation rates and spectra. These results suggest that, exclusively on reporter genes, leaving open the possibility that the with the exception of those that defend against oxidative results are biased by the particular features of the selected loci. damage, DNA repair pathways may exist primarily to defend This concern can be avoided by allowing mutations to accumulate against DNA damage induced by exogenous agents. nonselectively in DNA repair-defective strains and identifying the Author contributions: P.L.F., E.P., and H.T. designed research; E.P. and J.P.T. performed resulting sequence changes by whole-genome sequencing (WGS). research; H.L. and H.T. contributed new reagents/analytic tools; P.L.F., H.L., E.P., and J.P.T. Although this approach may miss rare but interesting mutational analyzed data; and P.L.F. wrote the paper. processes, it can reveal the overall threats to genomic stability and The authors declare no conflict of interest. identify features, such as local sequence context, that influence This article is a PNAS Direct Submission. mutational frequencies. Surprisingly, this technique has been Data deposition: The sequences reported in this paper have been deposited in the Na- used with the eukaryote Caenorhabditis elegans (2) but has not tional Center for Biotechnology Information Sequence Read Archive (BioProject accession been extensively applied to prokaryotes. no. SRP013707) and in IUScholarWorks Repository (URI hdl.handle.net/2022/20340). The mutation accumulation (MA) protocol involves establish- 1To whom correspondence should be addressed. Email: [email protected]. ing multiple clonal populations from a single founder and then This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. repeatedly passing the lines through single-individual bottlenecks 1073/pnas.1512136112/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1512136112 PNAS Early Edition | 1of10 Downloaded by guest on October 6, 2021 Table 1. Parameters of the MA/WGS experiments No. of No. of Generations BPS rate per BPS rate per Rate compared † Strain Description BPSs lines per line genome (× 103) 95% CL* nucleotide (× 1010) 95% CL* with PFM2

‡ PFM2 WT 246 61 4,230 0.95 0.17 2.05 0.36 1.00 PFM2m WT on minimal 277 50 6,166 0.90 0.13 1.94 0.29 0.94 medium ED1a WT 407 41 6,114 1.62 0.19 3.12 0.36 1.52 IAI1 WT 422 47 6,342 1.42 0.15 3.01 0.32 1.47 PFM101 umuDC dinB 269 43 6,078 1.03 0.15 2.22 0.33 1.08 PFM133 umuDC dinB polB 252 45 6,204 0.90 0.15 1.95 0.32 0.95 PFM35 uvrA 316 47 6,350 1.06 0.18 2.28 0.39 1.11 PFM40 alkA tagA 265 47 6,225 0.91 0.16 1.95 0.35 0.95 PFM88 ada ogt 250 49 6,269 0.81 0.12 1.75 0.25 0.85 PFM22 nth nei 209 50 920 4.54 0.79 9.79 1.71 4.77 PFM180 xthA nfo 339 44 3,155 2.44 0.38 5.26 0.81 2.56 PFM91 nfi 335 47 6,308 1.13 0.16 2.44 0.35 1.19 PFM61 mutT 2234 25 599 149 9 321 19 156 PFM6 mutY 485 24 1,972 10.2 1.2 22.1 2.6 11 PFM94 mutM mutY 4282 24 1,916 93 8 201 18 98

*The SEM of the BPSs per line for each strain was multiplied by the critical value of the t distribution to give the 95% CL. †Nucleotides per genome: K12 = 4,639,675; ED1a = 5,209,548; IAI1 = 4,700,560; the plasmids in ED1a and IAI1 are not included. ‡ Data for PFM2 on LB are from ref. 5; two lines and 13 mutations are included that were not included in ref. 5 because they were shared among MA lines (see SI Materials and Methods). The mutation rate is the average of two PFM2 datasets; the number of generations per line is a weighted average of the two datasets; the 95% CL was derived from the sum of the variances of the BPS per line for the two datasets.

using MA/WGS, we found that the rate of spontaneous base-pair MMR was defective, indicating that MMR preferentially repairs − substitutions (BPSs) of E. coli K12 was 2 × 10 10 mutations per coding sequences (5). nucleotide per generation (5), about a third of the rate expected The frequency of mutations in PFM2 grown on LB is biased by based on results with reporter loci (9). One goal of the current neighboring nucleotides; in particular, transitions at A:T base study was to determine if this mutation rate pertains only to K12 pairs in the context 5′ApC3′/3′TpG5′ occur twice as often as would or is characteristic of other E. coli strains. Thus, we performed be expected from the frequency of this dimer in the genome (5). MA/WGS with two relatively recently isolated human com- (Here and throughout this paper the mutated base is underlined mensal E. coli strains, ED1a and IAI1, that are diverged from and in bold-face type.) Both ED1a and IAI1 also showed this bias, E. coli K12 (10). but for IAI1 the bias was not significant at the P ≤ 0.05 level The results for BPSs from all the MA/WGS experiments (Table S2). All three strains growing on LB also showed a strong reported here are summarized in Table 1. The BPS rates in ED1a DNA strand bias: G:C to A:T transitions were twice as likely to and IAI1 did not differ significantly from each other but were each occur with C templating the lagging strand and G templating the ∼50% higher than that of our wild-type K12 strain, PFM2. leading strand rather than vice-versa (Table S3) (5). Interestingly, Although these differences are significant (t = 4.6, 4.4, df = 22, culturing PFM2 on minimal glucose medium eliminated the se- − − 20, P = 1 × 10 4 and 3 × 10 4 for ED1a and IAI1, respectively), quence bias and diminished the strand bias (Tables S2 and S3). the rates nonetheless are within the same order of magnitude. We DNA methyltransferase (Dcm) transfers a methyl also determined that the mutation rate of PFM2 when cultured on group to the 5 position of the internal Cs in CCWGG sequences; minimal glucose medium was nearly the same as when it was because 5-meC is prone to deamination, creating T, CCWGG sequences are hotspots for G:C to A:T transitions (11). Our cultured on LB medium, which supplies mainly peptides as carbon results reproduce this finding for PFM2 (cultured in LB and in and nitrogen sources (Table 1). minimal medium) and ED1a but not IAI1. In all three genomes The spectra of the types of BPSs were comparable among the 2% of the total G:C base pairs occur at the internal position of three strains (Tables 2 and 3). Fig. 1A shows the conditional CCWGG sites, but these base pairs accounted for 14% (12/86) of mutation rates, i.e., the rates of each type of BPS relative to the the G:C transitions when PFM2 was grown on LB, 6% (8/128) numbers of A:T or G:C base pairs in each genome. Because the when PFM2 was grown on minimal medium, and 6% (12/194) in mutation rates differ among the strains, it is more informative to ED1a; these values are significantly greater than expected (χ2 = look at the fractions of each type of BPS, again normalized to the 53; P ∼ 0; χ2 = 9, P = 0.003; and χ2 = 15, P = 0.0001, respectively). number of A:T or G:C base pairs (Fig. 1B). The BPS spectra of However, in IAI1 only 4% (6/162) of the G:C transitions occurred all three strains were dominated by transitions, particularly G:C at CCWGG sites, a value not significantly different from that to A:T transitions, which is the expected spectrum based upon expected (χ2 = 1.4, P = 0.24). previous locus-specific and genome-wide experiments (8). DNA methyltransferase (Dam) transfers a methyl In all three strains the ratio of BPSs in coding versus noncoding group to the 6 position of As in GATC sequences; GATC se- DNA was almost half of that expected from the ratio of coding quences are hotspots for A:T transversions (5), presumably versus noncoding DNA in the genomes (Table S1). Indeed, 1,000 because 6-meA depurinates, and depurinated sites produce Monte Carlo simulations for each strain did not yield a ratio of transversions (12). In all three genomes 3% of the total A:T base BPSs in coding versus noncoding DNA as small as the observed pairs occur in GATC sites, but these base pairs accounted for ratio. This result does not reflect selection against mutations in 24% (12/62) of the A:T transversions when PFM2 was grown on coding DNA; for each strain the ratio of nonsynonymous to syn- LB, 10% (5/53) when PFM2 was cultured on minimal medium, onymous mutations did not differ significantly from the ratio and 16% (14/87) in IAI1; these values are significantly greater than obtained from simulations (Table S1). In our previous study we expected (χ2 = 43, P ∼ 0; χ2 = 4, P = 0.05; and χ2 = 39, P ∼ 0, re- showed that the bias against coding DNA disappeared when spectively). However, there was no preference for A:T transversions

2of10 | www.pnas.org/cgi/doi/10.1073/pnas.1512136112 Foster et al. Downloaded by guest on October 6, 2021 Table 2. BPS spectra in the experimental strains PNAS PLUS PFM2m PFM101 PFM133 PFM40 PFM88 PFM22 PFM180 PFM94 PFM2 WT on ED1a IAI1 umuDC umuDC PFM35 alkA ada nth xthA PFM91 PFM61 PFM6 mutM Mutational changes WT* min WT WT dinB dinB polB uvrA tagA ogt nei nfo nfi mutT mutY mutY

Types of substitutions Total 246 277 407 422 269 252 316 265 250 209 339 335 2,234 485 4,282 Transitions 136 167 256 218 136 142 167 147 142 186 170 182 6 25 24 A:T > G:C50396256454674574223767439 8 G:C > A:T 86 128 194 162 91 96 93 90 100 5 94 108 3 16 16 Transversions 110 110 151 204 133 110 149 118 108 181 169 153 2,228 460 4,258 A:T > T:A 19 24 17 26 18 26 22 27 19 3 33 31 1 1 4 G:C > T:A 31 44 58 87 58 34 60 35 39 10 47 45 2 443 4,223 A:T > C:G 43 30 54 61 41 37 47 40 39 2 56 60 2,224 12 10 G:C > C:G 17 12 22 30 16 13 20 16 11 8 33 17 1 4 21 A:T sites 112 93 133 143 104 109 143 124 100 10 165 165 2,228 22 22 G:C sites 134 184 274 279 165 143 173 141 150 199 174 170 6 463 4,260 Consequences of substitutions Position N-Cd 57 80 92 80 73 50 73 66 48 24 72 81 350 80 550 Cd 189 197 315 342 196 202 243 199 202 185 267 254 1,884 405 3,732 Within coding sequences Syn 57 56 89 101 60 48 51 47 54 58 77 78 245 107 948 N-Syn 132 141 226 241 136 154 192 152 148 127 190 176 1,639 298 2,784 Amino acid changes Csv 59 78 99 113 71 71 92 75 75 61 93 92 716 112 1,039

N-Csv 73 63 127 128 65 83 100 77 73 66 97 84 923 186 1,745 GENETICS

Cd, coding; Csv, conservative; N-Cd, noncoding; N-Csv, nonconservative; N-syn, nonsynonymous; PFM2 on min, PFM2 on minimal glucose medium (all other experiments were done on LB medium); Syn, synonymous; WT, wild type. *Data from ref. 5; see footnotes in Table 1 for modifications.

to occur at GATC sites in ED1a (3/71 observed, 2/71 expected, χ2 = frequencies in highly expressed genes, but this increase was sig- 0.02, P = 0.88). nificant only in an MMR-defective strain. That G:C base pairs in CCWGG sites were not significant hotspots in IAI1, and A:T base pairs in GATC sites were not Impact of DNA Repair Pathways on Spontaneous Mutations. A second significant hotspots in ED1a, but both were hotspots in PFM2, goal of this study was to determine if, in normally growing cells, loss suggests that methylase activities among these strains differ even of a given DNA repair pathway would change the mutation rate or though both Dcm and Dam are encoded in all three genomes the mutational spectrum. If so, it could be concluded that the repair (www.kegg.jp/kegg/). pathway in question is important in preventing the mutations that In almost all the experimental strains in this report small arise from DNA damage induced by endogenous activities or by (≤4 bp) occurred at about 1/10th the rate as BPSs and more fre- ubiquitous exogenous agents. Our choice of which DNA repair quently were losses rather than gains of base pairs (Tables 4 and 5). pathways to investigate was based on previous reports that their loss As discovered many years ago (14), indels occurred predominantly had mutational consequences (discussed below). in mononucleotide runs. Growth on minimal glucose medium Note that in this report we have not included two major mu- appeared to decrease the mutation rate, particularly of −1-bp tation-avoidance pathways, proofreading during DNA replication events; however, because there were so few indels, few statistically and removal of uracil from the DNA. We previously reported the effects of inactivating the DNA MMR pathway (5, 6). significant conclusions can be drawn. DNA polymerases. E. coli possesses three specialized DNA poly- Transcription can impact mutation rates in several ways. Be- merases that are induced by DNA damage and other stresses. cause ssDNA is exposed during transcription and is more vulner- Two of these, DNA polymerase V (Pol V, encoded by the able to damage than dsDNA, high rates of transcription can be umuDC genes) and DNA polymerase IV (Pol IV, encoded by the mutagenic (15). On the other hand, transcription-coupled repair dinB gene), are intrinsically error prone, whereas the third, DNA (TCR), which is targeted to the transcribed DNA strand, should polymerase II (Pol II, encoded by the polB gene), is accurate lower mutation rates in transcribed genes (16). Head-on collisions (reviewed in ref. 19). Although there are few (≤15) molecules of between replication and transcription destabilize the replication Pol V in uninduced cells, transient induction of Pol V during machinery and can lead to mutations (17). In our previous study of normal growth could be mutagenic (20). Indeed, some reports E. coli K12 strain PFM2, we found no evidence that transcription have credited Pol V with a substantial role in spontaneous mu- affected the frequency of mutations (5). This negative result was tation (21–23), especially in stationary-phase cells (24). In con- reproduced here with IAI1, ED1a, and PFM2 growing in minimal trast to Pol V, Pol IV is abundant even in normally growing cells medium and with all the experimental strains. There was no sta- (∼250 molecules) and can be induced 10- to 30-fold by DNA tistically significant bias for mutations to occur in highly transcribed damage and other stresses (25, 26). Pol IV is required for sta- genes or in genes oriented so that transcription is in the direction tionary-phase or adaptive mutation (27, 28) and has a substantial opposite the direction of replication (even among highly tran- role in producing mutations on the F´ episome (29, 30). How- scribed genes), nor was there a bias for any specific base change to ever, in normally growing cells Pol IV contributes only 10%, at occur on the transcribed versus the nontranscribed strand (Tables most, to spontaneous mutations on the (20, 29, 31). S4–S6). Using different methods to analyze our published data, Loss of polymerase II (Pol II) results in a mutator phenotype, Chen et al. (18) found a small (17–30%) increase in mutation but this phenotype is entirely dependent on Pol IV (27, 32, 33),

Foster et al. PNAS Early Edition | 3of10 Downloaded by guest on October 6, 2021 Table 3. Conditional BPS rates × 1010 PFM101 PFM133 PFM94 PFM2 PFM2m ED1a IAI1 umuDC umuDC PFM35 PFM40 PFM88 PFM22 PFM180 PFM91 PFM61 PFM6 mutM Mutational changes WT* WT WT WT dinB dinB polB uvrA alkA tagA ada ogt nth nei xthA nfo nfi mutT mutY mutY

Types of substitutions Total 2.05 1.94 3.12 3.01 2.22 1.95 2.28 1.84 1.75 9.79 5.26 2.44 321 23.4 207 Transitions 1.14 1.17 1.96 1.56 1.12 1.10 1.21 1.02 1.00 8.72 2.64 1.32 0.86 2.49 3.46 A:T > G:C 0.85 0.55 0.96 0.81 0.75 0.72 1.09 0.80 0.60 0.48 2.40 1.09 0.88 2.93 4.57 G:C > A:T 1.41 1.76 2.93 2.28 1.48 1.46 1.32 1.23 1.38 16.7 2.87 1.55 0.85 2.07 2.39 Transversions 0.92 0.77 1.16 1.46 1.10 0.85 1.08 0.82 0.76 1.08 2.62 1.11 321 20.9 203 A:T > T:A 0.32 0.34 0.26 0.38 0.30 0.41 0.32 0.38 0.27 0.29 1.04 0.46 0.29 0.18 0.46 G:C > T:A 0.51 0.61 0.88 1.22 0.94 0.52 0.85 0.48 0.54 0.92 1.44 0.64 0.57 39.5 397 A:T > C:G 0.73 0.43 0.84 0.88 0.69 0.58 0.69 0.56 0.56 0.19 1.77 0.89 650 1.15 1.01 G:C > C:G 0.28 0.17 0.33 0.42 0.26 0.20 0.28 0.22 0.15 0.74 1.01 0.24 0.28 0.34 1.95 A:T sites 1.90 1.32 2.07 2.07 1.74 1.71 2.10 1.74 1.43 0.95 5.21 2.44 651 4.26 6.03 G:C sites 2.20 2.53 4.14 3.92 2.68 2.17 2.46 1.92 2.07 18.4 5.32 2.43 1.70 41.9 401 Consequences of substitutions Position N-Cd 3.21 3.77 5.12 4.68 4.06 2.60 3.55 3.08 2.27 7.58 7.54 3.97 339 26.5 181 Cd 1.85 1.62 2.80 2.78 1.90 1.83 2.06 1.62 1.66 10.2 4.87 2.17 318 22.8 211 Within coding sequences Syn 0.79 0.65 1.12 1.16 0.82 0.62 0.61 0.54 0.63 4.52 1.99 0.94 58.6 9.30 75.8 N-Syn 0.56 0.50 0.87 0.85 0.57 0.61 0.71 0.54 0.53 3.05 1.51 0.66 121 7.09 68.7 Amino acid changes Csv 0.54 0.59 0.82 0.85 0.64 0.60 0.72 0.56 0.57 3.10 1.57 0.73 112 5.79 55.5 N-Csv 0.59 0.43 0.93 0.86 0.52 0.62 0.70 0.52 0.50 3.00 1.46 0.59 129 8.26 80.6

Conditional mutation rates are the numbers of mutations per generation divided by the relevant parameter for each genome. Cd, coding; Csv, conser- vative; N-Cd, noncoding; N-Csv, nonconservative; N-syn, nonsynonymous; PFM2m WT, PFM2 WT on minimal glucose medium (all other experiments were done on LB medium); Syn, synonymous; WT, wild type. *Data from ref. 5; see footnotes in Table 1 for modifications.

probably because Pol IV is overexpressed if Pol II is deleted (26, BPSs in coding versus noncoding DNA was nearly half that 34). The three specialized polymerases also have distinct muta- expected (243/73 observed, 269/47 expected; χ2 = 7, P = 0.008) tional biases (20, 31, 35). Thus, we expected that loss of these (Table 2); thus, TCR does not contribute to the bias against polymerases would result in a decrease in mutation rates and/or mutations in coding DNA. Loss of UvrA did not result in a a shift in mutational spectra. significant change either in the frequency at which any type of As shown in Fig. 2 and Table 3, the loss of Pol IV and Pol V or base change occurred on the transcribed strand (Table S4)orin of all three specialized polymerases did not significantly affect the fraction of mutations in highly transcribed genes (Table S5). the rate or the spectra of BPSs with the exception of an 85% Thus, we conclude that neither NER nor TCR contributes to the increase in G:C to T:A transversions in the umuDC dinB mutant spontaneous BPS rate in normally growing cells. strain (P = 0.03 based on simulations). Loss of the polymerases Alkylation damage repair. Two major pathways prevent the lethal also did not significantly affect the rate or spectra of small indels and mutagenic consequences of the DNA damage produced by (Table 5). We conclude that in normally growing cells the spe- DNA alkylating agents (41). The first pathway is initiated by cialized DNA polymerases do not contribute detectably to geno- E. coli’s two 3-alkyladenine glycosylases, AlkA and Tag, which mic mutations. remove 3-alkylpurines from the DNA; AlkA also removes sev- Nucleotide excision repair and TCR. The nucleotide excision repair eral additional alkylated bases. The resulting abasic sites are (NER) proteins recognize and initiate the repair of a variety of repaired by enzymes of the base-excision repair (BER) pathway DNA lesions produced by exogenous and endogenous agents (see below). The second pathway involves two alkyltransferases, (36) as well as misincorporated ribonucleotides (37). NER also Ada and Ogt, each of which remove the alkyl groups from can act gratuitously on undamaged DNA (38). TCR preferen- O6-alkylguanine and O4-alkylthymine, restoring the bases to tially targets NER to the transcribed strand during transcription normality. Ada also is the transcriptional activator of the genes (16). Loss of NER has been reported to increase mutation rates that encode itself, AlkA and AlkB; AlkB repairs alkylated bases by as much as sixfold, presumably because of error-prone trans- in ssDNA. E. coli strains defective for Ada and Ogt have in- lesion synthesis (22, 39), and also to decrease mutation rates by creased spontaneous mutation rates under starvation conditions about the same amount, presumably by eliminating errors made by (42–44), mostly because of DNA alkylating agents produced by DNA polymerase I (Pol I) during gap repair (40). nitrosation of amino acids, peptides, and/or polyamines (42). The UvrA protein initiates the repair process by recognizing In our MA/WGS experiment, deleting both alkA and tagA had DNA damage and recruiting the other required Uvr proteins; little effect on the spontaneous mutation rate (Table 1). Deleting thus, a of the uvrA gene inactivates both NER and TCR. both ada and ogt reduced the mutation rate by a nonsignificant As shown in Fig. 2 and Table 3, the rate of BPSs in the uvrA 15% relative to the wild-type strain (t = 1.5, df = 18, P = 0.14) mutant strain did not differ significantly from that in the wild- (Table 1). The BPS spectra of both double-mutant strains were type strain. The spectrum of BPSs in the uvrA mutant strain also the same as that of the wild-type strain (Fig. 2 and Table 3). was nearly the same as that in the wild-type strain, differing only Thus, under the conditions of our experiment during which cells by a 67% increase in G:C to T:A transversions and a 30% in- do not experience prolonged starvation, endogenous DNA al- crease in A:T to G:C transitions (P = 0.03 and 0.05, respectively, kylation is, at most, a minor contributor to spontaneous mutation. based on simulations). As in the wild-type strain, the ratio of However, DNA Pol IV may bypass endogenous alkyl lesions

4of10 | www.pnas.org/cgi/doi/10.1073/pnas.1512136112 Foster et al. Downloaded by guest on October 6, 2021 A 4.0 removed by other enzymes to create a substrate for DNA Pol I PNAS PLUS (see below). Previous studies found that loss of either enzyme PFM2 3.5 alone had only modest mutagenic effects, but loss of both was 10 PFM2m synergistic, particularly for G:C to A:T mutations, which were 3.0 ED1a enhanced 40-fold in an nth nei double-mutant strain. These re- IAI1 sults established damaged as the major mutagenic le- 2.5 sions repaired by this pathway (49). The results of our MA/WGS experiment with a strain deleted 2.0 for both nth and nei confirmed these results, although the mu- tational effects were more modest. Relative to the wild-type 1.5 strain the overall BPS rate in the double-mutant strain was ele- = = = × −8 1.0 vated fivefold (t 10, df 16, P 2 10 ), and the rate of G:C to A:T transitions was elevated 12-fold (t = 11, df = 12, P = 1 × −7 0.5 10 ) (Fig. 3 and Table 3). The rate of G:C transversions also was increased twofold (P = 0.04, based on simulations), probably 0.0 because of the occasional of Gs and Ts opposite Total A:T>G:C G:C>A:T A:T>T:A G:C>T:A A:T>C:G G:C>C:G damaged cytosines by DNA polymerases, as previously observed (50). G:C transitions were significantly biased by the local se- 1.2 B quence context, occurring in the context 5′CpG3′/3′GpC5′ nearly twice as frequently as expected from the number of such dimer PFM2 1.0 pairs in the genome (98 observed versus 53 expected; χ2 = 23, − PFM2m P = 2 × 10 6). In contrast, the context 5′CpT3′/3′GpA5′ was strongly disfavored (5 observed versus 36 expected; χ2 = 26, P = 0.8 ED1a − 3 × 10 7). These biases suggest that a 3′ G potentiates a C to IAI1 oxidative damage, whereas a 3′ T is protective. Alternatively, 0.6 DNA polymerase may be more likely to insert an A opposite a

damaged C or to extend from the A:C mispair if it has just made a GENETICS strong G:C base pair rather than a weak T:A base pair. However, 0.4 base-stacking must play a role, because 5′CpC3′/3′GpG5′ se- quences were not hotspots, and 5′CpA3′/3′GpT5′ sequences were 0.2 not coldspots. As with other strains with high levels of oxidation- induced mutations (see below), mutations in the nth nei double- mutant strain were not DNA-strand biased (Table S3). 0.0 A:T>G:C G:C>A:T A:T>T:A G:C>T:A A:T>C:G G:C>C:G Exonuclease III and endonuclease IV. AP sites arise from the action of glycosylases, from spontaneous breakage of the glycosylic Fig. 1. The mutation rate and spectra of BPSs in wild-type E. coli strains. bond, and by the direct action of some DNA-damaging agents. (A) The conditional mutation rates of each of the six BPSs. The bars for the totals AP endonucleases cleave the phosphodiester bond 5′ to an AP show the mutation per generation per nucleotide; error bars show 95% CLs site or to the damaged sugar left by the lyase activities of DNA calculated from the number of mutations per MA line. The bars for specific glycosylases, leaving a 3′ OH group that is the substrate for DNA BPSs show the mutation per generation of each type of BPS divided by the Pol I. Exonuclease III (XthA) and endonuclease IV (Nfo) number of A:T or G:C base pairs in the genome; error bars show 95% CLs comprise >90% of the AP endonuclease activity in E. coli calculated from 1,000 Monte Carlo simulations of a random distribution with the mutational spectra observed for each dataset (SI Materials and Meth- (reviewed in ref. 51). Based on previous reports (39, 52, 53), we ods). (B) Normalized fraction of each of the six BPSs. The bars show the expected a two- to 10-fold increase in the mutation rate in the number of each of the six BPSs divided by the total number of BPSs, nor- xthA nfo double-mutant strain. Our results were on the lower end malized to the fraction of A:T or G:C base pairs in the genome. The error bars of this range: the BPS rate in the xthA nfo double-mutant strain show 95% CLs from 1,000 Monte Carlo simulations of a random distribution was elevated 2.6-fold relative to the wild-type strain (t = 7.6, df = − with the mutational spectrum observed for each dataset (SI Materials and 22, P = 1 × 10 7) (Fig. 3 and Table 1). As might be expected Methods). PFM2m, PFM2 on minimal medium (all other experiments were from the nonspecific activities of these enzymes, all classes of on rich medium). BPS were equally elevated by the loss of XthA and Nfo (Fig. 3 and Table 3). The rate of indels also was increased threefold (Table 5), but because the actual number of indels was small (36 in the xthA accurately, possibly mitigating the mutagenic consequence of loss nfo double-mutant strain and 24 in the wild type strain), this result of the alkylation repair enzymes (45). may not be significant (t = 3.4, df = 2, P = 0.075). Overall, these BER. BER is a major defense against the lethal and mutagenic results indicate that a low level of endogenous DNA damage consequences of a variety of DNA lesions. It typically is initiated resulting in base loss either spontaneously or from the action of by a glycosylase that removes a specific type of damaged base, glycosylases is a constant threat to genomic integrity. leaving an apurinic/apyrimidinic (AP) site. The activities of Endonuclease V. Endogenous and exogenous nitrosating agents endo- and/or exonucleases then create a small gap that can be oxidatively deaminate A, C, and G, yielding the potentially mu- filled in by DNA Pol I. The mutagenic consequences of loss of tagenic bases hypoxanthine, uracil, xanthine, and oxanine. Repair BER have been documented by numerous previous reports; we of these lesions can be initiated by enzymes of the uracil DNA chose to investigate the following three pathways that appear glycosylase family, but a major pathway for the repair of oxida- prominently in such reports. tively deaminated purine bases is initiated by endonuclease Endonuclease III and endonuclease VIII. Endonuclease III (Nth) and V (Nfi). Nfi recognizes the base and nicks the damage-bearing DNA endonuclease VIII (Nei) are glycosylases whose main targets are strand; the damaged base then is removed via the exonuclease oxidatively damaged (46, 47), although Nei can activity of Nfi or of DNA Pol I (reviewed in ref. 54). In previous remove some oxidatively damaged , including 8-oxoG (48). studies, loss of Nfi resulted in a twofold increase in the spontaneous Both Nth and Nei possess lyase activity, cleaving the sugar- mutation rate (55). However, in our MA/WGS experiment the phosphate backbone 3′ to the AP site; the sugar then must be loss of Nfi increased the mutation rate by only 19% relative to the

Foster et al. PNAS Early Edition | 5of10 Downloaded by guest on October 6, 2021 Table 4. Spectra of indels in the experimental strains PFM101 PFM133 PFM40 PFM94 PFM2 PFM2m ED1a IAI1 umuDC umuDC dinB PFM35 alkA PFM88 PFM22 PFM180 PFM91 PFM61 PFM6 mutM Mutational changes WT* WT WT WT dinB polB uvrA tagA ada ogt nth nei xthA nfo nfi mutT mutY mutY

Total 24 13 37 31 23 27 40 30 24 6 36 25 1 6 6 +1 bp 6 6 6 5 6 11 9 5 4 2 14 8 0 2 1 −1 bp 16 7 27 25 16 15 29 22 18 4 20 16 1 3 5 +>1bp 0 0001 0 0 0 1 0 0 0000 − >1bp 2 0410 1 2 3 1 0 2 1010 +1A:T 5 2524 8 6 3 3 1 7 5010 −1 A:T 5 4 15 19 12 8 18 16 17 4 12 12 0 1 3 +1G:C 1 4132 3 3 2 1 1 7 3011 −1 G:C 12 3 12 6 4 7 11 6 1 0 8 4 1 2 2 In run 22 13 30 31 19 26 37 24 20 6 32 22 1 5 6 Not in run 20704 1 3640 53010

bp, base pair; PFM2m, PFM2 on minimal glucose medium (all other experiments were done on LB medium). A run is two or more of the same base pair or repeats of a short base-pair sequence. *Data are from ref. 5; two indels identified in ref. 13 and one mistaken call have been added (see SI Materials and Methods).

wild-type strain (Table 1), an increase that may not be significant (t = expected to increase the rate of G:C to T:A transversions. 1.7, df = 19, P = 0.11), and had no effect on the spectrum of BPSs However, in confirmation of previous results (60), we found that (Fig. 3 and Table 3). These results suggest that under the conditions the rate of G:C to T:A transversions was not increased in the of our experiments endogenous nitrosating agents are not major mutT mutant strain (Table 3), suggesting either that 8-oxoG is contributors to spontaneous mutations. rarely inserted opposite template C or that the combined activi- 8-oxoG mutagenesis. One of the most mutagenic DNA lesions ties of MutY, MutM, and MMR are sufficient to prevent such produced by oxidation is 8-oxoG, which base pairs efficiently with mutations. Awheninthesyn instead of the normal anti orientation (56). MutY. Loss of MutY results in G:C to T:A transversions be- E. coli has three enzymes that prevent 8-oxoG mutagenesis: MutT, cause of the persistence of 8-oxoG:A and G:A mispairs (57). As which hydrolyzes 8-oxodGTP, preventing its incorporation into the expected, in our MA/WGS experiment the mutY mutant strain DNA; MutY, which removes As that are mispaired with 8-oxoG or had an 80-fold increase in G:C to T:A transversions relative to G; and MutM, which removes 8-oxoG and formamidopyrimidine the wild-type strain (t = 19, df = 13, P ∼ 0) (Fig. 4 and Table 3). (Fapy) lesions from the DNA (reviewed in ref. 57). Loss of each of MutY has a fairly wide range of substrates: in addition to re- these three enzymes has distinct mutagenic consequences. moving A mispaired with 8-oxoG and G, MutY can remove A MutT. In the absence of MutT, 8-oxoG is incorporated effi- mispaired with 8-oxoA or C (61). If A is the correct base in these ciently into the DNA opposite template A; thus, the loss of MutT mispairs, MutY activity will produce mutations instead of pre- yields a high frequency of A:T to C:G transversions (58, 59). The venting them. Although previous studies have found that loss of results of our MA/WGS experiment reproduce this old finding; MutY reduced the frequency of both A:T to C:G transversions A:T to C:G transversions occurred in the mutT mutant strain at a (60) and A:T to G:C transitions (62), our data did not confirm rate 900-fold greater than in the wild-type strain (t = 33, df = 23, these results (Table 3). P ∼ 0) (Fig. 4 and Table 3). MutY also can remove Gs paired with 8-oxoG, preventing G:C Because 8-oxoG inserted opposite template C could pair with to C:G transversions (63). However, in the MA/WGS experiment A during the next round of replication, loss of MutT might be the rate of G:C to C:G transversions was elevated in the mutY

Table 5. Conditional mutation rates of indels × 1011 PFM101 PFM133 PFM94 Mutational PFM2 PFM2m ED1a IAI1 umuDC umuDC PFM35 PFM40 PFM88 PFM22 PFM180 PFM91 PFM61 PFM6 mutM changes WT* WT on min WT WT dinB dinB polB uvrA alkA tagA ada ogt nth nei xthA nfo nfi mutT mutY mutY

Total 2.02 0.91 2.83 2.21 1.90 2.08 2.89 2.08 1.68 2.81 5.59 1.82 1.44 2.73 2.81 +1 bp 0.50 0.42 0.46 0.36 0.49 0.85 0.65 0.35 0.28 0.94 2.17 0.58 <1.4 0.91 0.47 −1 bp 1.35 0.49 2.07 1.78 1.32 1.16 2.09 1.52 1.26 1.87 3.11 1.16 1.44 1.37 2.34 +>1bp <0.08 <0.07 <0.08 <0.07 0.08 <0.08 <0.07 <0.07 0.07 <0.5 <0.16 <0.07 <1.4 <0.5 <0.5 − >1 bp 0.17 <0.07 0.31 0.07 <0.08 0.08 0.14 0.21 0.07 <0.5 0.31 0.07 <1.4 0.5 <0.5 +1 A:T 0.85 0.28 0.78 0.29 0.67 1.26 0.88 0.42 0.43 0.95 2.21 0.74 <2.9 0.93 <1 −1 A:T 0.85 0.57 2.33 2.76 2.01 1.26 2.64 2.25 2.42 3.81 3.79 1.77 <2.9 0.93 2.86 +1 G:C 0.17 0.55 0.15 0.42 0.32 1.06 0.43 0.27 0.14 0.92 2.14 0.43 <2.8 0.90 0.92 −1 G:C 1.82 0.41 1.81 0.84 0.65 1.85 1.56 0.82 0.14 <0.92 2.45 0.57 2.83 1.79 1.85 In run 8.62 4.66 11.0 11.0 7.61 9.50 11.5 7.45 6.47 14.4 21.5 7.83 7.38 2.28 2.81 Not in run 0.17 <0.07 0.54 <0.07 0.33 0.08 0.22 0.42 0.28 <0.5 0.78 0.22 <1.4 0.46 <0.5

Conditional mutation rates are the numbers of mutations per generation divided by the relevant parameter for each genome (not corrected for the few indels that did not occur at sites of the same base pair). The conditional mutation rates in runs are the number of indels that occurred in runs divided by the number of bases in runs in the genome (not corrected for the few indels that occurred in runs of more than one base pair). A run is two or more of the same base pair or repeats of a short base-pair sequence. bp, base pair; PFM2m, PFM2 on minimal glucose medium (all other experiments were done on LB medium). *Data are from ref. 5; two indels identified in ref. 13 and one mistaken call have been added (see SI Material and Methods).

6of10 | www.pnas.org/cgi/doi/10.1073/pnas.1512136112 Foster et al. Downloaded by guest on October 6, 2021 3.0 In both the mutY and mutM mutY mutant strains 45% of the PNAS PLUS Wild-type G:C to T:A mutations occurred in the context 5′GpC3′/3′CpG5′, 2.5 umuDC dinB although the expected percentage was only 33% (Table S7). The 10 ′G ′ ′C ′ umuDC dinB polB context 5 pT3 /3 pA5 was inhibitory; only 9% (mutY)and5% (mutM mutY) of the G:C to T:A transversions occurred in this 2.0 uvrA context, although the expected percentage was 22% (Table S7). alkA tagA The most permissive triplet was 5′ApGpC3′/3′TpCpG5′,andthe ′ G ′ ′ C ′ 1.5 ada ogt most stringent triplets were 5 Tp pT3 /3 Ap pA5 (mutY) and 5′ApGpT3′/3′TpCpA5′ (mutM mutY), again accounting for an eightfold difference in mutagenic potential in the mutM mutY 1.0 strain. In contrast to G:C to T:A transversions, G:C to C:G transversions in the mutM mutY mutant strain were biased by the ′ 0.5 5 base: of the 21 G:C to C:G transversions, 16 occurred in the context 5′GpG3′/3′CpC5′, although only five were expected (χ 2 = − 12, P = 7 × 10 4). Interestingly, 5′GpG3′/3′CpC5′ was identified in 0.0 an mutM mutant strain as a hotspot for G:C to C:G transversions Total A:T>G:C G:C>A:T A:T>T:A G:C>T:A A:T>C:G G:C>C:G induced by the oxidizing agent menadione (68). G:C to T:A Fig. 2. The spectra of BPSs of DNA repair-defective E. coli strains. See the transversions in mutY and mutM mutY mutant strains were not legend of Fig. 1A for the explanations of bars and error bars. ada ogt, strand biased (Table S3), but G:C to C:G transversions in the PFM88; alkA tagA, PFM40; umuDC dinB, PFM101; umuDC dinB polB, mutM mutY mutant strain were ∼ 30% more likely to occur with G PFM133; uvrA, PFM35; wild-type, PFM2. templating the leading strand than with G templating the lagging strand (16/21 observed vs. 10/21 expected; χ2 = 3.4, P = 0.06). The local sequence context could affect the probability of a mutant strain by only a nonsignificant 30% (t = 0.35, df = 3, P = base being damaged or repaired, of errors occurring during 0.75) (but see below). This difference from the 100-fold increase in replication, of mismatches being corrected, and/or of polymerase the frequency of G:C to C:G transversions found by Zhang et al. extending DNA synthesis from a mismatched base pair. We

(63) probably is explained by the fact that their results were briefly discuss the influence of sequence context on replication GENETICS obtained in cells after prolonged incubation on selective medium. errors below; a more extensive discussion that includes other MutM MutY. Although loss of MutM has been shown to increase factors is given in SI Discussion. mutation frequencies as much as 10-fold (64), in preliminary In the mutT mutant strain A:T to C:G mutations occur when experiments we failed to detect increases in mutation rates to 8-oxoG is inserted opposite A in the template. Our results sug- resistance to rifampicin or to nalidixic acid in a mutM-deletion gest that DNA polymerase makes this insertion most readily if it strain. Because loss of both MutM and MutY is synergistic for has just inserted a T opposite a template A and least readily if it G:C to T:A transversions (57), we used a mutM mutY double- has just inserted a G opposite a template C. A simple explanation deletion strain in our MA/WGS experiment. In confirmation of for this bias is that the weaker T:A base pair permits the incoming previous results, the rate of G:C to T:A transversions in the 8-oxoG to approach in the syn orientation, allowing it to base pair mutM mutY double-mutant strain was 800-fold higher than in the with A (56), whereas the stronger G:C base pair inhibits this = = ∼ wild-type strain (t 23, df 22, P 0) and 10-fold higher than in orientation. In the mutY and mutM mutY mutant strains, G:C to = = ∼ the mutY mutant strain (t 21, df 31, P 0) (Fig. 4 and T:A transversions occur when A is inserted opposite 8-oxoG in Table 3). the template. The mutational bias observed suggests that DNA The rate of G:C to C:G transversions in the mutM mutY polymerase is more likely to insert an A opposite 8-oxoG if it has double-mutant strain was increased sevenfold relative to the just inserted a G opposite a template C than if it has just inserted = = = wild-type strain (t 4.2, df 3, P 0.02) and fivefold relative to an A opposite a template T. A simple explanation is that template = = = the mutY mutant strain (t 3.6, df 2, P 0.07) (Table 3), 8-oxoG may be more likely to swing into the syn orientation, or to suggesting that G:C to C:G mutations are caused by a substrate be stably maintained in that orientation, if the 3′ base pair is the of MutM that persists in the absence of MutY. An obvious candidate is the 8-oxoG:G mispair, a substrate of both MutM (65) and MutY (66), that, if uncorrected, could create a G:C to C:G transversion in the next round of replication. Alternatively, 18 further oxidation products of 8-oxoG as well as other modified 16 Wild-type are also MutM substrates (67, 68) and may be re- sponsible for these mutations. 10 14 nth nei

Sequence Context of 8-OxoG Mutagenesis. Our experiments revealed 12 xthA nfo that the local sequence context strongly influences the frequency 10 nfi of mutations caused by 8-oxoG (Table S7). In the mutT mutant strain, 50% of the A:T to C:G mutations occurred in the sequence 8 5′ApA3′/3′TpT5′, although, based on the frequency of that dimer pair in the genome, the expected percentage was only 30%. In 6 contrast, only 10% of the A:T to C:G mutations occurred in the 4 context 5′ApC3′/3′TpG5′, although the expected percentage was 22%. There was an eightfold difference in mutagenic 2 potential between the triplet that was most permissive, 5′GpApA3′/ ′ T ′ ′ A ′ 0 3 Tp pC5 , and the triplet that was most restrictive, 5 Cp pC3 / Total A:T>G:C G:C>A:T A:T>T:A G:C>T:A A:T>C:G G:C>C:G 3′GpTpG5′ (Table S7). However, A:T to C:G transversions in the mutT mutant strain were not DNA-strand biased, occurring Fig. 3. The spectra of the six BPSs of DNA repair-defective E. coli strains. See equally often with A templating the leading strand as with A the legend of Fig. 1A for the explanations of bars and error bars. nfi, PFM91; templating the lagging strand (Table S3). nth nei, PFM22; wild-type, PFM2; xthA nfo, PFM180.

Foster et al. PNAS Early Edition | 7of10 Downloaded by guest on October 6, 2021 700 the loss of the major AP endonucleases Xth and Nfo causes a modest two- to threefold increase in all types of mutations, 600 Wild-type possibly including small indels, suggesting that there is a low 10 mutT level of endogenous DNA damage that results in base loss 500 mutY either spontaneously or from the action of glycosylases. iii) Oxidative DNA damage is pervasive and highly mutagenic. mutM mutY Loss of Nth and Nei, the glycosylases that remove oxidatively 400 damaged pyrimidines, results in a 12-fold increase in G:C tran- sitions. Loss of the enzymes MutT, MutY, and MutM, all in- 300 volved in preventing mutations caused by oxidatively damaged purines, increases transversion mutations by nearly 1,000-fold. 200 iv) Mutations are strongly biased by the local sequence context. This bias was evident in the wild-type bacteria, in which A:T 100 transitions occurred in the context 5′ApC3′/3′TpG5′ twice as frequently as expected, in the nth nei mutant strain, in which ′C ′ ′G ′ 0 G:C transitions occurred in the context 5 pG3 /3 pC5 twice Total A:T>G:C G:C>A:T A:T>T:A G:C>T:A A:T>C:G G:C>C:G as frequently as expected, and in strains defective in preventing mutagenesis caused by 8-oxoG, in which 5′ApA3′/3′TpT5′ and Fig. 4. The spectra of the six BPSs of E. coli strains defective in the pre- 5′GpC3′/3′CpG5′ were favored for A:T to C:G and G:C to T:A vention of mutations caused by 8-oxoG. See the legend of Fig. 1A for the transversions, respectively. explanations of bars and error bars. mutM mutY, PFM94; mutT, PFM61; mutY, PFM6; wild-type, PFM2. Materials and Methods Bacterial Strains. The bacterial strains used in this study and their methods of strong C:G rather than the weak T:A. However, our data also construction are given in Table S8. The oligonucleotides used to perform and indicate that base-stacking must play a role in potentiating or confirm genetic manipulations are listed in Table S9. Further details are preventing mutations, because the 3′ base had little influence given in SI Materials and Methods. when it was the complement of those indicated above (Table S7). MA Procedure. MA lines originated from single colonies of a founder and Comparison of the Results from MA/WGS Analyses and Previous were propagated through single-colony bottlenecks daily as described (5). Studies. As mentioned above, the DNA repair pathways in- Further details are given in SI Materials and Methods. vestigated in this report were chosen from published reports that their loss resulted in changes in spontaneous mutation rates or Estimation of Generations. The number of generations that each MA line ex- spectra. In general, with the exception of DNA repair activities perienced was estimated from colony size as previously described (5). The mean dealing with oxidative damage, our results did not reproduce these of these values for each strain is reported in Table 1 and was used to calculate mutation rates. Further details are given in SI Materials and Methods. previous findings. In retrospect, this result is not surprising. Most investigations of mutational processes have relied on reporter Genomic DNA Purification, Library Construction, and WGS. Genomic DNA was genes for mutational readout, and these reporter genes may not be purified using the PureLink Genomic DNA purification kit (Invitrogen Corp.). representative of the genome as a whole. Indeed, reporter genes Libraries were constructed either by the Beijing Genome Institute (BGI) or by may enter into use because they are highly mutable or responsive the Indiana University Center for Genomics and Bioinformatics (CGB). Se- to particular types of DNA damage. Although special sequences quencing was performed either by BGI using the Illumina HiSeq 2000 plat- can reveal relatively rare but important mutational processes, such form or by the University of New Hampshire Hubbard Center for Genome as hotspots, amplifications, complex multiple mutations, and ab- Studies (HCGS) using the Illumina HiSeq 2500 platform. Further details are errant recombination, such events at these loci may not contribute given in SI Materials and Methods. detectibly to the mutational load when the whole genome is the SNP and Short Indel Calling. Procedures for SNP and indel calling were as target. Thus, although the MA/WGS method may be blind to described in ref. 5. E. coli K12 strain MG1655 [National Center for Bio- some mutational events, our results confirm previous conclusions technology Information (NCBI) reference sequence NC_000913.2] was used (1) that the most significant threat to genomic integrity overall is as the reference genome sequence. Further details are given in SI Materials and DNA damage by reactive oxygen species. Methods. The sequences and mutations reported in this paper have been de- posited in the NCBI Sequence Read Archive (BioProject accession no. SRP013707) Conclusions and in the IUScholarWorks Repository (URI hdl.handle.net/2022/20340). Our studies of mutational accumulation in E. coli have resulted in the following four conclusions about the determinants of Mutation Annotation. Variants were annotated using custom scripts. Protein- spontaneous mutation: coding gene coordinates were obtained from the GenBank page of reference sequence NC_000913.2. BPSs in coding sequences were determined to be i) The rates and spectra of spontaneous mutations in three synonymous or nonsynonymous based on the . Nonsynonymous diverged commensal E. coli strains are nearly the same. BPSs were designated conservative or nonconservative based on the Blo- − The rate of BPSs is 2–3 × 10 10 mutations per nucleotide sum62 matrix (69) with a value ≥0 considered conservative. per generation, and the rate of small indels (≤4 nt) is 1/10th of that. BPSs are dominated by G:C to A:T transitions, and Monte Carlo Simulations. For each strain a random distribution of BPSs cor- the majority of indels occur in mononucleotide runs. responding to the observed mutational spectra was simulated using a custom ii) Several major pathways for repairing or tolerating DNA script; the number of mutations was fixed at the observed numbers, and 1,000 damage have little impact on spontaneous mutation under trials were simulated (5). the conditions of our experiments. These pathways include Statistical Analyses. Standard statistical analyses were used (70, 71). Confidence translesion DNA synthesis, NER, TCR, and repair of alkyl- limits (CLs) on the overall rates of BPS and indels were calculated from the mean ation damage. We conclude either that these pathways have and variance of the mutations per MA line for each strain. CLs for the different not evolved to repair spontaneous DNA damage or that types of mutations were calculated from the mean and variance of 1,000 Monte there are sufficient backup or redundant repair activities Carlo simulations for each strain; P values based on these calculations are in- to compensate for the loss of any one pathway. However, dicated in the text. Further details are given in SI Materials and Methods.

8of10 | www.pnas.org/cgi/doi/10.1073/pnas.1512136112 Foster et al. Downloaded by guest on October 6, 2021 ACKNOWLEDGMENTS. We thank the following current and past members of this paper for valuable suggestions; E. Denamur, R. Gerlach, B. Wanner, PNAS PLUS of the P.L.F. laboratory for technical assistance: H. Bedwell, C. Coplen, and the National BioResource Project-E. coli at the National Institute of J. Eagan, J. Ferlman, N. Gruenhagen, J. Healy, N. Ivers, R. Newlon, D. Osiecki, Genetics for providing bacterial strains and plasmids. This research was I. Rameses, S. Riffert, D. Simon, K. Smith, B. Souders, K. Storvik, L. Whitson, supported by US Army Research Office Multidisciplinary University Re- B.Wojcik,N.Yahaya,andA.YingYiTan;membersoftheP.L.F.,Lynch, search Initiative Award W911NF-09-1-0444 (to P.L.F., H.T., M. Lynch, and and Finkel laboratories for useful discussions; the anonymous reviewers S. E. Finkel).

1. Miller JH (1996) Spontaneous mutators in bacteria: Insights into pathways of muta- 34. Foster PL (2007) Stress-induced mutagenesis in bacteria. Crit Rev Biochem Mol Biol genesis and repair. Annu Rev Microbiol 50:625–643. 42(5):373–397. 2. Meier B, et al. (2014) C. elegans whole-genome sequencing reveals mutational signa- 35. Nowosielska A, Janion C, Grzesiuk E (2004) Effect of deletion of SOS-induced poly- tures related to carcinogens and DNA repair deficiency. Genome Res 24(10):1624–1636. merases, pol II, IV, and V, on spontaneous mutagenesis in Escherichia coli mutD5. 3. Mukai T (1964) The genetic structure of natural populations of Drosophila melanogaster. Environ Mol Mutagen 43(4):226–234. I. Spontaneous mutation rate of polygenes controlling viability. Genetics 50:1–19. 36. Truglio JJ, Croteau DL, Van Houten B, Kisker C (2006) Prokaryotic nucleotide excision 4. Halligan DL, Keightley PD (2009) Spontaneous mutation accumulation studies in repair: The UvrABC system. Chem Rev 106(2):233–252. evolutionary genetics. Annu Rev Ecol Evol Syst 40:151–172. 37. Vaisman A, et al. (2014) Investigating the mechanisms of ribonucleotide excision re- 5. Lee H, Popodi E, Tang H, Foster PL (2012) Rate and molecular spectrum of sponta- pair in Escherichia coli. Mutat Res 761:21–33. neous mutations in the bacterium Escherichia coli as determined by whole-genome 38. Branum ME, Reardon JT, Sancar A (2001) DNA repair excision nuclease attacks un- sequencing. Proc Natl Acad Sci USA 109(41):E2774–E2783. damaged DNA. A potential source of spontaneous mutations. J Biol Chem 276(27): 6. Foster PL, Hanson AJ, Lee H, Popodi EM, Tang H (2013) On the mutational topology of 25421–25426. the bacterial genome. G3 (Bethesda) 3(3):399–407. 39. Foster PL (1990) Escherichia coli strains with multiple DNA repair defects are hyper- 7. Sung W, et al. (2015) Asymmetric context-dependent mutation patterns revealed induced for the SOS response. J Bacteriol 172(8):4719–4720. through mutation-accumulation experiments. Mol Biol Evol 32(7):1672–1683. 40. Hasegawa K, Yoshiyama K, Maki H (2008) Spontaneous mutagenesis associated with 8. Williams AB (2014) Spontaneous mutation rates come into focus in Escherichia coli. nucleotide excision repair in Escherichia coli. Genes Cells 13(5):459–469. – DNA Repair (Amst) 24:73 79. 41. Sedgwick B (2004) Repairing DNA-methylation damage. Nat Rev Mol Cell Biol 5(2): 9. Drake JW (1991) A constant rate of spontaneous mutation in DNA-based microbes. 148–157. – Proc Natl Acad Sci USA 88(16):7160 7164. 42. Taverna P, Sedgwick B (1996) Generation of an endogenous DNA-methylating agent 10. Touchon M, et al. (2009) Organised genome dynamics in the Escherichia coli species by nitrosation in Escherichia coli. J Bacteriol 178(17):5105–5111. results in highly diverse adaptive paths. PLoS Genet 5(1):e1000344. 43. Mackay WJ, Han S, Samson LD (1994) DNA alkylation repair limits spontaneous base 11. Coulondre C, Miller JH, Farabaugh PJ, Gilbert W (1978) Molecular basis of base sub- substitution mutations in Escherichia coli. J Bacteriol 176(11):3224–3230. – stitution hotspots in Escherichia coli. Nature 274(5673):775 780. 44. Foster PL, Cairns J (1992) Mechanisms of directed mutation. Genetics 131(4):783–789. 12. Loeb LA, Preston BD (1986) Mutagenesis by apurinic/apyrimidinic sites. Annu Rev 45. Bjedov I, et al. (2003) Stress-induced mutagenesis in bacteria. Science 300(5624): – Genet 20:201 230. 1404–1409. GENETICS 13. Barrick JE, et al. (2014) Identifying structural variation in haploid microbial genomes 46. Jiang D, Hatahet Z, Blaisdell JO, Melamede RJ, Wallace SS (1997) Escherichia coli from short-read resequencing data using breseq. BMC Genomics 15:1039. endonuclease VIII: Cloning, sequencing, and overexpression of the nei structural gene 14. Streisinger G, Owen J (1985) Mechanisms of spontaneous and induced frameshift and characterization of nei and nei nth mutants. J Bacteriol 179(11):3773–3782. – mutation in bacteriophage T4. Genetics 109(4):633 659. 47. Saito Y, et al. (1997) Characterization of endonuclease III (nth) and endonuclease VIII 15. Jinks-Robertson S, Bhagwat AS (2014) Transcription-associated mutagenesis. Annu (nei) mutants of Escherichia coli K-12. J Bacteriol 179(11):3783–3785. Rev Genet 48:341–359. 48. Hazra TK, et al. (2000) Characterization of a novel 8-oxoguanine-DNA glycosylase 16. Ganesan A, Spivak G, Hanawalt PC (2012) Transcription-coupled DNA repair in pro- activity in Escherichia coli and identification of the enzyme as endonuclease VIII. J Biol karyotes. Prog Mol Biol Transl Sci 110:25–40. Chem 275(36):27762–27767. 17. Merrikh H, Zhang Y, Grossman AD, Wang JD (2012) Replication-transcription conflicts 49. Blaisdell JO, Hatahet Z, Wallace SS (1999) A novel role for Escherichia coli endonu- in bacteria. Nat Rev Microbiol 10(7):449–458. clease VIII in prevention of spontaneous G–>T transversions. J Bacteriol 181(20): 18. Chen X, Zhang J (2013) No gene-specific optimization of mutation rate in Escherichia 6396–6402. coli. Mol Biol Evol 30(7):1559–1562. 50. Wang D, Kreutzer DA, Essigmann JM (1998) Mutagenicity and repair of oxidative 19. Fuchs RP, Fujii S (2013) Translesion DNA synthesis and mutagenesis in prokaryotes. DNA damage: Insights from studies using defined lesions. Mutat Res 400(1-2):99–115. Cold Spring Harb Perspect Biol 5(12):a012682. 51. Daley JM, Zakaria C, Ramotar D (2010) The endonuclease IV family of apurinic/ 20. Curti E, McDonald JP, Mead S, Woodgate R (2009) DNA polymerase switching: Effects apyrimidinic endonucleases. Mutat Res 705(3):217–227. on spontaneous mutagenesis in Escherichia coli. Mol Microbiol 71(2):315–331. 52. Cunningham RP, Saporito SM, Spitzer SG, Weiss B (1986) Endonuclease IV (nfo)mutant 21. Kato T, Shinoura Y (1977) Isolation and characterization of mutants of Escherichia of Escherichia coli. J Bacteriol 168(3):1120–1127. coli deficient in induction of mutations by ultraviolet light. Mol Gen Genet 156(2): 53. Wallace SS (1998) Enzymatic processing of radiation-induced free radical damage in 121–131. DNA. Radiat Res 150(5, Suppl):S60–S79. 22. Sargentini NJ, Smith KC (1981) Much of spontaneous mutagenesis in Escherichia coli 54. Cao W (2013) Endonuclease V: An unusual enzyme for repair of DNA deamination. is due to error-prone DNA repair: Implications for spontaneous carcinogenesis. Cell Mol Life Sci 70(17):3145–3156. Carcinogenesis 2(9):863–872. 55. Guo G, Weiss B (1998) Endonuclease V (nfi) mutant of Escherichia coli K-12. J Bacteriol 23. Bhamre S, Gadea BB, Koyama CA, White SJ, Fowler RG (2001) An aerobic recA-, umuC- – dependent pathway of spontaneous base-pair substitution mutagenesis in Escherichia 180(1):46 51. 56. Kouchakdjian M, et al. (1991) NMR structural studies of the ionizing radiation adduct coli. Mutat Res 473(2):229–247. 24. Timms AR, Muriel W, Bridges BA (1999) A UmuD,C-dependent pathway for sponta- 7-hydro-8-oxodeoxyguanosine (8-oxo-7H-dG) opposite deoxyadenosine in a DNA duplex. – neous G:C to C:G transversions in stationary phase Escherichia coli mut Y. Mutat Res 8-Oxo-7H-dG(syn).dA(anti) alignment at lesion site. Biochemistry 30(5):1403 1412. 435(1):77–80. 57. Michaels ML, Miller JH (1992) The GO system protects organisms from the mutagenic 25. Kim SR, Matsui K, Yamada M, Gruz P, Nohmi T (2001) Roles of chromosomal and effect of the spontaneous lesion 8-hydroxyguanine (7,8-dihydro-8-oxoguanine). – episomal dinB genes encoding DNA pol IV in targeted and untargeted mutagenesis J Bacteriol 174(20):6321 6325. in Escherichia coli. Mol Genet Genomics 266(2):207–215. 58. Yanofsky C, Cox EC, Horn V (1966) The unusual mutagenic specificity of an E. Coli – 26. Layton JC, Foster PL (2003) Error-prone DNA polymerase IV is controlled by the stress- mutator gene. Proc Natl Acad Sci USA 55(2):274 281. response sigma factor, RpoS, in Escherichia coli. Mol Microbiol 50(2):549–561. 59. Maki H, Sekiguchi M (1992) MutT protein specifically hydrolyses a potent mutagenic – 27. Foster PL (2000) Adaptive mutation in Escherichia coli. Cold Spring Harb Symp Quant substrate for DNA synthesis. Nature 355(6357):273 275. Biol 65:21–29. 60. Fowler RG, et al. (2003) Interactions among the Escherichia coli mutT, mutM, and 28. McKenzie GJ, Lee PL, Lombardo M-J, Hastings PJ, Rosenberg SM (2001) SOS mutator mutY damage prevention pathways. DNA Repair (Amst) 2(2):159–173. DNA polymerase IV functions in adaptive mutation and not adaptive amplification. 61. de Oliveira AH, da Silva AE, de Oliveira IM, Henriques JA, Agnez-Lima LF (2014) MutY- Mol Cell 7(3):571–579. glycosylase: An overview on mutagenesis and activities beyond the GO system. Mutat 29. Kuban W, et al. (2004) Role of Escherichia coli DNA polymerase IV in in vivo replication Res 769:119–131. fidelity. JBacteriol186(14):4802–4807. 62. Kim M, Huang T, Miller JH (2003) Competition between MutY and mismatch repair at 30. Strauss BS, Roberts R, Francis L, Pouryazdanparast P (2000) Role of the dinB gene A·C mispairs In vivo. J Bacteriol 185(15):4626–4629. product in spontaneous mutation in Escherichia coli with an impaired replicative 63. Zhang QM, Ishikawa N, Nakahara T, Yonei S (1998) Escherichia coli MutY protein has a polymerase. J Bacteriol 182(23):6742–6750. -DNA glycosylase that acts on 7,8-dihydro-8-oxoguanine:guanine mispair to 31. Wolff E, Kim M, Hu K, Yang H, Miller JH (2004) Polymerases leave fingerprints: prevent spontaneous G:C–>C:G transversions. Nucleic Acids Res 26(20):4669–4675. Analysis of the mutational spectrum in Escherichia coli rpoB to assess the role of 64. Michaels ML, Cruz C, Grollman AP, Miller JH (1992) Evidence that MutY and MutM polymerase IV in spontaneous mutation. J Bacteriol 186(9):2900–2905. combine to prevent mutations by an oxidatively damaged form of guanine in DNA. 32. Foster PL (1999) Are adaptive mutations due to a decline in mismatch repair? The Proc Natl Acad Sci USA 89(15):7022–7025. evidence is lacking. Mutat Res 436(2):179–184. 65. Tchou J, et al. (1991) 8-oxoguanine (8-hydroxyguanine) DNA glycosylase and its 33. Banach-Orlowska M, Fijalkowska IJ, Schaaper RM, Jonczyk P (2005) DNA polymerase II substrate specificity. Proc Natl Acad Sci USA 88(11):4690–4694. as a fidelity factor in chromosomal DNA synthesis in Escherichia coli. Mol Microbiol 66. Michaels ML, Tchou J, Grollman AP, Miller JH (1992) A repair system for 8-oxo-7,8- 58(1):61–70. dihydrodeoxyguanine. Biochemistry 31(45):10964–10968.

Foster et al. PNAS Early Edition | 9of10 Downloaded by guest on October 6, 2021 67. Prakash A, Doublié S, Wallace SS (2012) The Fpg/Nei family of DNA glycosylases: 83. Heckman KL, Pease LR (2007) Gene splicing and mutagenesis by PCR-driven overlap Substrates, structures, and search for damage. Prog Mol Biol Transl Sci 110:71–91. extension. Nat Protoc 2(4):924–932. 68. Ono T, Negishi K, Hayatsu H (1995) Spectra of superoxide-induced mutations in the 84. Foster PL (2006) Methods for determining spontaneous mutation rates. Methods lacI gene of a wild-type and a mutM strain of Escherichia coli K-12. Mutat Res 326(2): Enzymol 409:195–213. 175–183. 85. Sarkar S, Ma WT, Sandri GH (1992) On fluctuation analysis: A new, simple and ef- 69. Henikoff S, Henikoff JG (1994) Protein family classification based on searching a ficient method for computing the expected number of mutants. Genetica 85(2): database of blocks. Genomics 19(1):97–107. 173–179. 70. Zar JH (1984) Biostatistical Analysis (Prentice Hall, Englewood Cliffs, NJ) 2nd Ed. 86. Hall BM, Ma CX, Liang P, Singh KK (2009) Fluctuation analysis CalculatOR: A web tool 71. Rice JA (1995) Mathematical Statistics and Data Analysis (Wadsworth Publishing for the determination of mutation rate using Luria-Delbruck fluctuation analysis. Company, Belmont, CA) 2nd Ed. Bioinformatics 25(12):1564–1565. 72. Miller JH (1992) A Short Course in Bacterial Genetics: A Laboratory Manual and 87. Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows-Wheeler Handbook for Escherichia coli and Related Bacteria (Cold Spring Harbor Lab Press, transform. Bioinformatics 25(14):1754–1760. Cold Spring Harbor, NY). 88. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate - a practical and 73. Blattner FR, et al. (1997) The complete genome sequence of Escherichia coli K-12. powerful approach to multiple testing. J R Statist Soc B 57(1):289–300. Science 277(5331):1453–1462. 89. Ming X, et al. (2014) Mapping structurally defined guanine oxidation products along 74. Riley M, et al. (2006) Escherichia coli K-12: A cooperatively developed annotation DNA duplexes: Influence of local sequence context and endogenous cytosine meth- snapshot–2005. Nucleic Acids Res 34(1):1–9. ylation. J Am Chem Soc 136(11):4223–4235. 75. Bachmann BJ (1996) Derivations and genotypes of some mutant derivatives of 90. Margolin Y, Shafirovich V, Geacintov NE, DeMott MS, Dedon PC (2008) DNA se- Escherichia coli K-12. Escherichia coli and Salmonella Cellular and , quencecontextasadeterminantofthequantity and chemistry of guanine oxi- eds Neidhardt FC, et al. (American Society of Microbiology, Washington, D.C.), 2nd Ed. dation produced by hydroxyl radicals and one-electron oxidants. JBiolChem283(51): 76. Slater SC, Lifsics MR, O’Donnell M, Maurer R (1994) holE, the gene coding for the 35569–35578. theta subunit of DNA polymerase III of Escherichia coli: Characterization of a holE 91. Rodriguez H, Valentine MR, Holmquist GP, Akman SA, Termini J (1999) Mapping of mutantandcomparisonwithadnaQ (epsilon-subunit) mutant. J Bacteriol 176(3): peroxyl radical induced damage on genomic DNA. Biochemistry 38(50):16578–16588. 815–821. 92. Margolin Y, Cloutier JF, Shafirovich V, Geacintov NE, Dedon PC (2006) Paradoxical 77. Freddolino PL, Amini S, Tavazoie S (2012) Newly identified genetic variations in hotspots for guanine oxidation by a chemical mediator of inflammation. Nat Chem common Escherichia coli MG1655 stock cultures. J Bacteriol 194(2):303–306. Biol 2(7):365–366. 78. Barker CS, Prüss BM, Matsumura P (2004) Increased motility of Escherichia coli 93. Hatahet Z, Zhou M, Reha-Krantz LJ, Morrical SW, Wallace SS (1998) In search of a by insertion sequence element integration into the regulatory region of the flhD mutational hotspot. Proc Natl Acad Sci USA 95(15):8556–8561. operon. JBacteriol186(22):7529–7537. 94. Sung RJ, Zhang M, Qi Y, Verdine GL (2012) Sequence-dependent structural variation 79. Baba T, et al. (2006) Construction of Escherichia coli K-12 in-frame, single-gene in DNA undergoing intrahelical inspection by the DNA glycosylase MutM. J Biol Chem knockout mutants: The Keio collection. Mol Syst Biol 2:0008. 287(22):18044–18054. 80. Datsenko KA, Wanner BL (2000) One-step inactivation of chromosomal genes in 95. Krahn JM, Beard WA, Miller H, Grollman AP, Wilson SH (2003) Structure of DNA Escherichia coli K-12 using PCR products. Proc Natl Acad Sci USA 97(12):6640–6645. polymerase beta with the mutagenic DNA lesion 8-oxodeoxyguanine reveals struc- 81. Blank K, Hensel M, Gerlach RG (2011) Rapid and highly efficient method for scarless tural insights into its coding potential. Structure 11(1):121–127. mutagenesis within the Salmonella enterica chromosome. PLoS One 6(1):e15763. 96. Yamada M, et al. (2012) Escherichia coli DNA polymerase III is responsible for 82. Sawitzke JA, et al. (2007) Recombineering: In vivo genetic engineering in E. coli, the high level of spontaneous mutations in mutT strains. Mol Microbiol 86(6): S. enterica, and beyond. Methods Enzymol 421:171–199. 1364–1375.

10 of 10 | www.pnas.org/cgi/doi/10.1073/pnas.1512136112 Foster et al. Downloaded by guest on October 6, 2021