Supporting Information

Traverse and Ochman 10.1073/pnas.1525329113 A→C A→C SI Methods where r is the error rate for A→C errors, radj is the base Transcription error rates were calculated by recovering all errors adjusted error rate, and fA is the adjustment coefficient for base in the output file processed by CircSeq_v2. This pipeline is composition, calculated as described in detail (22), but briefly, repeats within each read 0.25 were identified by CircSeq_v2, andalignedtoobtainaconsensus fA = , sequence if a read contained were at least three full repeats of A 100 bp or less. Any read that failed to meet this criterion was A discarded. Because each base within each repeat is assigned a where is the fraction of overall adenosine nucleotides se- quenced in the transcriptome. This calculation normalizes the error different quality score, a single quality score representative of → the consensus sequence at each base was calculated as the av- rate of A C errors by any base compositional biases in the tran- erage quality score from the three bases from each repeat at scriptome. This error rate is presented in the context of the entire each location. Reads are then mapped to their respective ref- transcriptome (i.e., not within the context of all sequenced adeno- erence genome using bowtie2, and errors were identified as sine locations). those bases within reads that did not match the reference To ensure that sequencing errors did not influence our results, we genome. Only bases that had an average quality score of 20 or analyzed the original sequence data to include all bases having an higher (Fig. S4 and below) were used. Overall per base cov- average quality score of 10 and higher and sequentially increased the erage was calculated as the sum of the total coverage of each stringency of the analysis by analyzing nucleotides at different quality base, and overall error rates were calculated by dividing the score cut-offs (Fig. S5). By sequentially increasing the stringency of number of errors by the overall per base coverage. The error the analysis, we determined the influence of sequencing errors at → rate for each type of nucleotide substitution, with A Casan each quality score. Because transcription error rates asymptote in example, was calculated as above except the error rate was the quality score range of 18–20 (Fig. S5), reflecting the point where adjusted for the base composition of the sequenced RNA sequencing errors are removed from the analysis. We selected a such that quality score value of 20 for all analyses, a value that maximizes the numbers of actual errors and provides accurate measures of tran- rA→C = rA→C p f adj A, scription error rates.

Traverse and Ochman www.pnas.org/cgi/content/short/1525329113 1of4 E. coli B. aphidicolaC. ruddii radA Stabilizes ssDNA recA Pairing of homologous DNA strands recB / recC Chi site recognition recD Helicase recF Single strand break repair recJ ssDNA nuclease recN DNA binding recO Single strand break repair recQ Helicase recR Single strand break repair Recombination Repair Recombination ruvA Crossover juntion migration ruvB Crossover juntion migration ruvC Crossover junction nuclease

exoX Exonuclease mutH Endonuclease at Chi site mutL Stabilizes MutS Repair

Mismatch mutS Binds mismatches

alkA DNA glycosylase mug DNA glycosylase mutM DNA glycosylase mutY DNA glycosylase nei DNA glycosylase nfo DNA glycosylase

Repair nth DNA glycosylase

Base-Excision sbcB Exonuclease tag DNA glycosylase ung DNA glycosylase xthA DNA glycosylase

cho 3' Excinuclease mfd Transcription-repair-coupling factor uvrA DNA lesion recognition uvrB DNA lesion recognition Repair uvrC 5' and 3' Excinuclease uvrD Helicase Nucleotide-Excision ybaZ DNA base-flipping protein

ssb Stabilizes single stranded DNA for repair dinB Error-prone DNA polymerase IV umuC Error-prone DNA polymerase subunit umuD Error-prone DNA polymerase subunit DNA Repair

Miscellaneous mutT 8-oxo-dGTP degradation

rpoA RNAP alpha subunit rpoB RNAP beta subunit

RNAP rpoC RNAP beta' subunit rpoZ RNAP omega subunit

greA Transcription error correction greB Transcription error corretion

Fidelity dksA Global gene regulator; Increases transcription fidelity

Transcriptiopn rbn tRNA maturation rna RNase to any dinucleotide pair rnb mRNA degradation rnc rRNA/tRNA Maturation; mRNA Processing rnd tRNA maturation rne rRNA/tRNA maturation; mRNA degradation rng rRNA maturation Okazaki fragment degradation rnhA RNA degradation from RNA:DNA hybrids rnhB

RNA Degradation tRNA maturation rnpA rRNA/tRNA/tmRNA Maturation rnr tRNA turnover rnt rRNA maturation ybeY

Fig. S1. Nucleic acid information processing genes that are present in E. coli compared with their retention or loss in B. aphidicola and C. ruddii. Colored circles indicate retention of the corresponding gene; white circles indicate loss of the corresponding gene from the specified genome.

Traverse and Ochman www.pnas.org/cgi/content/short/1525329113 2of4 Terminus Origin

1.0×10 -4

per 50kb 1.0×10 -5 Error Frequency 0 1,000,000 2,000,000 3,000,000 4,000,000

Genome Position (bp)

Fig. S2. Frequency of transcription errors along the E. coli genome. Shaded rectangles represent transcription error rates of all errors over the eight E. coli samples in nonoverlapping 50-kb windows. Horizontal lines represent the genome-wide mean transcription error rate (black) and 2 SDs from the mean (red). Positions of replication origin and terminus are shown.

150

100

50 r2 = .9349 p < .0001

Number of Errors 0 0 20 40 60

Fold Coverage per 50-kb Window

Fig. S3. Association between numbers of transcription errors and sequence coverage. Error numbers computed for nonoverlapping 50-kb windows across the E. coli genome in all eight samples.

Minimal Media Complex Media Midlog Phase Midlog Phase

Minimal Media Complex Media Stationary Phase Stationary Phase

1.0×10 -4

1.0×10 -5 per Nucleotide Error Frequency

Leading Lagging

Strand Genes Strand Genes

Fig. S4. Transcription error frequencies in E. coli genes transcribed on the leading or lagging strands. Points are color-coded according to growth condition, and horizontal bars represent means of each column. There is no significant difference between the transcription error frequencies for genes encodedonthe two strands (Wilcoxon test, P > 0.90; n = 8).

Traverse and Ochman www.pnas.org/cgi/content/short/1525329113 3of4 A 5.0x10 -4

4.0x10 -4

3.0x10 -4

2.0x10 -4 Error Frequency 1.0x10 -4

0 10 20 30 40

Average Quality Score

B -5 6.0x10 A C C A G A T A A G C G G C T C A T C T G T T G 4.0x10 -5

2.0x10 -5 Error Frequency

0 10 20 30 40

Average Quality Score

Fig. S5. Effect of sequencing errors and data quality on the estimation of transcription error frequencies. Transcription error frequencies for the combined E. coli replicates were calculated at increasing average base quality scores between 10 and 40 to demonstrate the effect of sequencing errors and low quality bases on error frequencies. Overall transcription error frequency (A) and the transcription error frequency for each nucleotide substitution (B) level off in the quality-score range of 18–20, indicating that use of data in this range and beyond exclude sequencing artifacts from estimates of transcription error rates. There were insufficient bases in the transcriptome that attained average quality scores >38 for inclusion in this analysis.

Traverse and Ochman www.pnas.org/cgi/content/short/1525329113 4of4