Detection of chromosomal rearrangements 29 March 2011. 1.

Supporting Online Material for:

Genome-wide detection of chromosomal rearrangements, indels and mutations in circular by short read sequencing.

Ole Skovgaard1,3, Mads Bak2, Anders Løbner-Olesen1, Niels Tommerup2

1 Dept. Of Science, Systems and Models, Roskilde University

2 Wilhelm Johannsen Centre for Functional Genome Research, Department of Cellular and

Molecular Medicine, University of Copenhagen, Denmark.

3 Corresponding author, Email: [email protected]

This PDF file includes:

Supplemental Table S1. Read alignment statistics

Supplemental Table S2. Detected mutations

Supplemental Table S3. PCR primers

Detection of chromosomal rearrangements 29 March 2011. 2.

Supplemental Table S1. Read alignment statistics hsm-1 hsm-2 hsm-3 hsm-4 hsm-5 hsm-6 hsm-7 hsm-8 MG1655 MG1655 ALO1917 ALO2415 ALO2416 ALO2417 ALO2418 ALO2419 ALO2420 ALO2421 Growth status stationary exponential stationary stationary stationary stationary exponential exponential exponential exponential Doubling time τ na 34 na na na na 50 45 47 43 (min.) Total reads 12085372 13187790 5845568 6903081 5973710 7490823 6360766 8793402 6764655 5567779 Perfectly 6897112 6938427 4274678 4910682 4058162 3661412 3419797 4035418 3268120 3468234 matching readsa Low quality 2346095 3256605 533700 943716 962986 3090791 2028364 4589439 2599609 493038 reads b Single mutation 448944 457668 153551 149664 139841 80456 109761 22390 102729 219301 readsc Double mutation 280658 271909 79613 73197 71884 38571 57803 4776 54252 130191 readsc InDel readsd 382095 357827 118609 111219 95096 54303 81781 8898 67249 230436 Orphan readse 1730468 1905354 685417 714603 645741 565290 663260 132481 672696 1026579

Detection of chromosomal rearrangements 29 March 2011. 3.

a All reads were 35 nucleotides long and were aligned with the reference sequence of MG1655. Reads that matched perfectly were then subtracted from the pool of reads.

b The remaining reads were quality filtered with the rule that the first 30 nucleotides should be called with an average quality of at least 35. We used the Illumina raw quality, which is phred-like value; the maximum value for one nucleotide is 40.

c The remaining reads were aligned with the reference sequence allowing for one or two misfits. At least one of the misfits must have a quality score of 35 or is discarded. This reduces general noise and calls of mutations at difficult to sequence positions.

d The remaining reads were aligned so that 15 or more nucleotides at either end match the reference sequence perfectly. These reads mark putative insertions, deletions or chromosomal break points. e Orphan reads failed any of the above attempts to be aligned. Detection of chromosomal rearrangements 29 March 2011. 4.

Supplemental Table S2. Detected mutations Position a Strains affected Nucleotide Protein

299.443 b hsm-2 - hsm-8 yagR C -> T point mutation Gly  Asp mutation. 547.694 All ylbE pseudogene A -> G point mutation Silent mutation (Leu  Leu) 547.832 All ylbE pseudogene Insertion of one extra G in a “GGGG” run. Frame shift

h 712.036 hsm-1 ybfF-seqA intergenic Deletion of 2 T’s in a 7 T run 1.397.470 c All fnr Duplication of 18 bp between two 5-mer Duplication of amino acids CTGGC repeats 21-26 of FNR 1.976.527 – 1.977.303 d All insAB Deletion of an IS1 element. 2.616.100 – 2.616.850 hsm-1 - hsm-8 hda::cat insertion. Gene cat (chloramphenicol acetylase) is Intended deletion of hda used for hda knock-out. 2.658.123 hsm-5 iscU C -> A point mutation Cys Phe mutation. 2796.315 hsm-3 stpA TCAGCTTTCA 10 bp duplication between Frame shift two TCAGCT repeats ~ 3.129.000 - ~ 3.650.000 hsm-7 Inversion between two IS5 elements

3.360.045 – 3.360.049 e wt gltF – yhcA intergenic Presumably insertion of an IS5 element. 3.880.708 f hsm-2 dnaA A -> C point mutation h Phe  Val mutation. 3.881.400 – 3.881.474 g hsm-4 dnaA Deletion of 75 bp h in frame deletion of 25 amino acids 3.957.957 All Upstream ppiC C -> T point mutation. 4025.046 – 4025.438 hsm-6 fre Deletion Deletion of the C-terminal part of the flavin reductase ~4.034.000 – ~ 4.207.000 hsm-8 rrnA  rrnE Duplication Detection of chromosomal rearrangements 29 March 2011. 5.

a With reference to the first strand in the reference sequence. b The yagR mutation at position 299.443 is common to hsm-2 to hsm-8 only and must have arisen in the host strain before isolation of these hsm mutations since hsm-2 – hsm-8 were isolated during an experiment separate from the isolation of hsm-1. c The 18 bp duplication in the fnr gene at base 1.397.493 causes an in frame duplication of the amino acids number 21 through 26 near the amino-terminal end of the FNR protein. The transcription factor FNR senses O2 and controls the switch between aerobic and anaerobic (Green and Paget 2004). 2+ 2+ The duplicated amino acids include Cys-23, important for ligating a [4Fe-4S] cluster and Ser-24, important for sensing O2 by converting this [4Fe-4S] cluster to [2Fe-2S]2+ (Jervis et al. 2009). This duplication is not related to the deletion in the fnr gene observed in some isolates of MG1655 from the Coli Genetic Stock Center (Soupene et al. 2003) and FNR is functional with this duplication since our isolate of MG1655 grows anaerobically (data not shown).

d Deletion of an IS1 element upstream of flhD. One copy of the flanking duplicated sequence CATAAATG is present. This deletion was also reported by Barker et al. (2004).

e A sequence beginning with GGAAGGTGCGAATAAGCGGG and ending with TGTTCGCACCTTC is inserted while duplicating the sequence CTTAA . The beginning and ending sequences match the ends of the IS5 element. This insertion is unique in MG1655 and must have arisen during propagation of this culture.

f In hsm-2 we detected the point mutation in the dnaA gene reported earlier (Riber et al. 2006)

g In hsm-4 we detected a 75 bp deletion in the variable region of the dnaA gene. This deletion occurs between two 14 bp GC-rich repeats with one misfit and results in an in frame deletion of 25 amino acids of the DnaA protein. This same deletion was independently identified as a suppressor of defects caused by deleting the seqA gene in a recent study (Molt et al. 2009). This region resides in the variable region of the DnaA protein believed to constitute a hinge between domain I and domain III-IV of the protein and several insertions and deletions in this region having little or no effect on the activity of the DnaA protein have been characterized previously (Messer et al. 1999; Nozaki and Ogawa 2008).

h The hsm-1, hsm-2 and hsm-4 mutants were characterized further by Charbon et al. (2011). Detection of chromosomal rearrangements 29 March 2011. 6.

Supplemental Table S3. PCR primers Name a Position b Strand Sequence yghO 3128060 - 3128081 + CAGACAAATGCTCGTTGCGTTC yghQ 3129642 - 3129663 - CTCGTGCAGCACAGTGTTGGTG yhiS 3649963 - 3649984 + ATATGACAGGACAGAACCTGTC slp 3651902 - 3651924 - CTGATTCTGAGGTGATATGCCTG A-fw 4033044 - 4033069 + GATCAAGCTGATTATGAAGATGTCAG

A-bw 4035148 - 4035171 - ACAAGCCTGTAGAGGTTTTACTGC E-fw 4205755 - 4205778 + CATTGATGATTGTGGATAACTCTGT E-bw 4207761 - 4207784 - TTCGCAAGACGCCTTGCTTTTCAC

a Names as shown in Figure 2 and Figure 3. b Position relative to accession no. U00096

Detection of chromosomal rearrangements 29 March 2011. 7.

References

Barker CS, Pruss BM, Matsumura P. 2004. Increased motility of Escherichia coli by insertion sequence

element integration into the regulatory region of the flhD operon. J Bacteriol 186: 7529-7537.

Charbon G, Riber L, Cohen M, Skovgaard O, Fujimitsu K, Katayama T, Lobner-Olesen A. 2011.

Suppressors of DnaA(ATP) imposed overinitiation in Escherichia coli. Mol Microbiol 79: 914-928.

Green J, Paget MS. 2004. Bacterial redox sensors. Nat Rev Microbiol 2: 954-966.

Jervis AJ, Crack JC, White G, Artymiuk PJ, Cheesman MR, Thomson AJ, Le Brun NE, Green J. 2009. The

O2 sensitivity of the transcription factor FNR is controlled by Ser24 modulating the kinetics of

[4Fe-4S] to [2Fe-2S] conversion. Proc Natl Acad Sci U S A 106: 4659-4664.

Messer W, Blaesing F, Majka J, Nardmann J, Schaper S, Schmidt A, Seitz H, Speck C, Tungler D,

Wegrzyn G, Weigel C, Welzeck M, Zakrzewska-Czerwinska J. 1999. Functional domains of DnaA

proteins. Biochimie 81: 819-825.

Molt KL, Sutera VA, Jr., Moore KK, Lovett ST. 2009. A role for nonessential domain II of initiator

protein, DnaA, in replication control. Genetics 183: 39-49.

Nozaki S, Ogawa T. 2008. Determination of the minimum domain II size of Escherichia coli DnaA

protein essential for cell viability. Microbiology 154: 3379-3384.

Riber L, Olsson JA, Jensen RB, Skovgaard O, Dasgupta S, Marinus MG, Løbner-Olesen A. 2006. Hda-

mediated inactivation of the DnaA protein and dnaA gene autoregulation act in concert to

ensure homeostatic maintenance of the Escherichia coli . Dev 20: 2121-

2134.

Soupene E, van Heeswijk WC, Plumbridge J, Stewart V, Bertenthal D, Lee H, Prasad G, Paliy O,

Charernnoppakul P, Kustu S. 2003. Physiological studies of Escherichia coli strain MG1655:

growth defects and apparent cross-regulation of . J Bacteriol 185: 5611-5626.