Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1947

Bacterial DNA repair and molecular search

ARVID H. GYNNÅ

ACTA UNIVERSITATIS UPSALIENSIS ISSN 1651-6214 ISBN 978-91-513-0966-8 UPPSALA urn:nbn:se:uu:diva-410039 2020 Dissertation presented at Uppsala University to be publicly examined in A1:111a, BMC, Husargatan 3, Uppsala, Friday, 28 August 2020 at 09:00 for the degree of Doctor of Philosophy. The examination will be conducted in English. Faculty examiner: Professor Stephen Kowalczykowski (University of California, Davis).

Abstract Gynnå, A. H. 2020. Bacterial DNA repair and molecular search. Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1947. 77 pp. Uppsala: Acta Universitatis Upsaliensis. ISBN 978-91-513-0966-8.

Surveillance and repair of DNA damage is necessary in all kinds of life. Different types of DNA damage require different repair mechanisms, but these mechanisms are often similar in all domains of life. The most serious type of damage, double stranded DNA breaks, are for example repaired in conceptually similar ways in both bacteria and eukaryotes. When this kind of breaks are repaired by , a homology to the site of the break must be found. Sometimes, this homology can be located far away from the break necessitating a search. Considering the large amount of heterologous DNA present, the complexity of this search is enormous. If and how this search can proceed has been unclear even in simple and well characterized organisms as E. coli. In this thesis, microscopy together with are used to show that DNA repair by homologous recombination occurs even between homologies separated by several micrometers. We also see that it finishes well within the time of a cell generation, with the enigmatic search phase being as quick as eight or possibly even three minutes. Since this time is much faster than expected, we present a physical model demonstrating how homology search this time scale is indeed plausible. Based on these results, we conclude that homologous repair using distantly located templates is likely to be a physiologically relevant mechanism of DNA repair. Microscopy together with image analysis by deep learning also provides a new method of detecting DNA damage in real time. Combined with tracking of cell lineages, it reveals that DNA damage in E. coli is repaired efficiently enough that the resulting fitness cost is close to none. With the same methods we also study the effect of deletions of several DNA repair enzymes, and largely confirms their previous characterizations. Among these, we confirm that the intriguing RecN protein is important but not absolutely necessary in DSB repair, that it acts early, and possibly aids in physically shaping the structure mediating the search. In addition to this, it is shown how DNA transcription and translation modulates the shape of the E. coli nucleoid. We observe how strong a transcription of a gene within a few minutes moves the gene towards the periphery of the cell where the concentration of ribosomes is higher, a movement possibly also aided by protein translation. We also present MINFLUX, a microscope for both nanometer scale localization of single fluorophores as well as in vivo single particle tracking with unprecedented trace length and resolution. Using this, the E. coli small ribosomal subunit could be observed to quickly shift between fast and slow diffusion states which might represent probing and discarding of RNAs suitable for translation.

Keywords: DNA repair, DNA damage, homologous recombination, homologous repair, recombination, RecA, I-SceI

Arvid H. Gynnå, Department of Cell and Molecular Biology, Molecular Systems Biology, Box 596, Uppsala University, SE-751 24 Uppsala, Sweden.

© Arvid H. Gynnå 2020

ISSN 1651-6214 ISBN 978-91-513-0966-8 urn:nbn:se:uu:diva-410039 (http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-410039) Ett hisnande faktum är att det till stora delar är samma DNA-molekyl som ger upphov till olika organismer, vare sig de är bakterier, växter, djur eller människor.

Nationalencyklopedin, 1991

List of Papers

This thesis is based on the following papers, which are referred to in the text by their Roman numerals.

I Arvid H. Gynnå*, Jakub Wiktor*, Prune Leroy, Johan Elf. (2020) RecA mediated homology search finds segregated sister locus in minutes after a double stranded break. bioRxiv 2020.02.13.946996; Manuscript. II Sora Yang*, Seunghyeon Kim*, Dong-Kyun Kim, Hyeong Jeon An, Jung Bae Son, Arvid H. Gynnå, and Nam Ki Lee. (2019) Transcription and Translation Contribute to Gene Locus Reloca- tion to the Nucleoid Periphery in E. Coli. Nature Communica- tions 10 (1): 1–12. III Francisco Balzarotti*, Yvan Eilers*, Klaus C. Gwosch*, Arvid H. Gynnå, Volker Westphal, Fernando D. Stefani, Johan Elf, and Stefan W. Hell. (2017). Nanometer Resolution Imaging and Tracking of Fluorescent Molecules with Minimal Photon Fluxes. Science 355 (6325): 606–12.

Reprints were made with permission from the respective publishers.

Other works not included in the thesis:

iv A. Sanamrad, F. Persson, E. G. Lundius, D. Fange, A. H. Gynnå and J. Elf. (2014). Single-Particle Tracking Reveals That Free Ribosomal Subunits Are Not Excluded from the Escherichia Coli Nucleoid. PNAS 111 (31): 11413–18. v J. Liljeruhm, ..., A. C. Forster. (2018) Engineering a Palette of Eukaryotic Chromoproteins for Bacterial Synthetic Biology. Journal of Biological Engineering 12 (May): 8. vi C. Vogel, A. H. Gynnå, J. Yuan, L. Bao, J. Liljeruhm, A. C. Forster. (2020) Rationally designed Spot 42 RNAs with an inhi- bition/toxicity profile advantageous for engineering E. coli. En- gineering Reports. 2020;e12126. vii M. Stårsta, D. L. Hammarlöf, M. Wäneskog, S. Schlegel, F. Xu, A. H. Gynnå, M. Borg, S. Herschend, and S. Koskiniemi. (2020). RHS-Elements Function as Type II Toxin-Antitoxin Modules That Regulate Intra-Macrophage Replication of Salmo- nella Typhimurium. PLoS Genetics 16 (2): e1008607.

Contents

Introduction ...... 11 Types of DNA damage and repair ...... 13 Causes and types of double stranded breaks ...... 14 Repair of double stranded breaks ...... 16 Homologous recombination repair ...... 17 The actors of homologous recombination repair ...... 19 Endonuclease RecBCD ...... 19 Recombinase RecA ...... 23 Double stranded breaks and molecular search ...... 28 Creating double stranded breaks ...... 28 Detection of DSBs ...... 29 Inducing double stranded breaks in microfluidics...... 31 RecA activity after a double stranded break...... 35 Superresolution imaging ...... 37 Double stranded breaks and the cell cycle ...... 40 Detecting DNA damage by a neural net classifier ...... 41 Effects of spontaneous DNA damage ...... 42 Early and late participants in homologous repair ...... 44 Dynamics of chromosomal loci ...... 52 Active single particle tracking ...... 54 Tracking results ...... 55 Discussion ...... 57 Search time of homologous recombination ...... 59 How long is the search time actually? ...... 60 After homology search ...... 61 The penalty of DNA damage...... 62 Conclusion and outlook ...... 63 Sammanfattning på svenska ...... 65 Acknowledgements ...... 67 References ...... 69

Abbreviations

ATC Anhydrotetracycline, a tetracycline analogue BER Base excision repair DNA Deoxyribonucleic acid DSB Double stranded break DSE Double stranded end dsDNA Double stranded DNA FWHM Full width at half maximum HR Homologous recombination HRR Homologous recombination repair HJ Holliday junction IPTG Isopropyl β-d-1-thiogalactopyranoside, a lactose analogue NER Nucleotide excision repair NHEJ Non-homologous end joining PSF Point spread function RESOLFT Reversible saturable optical fluorescence transitions RNA Ribonucleic acid ssDNA Single stranded DNA SD Standard deviation

Introduction

Having a genome is necessary for life as we know it. However, having a ge- nome also comes with several practical problems. The DNA must be stored in a compact form to fit in the cell, be available for retrieving genetic infor- mation, be copied to supply genomes to offspring and finally be partitioned to the offspring. All these problems have been solved in organisms from bacteria to vertebrates, and often in similar ways. The similarities have arisen either through convergent evolution or more often through homologies, often shared between all domains of life. The relay of a functional genome to one’s progeny is dependent of either protection of the DNA against damage, or correct repair if such damage oc- curs. Protection against damage may appear as the most straightforward strat- egy, but DNA damage is to some degree inevitable. Ionizing cosmic rays pen- etrate every living thing in the atmosphere and reactive oxygen species are unavoidable in aerobic conditions. Other damage is caused by the cell itself, by reactive metabolic by-products, deficiencies in DNA replication or in eu- karyotes even on purpose to promote recombination. These forces give rise to a variety of DNA lesions, ranging from minor chemical modifications at single bases to simultaneous breaks in both DNA backbones. What these have in common is that they prevent correct replication of their chromosome, and thus in the case of most single-cell organisms, rep- lication of the entire cell. Since DNA damage is inevitable, virtually all organ- isms have repair mechanisms and usually a variety of those adapted to the different things that could go wrong with the chemical substance DNA. This thesis investigates the most severe of these types of damage: double stranded breaks, or DSBs. A DSB is when covalent bonds in both backbones of a double stranded DNA are broken opposite each other or only a few base pairs apart. DSBs are uniquely toxic to a cell, as they do not only prevent the passage of RNA and DNA polymerases, preventing transcription and replica- tion respectively, but also disconnects the broken ends from each other in space. During repair, these ends must be reconnected with each other. If the ends are connected incorrectly, the result is a genome rearrangement which may have large effects on gene expression and chromosome segregation. This means that the repair of a DSB requires not only a sequence of chem- ical reactions at the damage site, but also involves a search throughout the space of a cell. Actually, for the type of repair studied here, homologous re- combination repair, also a third party which is an undamaged copy of the same

11 locus must be involved. This makes the search uniquely challenging, and has prompted disbelief if such search is even possible [1,2]. By combining microscopy with microfluidics, novel genomic constructs and automatic image analysis, I hope to shed some light on the timeline and participants in the most difficult type of DNA repair. I have also investigated the consequences of DNA damage on a larger scale to find out how DNA damage affects cells in the long run. Hopefully, the conclusion drawn here from the E. coli bacterium will be applicable to a wider set or organism, or at least inspire further research on related repair mechanisms in other branches of life. First however, we will take a at what DNA damage is and what mechanisms there are to handle it.

12 Types of DNA damage and repair

Any type of chemical modification that prevents replication of DNA can be considered damage, but how onerous the repair is depends on the type of mod- ification. Some examples of DNA damage and how they can be repaired are shown in Figure 1. For example, a DNA alkylation, the addition of a methyl or ethyl group to a nucleotide, can be repaired by simply removal of the added group. Other types of damage require several repair steps performed by sev- eral enzymes. Modifications to single bases can be repaired through BER, base excision repair, by removal of the base, cleavage of the phosphodiester backbone, fill-in DNA synthesis and ligation. A larger variety of damage, for example damage involving several nucleotides, can be repaired by NER, nu- cleotide excision repair. Here a stretch of a dozen nucleotides on one strand are removed and resynthesized using the complementary strand as a template [3,4].

Figure 1. Examples DNA damage types and how they can be repaired. Base modifi- cations include a large variety of oxidations, alkylations and deamination. Multi- base modifications are here illustrated as a pyrimidine dimer, but could also be sev- eral consecutive modified bases. Single-stranded gaps varies in length and may be up to a kilobase. DSBs can be either opposite or staggered, which are both possible substrates for NHEJ and HR repair.

If these methods fail to repair a lesion before the replication fork arrives, this will still not be the end of the cell lineage. Instead the DNA polymerase ceases replication at the lesion and reinitiates afterwards, leaving a gap of a few hun- dred nucleotides in the newly synthesized complementary strand. This gap can be bridged by the RecFOR pathway, which integrates the complementary strand from the correctly replicated sister chromatid in the gap instead. The missing strand in the sister can then be synthesized from the undamaged tem- plate [5].

13 The most drastic type of damage is however a double stranded break, or a DSB. This type of lesion can itself be subdivided into several types which pose different problems for repair.

Causes and types of double stranded breaks DSBs may be caused by many factors, including ionizing radiation, chemicals or spontaneous endogenous damage. In the case of radiation, the damage can be a direct physical effect, if a particle hits the DNA backbone or free radicals formed through water molecules break a backbone bond. In these cases, both strands must be separately affected within a short distance quicker than the single strand repair mechanisms can act. Since one single particle often cause several ionizations in close vicinity (known as clusters, or locally multiply damage sites), DSBs are a relatively common outcome [6,7]. Alternatively, and more commonly, ionizing radiation and UV light can cause a DSB indi- rectly. If radiation causes two closely located base oxidations (closely, but at least one base pair apart), BER repair may cause a DSB enzymatically when DNA glycosylases cleave the backbone at the site of the damage [8–10]. However, the main cause of DSBs is DNA replication itself. When the DNA polymerase encounters a single stranded gap or another single-stranded DNA lesion, it may run out the broken strand or stall. If the polymerase runs out the broken strand, a single-ended DSB is created which needs to be reat- tached to the parent chromosome (Figure 2). If the DNA replication complex instead stalls and backtracks a slight bit, the newly synthesized leading and lagging strands can anneal into a helix of the two new strands (as opposed to the normal semi-conservatively replicated dsDNA). The four-way intersection between the ancestral dsDNA, the two semiconservative dsDNA helices and the short entirely new helix is a Holliday junction which can be cleaved by specific endonucleases, also generating a single-ended DSB [11–13]. Spontaneous DSBs is a common occurrence in life, but estimates of their frequency in E. coli varies. Since the majority of DSBs are created at the rep- lication fork, the number of DSBs per cell and generation is proportional to the number of forks. In other words, fast growing cells with several concurrent rounds of ongoing replication will have more spontaneous DSBs than slower growing cells. Initial estimates put the number of DSBs at at least 15 % per cell and generation in slow-growing cells. This was based on the proportion of cells having to resolve dimeric chromosomes, i.e. when two chromosomes are covalently linked through their DNA backbones. Chromosome dimers are believed to result mainly from HR repair, but they may not form after each repair as there are several pathways for the chromosomal crossover interme- diate to be resolved [14,15]. Later, a spontaneous DSB frequency of 18 % was found by studying inheritance of incomplete chromosomes that occur as a re- sult of DSBs at the replication fork [16]. Contrary to this, two other methods

14 of estimating the frequency have found significantly lower numbers. Coupling GFP expression to SOS induction and measuring GFP by flow cytometry has given a frequency of circa 1 % [17]. This order of magnitude has been rein- forced by imaging of the phage protein Gam fused to GFP, which binds to DSB ends and forms fluorescent foci. Here, DSBs were calculated to occur in 2 % of slow-growing cells [18]. It is not clear how these numbers could be reconciled. If the former is ac- curate, then we have to assume that the SOS induction (as measured from the sulA promoter) can only be observed after a fraction of the breaks, and like- wise that Gam binding to DSB ends is also a rare event. If the latter number is more accurate, one explanation could be that that the absence of HR related enzymes not only inhibits DSB repair but also other functions necessary for successful DNA replication and segregation. Alternatively, DSBs might be caused more often in the mutants for some unknown reason.

Figure 2. Classification of DSBs, with conceptual views of the break in the chromo- some, and of the chromosome in the cell.

As shown in Figure 2, DSBs can be divided into two types. Those caused by replication runout of a nicked template are single-ended DSBs, also known as double-stranded ends (DSEs). Here there is no corresponding opposite end to join the DSE to. When a DSB is caused at another location, by cleavage of both DNA strands, it will be a double-ended DSB. This kind of DSB can be repaired by joining the two ends. Another way to classify DSBs is dependent on if another copy of the same locus exists and whether it is located nearby. We will later see why this is important. In the case of a single-ended DSB, an identical locus is always in the vicinity. When a double-end DSB occurs, the loci may not yet have segregated after replication meaning that another copy will be closely located. If the DSB happens after segregation of the locus, however, the replicated copy could be anywhere in the cell. In slow growing or dormant cells, a part or all of the chromosome is unreplicated and then no other copy will be present.

15 Repair of double stranded breaks In opposite to other types of DNA damage, repair of a DSB cannot be per- formed as only a local activity at a broken site, but must act throughout space to bring disassociated ends together. Despite this challenge, the majority of DSBs are successfully repaired, if we define successful as allowing the cell to continue DNA replication and subsequent cell divisions [19]. Life has several different methods to repair DSBs. In the case of a two- ended DSB, the conceptually simplest method is to ligate the ends together again. This process is called non-homologous end joining, NHEJ, and can be performed by several enzymatic pathways. If the DSB is clean, that is cleaved at the phosphodiester bond of the DNA backbone and not having chemical modifications at the end nucleotides, NHEJ mechanisms can join the broken ends correctly. However, frequently this is not the case and then NHEJ will either join damaged ends or use enzymes to remodel the ends until they can be ligated. This will often induce mutations, deletions or insertions [20]. Eukaryotes frequently employs NHEJ, especially in cell cycle phases where the DNA is unreplicated and only present in a single copy [21]. Here, there are multiple pathways available: classic NHEJ (c-NHEJ) and several al- ternative end joining pathways (a-EJ, or a-NHEJ). In c-NHEJ, the most central component is the protein Ku which binds the DSB end. Ku then recruits lig- ases, and if necessary nucleases and polymerases, to edit the ends until they can be ligated together. The ligation is often guided by a short homology of less than 5 nucleotides, although blunt end ligation can also occur [22]. The less common a-EJ responses are employed in cases when c-NHEJ cannot li- gate the ends and all involve longer resection of the ends to find a homology to join the ends by. A-EJ thus always results in a deletion [23]. In bacteria, NHEJ is less common. In fact, this mechanism was unknown until the era of genome sequencing, when homologs to protein Ku were found in bacteria [24,25]. Such homologues are now identified in almost a quarter of the sequenced bacterial genomes [26]. Which other components than Ku that are involved in the NHEJ pathway varies widely between bacteria, but LigD is typically present as the main ligase [27]. NHEJ is more often present in bacterial species which sporulate or otherwise have an extended dormant stage where the cell has only a single chromosome. When no intact copy of the chromosome is present, NHEJ is the only method that can repair a DSB. Accordingly, stationary phase cells and spores, which typically are haploid, have been found to be much more sensitive to ionizing radiation when NHEJ genes are mutated [28,29]. Neither Ku or LigD have not been found in E. coli. Because of this, the popular model species was long thought to be devoid of NHEJ mechanisms [30]. However, an a-EJ-like pathway that repairs breaks via microhomologies has now been reported [31]. Instead of Ku and LigD, E. coli a-EJ uses Rec-

16 BCD for end resectioning and depends on LigA, an essential ligase which oth- erwise joins Okazaki fragments during DNA replication, for ligation. E. coli a-EJ was found to ligate both compatible and non-compatible ends, but in 90 % of cases involves deletions which often span several kilobases.

Homologous recombination repair The other category of repair mechanisms is homologous recombination repair, or HR repair, which will be the main theme of this thesis. HR repair is con- ceptually more complex than NHEJ but generally repairs breaks correctly. Like NHEJ, HR repair is widespread and well conserved throughout bacteria, archaea and eukaryotes. HR repair can be used for both single-ended and two- ended DSBs. However, as the name implies, HR repair requires a homology to the break site. i.e. another copy of the chromosome, or at least of the broken locus. In HR repair, the ends of the break are first bound by a protein complex that recognizes a loose double-stranded DNA end. In E. coli and many other gram-negatives this is called RecBCD, while in Bacillus subtilis and a wide range of other bacteria the structurally different but functionally similar AddAB enzyme is found. In eukaryotes the break is first recognized by the MRN (in humans) or MRX (in yeast) complex which recruit nucleases [32,33]. Here, E. coli terminology will be used although the process is con- ceptually similar between species. In short, HR repair of a DSB proceeds as shown in Figure 3. After binding, the RecBCD complex resects the DSB ends and creates a 3’ single stranded DNA end, which is loaded with many copies of the protein RecA. Bound to the ssDNA end, RecA will now probe dsDNA sequences in the other binding site. RecA can bind both a single DNA strand and a double stranded DNA helix simultaneously. If the sequences match, RecA will perform a strand exchange. That is exchanging the two identical strands so that the previous ssDNA is now base paired with its reverse com- plement while the previous dsDNA strand identical to the original ssDNA is now single stranded. Now, the broken end is base paired with a continuous chromosome and a DNA polymerase can use the reverse complement as a template to replace the resected part. In the case of a two-ended DSB, the strand exchange and DNA polymerization are done with both ends until the polymerase meets the opposite 5’ end [32,34]. A ligase can then join the se- quences and restore the backbone continuity. At this stage however, the two chromosomes are now in heteroduplexes with each other and cannot segre- gate. The two four-way DNA structure where the strands cross over is called a Holliday junction, and to separate the chromosomes the junctions must be resolved by nicking two strands and religating them with each other. The cut can be “vertical” or “horizontal”, referencing the way they are drawn in Figure 3. These cuts, in E. coli performed by RuvC, can lead to a crossover between

17 the two chromosomes [35]. In circular chromosomes, a single crossover re- sults a chromosome dimer, i.e. two complete chromosomes covalently joined into a large circle. This is resolved at the end of the cell cycle, just before septum formation when the dimeric chromosome is again crossed over by the protein complex XerCD. In eukaryotes, there is also a closely related synthe- sis-dependent strand annealing (SDSA) pathway for two-ended DSBs, which is left out here but is instead summarized in the Discussion.

Figure 3. Steps of HR repair A. A double-ended DSB. RecBCD processes both ends, which are bound by RecA. The ends find and invade corresponding locus on the sis- ter chromatid. DNA polymerisation and ligation produces a correct repaired se- quence before the chromatids are separated again. B. A single-ended DSB occurs at the replication fork. HR repair is used to reattach the lost strand to the parental chromosome and restart replication. Here the DSB is created by an SSB in the lead- ing strand, but a stalled replication fork can also lead to a similar situation through replication fork reversal and cleavage of the resulting Holliday junction.

18 A single-ended DSB is processed similarly, but the here the goal is instead only to reattach the loose DSB end to the sister chromatid. After that, a reac- tion cascade started by protein PriA allows reassembly of the replisome and continuation of DNA replication. The actions of the individual enzymes will be described in more detail be- low. One step that was skimmed over here however, was the probing of dsDNA before strand exchange. This step is not trivial, instead it is a complex search where the RecA-bound ssDNA must probe every position of the sister chromosome before the single correct position is found. How this complex process can finish in a reasonable time has been a mystery for decades. Despite this, homologous recombination works and resolves spontaneous DSBs in many E. coli cells during every generation, and several times per generation in eukaryotes. Actually, eukaryotes create DSBs on purpose during meiosis and depend on HR to recombine the chromosomes afterwards. Eukaryotes use this to achieve genetic variation between their descendants, which is prereq- uisite for evolutionary selection of several traits simultaneously. It is thus clear that life puts a lot of faith in efficient homologous recombination

The actors of homologous recombination repair HR repair in E. coli involves at least half a dozen of proteins and protein com- plexes, the most important of which will be described already here: RecBCD and RecA. Other bacteria generally have direct homologues of these, while the situation in eukaryotes can involve a much larger number of enzymes.

Endonuclease RecBCD E. coli RecBCD, previously also known as endonuclease V, is a large protein complex with very potent nuclease and activities. RecBCD is ex- tremely processive and can degrade tens of kilobases of dsDNA while displac- ing any DNA-bound proteins in its way. Surprisingly, it is responsible for both efficient destruction of invasive foreign DNA and accurate repair of the en- dogenous DNA. To combine these two roles, it is controlled by sequence spec- ificity. The components are expressed from two different promoters, RecC, a protein of 1120 amino acids, from one and RecBD, 1180 and 608 amino acids, from another. Despite that RecBCD is only present in about 10 copies per cell, inactivation of either of the RecBC genes reduces cell viability by three quar- ters even without exogenous DNA damage [36–38]. Bioinformatics and structure determination have shown that all three sub- units have helicase homology. RecB also has a C-terminal nuclease domain [32,39–41]. In agreement with this, in vitro experiments have demonstrated that RecB has is a ATP-dependent DNA helicase, with a preference for ssDNA [37,42]. RecB and RecC together degrades both linear ssDNA and

19 dsDNA in the presence of ATP. RecD, which was discovered later, has a weak ATPase activity in the presence of ssDNA [43]. When combined into the en- tire RecBCD complex, the DNA degradation remains, but the ATPase activity gains a dsDNA preference and the helicase activity is much more potent [44]. RecBCD binds blunt double-stranded ends with high affinity (Kd ≈ 3 nM), and even tighter when there is a short overhang on either strand [45]. As indicated by , the primary activity of RecBCD is to bind dsDNA ends, unwind and degrade them with high processivity. How can then this enzyme be useful in DNA repair? That is, because this destructive first activity, is changed for a different mode of activity after recognition of a spe- cific DNA sequence named chi. After chi, the second mode of the enzyme instead only degrades the 5’ strand and leaves a 3’ ssDNA end which is used for HR as we saw above. The chi1 sequence was first discovered as a sequence that rescued recombination-deficient phage lambda by allowing RecBCD to step in instead [46,47]. In E. coli, chi is traditionally given as the eight-nucle- otide sequence 5’-GCTGGTGG-3’, but this is not an absolute recognition se- quence for RecBCD. Chi is only recognized by RecBCD in 20-40 % of pas- sages [48], and single point mutations of chi lower but does not abolish recog- nition [49,50]. Recently it was found that additional nucleotides in the 3’ di- rection outside of the traditional chi strongly affects recognition [51]. Based on this, the optimal chi site might be 5’-TTGCTGGTGGCCNAAAA-3’, or a very similar sequence.

Figure 4. Structure of RecBCD unwinding DNA, . A. Front view. dsDNA entering from the right hand side. B. Back view, rotated around the horizontal axis from A. a) Helicase domain of RecB. b) Nuclease domain of RecB. C. Front view cartoon showing dsDNA unwinding and chi recognition. From PDB id 6SJB [54].

The mechanism of the entire RecBCD complex has been elegantly shown by DNA-bound structures and electron micrographs. The three subunits of Rec- BCD bind each other in a tight conformation, with RecB and RecC wrapping around each other. The helicase and nuclease domains of RecB are far apart and only connected through a peptide linker [41]. The dsDNA is fed into the complex between RecB and RecC. Here the double helix is split, after which

1 The chi site is sometimes written as the Greek letter . But chi is also an abbreviation of crossover hotspot instigator, the role it has in the phage lambda mutant mentioned above.

20 the 3’ strand is passed on to the RecB helicase domain, and then to RecC where recognition of chi takes place [52–54]. Then the strand reaches the nu- clease domain of RecB and can be cleaved. The 5’ strand takes a different path directly after the split. It passes through RecC before it is handed to RecD, and then passes close to the RecB nuclease domain also exposing itself to possible cleavage. The loose conformation of linker to the nuclease domain likely al- lows it to access both DNA strands on their way out of the complex [41]. Mutants where either helicase is inactivated still translocates along the DNA, but the RecD 5’-to-3’ activity is on average three times as fast as the RecB 3’-to-5’ activity. Due to the difference in rate, an excess of 3’ strand would be expected to pile up in from of RecB. Exactly this is observed in electron microscopy, where a large loop of the 3’ stand is seen protruding out of the complex [36,55,56]. In vivo the loop is 100-500 nucleotides [57]. The fact that separate act on each strand in the enzyme might explain the extreme processivity of RecBCD. The complex processes on average 30 kilobases in vitro and possibly even longer in vivo before falling off the DNA, and it does so at rate of more than 1 000 base pairs per second [58,59]. The active translocation on both strands may also help the enzyme bypass ssDNA gaps and other DNA lesions, which would allow it to process DNA where a DSB is part of a locally multiply damage site. When a chi sequence reaches the recognition site2, it binds and it is believed that the RecD helicase activity ends [60]. The binding causes RecBCD to ex- trude a new loop through a latch between the RecB and RecC [61]. RecBCD now enters the second mode where the slower RecB provides the force driving the complex along dsDNA. Now, RecBCD will also load the 3’ tail with RecA through a RecA binding site on the RecB nuclease [62,63]. As the last step before leaving the complex, both strands pass by the endo- nuclease domain. Before chi binding, it is normally oriented towards the 3’ strand and cleaves it much more often than the 5’ strand. After chi, it changes conformation and now only cleaves the 5’ strand but much more often [54,64]. How often cleavage occurs at all in vivo is unclear, as the frequency varies based on the relative concentrations of Mg2+ and ATP. With high Mg2+ and low ATP, cleavage is frequent; with low Mg2+ and high ATP, cleavage hap- pens rarely or never [65]. As the intracellular concentration of free Mg2+ is hard to determine, this has led to two competing views on what is actually going on in vivo. The first view is that intracellular Mg2+ and ATP concentra- tions are similar, and that RecBCD in vivo digests both strands during before chi. The 3’ end is cut into pieces of oligos tens of nucleotides while the 5’ strand becomes kilobase fragments [32]. After chi, cleavage of the 5’ strand becomes more frequent. The other view is that Mg2+ is lower than ATP, and that cutting in practice only occurs once. That is close to chi on the 3’ strand

2 Note that RecBCD might be positioned far beyond chi on the dsDNA already, since chi must pass through any extruded loop before reaching the site.

21 when the enzyme pauses there [66]. This model is referred to as “nick-at-chi”. There is recent evidence of either model. For example, qPCR shows approxi- mate half the normal DNA concentration around a DSB, which is compatible with 5’ strand degradation [59]. On the other hand, the observed chi site con- text dependence, that appears to be used by E. coli to increase recognition, is only effective at such Mg2+ concentrations where nick-at-chi is expected [51]. In this work the former model is assumed and used in illustrations. HR repair would however work in both cases, although some intermediates differ from what is illustrated here. The eight-bp classical chi site is highly overrepresented in the E. coli ge- nome, and especially so in the direction towards the origin of replication where it is three times more common than towards the terminus. The direc- tional bias aids the repair of the more common single-ended DSBs. While it is debatable what the hen and the egg here, e.g. whether these biases have evolved to aid HR repair or if the sequence exists due to other factors (certain common codon sequences, CG bias etc) and RecC has evolved to recognize it [67], they provide a mean for RecBCD to distinguish the E. coli chromosome from other DNA. A phage is unlikely to contain a chi site, and even if it does, RecBCD will in most cases miss it. In the chromosome there will sooner or later be another chi, but a phage will in most cases be entirely degraded3 and the degradation products may be turned into CRISPR spacers, giving future immunity against that phage [68]. In summary, RecBCD is a very intricate dsDNA end processing machine which fulfills several cellular functions by changing between enzymatic ac- tivities. This explains how the enzyme can be both destructive against phages and aid DNA repair, and also why so much undamaged DNA is seemingly needlessly degraded after a DSB. Apparently, the cost of polymerizing DNA again after a break is small compared to the advantage of efficiently degrading phages. Similar complexes exist in other bacteria. The AddAB variant is wide- spread and is present among others in Bacillus subtilis. It has structural and mechanistic differences, but fulfills the same function. In mycobacteria, the simpler AdnAB appears to have a similar role [69]. In eukaryotes however, no protein homologous to RecBCD is known [32].

3 At least if the former view on RecBCD nuclease activity is true. With the nick-at-chi model, it is possible that the helicase activity alone would disable phages, especially if the ssDNA is bound by SSB before it can anneal again. Here the 3’ strand loop could play a role in offsetting the output strands to prevent reannealing.

22 Recombinase RecA E. coli RecA is a small but very versalite protein at 354 amino acid residues, whose homologues are ubiquitous among both prokaryotes and eukaryotes [70]. RecA was first discovered already 1965 as the cause of a recombination- deficient phenotype in E. coli [71]. RecA and homologues have since been found to take part in a diverse range of processes all involving recombination, including DNA repair, bacterial conjugation and meiotic recombination in eu- karyotes. RecA null mutations have serious consequences for the cell: it gives extreme UV sensitivity, reduces cell viability to around 50 % even under nor- mal conditions and leads to chromosome degradation, resulting in anucleate cells [38,72].

Figure 5. DNA strand exchange. A circular ssDNA with bound RecA can take the complementary strand from a homologous dsDNA, leaving the homologous strand as ssDNA.

The most important reaction catalyzed by RecA is DNA strand exchange. Strand exchange involves two homologous ssDNA and dsDNA strands, and RecA switching the two homologous strands for each other, or in other words move the complementary dsDNA strand to the ssDNA (Figure 5). As men- tioned above, RecA achieves this through two DNA binding sites: one ssDNA binding site and one dsDNA binding site [73,74]. RecA binds ssDNA in a multimeric filament with one RecA monomer per three nucleotides and the monomers in contact with each other. Initial RecA binding to ssDNA, i.e. nu- cleation, is slow. In vivo, nucleation is aided by RecBCD or RecFOR which load RecA on newly created ssDNA. Once a nucleus of two monomers is formed, the filament can grow in both ends with additional monomers, alt- hough faster in the DNA 3’ direction. [75–78] The final RecA-ssDNA filament assumes a helical shape with the DNA strand embedded in the middle of the filament. This filament, with a pitch of 94 Å and an average rise of 5.08 Å per nucleotide, is stretched ca 1.5 times

23 compared to normal B-form dsDNA4. Notably however, the bases are une- venly spaced. They are held together in groups of three, each belonging to one RecA monomer, with a 7.1 Å distance between each group. The stretching within each triplet group is thus only 1.2 times compared to the B-form [79]. This state is called the presynaptic filament. When the RecA-ssDNA filament meets a dsDNA, the dsDNA is first caught by a binding patch on the RecA N- terminal domain before being led into the dsDNA binding site. Simulations suggest that binding here promotes flipping out of two bases in the dsDNA, allowing them to pair with two of the bases in a ssDNA triplet. If those two match, two bases in the next triplet are tested similarly. If three consecutive triplets pair two bases each, two of the third, unflipped, bases are flipped as well. If they match, i.e. eight bases are paired, a metastable recombination intermediate has been reached [80]. These simulations are corroborated by experiments showing DNA pairing only with homologies of at least 8 nucle- otides [81]. Single molecule experiments have also shown that a homology of 8 nucleotides is a critical limit in achieving more than transient binding (>10 s in vitro) between dsDNA several ssDNA-bound RecA homologues [82]. When the initial 8 base contact has been established, pairing continues in three-base steps in the 3’ direction of the ssDNA direction [82]. A stable con- formation is reached after 15-20 base pairs. After finishing strand exchange, RecA dissociates in the 5’ end leaving a rolling synapsis region through which the strand exchange progresses. While strand exchange can progress for sev- eral kilobases, the active synapsis region has been measured in vitro to cover only ca 80 base pairs [83]. RecA can also initially bind dsDNA although with a very slow nucleation rate. This reaction pathway is much less studied, but it has been shown that it can also lead to binding of homologous ssDNA and subsequent strand ex- change [84]. Additionally, RecA has an inhibitor, RecX, which is expressed from the same operon but at a fraction of the RecA level. RecX prevents formation of filaments and promotes their disassembly, possibly by capping the 3’ filament ends. The physiological significance of the inhibitor is unknown [85,86]. Intriguingly, while RecA is a facilitator in HR repair it is actually also a bottleneck. Directed evolution experiments have yielded single position mu- tants of RecA which increase E. coli radiation resistance by three orders of magnitude [87,88]. This suggests that efficient DNA repair is not the only purpose of RecA, and that E. coli RecA may have evolved as a compromise between several functions.

4 B-form DNA has a pitch of 33.2 Å and a rise of 3.32 Å per base pair.

24 LexA proteolysis & SOS response Apart from DNA binding and strand exchange, RecA-ssDNA catalyzes the protolytic autocleavage of LexA. LexA is a transcription factor that normally represses a large number of genes involved in several DNA repair pathways, including recA, recN, uvrD, ruvAB, sulA and ssb. When RecA creates a - ment on ssDNA, LexA binds in the groove of the filament, causing it to pro- teolyse itself into two peptides. Proteolysed LexA cannot bind to its sequence motif, which means that the effective concentration of LexA decreases in the cell whenever RecA-ssDNA is present. This depresses the genes mentioned above and many others, which is called the SOS response [89,90]. As different LexA-repressed genes have LexA binding boxes of different affinity, transcription of the genes will be activated in a sequence depending on how fast and for how long LexA is degraded. In the first wave are genes for “simple” repair methods as excision repair and DNA translesion synthesis (uvrAB, polB), then recombination repair genes (recA, recN) and a cell divi- sion inhibition (sulA), and last a mutagenic repair pathway (umuCD) [5,91]. In short, RecA-ssDNA acts as a signal of distress, which sequentially acti- vates more and more desperate rescue mechanisms depending on the severity and duration of the emergency. LexA cleavage is irreversible, but the protein also represses its own gene. It will thus be expressed during SOS response, and is restored to the normal concentration when all RecA-ssDNA is gone. RecA-ssDNA is also exploited by several bacteriophages whose repressors are autocleaved in the same way as LexA, causing phages to go into a lytic cycle whenever there is DNA damage.

ATP hydrolysis in RecA RecA is a DNA-dependent ATPase, however the purpose of the ATP hydrol- ysis is not obvious. For the presynaptic filament to be active, ATP must be bound, but the filament can bind ssDNA, search for homology, perform strand exchange and proteolyse LexA without hydrolysing ATP [92]. In other words, ATP hydrolysis is unnecessary for all key enzymatic activities. Burning ATP is of course costly for the cell. So why does it happen? RecA has actually been shown to take advantage of ATP hydrolysis in sev- eral ways. When RecA nucleates on ssDNA, it does not necessary do so “in phase”, so when the protofilaments grow there might be 1-2 nucleotide gaps between them. These gaps can be filled through ATP hydrolysis, apparently through some mechanism that shifts the protofilaments along the ssDNA [93]. ATP hydrolysis will also speed up homology search by allowing RecA- ssDNA to disassociate four times faster from non-homologous dsDNA once the initial 8-base interaction has completed [94]. After stable pairing has been achieved, ATP hydrolysis allows strand exchange to proceed through short non-homologous regions as well [95]. Finally, ATP hydrolysis promotes RecA disassociation after the strand exchange has finished [96].

25 It is not clear how these outcomes of ATP hydrolysis relate to each other structurally and mechanistically, but the end result is that hydrolysis-deficient RecA despite its prolifency in the basic biochemical reactions is completely unable to support recombination in vivo [97].

The challenge of RecA homology search The most interesting and elusive property of RecA is its ability to find homol- ogies within vast amounts of heterologous DNA within a biologically relevant timescale. The difficulty of finding exactly the correct genomic position was conveniently skipped over in the description of pairing above, but this is a monumental challenge for the RecA-ssDNA filament. Even though the fila- ment and its homology are both long, the actual target is to bind exactly on the right base pair on the genome, a target which is of a size of only about 3 Å. Before that point is reached, every ssDNA-dsDNA interaction is meaningless and the time spent there should be minimized. The task can be compared to that of Cas9 or a Cascade complex which in its defense against invading phages must find phage DNA matching the guide RNA before the phage can hijack the cell. Two studies have shown that a sin- gle enzyme can find a genomic position in 1.5 to 6 hours in E. coli [98,99]. This timescale is long, but not entirely implausible for DNA repair consider- ing that the E. coli generation time is can range up to more than an hour in poor growth conditions. However, this time is not reasonable to assume for RecA-mediated homologous search. The search rate is, if we assume random collisions, ultimately limited by the diffusion rates of the searcher and the tar- get [100]. In RecA-mediated search both of those will be very slow, the chro- mosomal locus is the same as in the Crispr search, but now the other party will be a RecA-ssDNA filament instead of a small molecule. The RecA-ssDNA filament is long and stiff. Its persistence length in vitro has been estimated to ca 950 nm which can be compared to ca 50 nm in dsDNA [101,102]. The persistence length and diffusivity of such a filament in vivo is not easy to es- timate, but we can safely assume it is orders of magnitude slower than a Cas9 and very likely slower than a chromosomal locus. However, it is possible to find a chromosomal target faster than a simple diffusion and collision would allow. The lac repressor in E. coli is one exam- ple, in vivo it finds its operator in just 3-5 minutes [103,104]. Bearing in mind that the lac repressor senses sequence specificity in the DNA major groove and thus does not have to denature the dsDNA helix, this search rate is still allowed only by a combination of hopping (microscopic dissociation-reasso- ciation), intersegment transfer (simultaneous binding of several uncorrelated chromosomal loci) and sliding on nonspecific DNA [104,105]. The RecA-ssDNA also has several features that accelerates search. First, the filament is long, can interrogate dsDNA at several points simultaneously,

26 and these points may interrogate loci that are separated far along the chromo- some. This has been termed intersegment contact sampling and has been shown in in vitro [106]. Second, the RecA C-terminal patch where dsDNA first binds is flexibly hinged, and the dsDNA can remain bound there while being fitted in the dsDNA binding site in up to four different base pair alignments according to simulations. This allows interrogation of several positions without macro- scopic dissociation [80]. Third, short-range RecA-ssDNA filament sliding on dsDNA has been ob- served in vitro. This means that each initial binding with dsDNA interrogates several positions, effectively increasing the size of the search target. Sliding in vitro was estimated to take place over 60-300 bp [107]. Sliding over longer distances than ca 800 bp has been excluded by studies using other methods [82,108]. In vivo sliding over these distances appears unlikely, since that would require the dsDNA to wrap around the RecA-ssDNA filament. How- ever also a more plausible short sliding would aid search by increasing the target size several fold. The mechanism of sliding is not clear but it is possibly related to the flexible C-terminal binding mentioned above. Fourth, the ca 50 % stretching of RecA-bound ssDNA means that RecA- ssDNA and dsDNA aligned parallel to each other will make contacts in dif- ferent sequence registers and can probe at different sequence offsets simulta- neously. If the two helices had the same rise per nucleotide, all contacts would effectively probe the same relation between the ssDNA and dsDNA. It is not clear which of these mechanisms that are relevant in vivo and if there are additional mechanism that might speed up search further. It has ac- tually been doubted whether HR repair between segregated sites occur at all, considering the complexity of HR search [1,2]. Some mechanisms might have different impact in the cell than in the test tube - for example, several in vitro studies have found that the concentration of heterologous dsDNA does not affect time needed to find the homologous dsDNA, implying that heterologous DNA is entirely ignored by RecA [109,110]. However, in both cases the high- est concentration of heterologous DNA used was much lower than in the E. coli cell (300 μM and 9 μM vs 7 mM5 incorrect sites, respectively). With that little heterologous DNA, the RecA-ssDNA might have spent so little time non- specifically bound that no effect on the final binding rate could be measured.

5 The concentration of incorrect sites in an E. coli is approximately 1.5 chromosomes ∙ 4.6 ∙ 106 nucleotides / 1 fl ≈ 7 mM, and significantly higher inside the nucleoid.

27 Double stranded breaks and molecular search

When specific DSBs are generated in Caulobacter crescentus, it has been ob- served that the DSB site travels towards a homologous sister locus which is tethered to the opposite cell pole [111]. In E. coli, where there is no chromo- somal tethering6, there are also indications that one locus travels further than the other, but it is not known whether this is the cut loci or its sister [112]. There are also questions about how to interpret the results, since chromosomal labels such as those used in that study were later shown to be ejected during RecBCD end processing [59]. However, since asymmetric migration between DSB loci and their sister could potentially be very informative on the search for homology, we devised a system where a DSB can be induced in a specific position, detected and then tracked by fluorescent microscopy.

Creating double stranded breaks The endonuclease I-SceI was used to create locus-specific DSBs. This endo- nuclease, originally cloned from a yeast transposon, recognizes an 18 bp se- quence which does not occur in the wild type E. coli genome [113]. It has been used previously to generate DSBs, but expression of the enzyme often causes DSBs in all chromosomes present [59]. This leads to a lack of homologous repair templates, prevents repair and leads to cell death. To modulate the cut- ting in time, we overlapped the restriction site with a lacO1 lac operator which was expected to prevent I-SceI binding (Figure 6A). To decrease the search time of the lac repressor (LacI), a secondary stronger operator (lacOsym) was inserted a full DNA helix turn from the first [104,114]. We theorized that with this system could be used together with fast media switching in microfluidics to turn the restriction site on or off. First, media inducing endonuclease ex- pression could be given, and switched for media with the lac inducer IPTG to allow cutting. By adjusting the time when cutting is allowed, an arbitrary pro- portion of restriction sites would be cut. When tested with wild type LacI ex- pression, the LacI protection was very effective although some restriction could still be detected (Paper I figure 1C). We also attempted additional LacI

6 Disregarding cotranscriptional membrane insertion, or transertion, which is temporary.

28 expression from a low copy pSC101 plasmid, but this largely prevented re- striction cutting even when IPTG was present. In the following experiments only wild type expression was used.

Figure 6. A. Ends created by I-SceI restriction, and the overlap with lacO1. B. The mutated restriction site CS3mut. C. Formation of RecA4155-GFP structures after expressing I-SceI in a strain without restriction sites. The enzyme was expressed for 30 minutes from the pSN1 plasmid using 0.4 % arabinose, at 30 C.

The I-SceI enzyme itself was expressed from a p15a plasmid with an ampicil- lin resistance gene. The I-SceI gene was transcribed from a tetracycline in- duced promoter and C-terminally fused to at AANDENYALAA tag, which marks a protein for degradation similarly to tmRNA. The efficiency of the degradation in this specific case has however not been evaluated.

Detection of DSBs To detect DSBs in real time, we used the ejection of DNA-bound proteins by RecBCD end processing. A parS-pMt1 site was integrated close to an endo- nuclease restriction site (Figure 7A). The parS-pMt1 site is a binding site for the ParB-pMt1 plasmid segregation protein from a Yersinia pestis plasmid, and has the advantage that it does not delay chromosome segregation as some other parS/parB systems do [116,117]. Hereafter these will be referred to simply as parS and ParB. Three variants of this construct were developed, with parS located 250 bp in the terminus direction of the restriction site (CS1,

29 Figure 7A), directly adjacent in the terminus direction (CS2) and 250 bp in the origin direction (CS3). To visualize the loci, ParB fused N-terminally to mCherry was expressed constitutively from the chromosome. Since it is not entirely clear how ParB binds to parS, it was interesting to note that placing parS directly adjacent to the lac operators inhibited ParB binding (Figure 7B). Adding IPTG restored binding, showing that it is LacI that prevents ParB from binding, presumably either by acting as a roadblock during binding or sterically through the LacI-induced loop. Only the CS3 con- struct was used in the experiments described here.

Figure 7. A. Three variants of the genetic construct designed for creating and de- tecting specific DSBs. With the parS site in the terminal direction of the DSB with spacing in between, in the origin direction of the DSB without spacing, and in the origin direction of the DSB but with spacing. B. mCherry-ParB fluorescence in the three constructs, without and with IPTG. C. The CS3 restriction site as it appears in vivo, with a malO array towards the origin and LacI looping.

Additionally to this, an array of 12 malO binding sites were integrated ca 25 kb in in the origin direction. These sites are bound by the maltose inhibitor, MalI, which was constitutively expressed from the chromosome and fused to the yellow fluorescent protein mVenus. MalI is not well characterized but it has strong homology to LacI. If we assume it to bind as a dimer, as LacI- mVenus does, up to 24 fluorophores could bind each malO array [118]. We also integrated a reporter for the SOS response to detect DNA damage independently from the mCherry-ParB focus. Here we used sCFP3a [119] ex- pressed from the sulA promoter, similarly to what has been done previously with GFP [17]. A cyan reporter is required to avoid interference with either

30 mCherry or mVenus, but due to the slow maturation and poor brightness we had to optimize the expression. This involved changing the promoter -35 box to the E. coli consensus sequence and manually adjusting the codon usage to avoid secondary structure at the ribosomal binding site, resulted in a useful level of expression. The resulting promoter was named pSulA’.

Inducing double stranded breaks in microfluidics Inducing DSBs in cells in a microfluidic system has several advantages: The growth media can be exchanged within seconds, allowing precise control of the time window where restriction cuts are allowed. It is also possible to in- duce breaks while microscopic observation is ongoing, and catch any fast dy- namics that happen shortly after a break. Lastly, it is possible to track cell lineages over time to ensure that return to normal cell division, indicating that DSB repair was successful. Cells were loaded in microfluidic traps, and grown in minimal media cho- sen so the vast majority of cells (> 97 %) had at most two foci of either color. DSBs were generated by temporarily changing to a media containing IPTG and the tet inducer ATC. An induction with three minutes of IPTG (1 mM) and ATC (20 ng/ml) followed by three minutes of only IPTG was found to provide an appropriate level of DSB induction at 37° C. Since DSB generation is stochastic, three types of outcomes were observed in cells that initially had two mCherry-ParB foci (Figure 8A, Paper I figure 1D). Some cells were unaffected and divided normally. Others lost both mCherry foci, elongated without division and expressed high levels of the SOS reporter CFP. Most Interesting to us, however, were those which only had one single DSB. These typically survived and divided, but later and with a longer final cell length than the unaffected cells (Paper I figure 1G). The CFP was also on average increased in their daughters (Paper I figure 1F). The DSB induction protocol was adjusted to optimize the proportion of cells hav- ing a single DSB. Although the induction efficiency varied between experi- ments, typically around 30 % of the loci were lost and about 10 % of cells lost both loci, suggesting the cuts are independent. The cell images were segmented and cell lineages were built. From these, cells that divided twice (into granddaughters of those present at induction time) were selected. These were further filtered on segmentation quality cri- teria and an average increase of CFP fluorescent of at least 5 camera counts between time zero and 90 minutes. Since the cells divided in the meantime, the CFP fluorescence was measured in the descendants of the initial cells. The cells selected automatically were visually inspected and cells with probable single DSB and subsequent repair were identified based on mCherry-ParB spot disappearance, merging of MalI-mVenus spots and finally splitting into

31 Figure 8. A. Phase contrast image of one mother machine cell trap over time, with CFP insert at 90 minutes. Cell 1 is without DSBs, cell 2 has two DSBs and cell 3 has one single DSB. B. Examples of cells with single DSBs and their daughters, in mCherry and mVenus fluorescence channels. C. Number of mCherry-ParB foci per cell after induction of DSBs, in cells with active endonuclease and cut site and con- trols with uncuttable site and inactive enzyme. D. Number of DSBs detected in the chromosomes segregating towards the old and to the new poles, respectively. two spots again (Figure 8B). These cells showed a rest period following the mCherry dot loss, with no visible events, then largely symmetric pairing be- tween the MalI-mVenus foci, colocalization in the cell center and finally re- segregation before cell division. Notably, the remaining mCherry focus (at the sister locus) stays colocalized with the 25 kb distant mVenus focus throughout the repair. To ensure that these events represent actually double stranded breaks, strains with a mutated, uncuttable restriction site or an inactive endo- nuclease were induced alongside the strain with a cuttable site. Here, the num- ber of mCherry foci drops notably after 20 minutes only in the cuttable strain, followed by a recovery and then a temporary increase of cells with 3 and 4 foci (Figure 8C). This pattern was not observed either in cells lacking a func- tional restriction site or an active endonuclease. The data was then processed identically, and images from the cutting strain and a control strain were pre- sented in random order to a human observer. Cells with DSBs were selected

32 Figure 9. A. Examples of cells with single DSBs and their daughters, with the MalI- mVenus label on the left chromosomal arm. B. Examples of cells with single DSBs and their daughters, with the MalI-mVenus label 75 kb upstream of the cut site. C. Examples of cells with single DSBs and their daughters, with HU-yPet. D. Long-axis distributions of MalI-Venus, 25 kb from CS3, and mCherry-ParB during 4 min time windows before the DSB, after the DSB and before mCherry resegregation. Cells have been reoriented with the DSB on the right hand side. (n=77, 77 & 73 cells) E. As D, but with MalI-Venus on the opposite chromosomal arm to CS3. (n=42, 42 & 40 cells) F. Long-axis mean HU-yPet density, at the time of a DSB, the estimated time of pairing and at the time of mCherry resegregation. (n=4 cells) and the timings of mCherry foci loss, mVenus foci merge and foci resegrega- tion were annotated. While 35 DSBs were identified in the cuttable strain, none were found in the control strains. Notably, dot loss was observed on average 16.6 ± 6.4 min (n = 77) after the start of DSB induction, which is more than 10 min after IPTG has been re- moved and when the restriction site would again be protected by LacI. The delay is explained by the unusual property of I-SceI to first bind the recogni- tion site, but not execute the double-stranded cut until after a resting period of

33 several minutes [120]. There was no bias for DSBs being generated in either the chromatid segregating towards the old or the new cell pole (Figure 8D). What was immediately evident is that the repair process was significantly faster than what has previously been reported in E. coli [112]. The initial phase, from the loss of a DSB focus until MalI-mVenus foci pairing lasted 8.3 ± 4.4 min (mean ± SD, n = 55). The second phase, from the pairing until re- segregation of mCherry-ParB lasted 8.4 ± 6.4 min (n = 51). We termed the first phase repair phase I and the second phase repair phase II. The timescale of repair is surprisingly fast, but also makes DSB repair using a segregated as repair template a mechanism more plausible to be relevant in vivo. From here, the next step is to identify which mechanistic steps that correspond to each phase. The specific search for the sister locus could be performed either in the phase I with the chromosomes in their normal positions, or in phase II when the chromosomes may be compacted in the cell center. To investigate this, the malO array was moved to a position on the other, left, chromosomal arm, on a similar distance to the origin of replication (ygaY). Now when DSBs are induced, the MalI-mVenus foci did not merge in the cell center (Figure 9A, E) which is notably different from when the labels were proximally located (Figure 9D). In a few cases, the locus on the left arm migrated a few hundred nanometers towards the cell center, but most often they were unaffected visi- bly. Additionally, placing the malO array on the same arm as the cut site, but circa 75 kb in the origin direction, yielded MalI-mVenus pairing in a majority of cells but with a minority migrating towards the middle without merging (Figure 9B). To visualize the entire nucleoid during HR repair, the chromoso- mal binding protein HU was fused with mVenus, and showed that while a single DSB reverses the chromosomal segregation temporarily it does not cause general chromosomal condensation (Figure 9C, F). These results show that only a region around the sister locus pairs during the repair process, indicating that the search for a specific homology finishes during the initial 8.3 minute phase. This equals about a fifth of a generation time under these conditions, and is several times faster than the circa 50 minutes that were reported previously in E. coli [112]. There is also no evi- dence of short- or long-traveling loci. Instead, pairing occurs symmetrically and only shows that the homologous locus has to be located before it is moved to mid-cell.

34 RecA activity after a double stranded break The short time frame needed for homology search raises questions about the mechanism that achieves this. Although the search is mediated by RecA, most of our knowledge about this enzyme comes from in vitro studies and we can only guess what activities that are relevant in the cell and on what time scales they act. To investigate the behavior of RecA in vivo, we replaced the MalI- mVenus label by a chromosomally expressed RecA-mVenus fusion. As fluo- rescent RecA fusions typically partly inhibit the enzymatic activity [115], we coexpressed RecA-mVenus with wild type RecA from the native recA locus similar to what was done by [121]. When expressed like this, no persistent foci or other structures that suggested protein aggregation were seen. As be- fore, DSBs were found by a combination of automatic filters and manual in- spection. When a control strain with uncuttable restriction site was induced alongside the cuttable strain, no DSBs were found in the former while 21 DSBs were found in the latter strain. When DSBs were induced in this strain, a strong focus involving the almost all RecA-mVenus in the cell developed quickly at same place where a mCherry-ParB focus was lost. Actually, on average a RecA foci was observed 43 ± 82 s (n = 89) before the mCherry focus was lost. Assuming that the RecA focus is formed by RecA being loaded on ssDNA by RecBCD, and since there are no chi sites before the parS, this could be due to two reasons. It either suggests that the two DSB ends are independently processed by RecBCD, pos- sibly with a preference for the terminus end, or it could be due to ParB clus- tering at parS which could conceivably last for some time after the parS site is degraded. About 4.9 ± 2.1 (n = 89) min after the initial RecA focus was observed, a different type of structure could be observed. Often first seen as a protrusion from the initial focus, a more diffuse structure was observed. Typically, the second type of RecA structure was elongated along the long cell axis, although it often was bent and had several nodes with more dense fluorescence (Figure 10A, Paper I figure 3A). This structure lasted for several minutes, with a grad- ual demise that is often hard to pinpoint in time. In many cases the second structure starts well-defined and filament-like, but after 3-4 minutes trans- formed into a loosely shaped cloud of RecA-mVenus that rapidly shifts posi- tion between the cell lobes. Notably, the lifetime of the second RecA structure is similar to the time observed until MalI-mVenus pairing, suggesting that this RecA conformation is indeed active in mediating the pairing. When filament- like, the second structure can easily be interpreted as identical to the RecA-

35 36 Figure 10. A. Cells with single DSBs and their daughters, with RecA-mVenus. B. Magnification of RecA-mVenus structures. C. RecA-rsEGFP2 in wide-field fluores- cene, a z plane max projection of the MoNaLISA image and colored by z plane. Rec- tangles indicate elongated structures. D. Examples of elongated structures, with FWHM measurements at the arrows. ssDNA filaments that are known to perform homology search and strand ex- change in vitro.

Superresolution imaging To resolve the elongated structures better, they were imaged in superresolu- tion. We used the recently developed RESOLFT method MoNaLISA [122]. For this purpose, mVenus was exchanged for the reverse-switchable fluores- cent protein rsEGFP2 [123]. The superresolution imaging was performed in microfluidics as before, except that now a device without subdivisions be- tween the individual cell channels was used. Instead the cells grow in a mon- olayer in a rectangular cell trap which increases the number of cells that can be observed within the region of interest of a camera several fold. The super- resolution imaging was performed at room temperature since the microscope is hard to align properly when the objective is heated. Similarly to RecA-mVenus, the RecA-rsEGFP2 fusion did not form stable structures or aggregates during normal exponential growth. When DSBs were induced, dynamic structures could be observed already by widefield micros- copy. Since MoNaLISA is a technique with thin horizontal slicing, we ac- quired up to four z slices over a span of 540 nm. In those images we observed long structures which were generally longitudinally oriented in the cells. Since the structures often transcend multiple z planes, they are best visualized in a z projection (Figure 10C). Although the label density of the structures is rela- tively low, compared for example to the eukaryotic Vimetin-rsEGFP2 used for demonstration [122], the width of the structures could still be estimated (Figure 10D). Measures of the full-width at half maximum gave an estimated structure width of ca 80 nm, which may however be an upper bound limited by the resolution of the microscope. The extension of the structures show that they are stiff over a micrometer length scale. This can be compared to the ca 950 nm persistence length of ssDNA-RecA filaments. As a contrast, cells that had failed repair and filamented eventually exhib- ited very bright and stable elongated RecA structures together with large bright clusters (Figure 11). The elongated structures appeared to be nucleoid excluded and did not reorganize or dissolve within a time scale of up to ten minutes. The clusters extended several hundred nm in all directions and were consistent with protein aggregates. We attempted to acquire time lapses in superresolution with MoNaLISA, but fluorophore fatigue limited the number of usable frames strongly. In addi-

37 tion to this, the number of RecA-rsEGFP2 foci increased after imaging, sug- gesting that the 405 and 488 nm laser exposure from MoNaLISA caused DNA damage.

Figure 11. Thick, immobile RecA-rsEGFP2 structures in irreparable cells, imaged by MoNaLISA. Maximum projection (left) and colored by z plane (right).

Based on the observation of RecA after a DSB, we can refine the two phases of DSB repair previously described. The phase I, before MalI-mVenus pair- ing, can be divided into Ia. when RecA is assembled in a single strong focus and, Ib. when RecA creates an elongated filament. When DSBs are caused by SbcCD just after the replication fork, a single strong RecA-mCherry focus with similar life time as here has been observed [121]. This focus however dissolves without creating an elongated filament. The SbcCD-mediated DSB occurs between unsegregated chromosomes, so only minimal homology search is needed. Since the elongated structure only forms after a DSB in segregated loci, we believe these structures are necessary for search and that the search occurs in phase Ib. The shape, as observed in superresolution, and the function of the elongated structure together strongly suggest that this structure is identical to the ssDNA-RecA structures observed in vitro to mediate homology search and strand exchange. Phase II, when MalI-mVenus is paired, involves less well-defined RecA behavior. The RecA filament often dissolves gradually. Sometimes, RecA- mVenus is observed to oscillate between the cell lobes on a minute time scale.

38 This oscillation is yet unexplained. In a few cases, weaker RecA-mVenus foci are observed at the cell poles (Paper I figure 3A). Those foci last for a few minutes and might be caused by nuclear exclusion or be short-lived aggregates caused by high RecA expression during SOS response.

39 Double stranded breaks and the cell cycle

While RecA-mVenus did not produce persistent structures in exponentially growing cells, we did see foci, and less often elongated structures, lasting a few minutes also without DSB induction (Figure 12A). These are similar to the RecA-mCherry foci detected by [121] when single-ended DSBs were gen- erated using SbcCD-mediated chromosome cleavage. We hypothesized these foci indicate spontaneous DNA damage, repaired by either the RecBCD or the RecFOR pathways. If this hypothesis is correct, we have the opportunity measure the effects of DNA damage in individual cells on a larger scale by using microfluidics.

Figure 12. A. Example of a microfluidic cell trap with cells carrying the RecA-Venus gene during steady state growth. RecA-Venus structures are annotated. B. Number of cells and the proportion of cells with spontaneous RecA-mVenus structures, with a 20-minute average, during a microfluidic experiment without DSB induction.

40 C. Proportion of cells with spontaneous RecA-mVenus structures of different dura- tion. D. Percent of daughter cells that have structures of different duration, sorted by the duration of structures in the mother cells.

Detecting DNA damage by a neural net classifier To allow for automatic detection of RecA structures, a neural network was trained to recognize cells with structured RecA. A strain expressing RecA- mVenus but with inactive I-SceI was grown in non-inducing media in micro- fluidics for up to 16 hours, while being imaged in phase contrast and YFP fluorescence every minute. The images were segmented and fluorescence cell images were extracted. First, a small dataset of ca 4 359 fluorescent cell im- ages (79 of which had RecA structures) was classified manually and input into AlexNet [124]. The features from layer fc7 were used to train a support vector machine. This classifier was then used to automatically classify a larger da- taset which was then manually corrected, and used to train neural networks with six different architectures. Since the classes are heavily unbalanced with only 2 % of the images having a structure, accuracy is not very useful as a performance metric. Instead the added specificity and sensitivity on the vali- dation set were used to rank the architectures. The best performing architec- ture (Table 1) was again used to classify an even larger set of cell images. After manual correction this resulted in a dataset of 7 069 unstructured and 1 220 structured cell images of which 90 % were used for training and 10 % for validation. The structured cells in the training set were augmented by dupli- cating each image 4 times with 1 % normally distributed noise added. This dataset was used to train the final neural net classifier.

Table 1. The network architecture used for the final classifier Layers 1 Input layer of size 13x42 px, normalization around zero 2 Convolution 8 5x5x1, stride 1, padding to original size 3 ReLU 4 Batch normalization 5 Convolution 8 5x5x8, stride 1, no padding 6 ReLU 7 Batch normalization 8 Convolution 16 3x3x8, stride 1, no padding 9 ReLU 10 Batch normalization 11 Average pooling 3x3, stride 1 12 Fully connected layer 2720x2 13 Softmax layer 14 Weighted cross entropy classification layer

41 For final training, the extracted cell images were cropped to a height of 13 pixels by removing the darkest rows at top and bottom, and scaled to a length of 42 pixels. As training input, we used either the entire bounding box of the cell or applied the segmentation mask and replaced the outside pixels with either the mean of the cell or zeros. After training six randomly initialized networks, with the architecture in Table 1, up to 20 epochs on each of the three input variants. The results achieved were very similar but marginally better with masked cells surrounded by zeros. When this classifier was applied to an independent test set of 11 454 unstructured and 223 structured cell images, which corresponds to actual class proportions, specificity was found to be 0.80 and sensitivity 0.75. Manual inspection revealed that many of the misclassi- fied cells were borderline cases where the class could not be assigned unam- biguously even by a human observer, thus putting an upper limit on the clas- sification performance achievable. We were thus content with the accuracy and applied the classified to all the cell images.

Effects of spontaneous DNA damage The proportion of cells with RecA-mVenus structure was stable at circa 2 % during exponential growth while imaging every minute (Figure 12B). We found that 14.4 % of the cells have RecA structures lasting for one minute of more during their lifetime (Figure 12C). This number is similar to the 18 % of cells that have found to have a DSB per generation [16]. Since the micro- fluidic system allows recording of cell lineages, it is possible to compare the proportion of cells with RecA structures from one generation to the next (Fig- ure 12D). Here there is no apparent inheritance of RecA structures, except for after the longest lasting structures. More interestingly, the cells that were found to have RecA structures also induced the SOS response and produce more RecA, which corroborates the interpretation of these structures as a sign of DNA damage (Figure 13A, Pa- per I figure S3C). Longer lasting structures gives a stronger SOS-response, which agrees with SOS induction lasting as long as there is RecA-bound ssDNA. RecA structures also give a prolonged generation time, and a corre- spondingly increased cell elongation before division (Figure 13C), similar to the cell elongation caused by a chronic DSB [125]. However, this delay in division is accompanied by a decrease in the generation time of the daughters, showing that the extra cell elongation during a repair helps the eventual daugh- ters to complete their cell cycle faster. When combining the generation times of mothers and daughters, the faster daughter cell cycles actually entirely off- sets the division delay in the cell where the DSB occurred (Figure 13D). Thus, the detected DNA damage has virtually no fitness cost when measured over two cell cycles.

42

Figure 13. Cells with RecA-mVenus structures of different durations, and their daughters. A. SOS induction, measured by increase of RecA-mVenus fluorescence over the cell cycles. B. Generation times of cells with RecA-mVenus structures of dif- ferent durations. C. Length extension, as ratio between last and first observed lengths. D. Combined generation times of mothers and daughters. The circle and vertical lines in the violin plots indicate median and quantiles. The horizontal dotted lines is the median of the leftmost distribution. Outliers have been omitted for clar- ity. ✱ indicates p < 0.01 with Welch's t-test. (mothers n = 15 773, 4 000, 1 369, 655, 398, 265, 644, daughters n = 13 786, 3 712, 1 215, 579, 365, 243, 576)

43 Early and late participants in homologous recombination repair

Since we can now recognize DNA damage from RecA behavior, we wanted to use this to investigate other enzymes involved in HR repair. While a num- ber of players are mentioned in the introduction, there are many other proteins that are known to either interact with the known HR proteins or to be involved in chromosomal topology activities necessary after HR repair. Several have been shown to have moderate effects on HR efficiency, which may be inter- preted as if some DSBs can still be repaired but others not. Since we can now differentiate between DSBs between segregated and non-segregated sites, we can investigate whether a deficiency to perform the extensive search needed between segregated sites is what results in a partly HR deficient phenotype. To test this, we separately deleted a selection of candidate genes and per- formed DSB inductions in microfluidics as before. The genes chosen were RecC, which is part of RecBCD and is indispensable for HR repair. RecF, a protein similar to structural maintenance of chromosomes (SMC) proteins which is part of single strand gap repair pathway RecFOR . Since RecF is known to mediate RecA filament growth through RecX, it is conceiv- able that is active also on filaments created through RecBCD [126]. RecG, a helicase which mediates branch migration at Holiday junctions [127]. This leads to resolution of the D-loop created during HR repair. RecN, a SMC-like protein induced during SOS response and whose mutants have increased sensitivity to DNA damage [128,129]. RecO, part of the RecFOR pathway and helps RecA displace SSB from ssDNA [130]. RuvA, part of the RuvABC resolvasome which resolve Holliday junctions created during HR repair [35]. RuvAB mediates branch migration. RuvABC is an alternative resolution pathway to RecG. RuvC, which cleaves Holliday junctions to resolve the D-loop [35]. UvrD, a helicase involved in NER which has also been implicated to interact with RecA [131]. Mutants alleles from the Keio collection, except for ΔrecN which was de- rived from [132], were transduced into the strain used for visualizing DSBs with RecA-mVenus. DSBs were induced as before during imaging in a micro- fluidic chip.

44 Unsurprisingly, the RecC mutant was severely deficient in DSB repair (Figure 14A). RecA does not assemble at the location of lost mCherry-ParB foci and SOS response was rarely induced. Instead, most the growth of most cells stalled (Figure 14B, C). No DSB repair between segregated foci was ob- served in ΔrecC.

Figure 14. A. ΔrecC cells with induced DSBs. B. Growth rate, as a 20 minute mov- ing average, in ΔrecC and wild-type cells following a DSB induction. C. Cell lengths in ΔrecC and wild-type cells following a DSB induction. Shown are, from top to bot- tom, 90th percentile, 75th percentile, average, 25th percentile and 10th percentile.

Aside from for ΔrecC, successful DSB repair between segregated foci was observed in all the mutants listed above. In all these, successful HR repair was superficially similar to how it was observed to proceed in the wild type, in- cluding a rapid RecA-mVenus focus, an elongated RecA-mVenus structure, migration to the cell center and eventual resegregation (Figure 16Figure 18). However, some of the mutants exhibited notable phenotypes. Several of them failed to repair DSBs in some cases, leading strong SOS induction and fila-

45 mentous cells. The proportion of cells that failed to repair a DSB was esti- mated by classifying the cells present during DSB induction based on RecA- mVenus structures. In all strains, cells with no or only one frame with RecA- mVenus structures almost universally divided within 90 minutes after birth (Figure 15A). In the cells with structures lasting for two minutes or more, as- sumed to be mostly I-SceI-induced DSBs, the proportion of cells dividing within 90 minutes was always lower and ranged from 95 % in the wild type to 61 % in ΔrecN.

Figure 15. A. Proportion of deletion mutants that divide within 90 minutes of birth, sorted by normal cells (no more than one frame with RecA structure during cell life- time) and cells with RecA structures (at least three consecutive frames with RecA structures). (normal cells n = 97, 94, 224, 32. 117, 97, 59, 33, structured cells n = 61, 65, 86, 98, 108, 132, 110, 145) B. Ratio between generation times in dividing cells with RecA structures and dividing normal cells. Error bar is standard error of the mean. (normal cells n as in A, structured cells n = 58, 61, 79, 88, 86, 103, 79, 88)

This number cannot be considered as an absolute measure of the proportion of cells which can successfully repair a DSB since there are several confound- ing factors that are unaccounted for. First, while the majority of RecA-mVe- nus structures are likely caused by I-SceI mediated DSBs, some may also be spontaneous single-ended DSBs or RecF-mediated SSB repair. Second, since I-SceI restriction is stochastic some cells will cut both chromosomes or their single yet unreplicated chromosome. In these cases, no repair is possible. The 5 % which failed repair in the wild type strain likely represent those. Third, the detection of cell divisions is imperfect and especially very long filamen- tous cells are sometimes incorrectly split by the cell segmentation algorithm. Despite this however, the proportion of cells with RecA structures that man- age cell division is likely a useful measure when comparing the importance of

46 various genes in HR repair. While ΔrecF, ΔrecO, ΔuvrD had approximately 90 % successful divisions, the ΔrecG, ΔruvA, ΔruvC & ΔrecN mutants stood out with less than 80 % of RecA-structured cells dividing. When the time until division was compared between the cells with RecA structures and those with- out, including only dividing cells, ΔruvA and ΔruvC were found to also have prolonged generation times after DSBs, while ΔrecN and ΔrecG were less af- fected. A significant increase in the generation time after DNA damage was also seen in ΔrecF cells.

RecN Inspection of the cell images confirmed that the ΔrecN mutant failed a large proportion of repairs, and when it succeeded, the Ib phase, where an elongated RecA structure is visible, was often prolonged and could last up to 20 minutes (Figure 17A). In these cases the RecA filament did not appear to develop into a thin filament, was not as dynamic as those in the wild type and often did not extend throughout the opposite cell lobe. This suggests a function of the RecN SMC-like protein in the HR search process, possibly as an organizer of RecA- mediated HR search.

RuvC The ΔruvC mutant also failed a large proportion of HR repairs but in a differ- ent way (Figure 18B). Here, phases Ia-Ib proceeded normally, but a propor- tion of the cells stayed unusually long in phase II with one single centrally located mCherry-ParB focus. Sometimes, this central focus resegregated into two foci, which rapidly segregated to different cell lobes followed by cell di- vision. In other cases, the central focus persisted and no further cell divisions were observed, resulting in failed DSB repair.

RuvA A similar phenotype was observed in ΔruvA (Figure 18A). As with ΔruvC, some cells repaired as normal while others were stuck in phase II. Even short- lived (3-4 min) RecA-mVenus foci at the single mCherry-ParB focus in cells with yet unsegregated ParS loci, i.e. where the search-related RecA filament is never observed, could lead to a stuck central mCherry-ParB focus and pre- vent cell division. While some cells did recover by segregating their chromo- somes and dividing, others never divided. Interestingly, filamentation was rarely observed in these cells. Instead they typically grew little even over sev- eral hours. This results confirms that RuvABC complex is involved in the late phase of HR repair, as was already believed based on biochemical studies [35].

47 Figure 16. Examples of cells with DSBs in segregated loci, which repair and re- sume cell division. A. Wild type (as in Figure 8A) B. ΔrecG C. ΔrecF

48 Figure 17. Examples of cells with DSBs in segregated loci, which repair and resume cell division. A. ΔrecN B. ΔrecO C. ΔuvrD.

49

50 Figure 18. Examples of A. ΔruvA and B. ΔruvC cells with DSBs in segregated loci. Two leftmost panel cells repair and resume cell division. Rightmost panel cells has mCherry-ParB focus stuck in central position until it is pushed out of the microflu- idic trap, and are not observed to resume division.

RecG DSBs in ΔrecG were typically repaired, leading to successful division. How- ever, in a minority of cells repair led to anomalous chromosome segregation with several mCherry-parB foci which did not segregate well into the cell halves (Figure 16A). In some cases, these anomalous chromosome segrega- tions resolved after delay of 20 minutes or more and the cell eventually di- vided into a two normally appearing daughter cells. In other cases, the cells did not divide and instead filamented. Similar non-resolvable anomalous chro- mosome segregation was observed in some cells also without DSB induction, indicating that also spontaneous DNA damage sometimes requires RecG for successful repair. These results agree with the view that RecG and RuvABC provide complementary pathways for resolving Holliday junctions after HR repair [5].

RecF No clear phenotype was seen in the images of ΔrecF cells (Figure 16C), de- spite that these show an increased generation time after DNA damage in Fig- ure 15B. RecF acts in a separate pathway from RecBCD and is thus not ex- pected to affect the repair of DSBs. A possibility is that the lack of RecF leaves single strand gaps unrepaired which are then converted to single-ended DSBs by the replication fork, effectively increasing the rate of spontaneous DSBs. This could lead to several DSBs in a single cell, resulting in delayed cell divi- sion. In this case, however, is it unclear why a RecO deletion does not have the same effect.

RecQ and RuvB In addition to those HR-related genes mentioned above, deletion mutants of RecQ, a helicase known to suppress incorrect recombination during RecFOR recombination [133], and RuvB, which catalyzes branch migration together with RuvA, were constructed and tested as those above. Automatic analysis could not be performed on these strains due to imaging data quality issues. However, successful repair between segregated loci was observed in both these strains.

51 Dynamics of chromosomal loci

When a DSB occurs at a segregated locus and the sister locus must be located, the question arises to what degree chromosome organization and mobility aids the search. E. coli, and many other bacteria, have genomes with a contour length about one thousand times longer than the cell length. This genome must be packed in the cell in such a way that it can still be transcribed, replicated and segregated before cell division. Although the prokaryotic genome typi- cally is small compared to eukaryotes, the organization is in ways more com- plex since their cell cycles are not divided into clear separate phases. Instead chromosome segregation can proceed simultaneously with not only one but several ongoing rounds of chromosome replication. The chromosome of many rod-shaped bacteria is confined to a central re- gion in the cell termed the nucleoid [134,135]. Despite the lack of a nucleus membrane, the bulk of the chromosome is still held in a dense central region. The organization of the chromosome is solved differently in different bacterial species. In slow growing E. coli, the two sister chromatids are segregated into the respective cell halves, with the origin and terminus regions at the quarter positions and left and right chromosome arms on each side [136,137]. Exactly what forces that holds the nucleoid together is not clear and is likely a combi- nation of nuclear proteins, negative supercoiling and entropic forces. Recent research suggests that the SMC protein MukBEF organizes the chromosome by binding two DNA positions as a dimer and extruding a DNA loop between them. This might create a MukBEF-bound chromatid backbone with extruded loops of circa 50 kb between them [138]. These loops may then be organized by other nuclear proteins such as H-NS, HU and Fis. In paper II, we show that the chromosome structure is modulated by tran- scription and possibly translation. After the lac promoter has been dere- pressed, the lac locus moves within three minutes out of the nucleoid towards the cell periphery (Paper II figure 3BC). This movement, which was 48 ± 7 nm (mean ± standard deviation), brings the transcribed locus towards the re- gion between the nucleoid and the cell membrane where actively translating ribosomes reside [139]. This movement was shown to be dependent on trans- lation, as the distance decreased to 18 ± 11 nm without ribosomal binding site on the mRNA and disappeared entirely when also the start codon was removed (Paper II figure 4). However, cotranscriptional membrane insertion of pro- teins, transertion, was not found to affect the locus movement (Paper II figure S7). This does not necessarily mean that translation provides the main driving

52 force for the movement. The translated mRNAs also had a severalfold higher transcription rate, leaving the possibility that high transcriptional activity alone induces a movement of a genomic locus towards the cell periphery. The change in position of induced loci largely agrees with what I found previously with a different but similar experimental system [140]. It is also in agreement with the localization of transcribing RNA polymerases in general, which are located farther from the cell axis than non-transcribing RNA poly- merases [165]. We can now say that it is not only due to genes at the periphery being transcribed more, there are also individual genes which moves outwards upon expressed. In an earlier study, however, a movement towards the cell membrane was only observed with transertion and not with non-membrane proteins [141]. Here the distance was not measured from the cell axis and outward but from the membrane and inward. Hence, these methods are not entirely equivalent. It is possible that both results are correct, only that the latter method is more accurate in determining association with the membrane while ours is more reliable at determining the locus position relative the nucleoid.

53 Active single particle tracking

To determine single particle diffusion coefficients at higher resolution and at various timescales, we have developed a novel method for particle tracking. Since the resolution of a conventional optical microscope is fundamentally limited by physics, there has been much interest in methods to increase the effective resolution for different types of imaging problems. One class of flu- orescence super-resolution methods relies on keeping the fluorophores sparse enough that their point spread functions (PSFs) do not overlap [142–144]. They can then be imaged by widefield microscopy and each fluorophore can be localized individually by fitting the observed fluorescence to the expected PSF. The localization accuracy of this method depends on the number of pho- tons that can be detected from the fluorophore and the background level. An accuracy down to 30 nm can be reached in practical in vivo applications (Pa- per III table S2). To determine diffusion coefficients, time lapse images are acquired, the fluorophores localized in each frame and the diffusion coeffi- cient is estimated from the mean squared displacement of the particles over time. The second class of superresolution methods rely are scanning methods that rely on manipulation of the fluorophores. These methods, together called RESOLFT, involve scanning over the sample with an excitation beam while using another beam to turn all fluorophores but those in the excitation center off [145–147]. Since only fluorophores in the center are allowed to emit, any collected photons can be attributed to the position of the beam center. These methods are not very useful for diffusion estimation, since the scanning beams bleach the fluorophores too quickly. Instead MINFLUX was developed as a tracking microscope that minimizes bleaching while estimating the particle position over time. The idea of MIN- FLUX is to avoid exposing the particle to the light beam, i.e. only direct the excitation beam where the particle probably is not. If we do not observe emit- ted photons from this point, then we know that the fluorophore was not there. If we have some a priori information about the position of the fluorophore, we can use this cleverly to narrow down the location by probing positions around the most likely position. If we miss, i.e. do excite the fluorophore at one of these positions, we can use this information to improve our position estimate. Here we can take advantage of the intensity gradient of diffraction limited beams and use the number of photons collected as input for our posi- tion update. In Paper III, this concept was realized using “doughnut” shaped

54 beams. In this case, the fluorophore was kept in in the middle and probed with a pattern of one central and three offset doughnut beams (Paper III figure 2B). For a stationary fluorophore, the pattern can be repeated and the total num- ber of detected emission photons from all cycles can be used to localize the fluorophore with precision down to the nanometer scale (Paper III figure 4). If the fluorophore is moving while imaging, a new position estimate must be made after each cycle based on the photons collected there. If the excitation beam pattern is recentered on the new estimated position after each cycle, the beams will follow in the particle and we now have a single particle tracking microscope (Paper III figure 5).

Tracking results For the tracking of ribosomal subunits in Paper III, a cycling rate of 8 kHz was used and the excitation beams were moved using electro-optical deflectors and a piezoelectric mirror on the small (~1 μm) and the large (~20 μm) length scales, respectively. A field-programmable gate array (FPGA) was employed to allow calculation of a new position estimate with an 8 kHz cycling rate. While tracking the small E. coli ribosomal subunit, labeled by the switchable fluorescent protein mEos2 [139], less than a fifth of the number of photons were needed to achieve the same localization precisions with camera tracking. Particle trajectories of up to a half second could be acquired, with the trajec- tory length mainly limited by fluorophore blinking (Paper III figure S12A). When mEos2 blinks, it temporarily (average ~0.6 ms) goes into a dark state and can then diffuse out of center of the excitation pattern and will be excited with much higher light intensity and risk bleaching. Blinking is a common phenomenon in fluorescent proteins and is rarely characterized on a millisec- ond time scale. The result of the ribosomal subunit tracking shows a similar overall diffu- sivity as seen before [139], but now periods of faster and slower diffusion can be distinguished within the same trajectory (Paper III figure 5D). These peri- ods are on the order of 100 ms, and could represent small ribosomal subunits probing RNA for a ribosomal binding site. Short-lived interactions between the small ribosomal subunit and unspecific RNA is expected to exist, but has not been observed previously in vivo. The MINFLUX concept has been developed further to track particles on both fast and very slow timescales and in three dimensions, now using Gauss- ian excitation beams instead of doughnut beams. One important aim has been to track the diffusion of chromosomal loci, for example using MalI-mVenus bound at the codA-cynR locus (Figure 7A). If chromosomal loci experience anomalous diffusion, MINFLUX could give the opportunity to explore behav- ior at different length scales, especially at longer distances which are most relevant to cell-wide molecular search. Attempts to characterize the diffusion

55 behavior of chromosome-bound MalI-mVenus have already been done. How- ever, at the time of writing there are still challenges, including sample drift, that precludes drawing reliable conclusions.

56 Discussion

The results here confirm that homologous recombination repair does occur between segregated sites in E. coli, and that it is both efficient, fast and has no measurable fitness cost. First after a DSB, RecA is seen to aggregate on the site of a DSB. This has been seen both in this study, and with RecA-mCherry after SbcCD-induced breaks at non-segregated locations [121]. This focus very likely represents the loading and binding of RecA to ssDNA created by RecBCD. The initial RecA focus involves virtually all labeled RecA in the cell within on average less than a minute. If we assume that wild type RecA behaves similarly, about 900 RecA are bound to DNA within a minute. Rec- BCD has been shown to degrade dsDNA in vivo at a rate of 1 600 bp/s, and assuming half the speed after a chi encounter, it will every second produce ssDNA for 270 RecA monomers to bind. With two RecBCD machines pro- cessing one DNA end each, this is with a good margin enough to bind all the initially available RecA within seconds. Actually, it raises an opposite ques- tion: How come that all available RecA is not consumed by one of the DSB ends? Since nucleation of RecA filaments is very slow, the sequestration of RecA by one DSB end would prevent the ssDNA of the other end from forming a ssDNA-RecA filament. The current model of two-ended HR repair requires two RecA-ssDNA ends. This problem could be solved either by coordinating the RecBCDs, making sure that post-chi degradation happens simultaneously so that one of the ends is not left without RecA. There is however no known mechanism to do so, and our finding that RecA foci can form either before or after the mCherry-ParB focus has dissolved suggests that the end processing is not synchronized. Alternatively, if RecA is only loaded sparsely on the ssDNA by RecBCD, the initially available RecA pool may last long enough to populate both newly created ssDNA ends with enough nucleation sites even if resection starts at slightly different time points. The SOS response will then ensure production of new RecA molecules which will bind and fill in the gaps between the nucleation sites. It should be mentioned that there is a possible alternative HR repair path- way that only requires a RecA to bind one of the ssDNA ends. Synthesis- dependent strand annealing (SDSA) is well documented in eukaryotes and in- volves strand invasion by just one of the break ends, extensive DNA synthesis to bridge the RecBCD-degraded sequence, reversal of the strand invasion and then annealing of the now extended ssDNA end to the other [148]. SDSA has

57 not been described in bacteria, although there are helicases that could theoret- ically mediate the strand invasion reversal [149]. UvrD and RecQ have been proposed among those [148], and while a UvrD deletion did not give any clear phenotype in cells with DSBs in this study, RecQ remains to be investigated. Notably, we do not have any clear observation of more than one elongated RecA-mVenus structure at a time in any cell. Phase Ia, with a single RecA focus, lasts circa 5 minutes before an elon- gated filament extends. Since the elongated filament is only observed when the DSB takes place in a segregated locus, the filament is likely to be the struc- ture that performs the HR search. However, there is no biochemically known step that would require such a wait period before HR search could commerce. Instead this time may be needed for enough new RecA to be produced, which then bind the ssDNA stretches already present and together form a filament. This filament may not straighten until it is sufficiently RecA-covered, and if this threshold is close to a full coverage it could be expected to extend in the rapid and sudden fashion observed in many cells here. When the ssDNA-RecA filament has extended, it will be ready to pair with dsDNA to probe for homology. Here the length and stiffness of the filament helps it to reach out of the local area into the other cell lobe where an intact chromosome with a sister locus can be found. A deletion of the recN gene is strongly deleterious for this step in the process, resulting in many cells failing HR repair. However, contrary to what was originally reported [150], some cells can repair DSBs although they take longer time to achieve pairing of the sister Additionally, the structure and the dynamics if the ssDNA-RecA filament appears different with RecA structures failing to extend, suggesting that RecN is involved in shaping the ssDNA-RecA filament. How that occurs is not known, but since we never observe two separate filaments extending through- out the cell, it can be imagined that RecN also binds the two ssDNA-RecA filaments together during search. However, we do not observe multiple struc- tures even when RecN is deleted, making it more likely that RecN has another function in organizing the searching filaments. A recent in vitro study found that RecN preferentially first binds a ssDNA strand and then captures a dsDNA strand, suggesting it can have a role in stabilizing the interaction pre- ceding strand exchange [151]. This would however not explain why the elon- gated structures are different. In other organisms, RecN has been demon- strated to be first to bind severed dsDNA ends [152] and to stimulate DNA ligation by ligases [128]. We are still at the level of speculation about the in vivo function of RecN, beyond that it is critical for efficient homology search in E. coli. With a typical length of circa 2 μm, a fully extended filament would involve 400 bp and 130 RecA molecules. If ssDNA from both ends are in this struc- ture, it would include 260 RecA, which is only a fraction of circa 900 mole- cules are present in E. coli already before SOS induction [153]. After SOS

58 induction, the RecA concentration increases severalfold. If wild type RecA is as involved in the filaments as labeled RecA, and almost all RecA takes part in the visible structures, then most of the RecA molecules are not directly bound to DNA and may fulfill some yet unknown function. An alternative hypothesis is that the ssDNA is folded somehow to include much more DNA than suggested by the observed length of the structure. The same conclusion was previously reached with RecA4155-GFP bundles [112].

Search time of homologous recombination From the experiments with two fluorescent chromosomal markers, we could conclude that 8.3 minutes is an upper bound for the average HR cell-wide search time. This time is several times faster than what was previously de- scribed in E. coli [112], but is similar to the time from a DSB until sister pair- ing in C. crescentus [111]. The time measured here is necessarily an upper bound for the search time, since it also includes the time of RecBCD end re- section and RecA binding before the search as well as the transportation to- wards the cell center after search. Since the MalI-mVenus label is placed 25 kb away from the DSB, we also expect it to pair later than the DSB site itself. The pairing of fluorescent chromosomal markers close to a I-SceI cutting site has been reported previously, but on a much slower time scale by Lesterlin et al. [112]. There, pairing occurred 103 minutes after DSB induction, 51 minutes after formation of RecA-GFP “bundles”, and lasted for 50 minutes. While those experiments performed at 30 °C are expected to be slower, the repair time was still longer than one generation time. In this work, the entire process finishes in less than a half generation. What complicates the interpre- tation of the experiments described in Lesterlin et al. is that the chromosomal labels were located closely to the break and that such labels have been shown to be ejected by RecBCD processing [59]. Ejection of such labels are also observed here, and is the basis of DSB detection in this work. One possible way to reconcile the results is to consider the unspecific I-SceI activity re- ported in Paper I. Although this activity was hard to detect in the presence of a fully functional HR repair system, at 30 °C unspecific breaks were caused in a large proportion of cells as shown in a ΔrecA strain. Since unspecific nu- clease activity is uncontrolled for in Lesterlin at al., it is possible that DSBs at other loci contributed to those results. Unspecific DSBs could explain why chromosomal markers close to the restriction site can still be observed while they migrate towards each other. Chromosomes with multiple lesions have been observed to compact in the cell center [154], which could cause general colocalization of any chromosomal markers. The elongated RecA structures imaged here are also markedly different from the bundles observed with the labeled mutant used by Lesterlin et al., RecA4155-GFP. While the RecA-mVenus structures are highly dynamic and

59 short-lived, the bundles were typically stable. Instead, the bundles are similar to the RecA-rsEGFP2 structures observed in non-repairing cells with Mo- NaLISA (Figure 11). Both of these are similarly bent in the cell poles, do not show fast reorganization and are nucleoid excluded. RecA-rsEGFP2 bundles did however never lead to repair, suggesting that they are not active in HR search. It is currently not easy to fully explain the differences between our results, and further work will be needed, for example within characterization of the different labeled RecA variants used.

How long is the search time actually? By considering the behavior RecA-mVenus, we may attempt to further narrow down the timespan of HR search. When comparing the two- results and the results with RecA-mVenus, we can find that phase Ib with the elongated RecA structures appear circa 3 minutes before MalI-mVenus pairing (Paper I table 1). If these structures truly represent the ssDNA-RecA filament de- scribed in vitro, then they likely perform the search during this short time pe- riod. If this is the search time, it is as fast as the search time of the lac repressor, and more than an order of magnitude faster that of Cas9. Whether the search time is 8 or 3 minutes, a such short time span may seem physically improbable considering the slow diffusion of the ssDNA-RecA filament and the sister lo- cus. However, in paper I, we describe a physical model that allows homology search to proceed on a similar time scale. We have chosen to reverse the view of the two sister loci, the searcher and the target. From a physical perspective, these are of course the same, but since the ssDNA-RecA filament due to its stiffness with all likelihood is diffusing slower than the sought after dsDNA locus, it is more intuitive to consider the former as the target and the latter as the searcher. When the ssDNA-RecA filament extends throughout the cell, including through the sister chromatid, any point along the filament will provide one possible correct position for the sister locus to find the target. As the filament extends approximately along the cell long axis, it spans through all planes per- pendicular to the long axis. At every such plane, there is one ssDNA nucleo- tide which is the correct pairing for exactly one nucleotide in the sister nucle- oid. While there are several thousands of homologous positions between the filament and the sister locus, we can reduce the problem to considering only one. That is, since the first binding to the correct ssDNA nucleotide among many independently searching dsDNA base pairs has the same rate as that of one dsDNA searcher that can bind anywhere on the ssDNA, we only have to model the search of one single dsDNA position for any position on the fila- ment. Since the filament transcends all planes, we may restrict ourselves to one plane and transform the 3D search problem to a less complex 2D search.

60 The rate of reaching a central target in a plane through 2D diffusion has 2 π D been described as k= , where D is the diffusion constant and R and r the ln / radii of the plane (nucleoid) and the target (ssDNA-RecA), respectively [155]. Since there is one searcher that matches each target, i.e. which is relevant in each plane, the searcher concentration is = = , and the time to find the target is = = ∙ln . With the assumptions in Paper I, radii R = 200 nm and r = 1 nm, and ssDNA diffusion rate D = 0.001 μm2/s, we arrive at a time of ca 3.5 minutes. Depending on the time spent probing incorrect sequences and the accuracy of the D estimate, this reversed sampling reduced dimen- sionality model shows that search on the time scale observed is physically plausible. Our model assumes intersegment contact sampling, i.e. that several parts of the RecA-ssDNA filament can make different dsDNA contacts sim- ultaneously, but does not include RecA sliding on dsDSB. Sliding does not affect the search rate with this model, since the target in each plane is still only one nucleotide. In an evolutionary perspective, a short search time would also be expected. If a DSB takes hours to be repaired, this repair would only be of use to an E. coli under starvation conditions where growth is slow or non-existent. Other- wise the cell would have been long outcompeted by its peers before the repair is finished. However, under such conditions E. coli would also be largely hap- loid, preventing repair. The intricate system of HR repair machinery would be of little use if repair could not finish in less than one generation.

After homology search The steps after synapsis, i.e. strand exchange, HR DNA polymerization and ligation, have not yet been visualized in vivo. However, we find here that at some point the DSB locus and a region of 30-75 kb, together with the homol- ogous sister, are transported to the cell center. The mechanism and purpose of this transport are unknown. Immediately after synapsis the chromosomes are joined by Holliday junctions, and after their resolution they are either cate- nated or dimerized. Possibly, the transport is not necessary for the repair per se, but might be a consequence of the normal chromosomal segregation forces acting on what is now two chromosomes joined at the DSB position. After the transport, in phase II, the DSB region rests at the cell center for about 8.4 minutes, for a purpose that is also yet unknown. For HR repair to complete after pairing, the DNA corresponding to the resected DNA ends must be resynthesized. As these RecBCD-generated gaps could range several tens of kilobases, this time may be significant depending on the polymerase used. While the E. coli main replicative DNA polymerase III is as fast as 1000 nucleotides per second, the alternative DNA polymerases I and II only add 15-

61 30 nucleotides per second [156,157]. If the latter polymerases perform the bulk of the DNA resynthesis, this step would last up to a minute per kilobase and could use up a large part of the 8.4 minutes spent in phase II. The large cell to cell variation in this timespan, larger than in phase I, could then also be explained by different RecBCD processing lengths. However, there is no ob- vious reason why polymerase III could not perform the replication, at least not with the otherwise undamaged DNA ends of our experiments. Future experi- ments with polymerase I and II mutants using our system could be elucidating here. Some time of phase II may also be spent resolving the Holiday junctions that result from HR repair, and the following dimerisation or catenation. Ex- periments on SbcCD-generated DSBs suggests that uncoupling of the chro- mosomes delays cell division also when the two sisters are initially unsegre- gated [125]. Mutations in both Holliday junction resolution pathways, RecG and RuvABC, prolong the time spent in phase II and sometimes prevent cells from leaving it, leading to the conclusion that the resolution take place here. The helicase RecG has an enigmatic function in HR repair, and has been suggested to participate in both migration of Holliday junctions, positioning of the primosome or removal of incorrect primosome substrates [127,158,159]. In this study, the increase in mCherry-ParB foci seen in ΔrecG agrees with the anomalous replication that has been shown in a such mutant, and was suggested to result from incorrect primosome positioning [160]. The partial HR deficiency phenotypes of either ΔruvA, ΔruvC and ΔrecG also agree with previous experiments where DSBs were induced in bulk cul- tures [161,162]. In addition to this, these bulk experiments show that ΔrecG combined with either ΔruvA or ΔruvC inhibited all detectable HR repair. While the RuvABC resolution pathway appears to be the most important based in the results here, there is also a considerable proportion of HR inter- mediates which are not resolved by RuvABC as shown by the ΔrecG pheno- type. What is not clear is if these two pathways are necessary to resolve two separate classes of recombination intermediates or if they can handle an over- lapping set of substrates. If the ratios of successful versus failed repair can be measured absolutely (as opposed to current relative number given in Figure 15A), it could be concluded if the activities are truly complementary or over- lapping, and open up for speculation on what the difference between these two types of HR intermediates could be.

The DNA damage penalty A cell with DNA damage requiring RecA-mediated repair experiences a sig- nificant generation time penalty, depending on the duration of the repair, as seen in Figure 13. However, the cell does not pause its growth to repair dam- age. Instead elongation and cell metabolism continues during the repair, even

62 if septation is inhibited by the SOS response. This results in daughter cells that are born larger than average and will divide again faster than average, effec- tively compensating for the division delay in the mother generation. This means that the cost of a DSB over two generations is almost none, which is a testament to the great efficiency of the DNA repair pathways. This agrees with the earlier bulk culture results showing that a double-ended DSB at an unseg- regated location gives a less than one percent growth deficit over several gen- erations [163], and explains how a significant delay in the division of a cell can be reconciled with a very small or non-existent cost observed over several generations. It is not surprising that the DSB repair mechanism has evolved to minimize the cost on a multi-generation time scale, since that is what matters in the evolutionary perspective. For the future, it would be enlightening if different types of RecA-repaira- ble DNA damage could be distinguished, including SSBs, one- and two-ended DSBs and DSBs at cohesive or segregated loci. Then the repair time and cost of each of those lesion types could be determined. This could be achieved by either labeling proteins involved only in each of these pathways, or deleting those genes to exclude certain pathways.

Conclusion and outlook From the results in this work, it is clear that DSB repair is a rapid and efficient process even in the difficult case of a break in an already segregated locus. Although this type of DSBs cannot be distinguished among the spontaneous breaks, there is also no sign that having a single DNA lesion confers any dis- advantage for the cell in the long term. Since this type of DNA damage is thought to be rare compared to single-ended DSBs and single-stranded le- sions, it was not clear from beforehand that cells would actually have such an efficient method to repair them. We did however observe a mechanism that appears to be well-tuned to work within the time frame given by the normal cell functions. The good adaption to such DSBs makes us believe that HR repair in segregated loci is a physiologically relevant mechanism that is active in vivo during the normal growth conditions of E. coli. It also suggests that E. coli regularly experiences this kind of DNA damage although its natural hab- itat is protected from both light and oxygen. While only E. coli has been studied here, these findings are likely to extend also to other bacteria. Pairing and RecA structures acting on similar timescales has been reported in C. crescentus and B. subtilis [111,164]. Since we now know that HR search and repair is physically possible to execute in circa 20 minutes in cells with the volume and genome size of E. coli, it would be sur- prising if it was less perfected in other types of cells with similar physical constraints. Exceptions are if either other species are either less exposed to DSB-generation forces, or if the ability to repair DSBs efficiently requires a

63 compromise which prevents the cell from simultaneously performing some other function that is more important in the natural environment of that spe- cies. Actually, the fact that E. coli can easily be evolved to be far more DNA damage resistant than wild type hints that some compromise between different function may indeed be the case [87,88]. There are still however many unanswered questions. How are the searching ssDNA-RecA filaments assembled, and why is much more RecA expressed than what would be needed to create the visible filaments? Furthermore, de- fining the function is the enigmatic RecN protein is paramount for understand- ing how the searching structure is organized. Currently RecN is the only pro- tein known to have a strong, but confusingly enough non-critical, influence on the HR search but where there is no accepted model explaining its role. There are also questions in the late stage of HR repair. Why cannot RuvABC and RecG compensate for the loss of each other? If they differ in in substrate spec- ificity, these two types of substrates should be results of different upstream processes which we cannot distinguish by current methods. A different question is to what degree the findings here can be transferred to eukaryotes. Since eukaryotic genomes are often orders of magnitude larger than that of E. coli, the search problem is also more complex. On the other hand, the more sophisticated eukaryotic genome organization means that a homologous locus can often be found nearby. The NHEJ pathway also offers eukaryotes an alternative when HR repair is difficult or impossible, but HR search is still this critical for crossing over the eukaryotic chromosomes to produce genetic variation. The model for HR search that we describe does not offer obvious advantages if the chromosomes are well organized and closely located, but Rad51 is despite this biochemically very similar to RecA. It would be exciting if the experiments described here could be repeated in eukaryotes to build a similar timeline for yeast or human cells.

64 Sammanfattning på svenska

I allt levande finns arvsmassa i form av DNA. Denna DNA är nödvändig för alla sorters liv vi känner till. Att ha DNA innebär dock också en hel del prak- tiska problem. Cellens DNA måste lagras kompakt för att få plats i cellen, det måste vara tillgängligt för att kunna läsas av, det ska kunna kopieras åt cellens avkommor och slutligen ska det fördelas mellan dem. För att cellens arvsmassa ska kunna föras vidare korrekt till avkommorna så måste det skyddas mot skador och eventuella skador som uppstår måste repareras. Att skador i arvsmassan uppstår är i någon mån oundvikligt då skad- ligt inflytande från strålning, reaktiva syreföreningar samt cellens egna pro- cesser är ständigt närvarande. I eukaryoter skapas till och med DNA-skador avsiktligt under meiotisk celldelning. Skadorna som uppstår är av olika slag. De sträcker sig från kemiska för- ändringar i enstaka DNA-bokstäver (eller baser), till skador i flera baser, sam- manbindning av närliggande baser och i det allra värsta fallet DNA-dubbel- strängsbrott. Gemensamt för alla slags skador är att de förhindrar korrekt ko- piering av arvsmassan, vilket oftast hindrar cellens fortplantning överhuvud- taget. På grund av detta har i stort sett alla organismer flera lagningsmekanismer för olika slags DNA-skador. Denna avhandling utforskar lagning av DNA-dubbelsträngsbrott i bakte- rien Escherichia coli. Dubbelsträngsbrott är när båda strängarna i DNA-dub- belhelixen går av nära varandra samtidigt. Denna form av skador är de farlig- aste för cellen, då de inte bara hindrar kopiering över själva brottet utan också låter kromosomens delar glida isär i rummet. För att laga ett dubbelsträngs- brott räcker det inte med en serie kemiska reaktioner genomföras vid skadan, utan först måste de två trasiga ändarna föras ihop igen. Om fel ändar sätts ihop så orsakas omkastningar i arvsmassan, vilka kan allvarligt påverka genuttryck och fördelningen av arvsmassa mellan avkommorna. I högre organismer kan sådana omkastningar leda till cancer. Den lagningsmekanism för dubbelsträngsbrott som finns i E. coli, lagning genom homolog rekombination (HR-lagning), finns också i eukaryota celler inklusive våra egna. HR-lagning använder en annan, oskadd, kopia av den plats på kromosomen där brottet skett som en mall för att para ihop de trasiga ändarna korrekt. Det innebär att tre olika delar av cellens DNA måste hittas och föras samman innan lagningen kan genomföras. Med tanke på hur mycket DNA cellen innehåller är sökningen för att hitta de rätta delarna, och då sär- skilt den oskadda kopian, oerhört svår. Vissa forskare har tvivlat på att det ens

65 är möjligt att hitta kopian inom rimlig tid, förutom i vissa specialfall då den ligger bredvid brottet från början. Genom att kombinera avancerad mikroskopi och mikrofluidik så kan vi här visa att dubbelsträngsbrott repareras snabbt och effektivt i E. coli, även då kopian befinner sig flera mikrometer bort. Vi kan se att lagningen sker på cirka 20 minuter, vilket är mindre än hälften av cellernas generationstid. Särskilt den befarat svåra sökningen efter lagningskopian är mycket snabbare än vän- tat. Den slutförs i genomsnitt på som längst åtta minuter, och kanske så snabbt som tre minuter. Det är inte lätt att förstå hur sökningen kan genomföras så fort. För att kunna paras ihop så måste delarna slumpmässigt stöta på varandra, då alla tre är stora och diffunderar mycket trögt genom cellen. Vi presenterar dock en ny fysikalisk modell där vi istället för att tänka oss att brottet letar efter kopian föreställer oss att kopian letar efter DNA kring brottstället, vilket har sträckts ut längs med cellen. En utsträckt form av proteinet RecA kan också observeras i cellerna efter att ett dubbelsträngsbrott skett. Under dessa förhållanden framstår det utifrån modellen faktiskt som möjligt att komponen- terna finner varandra inom den tidsram som experimenten visar. Genom att känna igen DNA-skadade celler med hjälp av ett neuralt nätverk kan vi också visa att lagningen av DNA-skador är så effektiv att skador inte kostar cellerna någon tid. Även om delningen av en DNA-skadad cell fördröjs något, så kommer dess döttrar att dela sig snabbare än vanligt så att de redan en cellgeneration senare kommit ifatt de oskadda cellerna. Detta visar hur för- finat och i vilket gott samspel med cellens normala processer HR-lagningen sker. Genom att mutera olika enzym som tros vara inblandade i lagningspro- cessen så har vi bekräftat att enzymen RecN, RuvABC och RecG är viktiga för lagning av dubbelsträngsbrott, men att ingen av dem är helt oundgänglig. Som tidigare misstänkts så är det förstnämnda enzymet viktigt tidigt under lagningen, då sökningen efter en kopia pågår, medan de senare är verksamma i slutet, när den lagade kromosomen åter ska skiljas från kopian så att kromo- somerna kan fördelas mellan cellens avkommor. Dessa resultat visar att lagning av dubbelsträngsbrott med användning av homologa kopior som befinner sig långt bort från brottet är tillräckligt effektiv för att vara en relevant och fungerande mekanism när E. coli växer i sin natur- liga miljö. Eftersom lagningsmekanismerna är liknande mellan de flesta bak- terier så gäller detta sannolikt även många andra sorters bakterier, och kanske även eukaryoter som oss själva.

66 Acknowledgements

I am immensely thankful to everyone who has been on my side and supported me throughout these years, in science and in life. These years have not always been easy. Instead, there have been ample opportunity to “learn the hard way” both inside and outside of work, but also many moments of joy. Many of you deserve special mention, and among you are My main advisor Johan Elf, for letting me in, giving me the freedom to the experiments I wished for myself (with variable results) and for the deep knowledge and tremendously insightful theoretical support whenever results went somewhere we would not have predicted. Thank you for these years! Magnus Johansson, I am sorry we failed to tame the ribosomes. But I am grateful for your enthusiasm in trying and for the support over the years. Tony Forster, thank you for persisting with the small RNAs. I actually newer thought they would end up published at all. Prune, thank you for teaching me all the techniques. I mean all of them. And also all the shortcuts. But also for ski trips in heavy winds, club nights and house parties. Kuba, thank you for being willing to come up to the dark cold north, always being critical of methods and results, and eventually turn- ing this into a successful research project! Francisco Balzarotti and Yvan Eilers for putting immeasurable time into constructing, reconstructing and aligning the first MINFLUX microscope, as well as Stefan Hell for repetitively inviting me there for measuring. And also to Elias and Bo for doing all of this again in Uppsala! Nam Ki Lee and coworkers for the interesting collaboration formed from a conference coincidence. I am happy my old master degree thesis once became the center of interest! Franchesca Pennacchietti and Ilaria Testa for access to the experimental MoNaLISA microscope, and Franchesca for impeccable patience in operating it.

All dear ICM coworkers! Dan, for great company in the lab of course, but I will remember you more for fording wild rivers and journeys on frozen lakes! Irmeli who I know I can always rely on to keep things in order. David for countless calibrations and alignments, as well as insightful comments on sci- ence and else. Jimmy, thank you for keeping everything in shape, always! Emil, thank you for lending lunch money and buying beer. Spartak, who can always suggest a solution to any problems. Camsund, who is always ready to

67 be helpful how much in a hurry you may be. Anna, good luck to you all of the family! Petter, for getting me started at this place once upon a time. Mike, for being a big loud American friend. Martin, for tirelessly explaining the many theories about counting photons. Javier, for being such an upbeat guy. Pra- neeth, for solving difficult problems. Misha, I hope the ribosomes will start to listen at some point. Ivan, Konrad, please keep the courses going! Oscar and Malin, good luck with your projects and don’t forget to live as well! The ICM runners, including Disa, Christoffer and Christian. It’s important to keep in shape! See you on Lidingö in fall? I also would like to extend a thank to the cleaning and support staff, whose names I don’t even know despite these years. I really appreciate your work!

Tack Juvenalerna för glädjen. Tack till alla Eskatologer för dumheter och smartheter, Chewie, H#, F*, DtB, FreddieP, Au, Ɐ@ & många fler! Särskilt tack till ^* för att du var där när det var som svårast.

Mamma och pappa, tack för allt stöd från åttiotalet fram till idag! Kära syskon, hoppas vi ses mera framöver! Trots att ni är för många för att få plats här så skulle jag skulle vilja nämna alla lärare och peppande klasskompisar.

Kära Sara, tack för att du har låtit mig vara jobbig och tack för att du har petat mig ännu en bit så många gånger! Allt hade varit mycket svårare utan dig, och jag hoppas jag kan göra det lättare för dig med när du en dag sitter i samma sits. Jag är så glad att jag har dig.

68 References

1. Barzel A, Kupiec M. Finding a match: how do homologous sequences get to- gether for recombination? Nat Rev Genet. 2008;9: 27–37. 2. Weiner A, Zauberman N, Minsky A. Recombinational DNA repair in a cellular context: a search for the homology search. Nat Rev Microbiol. 2009;7: 748–755. 3. Hakem R. DNA-damage repair; the good, the bad, and the ugly. EMBO J. 2008;27: 589–605. 4. Truglio JJ, Croteau DL, Van Houten B, Kisker C. Prokaryotic nucleotide excision repair: the UvrABC system. Chem Rev. 2006;106: 233–252. 5. Kuzminov A. Recombinational repair of DNA damage in Escherichia coli and bacteriophage lambda. Microbiol Mol Biol Rev. 1999;63: 751–813, table of con- tents. 6. Ward JF. DNA Damage Produced by Ionizing Radiation in Mammalian Cells: Identities, Mechanisms of Formation, and Reparability. In: Cohn WE, Moldave K, editors. Progress in Nucleic Acid Research and Molecular Biology. Academic Press; 1988. pp. 95–125. 7. Barnard S, Bouffler S, Rothkamm K. The shape of the radiation dose response for DNA double-strand break induction and repair. Genome Integr. 2013;4: 1. 8. Blaisdell JO, Wallace SS. Abortive base-excision repair of radiation-induced clustered DNA lesions in Escherichia coli. Proc Natl Acad Sci U S A. 2001;98: 7426–7430. 9. Yang N, Galick H, Wallace SS. Attempted base excision repair of ionizing radi- ation damage in human lymphoblastoid cells produces lethal and mutagenic dou- ble strand breaks. DNA Repair . 2004;3: 1323–1334. 10. Kozmin SG, Sedletska Y, Reynaud-Angelin A, Gasparutto D, Sage E. The for- mation of double-strand breaks at multiply damaged sites is driven by the kinetics of excision/incision at base damage in eukaryotic cells. Nucleic Acids Res. 2009;37: 1767–1777. 11. Michel B, Boubakri H, Baharoglu Z, LeMasson M, Lestini R. Recombination proteins and rescue of arrested replication forks. DNA Repair . 2007;6: 967–980. 12. Mehta A, Haber JE. Sources of DNA double-strand breaks and models of recom- binational DNA repair. Cold Spring Harb Perspect Biol. 2014;6: a016428. 13. Rickman K, Smogorzewska A. Advances in understanding DNA processing and protection at stalled replication forks. J Cell Biol. 2019;218: 1096–1107. 14. Steiner WW, Kuempel PL. Sister Chromatid Exchange Frequencies inEsche- richia coli Analyzed by Recombination at thedif Resolvase Site. J Bacteriol. 1998;180: 6269–6275. 15. Cox MM, Goodman MF, Kreuzer KN, Sherratt DJ, Sandler SJ, Marians KJ. The importance of repairing stalled replication forks. Nature. 2000;404: 37–41. 16. Sinha AK, Possoz C, Durand A, Desfontaines J-M, Barre F-X, Leach DRF, et al. Broken replication forks trigger heritable DNA breaks in the terminus of a circu- lar chromosome. PLoS Genet. 2018;14: e1007256.

69 17. Pennington JM, Rosenberg SM. Spontaneous DNA breakage in single living Escherichia coli cells. Nat Genet. 2007;39: 797–802. 18. Shee C, Cox BD, Gu F, Luengas EM, Joshi MC, Chiu L-Y, et al. Engineered proteins detect spontaneous DNA breakage in human and bacterial cells. Elife. 2013;2: e01222. 19. Vilenchik MM, Knudson AG. Endogenous DNA double-strand breaks: produc- tion, fidelity of repair, and induction of cancer. Proc Natl Acad Sci U S A. 2003;100: 12871–12876. 20. Waters CA, Strande NT, Wyatt DW, Pryor JM, Ramsden DA. Nonhomologous end joining: a good solution for bad ends. DNA Repair . 2014;17: 39–51. 21. Shibata A. Regulation of repair pathway choice at two-ended DNA double-strand breaks. Mutat Res. 2017;803-805: 51–55. 22. Chang HHY, Pannunzio NR, Adachi N, Lieber MR. Non-homologous DNA end joining and alternative pathways to double-strand break repair. Nat Rev Mol Cell Biol. 2017;18: 495–506. 23. Sallmyr A, Tomkinson AE. Repair of DNA double-strand breaks by mammalian alternative end-joining pathways. J Biol Chem. 2018;293: 10536–10546. 24. Aravind L, Koonin EV. Prokaryotic homologs of the eukaryotic DNA-end-bind- ing protein Ku, novel domains in the Ku protein and prediction of a prokaryotic double-strand break repair system. Genome Res. 2001;11: 1365–1374. 25. Doherty AJ, Jackson SP, Weller GR. Identification of bacterial homologues of the Ku DNA repair proteins. FEBS Lett. 2001;500: 186–188. 26. McGovern S, Baconnais S, Roblin P, Nicolas P, Drevet P, Simonson H, et al. C- terminal region of bacterial Ku controls DNA bridging, DNA threading and re- cruitment of DNA ligase D for double strand breaks repair. Nucleic Acids Res. 2016;44: 4785–4806. 27. Bertrand C, Thibessard A, Bruand C, Lecointe F, Leblond P. Bacterial NHEJ: a never ending story. Mol Microbiol. 2019;111: 1139–1151. 28. Pitcher RS, Green AJ, Brzostek A, Korycka-Machala M, Dziadek J, Doherty AJ. NHEJ protects mycobacteria in stationary phase against the harmful effects of desiccation. DNA Repair . 2007;6: 1271–1276. 29. Moeller R, Stackebrandt E, Reitz G, Berger T, Rettberg P, Doherty AJ, et al. Role of DNA repair by nonhomologous-end joining in Bacillus subtilis spore re- sistance to extreme dryness, mono- and polychromatic UV, and ionizing radia- tion. J Bacteriol. 2007;189: 3306–3311. 30. Malyarchuk S, Wright D, Castore R, Klepper E, Weiss B, Doherty AJ, et al. Ex- pression of Mycobacterium tuberculosis Ku and Ligase D in Escherichia coli re- sults in RecA and RecB-independent DNA end-joining at regions of microho- mology. DNA Repair . 2007;6: 1413–1424. 31. Chayot R, Montagne B, Mazel D, Ricchetti M. An end-joining repair mechanism in Escherichia coli. Proc Natl Acad Sci U S A. 2010;107: 2141–2146. 32. Dillingham MS, Kowalczykowski SC. RecBCD Enzyme and the Repair of Dou- ble-Stranded DNA Breaks. Microbiol Mol Biol Rev. 2008;72: 642–671. 33. Lamarche BJ, Orazio NI, Weitzman MD. The MRN complex in double-strand break repair and telomere maintenance. FEBS Lett. 2010;584: 3682–3695. 34. Cox MM. Recombinational DNA Repair in Bacteria and the RecA Protein. In: Moldave K, editor. Progress in Nucleic Acid Research and Molecular Biology. Academic Press; 1999. pp. 311–366. 35. Wyatt HDM, West SC. Holliday junction resolvases. Cold Spring Harb Perspect Biol. 2014;6: a023192. 36. Taylor A, Smith GR. Unwinding and rewinding of DNA by the RecBC enzyme. Cell. 1980;22: 447–457.

70 37. Hickson ID, Robson CN, Atkinson KE, Hutton L, Emmerson PT. Reconstitution of RecBC DNase activity from purified Escherichia coli RecB and RecC proteins. J Biol Chem. 1985;260: 1224–1229. 38. Capaldo-Kimball F, Barbour SD. Involvement of recombination genes in growth and viability of Escherichia coli K-12. J Bacteriol. 1971;106: 204–212. 39. Gorbalenya AE, Koonin EV. Helicases: amino acid sequence comparisons and structure-function relationships. Curr Opin Struct Biol. 1993;3: 419–429. 40. Aravind L, Makarova KS, Koonin EV. SURVEY AND SUMMARY: holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res. 2000;28: 3417– 3432. 41. Singleton MR, Dillingham MS, Gaudier M, Kowalczykowski SC, Wigley DB. Crystal structure of RecBCD enzyme reveals a machine for processing DNA breaks. Nature. 2004;432: 187–193. 42. Phillips RJ, Hickleton DC, Boehmer PE, Emmerson PT. The RecB protein of Escherichia coli translocates along single-stranded DNA in the 3′ to 5′ direction: a proposed ratchet mechanism. Mol Gen Genet. 1997;254: 319–329. 43. Chen HW, Ruan B, Yu M, Wang J d., Julin DA. The RecD subunit of the Rec- BCD enzyme from Escherichia coli is a single-stranded DNA-dependent ATPase. J Biol Chem. 1997;272: 10072–10079. 44. Masterson C, Boehmer PE, McDonald F, Chaudhuri S, Hickson ID, Emmerson PT. Reconstitution of the activities of the RecBCD holoenzyme of Escherichia coli from the purified subunits. J Biol Chem. 1992;267: 13564–13572. 45. Gabbidon MR, Rampersaud VE, Julin DA. Salt-stable complexes of the Esche- richia coli RecBCD enzyme bound to double-stranded DNA. Arch Biochem Bi- ophys. 1998;350: 266–272. 46. Lam ST, Stahl MM, McMilin KD, Stahl FW. Rec-mediated recombinational hot spot activity in bacteriophage lambda. II. A mutation which causes hot spot ac- tivity. Genetics. 1974;77: 425–433. 47. Henderson D, Weil J. Recombination-deficient deletions in bacteriophage lambda and their interaction with chi mutations. Genetics. 1975;79: 143–174. 48. Cockram CA, Filatenkova M, Danos V, El Karoui M, Leach DRF. Quantitative genomic analysis of RecA protein binding during DNA double-strand break re- pair reveals RecBCD action in vivo. Proc Natl Acad Sci U S A. 2015;112: E4735–42. 49. Schultz DW, Swindle J, Smith GR. Clustering of mutations inactivating a Chi recombinational hotspot. J Mol Biol. 1981;146: 275–286. 50. Cheng KC, Smith GR. Recombinational hotspot activity of Chi-like sequences. J Mol Biol. 1984;180: 371–377. 51. Taylor AF, Amundsen SK, Smith GR. Unexpected DNA context-dependence identifies a new determinant of Chi recombination hotspots. Nucleic Acids Res. 2016;44: 8216–8228. 52. Bianco PR, Kowalczykowski SC. The recombination hotspot Chi is recognized by the translocating RecBCD enzyme as the single strand of DNA containing the sequence 5’-GCTGGTGG-3'. Proc Natl Acad Sci U S A. 1997;94: 6706–6711. 53. Handa N, Ohashi S, Kusano K, Kobayashi I. Chi-star, a chi-related 11-mer se- quence partially active in an E. coli recC1004 strain. Genes Cells. 1997;2: 525– 536. 54. Cheng K, Wilkinson M, Chaban Y, Wigley DB. A conformational switch in re- sponse to Chi converts RecBCD from phage destruction to DNA repair. Nat Struct Mol Biol. 2020;27: 71–77.

71 55. Taylor AF, Smith GR. RecBCD enzyme is a DNA helicase with fast and slow motors of opposite polarity. Nature. 2003;423: 889–893. 56. Dillingham MS, Spies M, Kowalczykowski SC. RecBCD enzyme is a bipolar DNA helicase. Nature. 2003;423: 893–897. 57. Spies M, Amitani I, Baskin RJ, Kowalczykowski SC. RecBCD enzyme switches lead motor subunits in response to chi recognition. Cell. 2007;131: 694–705. 58. Roman LJ, Eggleston AK, Kowalczykowski SC. Processivity of the DNA hel- icase activity of Escherichia coli recBCD enzyme. J Biol Chem. 1992;267: 4207– 4214. 59. Wiktor J, van der Does M, Büller L, Sherratt DJ, Dekker C. Direct observation of end resection by RecBCD during double-stranded DNA break repair in vivo. Nucleic Acids Res. 2017. doi:10.1093/nar/gkx1290 60. Amundsen SK, Taylor AF, Reddy M, Smith GR. Intersubunit signaling in Rec- BCD enzyme, a complex protein machine regulated by Chi hot spots. Genes Dev. 2007;21: 3296–3307. 61. Yang L, Handa N, Liu B, Dillingham MS, Wigley DB, Kowalczykowski SC. Alteration of χ recognition by RecBCD reveals a regulated molecular latch and suggests a channel-bypass mechanism for biological control. Proc Natl Acad Sci U S A. 2012;109: 8907–8912. 62. Anderson DG, Kowalczykowski SC. The translocating RecBCD enzyme stimu- lates recombination by directing RecA protein onto ssDNA in a chi-regulated manner. Cell. 1997;90: 77–86. 63. Spies M, Kowalczykowski SC. The RecA binding locus of RecBCD is a general domain for recruitment of DNA strand exchange proteins. Mol Cell. 2006;21: 573–580. 64. Dixon DA, Kowalczykowski SC. The recombination hotspot chi is a regulatory sequence that acts by attenuating the nuclease activity of the E. coli RecBCD enzyme. Cell. 1993;73: 87–96. 65. Taylor AF, Smith GR. Strand specificity of nicking of DNA at Chi sites by Rec- BCD enzyme. Modulation by ATP and magnesium levels. J Biol Chem. 1995;270: 24459–24467. 66. Smith GR. How RecBCD enzyme and Chi promote DNA break repair and re- combination: a molecular biologist’s view. Microbiol Mol Biol Rev. 2012;76: 217–228. 67. Arakawa K, Uno R, Nakayama Y, Tomita M. Validating the significance of ge- nomic properties of Chi sites from the distribution of all octamers in Escherichia coli. Gene. 2007;392: 239–246. 68. Levy A, Goren MG, Yosef I, Auster O, Manor M, Amitai G, et al. CRISPR ad- aptation biases explain preference for acquisition of foreign DNA. Nature. 2015;520: 505–510. 69. Wigley DB. Bacterial DNA repair: recent insights into the mechanism of Rec- BCD, AddAB and AdnAB. Nat Rev Microbiol. 2013;11: 9–13. 70. Lin Z, Kong H, Nei M, Ma H. Origins and evolution of the recA/RAD51 gene family: evidence for ancient gene duplication and endosymbiotic gene transfer. Proc Natl Acad Sci U S A. 2006;103: 10328–10333. 71. Clark AJ, Margulies AD. ISOLATION AND CHARACTERIZATION OF RE- COMBINATION-DEFICIENT MUTANTS OF ESCHERICHIA COLI K12. Proc Natl Acad Sci U S A. 1965;53: 451–459. 72. Skarstad K, Boye E. Degradation of individual chromosomes in recA mutants of Escherichia coli. J Bacteriol. 1993;175: 5505–5509. 73. Story RM, Weber IT, Steitz TA. The structure of the E. coli recA protein mono- mer and polymer. Nature. 1992;355: 318–325.

72 74. Rehrauer WM, Kowalczykowski SC. The DNA binding site(s) of the Escherichia coli RecA protein. J Biol Chem. 1996;271: 11996–12002. 75. Joo C, McKinney SA, Nakamura M, Rasnik I, Myong S, Ha T. Real-Time Ob- servation of RecA Filament Dynamics with Single Monomer Resolution. Cell. Augusti 11, 2006;126: 515–527. 76. Galletto R, Amitani I, Baskin RJ, Kowalczykowski SC. Direct observation of in- dividual RecA filaments assembling on single DNA molecules. Nature. 2006;443: 875–878. 77. Bell JC, Plank JL, Dombrowski CC, Kowalczykowski SC. Direct imaging of RecA nucleation and growth on single molecules of SSB-coated ssDNA. Nature. 2012;491: 274–278. 78. Del Val E, Nasser W, Abaibou H, Reverchon S. RecA and DNA recombination: a review of molecular mechanisms. Biochem Soc Trans. 2019;47: 1511–1531. 79. Chen Z, Yang H, Pavletich NP. Mechanism of homologous recombination from the RecA-ssDNA/dsDNA structures. Nature. 2008;453: 489–484. 80. Yang D, Boyer B, Prévost C, Danilowicz C, Prentiss M. Integrating multi-scale data on homologous recombination into a new recognition mechanism based on simulations of the RecA-ssDNA/dsDNA structure. Nucleic Acids Res. 2015;43: 10251–10263. 81. Hsieh P, Camerini-Otero CS, Camerini-Otero RD. The synapsis event in the ho- mologous pairing of : RecA recognizes and pairs less than one helical re- peat of DNA. Proc Natl Acad Sci U S A. 1992;89: 6492–6496. 82. Qi Z, Redding S, Lee JY, Gibb B, Kwon Y, Niu H, et al. DNA sequence align- ment by microhomology sampling during homologous recombination. Cell. 2015;160: 856–869. 83. van der Heijden T, Modesti M, Hage S, Kanaar R, Wyman C, Dekker C. Homol- ogous recombination in real time: DNA strand exchange by RecA. Mol Cell. 2008;30: 530–538. 84. Zaitsev EN, Kowalczykowski SC. A novel pairing process promoted by Esche- richia coli RecA protein: inverse DNA and RNA strand exchange. Genes Dev. 2000;14: 740–749. 85. Pagès V, Koffel-Schwartz N, Fuchs RPP. recX, a new SOS gene that is co-tran- scribed with the recA gene in Escherichia coli. DNA Repair . 2003;2: 273–284. 86. Drees JC, Lusetti SL, Chitteni-Pattu S, Inman RB, Cox MM. A RecA filament capping mechanism for RecX protein. Mol Cell. 2004;15: 789–798. 87. Harris DR, Pollock SV, Wood EA, Goiffon RJ, Klingele AJ, Cabot EL, et al. Directed evolution of ionizing radiation resistance in Escherichia coli. J Bacteriol. 2009;191: 5240–5252. 88. Byrne RT, Klingele AJ, Cabot EL, Schackwitz WS, Martin JA, Martin J, et al. Evolution of extreme resistance to ionizing radiation via genetic adaptation of DNA repair. Elife. 2014;3: e01322. 89. Little JW. Mechanism of specific LexA cleavage: autodigestion and the role of RecA coprotease. Biochimie. 1991;73: 411–421. 90. Simmons LA, Foti JJ, Cohen SE, Walker GC. The SOS Regulatory Network. EcoSal Plus. 2008;2008. doi:10.1128/ecosalplus.5.4.3 91. Courcelle J, Khodursky A, Peter B, Brown PO, Hanawalt PC. Comparative gene expression profiles following UV exposure in wild-type and SOS-deficient Esch- erichia coli. Genetics. 2001;158: 41–64. 92. Menetski JP, Bear DG, Kowalczykowski SC. Stable DNA heteroduplex for- mation catalyzed by the Escherichia coli RecA protein in the absence of ATP hydrolysis. Proc Natl Acad Sci U S A. 1990;87: 21–25.

73 93. Kim SH, Ahn T, Cui TJ, Chauhan S, Sung J, Joo C, et al. RecA filament maintains structural integrity using ATP-driven internal dynamics. Sci Adv. 2017;3: e1700676. 94. Lee JY, Qi Z, Greene EC. ATP hydrolysis Promotes Duplex DNA Release by the RecA Presynaptic Complex. J Biol Chem. 2016;291: 22218–22230. 95. Kim JI, Cox MM, Inman RB. On the role of ATP hydrolysis in RecA protein- mediated DNA strand exchange. I. Bypassing a short heterologous insert in one DNA substrate. J Biol Chem. 1992;267: 16438–16443. 96. Rosselli W, Stasiak A. Energetics of RecA-mediated recombination reactions. Without ATP hydrolysis RecA can mediate polar strand exchange but is unable to recycle. J Mol Biol. 1990;216: 335–352. 97. Konola JT, Logan KM, Knight KL. Functional characterization of residues in the P-loop motif of the RecA protein ATP binding site. J Mol Biol. 1994;237: 20– 34. 98. Vink JNA, Martens KJA, Vlot M, McKenzie RE, Almendros C, Estrada Bonilla B, et al. Direct Visualization of Native CRISPR Target Search in Live Bacteria Reveals Cascade DNA Surveillance Mechanism. Mol Cell. 2019. doi:10.1016/j.molcel.2019.10.021 99. Jones DL, Leroy P, Unoson C, Fange D, Ćurić V, Lawson MJ, et al. Kinetics of dCas9 target search in Escherichia coli. Science. 2017;357: 1420–1424. 100. Alberty RA, Hammes GG. Application of the Theory of Diffusion-controlled Re- actions to Enzyme Kinetics. J Phys Chem. 1958;62: 154–159. 101. Hegner M, Smith SB, Bustamante C. Polymerization and mechanical properties of single RecA-DNA filaments. Proc Natl Acad Sci U S A. 1999;96: 10109– 10114. 102. Hagerman PJ. Flexibility of DNA. Annu Rev Biophys Biophys Chem. 1988;17: 265–286. 103. Riggs AD, Bourgeois S, Cohn M. The lac represser-operator interaction: III. Ki- netic studies. J Mol Biol. 1970;53: 401–417. 104. Hammar P, Leroy P, Mahmutovic A, Marklund EG, Berg OG, Elf J. The lac re- pressor displays facilitated diffusion in living cells. Science. 2012;336: 1595– 1598. 105. Berg OG, Winter RB, von Hippel PH. Diffusion-driven mechanisms of protein translocation on nucleic acids. 1. Models and theory. Biochemistry. 1981;20: 6929–6948. 106. Forget AL, Kowalczykowski SC. Single-molecule imaging of DNA pairing by RecA reveals a three-dimensional homology search. Nature. 2012;482: 423–427. 107. Ragunathan K, Liu C, Ha T. RecA filament sliding on DNA facilitates homology search. Elife. 2012;1: e00067. 108. Adzuma K. No sliding during homology search by RecA protein. J Biol Chem. 1998;273: 31565–31573. 109. Honigberg SM, Rao BJ, Radding CM. Ability of RecA protein to promote a search for rare sequences in duplex DNA. Proc Natl Acad Sci U S A. 1986;83: 9586–9590. 110. Yancey-Wrona JE, Camerini-Otero RD. The search for DNA homology does not limit stable homologous pairing promoted by RecA protein. Curr Biol. 1995;5: 1149–1158. 111. Badrinarayanan A, Le TBK, Laub MT. Rapid pairing and resegregation of distant homologous loci enables double-strand break repair in bacteria. J Cell Biol. 2015;210: 385–400.

74 112. Lesterlin C, Ball G, Schermelleh L, Sherratt DJ. RecA bundles mediate homology pairing between distant sisters during DNA break repair. Nature. 2014;506: 249– 253. 113. Colleaux L, D’Auriol L, Galibert F, Dujon B. Recognition and cleavage site of the intron-encoded omega transposase. Proc Natl Acad Sci U S A. 1988;85: 6022–6026. 114. Müller J, Oehler S, Müller-Hill B. Repression of lac promoter as a function of distance, phase and quality of an auxiliary lac operator. J Mol Biol. 1996;257: 21–29. 115. Renzette N, Gumlaw N, Nordman JT, Krieger M, Yeh S-P, Long E, et al. Local- ization of RecA in Escherichia coli K-12 using RecA-GFP. Mol Microbiol. 2005;57: 1074–1085. 116. Youngren B, Radnedge L, Hu P, Garcia E, Austin S. A plasmid partition system of the P1-P7par family from the pMT1 virulence plasmid of Yersinia pestis. J Bacteriol. 2000;182: 3924–3928. 117. Stouf M, Meile J-C, Cornet F. FtsK actively segregates sister chromosomes in Escherichia coli. Proc Natl Acad Sci U S A. 2013;110: 11157–11162. 118. Reidl J, Römisch K, Ehrmann M, Boos W. MalI, a novel protein involved in reg- ulation of the maltose system of Escherichia coli, is highly homologous to the repressor proteins GalR, CytR, and LacI. J Bacteriol. 1989;171: 4888–4899. 119. Kremers G-J, Goedhart J, van Munster EB, Gadella TWJ. Cyan and Yellow Super Fluorescent Proteins with Improved Brightness, Protein Folding, and FRET Förster Radius † , ‡. Biochemistry. 05/2006;45: 6570–6580. 120. Perrin A, Buckle M, Dujon B. Asymmetrical recognition and activity of the I- SceI endonuclease on its site and on intron-exon junctions. EMBO J. 1993;12: 2939–2947. 121. Amarh V, White MA, Leach DRF. Dynamics of RecA-mediated repair of repli- cation-dependent DNA breaks. J Cell Biol. 2018;217: 2299–2307. 122. Masullo LA, Bodén A, Pennacchietti F, Coceano G, Ratz M, Testa I. Enhanced photon collection enables four dimensional fluorescence nanoscopy of living sys- tems. Nat Commun. 2018;9: 3281. 123. Grotjohann T, Testa I, Reuss M, Brakemann T, Eggeling C, Hell SW, et al. rsEGFP2 enables fast RESOLFT nanoscopy of living cells. Elife. 2012;1: e00248. 124. Krizhevsky A, Sutskever I, Hinton GE. ImageNet Classification with Deep Con- volutional Neural Networks. In: Pereira F, Burges CJC, Bottou L, Weinberger KQ, editors. Advances in Neural Information Processing Systems 25. Curran As- sociates, Inc.; 2012. pp. 1097–1105. 125. White MA, Darmon E, Lopez-Vernaza MA, Leach DRF. DNA double strand break repair in Escherichia coli perturbs cell division and chromosome dynamics. PLoS Genet. 2020;16: e1008473. 126. Feliciello I, Zahradka D, Zahradka K, Ivanković S, Puc N, Đermić D. RecF, UvrD, RecX and RecN proteins suppress DNA degradation at DNA double- strand breaks in Escherichia coli. Biochimie. 2018;148: 116–126. 127. Whitby MC, Vincent SD, Lloyd RG. Branch migration of Holliday junctions: identification of RecG protein as a junction specific DNA helicase. EMBO J. 1994;13: 5220–5228. 128. Reyes ED, Patidar PL, Uranga LA, Bortoletto AS, Lusetti SL. RecN is a cohesin- like protein that stimulates intermolecular DNA interactions in vitro. J Biol Chem. 2010;285: 16521–16529. 129. Finch PW, Chambers P, Emmerson PT. Identification of the Escherichia coli recN gene product as a major SOS protein. J Bacteriol. 1985;164: 653–658.

75 130. Sakai A, Cox MM. RecFOR and RecOR as distinct RecA loading pathways. J Biol Chem. 2009;284: 3264–3272. 131. Centore RC, Sandler SJ. UvrD limits the number and intensities of RecA-green fluorescent protein structures in Escherichia coli K-12. J Bacteriol. 2007;189: 2915–2920. 132. Vickridge E, Planchenault C, Cockram C, Junceda IG, Espéli O. Management of E. coli sister chromatid cohesion in response to genotoxic stress. Nat Commun. 2017;8: 14618. 133. Hanada K, Ukita T, Kohno Y, Saito K, Kato J, Ikeda H. RecQ DNA helicase is a suppressor of illegitimate recombination in Escherichia coli. Proc Natl Acad Sci U S A. 1997;94: 3860–3865. 134. Piekarski G. Cytologische Untersuchungen an Paratyphus-und Colibakterien. Ar- chiv für Mikrobiologie. 1937;8: 428–439. 135. Eltsov M, Zuber B. Transmission electron microscopy of the bacterial nucleoid. J Struct Biol. 2006;156: 246–254. 136. Nielsen HJ, Ottesen JR, Youngren B, Austin SJ, Hansen FG. The Escherichia coli chromosome is organized with the left and right chromosome arms in separate cell halves. Mol Microbiol. 2006;62: 331–338. 137. Wang X, Liu X, Possoz C, Sherratt DJ. The two Escherichia coli chromosome arms locate to separate cell halves. Genes Dev. 2006;20: 1727–1731. 138. Mäkelä J, Sherratt DJ. Organization of the Escherichia coli Chromosome by a MukBEF Axial Core. Mol Cell. 2020;0. doi:10.1016/j.molcel.2020.02.003 139. Sanamrad A, Persson F, Lundius EG, Fange D, Gynnå AH, Elf J. Single-particle tracking reveals that free ribosomal subunits are not excluded from the Esche- richia coli nucleoid. Proc Natl Acad Sci U S A. 2014;111: 11413–11418. 140. Hedén Gynnå A. Induction kinetics of the lac operon: Studied by single molecule methods. 2014. 141. Libby EA, Roggiani M, Goulian M. Membrane protein expression triggers chro- mosomal locus repositioning in bacteria. Proc Natl Acad Sci U S A. 2012;109: 7445–7450. 142. Rust MJ, Bates M, Zhuang X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat Methods. 2006;3: 793–795. 143. Betzig E, Patterson GH, Sougrat R, Lindwasser OW, Olenych S, Bonifacino JS, et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science. 2006;313: 1642–1645. 144. Sharonov A, Hochstrasser RM. Wide-field subdiffraction imaging by accumu- lated binding of diffusing probes. Proc Natl Acad Sci U S A. 2006;103: 18911– 18916. 145. Hofmann M, Eggeling C, Jakobs S, Hell SW. Breaking the diffraction barrier in fluorescence microscopy at low light intensities by using reversibly photoswitch- able proteins. Proc Natl Acad Sci U S A. 2005;102: 17565–17569. 146. Klar TA, Hell SW. Subdiffraction resolution in far-field fluorescence micros- copy. Opt Lett. 1999;24: 954–956. 147. Bretschneider S, Eggeling C, Hell SW. Breaking the diffraction barrier in fluo- rescence microscopy by optical shelving. Phys Rev Lett. 2007;98: 218103. 148. Verma P, Greenberg RA. Noncanonical views of homology-directed DNA repair. Genes Dev. 2016;30: 1138–1154. 149. Ayora S, Carrasco B, Cárdenas PP, César CE, Cañas C, Yadav T, et al. Double- strand break repair in bacteria: a view from Bacillus subtilis. FEMS Microbiol Rev. 2011;35: 1055–1081.

76 150. Picksley SM, Attfield PV, Lloyd RG. Repair of DNA double-strand breaks in Escherichia coli K12 requires a functional recN product. Mol Gen Genet. 1984;195: 267–274. 151. Keyamura K, Hishida T. Topological DNA-binding of structural maintenance of chromosomes-like RecN promotes DNA double-strand break repair in Esche- richia coli. Communications Biology. 2019;2: 413. 152. Rösch TC, Altenburger S, Oviedo-Bocanegra L, Pediaditakis M, Najjar NE, Fritz G, et al. Single molecule tracking reveals spatio-temporal dynamics of bacterial DNA repair centres. Sci Rep. 2018;8: 16450. 153. Karu AE, Belk ED. Induction of E. coli recA protein via recBC and alternate pathways: quantitation by enzyme-linked immunosorbent assay (ELISA). Mol Gen Genet. 1982;185: 275–282. 154. Odsbu I, Morigen, Skarstad K. A reduction in ribonucleotide reductase activity slows down the chromosome replication fork but does not change its localization. PLoS One. 2009;4: e7617. 155. Berg OG, Blomberg C. Association kinetics with coupled diffusional flows. Spe- cial application to the lac repressor--operator system. Biophys Chem. 1976;4: 367–381. 156. Schwartz JJ, Quake SR. Single molecule measurement of the “speed limit” of DNA polymerase. Proc Natl Acad Sci U S A. 2009;106: 20294–20299. 157. Bonner CA, Stukenberg PT, Rajagopalan M, Eritja R, O’Donnell M, McEntee K, et al. Processive DNA synthesis by DNA polymerase II mediated by DNA poly- merase III accessory proteins. J Biol Chem. 1992;267: 11431–11438. 158. Rudolph CJ, Upton AL, Briggs GS, Lloyd RG. Is RecG a general guardian of the bacterial genome? DNA Repair . 2010;9: 210–223. 159. Lloyd RG, Rudolph CJ. 25 years on and no end in sight: a perspective on the role of RecG protein. Curr Genet. 2016;62: 827–840. 160. Azeroglu B, Mawer JSP, Cockram CA, White MA, Hasan AMM, Filatenkova M, et al. RecG Directs DNA Synthesis during Double-Strand Break Repair. PLoS Genet. 2016;12: e1005799. 161. Meddows TR, Savory AP, Lloyd RG. RecG helicase promotes DNA double- strand break repair. Mol Microbiol. 2004;52: 119–132. 162. Donaldson JR, Courcelle CT, Courcelle J. RuvAB and RecG are not essential for the recovery of DNA synthesis following UV-induced DNA damage in Esche- richia coli. Genetics. 2004;166: 1631–1640. 163. Darmon E, Eykelenboom JK, Lopez-Vernaza MA, White MA, Leach DRF. Re- pair on the go: E. coli maintains a high proliferation rate while repairing a chronic DNA double-strand break. PLoS One. 2014;9: e110784. 164. Kidane D, Graumann PL. Dynamic formation of RecA filaments at DNA double strand break repair centers in live cells. J Cell Biol. 2005;170: 357–366. 165. Stracy M, Lesterlin C, Garza de Leon F, Uphoff S, Zawadzki P, Kapanidis AN. Live-cell superresolution microscopy reveals the organization of RNA polymer- ase in the bacterial nucleoid. Proc Natl Acad Sci U S A. 2015;112: E4390–9.

77 Acta Universitatis Upsaliensis Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology 1947 Editor: The Dean of the Faculty of Science and Technology

A doctoral dissertation from the Faculty of Science and Technology, Uppsala University, is usually a summary of a number of papers. A few copies of the complete dissertation are kept at major Swedish research libraries, while the summary alone is distributed internationally through the series Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology. (Prior to January, 2005, the series was published under the title “Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology”.)

ACTA UNIVERSITATIS UPSALIENSIS Distribution: publications.uu.se UPPSALA urn:nbn:se:uu:diva-410039 2020