DNA Ligase C and Prim­PolC participate in base excision repair in mycobacteria

Article (Accepted Version)

Plocinski, Przemyslaw, Brissett, Nigel C, Bianchi, Julie, Brzostek, Anna, Korycka-Machała, Małgorzata, Dziembowski, Andrzej, Dziadek, Jaroslaw and Doherty, Aidan J (2017) DNA Ligase C and Prim-PolC participate in base excision repair in mycobacteria. Nature Communications, 8 (1). a1251 1-12. ISSN 2041-1723

This version is available from Sussex Research Online: http://sro.sussex.ac.uk/id/eprint/70317/

This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version.

Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University.

Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available.

Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way.

http://sro.sussex.ac.uk 1

2 DNA Ligase C and Prim-PolC participate

3 in base excision repair in mycobacteria

4

5 Przemysáaw PáociĔski1,2§, Nigel C. Brissett1§, Julie Bianchi1,4, Anna Brzostek2,

6 Maágorzata Korycka-Machaáa2 , Andrzej Dziembowski3, Jarosáaw Dziadek2 and

7 Aidan J. Doherty1*

8

9 1 Genome Damage and Stability Centre, School of Life Sciences, University of 10 Sussex, Brighton BN1 9RQ, UK.

11 2 Institute of Medical Biology, Polish Academy of Sciences, Lodowa 106, 93-232 12 Lodz, Poland.

13 3 Institute of Biochemistry and Biophysics, Polish Academy of Sciences, PawiĔskiego 14 5A, 02-106 Warsaw, Poland.

15 4 Present address: Department of Oncology-Pathology, Cancer Center Karolinska, 16 Karolinska Institutet, R8:04, Karolinska Universitetssjukhuset Solna 171 76 17 Stockholm, Sweden.

18 § These authors contributed equally to this work

19

20 * Corresponding author: Aidan Doherty, email: [email protected]

21

22 Keywords: LigC, LigD, BER, , , repair, ligase, PrimPol, AEP, 23 NHEJ, SSB, RNA, mycobacteria

24 25

26 Abstract

27 Prokaryotic Ligase D is a conserved DNA repair apparatus processing DNA double-

28 strand breaks in stationary phase. An orthologous Ligase C (LigC) complex also co-

29 exists in many bacterial species but its function is unknown. Here, we show that the

30 LigC complex interacts with core BER in vivo and demonstrate that

31 together these factors constitute an excision repair apparatus capable of repairing

32 damaged bases and abasic sites. The polymerase component, which contains a

33 conserved C-terminal structural loop, preferentially binds to and fills-in short gapped

34 DNA intermediates with RNA and LigC ligates the resulting nicks to complete repair.

35 Components of the LigC complex, like LigD, are expressed upon entry into stationary

36 phase and cells lacking either of these pathways exhibit increased sensitivity to

37 oxidising genotoxins. Together, these findings establish that the LigC complex is

38 directly involved in an excision repair pathway(s) that repairs DNA damage with

39 ribonucleotides during stationary phase.

40

1

41 Introduction

42 In bacteria, canonical primer synthesis during DNA replication is carried out by

43 enzymes from the DnaG superfamily 1,2. In contrast, priming of replication in archaea

44 and eukaryotes is performed by members of the archaeo-eukaryotic primase (AEP)

45 superfamily 3,4. However, AEPs are also widely distributed in most bacterial species

46 4, where they have evolved to fulfill divergent roles and have recently been

47 reclassified as a family of called primase-polymerases (Prim-Pols) to

48 better reflect their evolutionary origins and more diverse roles in DNA metabolism 3,4.

49 The best characterized bacterial AEP is Prim-PolD (PolDom) that forms part of a

50 multifunctional non homologous end-joining (NHEJ) DNA break repair complex

51 called Ligase D (LigD). In mycobacterial LigD, an AEP is fused to phosphoesterase

52 and ATP-dependent DNA ligase domains that, together with the Ku repair factor,

53 coordinate the sequential synapsis, processing and repair of double-strand breaks

54 (DSBs) in stationary phase 5-9. However, in many other species these domains are

55 encoded by separate operonically-associated 5,10.

56 Many bacterial species, including Actinobacteria, encode multiple ATP-dependent

57 DNA ligases and Prim-Pols similar to those found in mycobacterial LigD. Exemplary

58 classes of Prim-Pols and DNA ligases in selected gram-positive bacteria are shown

59 in Supplementary Fig. 1. Mycobacterium smegmatis encodes four distinct primase-

60 polymerases. Although it is known that the Prim-PolD subunit of LigD is involved in

61 the NHEJ repair complex, the roles of the other stand-alone Prim-Pols remain

62 unknown. One orthologue, MSMEG_6301 (Prim-PolC / LigC Pol / PolD) is encoded

63 in the genomic proximity of two DNA ligase genes (LigC1: msmeg_6302 and LigC2:

64 msmeg_6304). Although LigD’s role in NHEJ-mediated repair is firmly established

65 7,9,11, the pathways in which these other ligases and Prim-Pols operate in remains 2

66 unclear. Due to their similarities with the NHEJ complex, it was proposed that DNA

67 ligase C and operonically-associated Prim-PolC (LigC-Pol or PolD1) function as a

68 “back-up” complex for the LigD pathway 12,13, although this has not been proven.

69 While the basic biochemical characteristics of Prim-PolC and PolD2, another closely

70 related AEP, were partially described in a previous study 12, Prim-PolC failed to act

71 as a “back-up” of Prim-PolD during DSB repair. Another study reported that LigC

72 was responsible for ligation of ~20% of DSBs in a LigD mutant strain containing a

73 mutation disrupting the ligase function but possessing functional primase and

74 exonuclease domains that can still participate in the NHEJ 14.

75

76 Here, we sought to elucidate the role(s) of LigC, and its associated polymerase

77 (Prim-PolC), in mycobacterial DNA metabolism. We show that components of the

78 LigC complex act together to preferentially fill in short DNA gaps with ribonucleotides

79 (rNTPs), followed by sealing of the resulting nicks. We identify that LigC operon-

80 encoded proteins associate with components of the base excision repair (BER)

81 apparatus in vivo and, together with the excision repair machinery, form repair

82 complexes that remove damaged DNA and abasic sites and fill in resulting gaps with

83 rNTPs. Structural studies reveal that a conserved C-terminal element of Prim-PolC

84 forms an extended surface surrounding the active site, which likely supports gap

85 recognition and synthesis. Components of the LigC complex are expressed upon

86 entry into stationary phase, when the intracellular pool of dNTPs is naturally

87 depleted, necessitating the preferential incorporation of rNTPs after lesion removal.

88 Abrogation of the LigC complex, results in an increased sensitivity of mycobacterial

89 cell to oxidative damaging agents. LigD-deficient cells are also sensitive to oxidative

90 damaging agents, uncovering an unexpected duel role in both excision and DSB

3

91 repair. Together, these findings establish that the LigC complex operates in excision

92 repair pathways that specifically mends lesions and single-strand breaks in

93 stationary phase.

94

95

4

96 Results

97 DNA ligase C complex interacts with BER enzymes in vivo

98 Although LigD-associated Prim-Pols and ligases facilitate NHEJ repair of DSBs in

99 bacteria, the functions of closely related orthologues remains unclear. To investigate

100 the cellular roles of Prim-PolC, and its operonically-associated DNA ligases (LigC1

101 and LigC2), we first sought to define their physical associations with other proteins /

102 complexes in mycobacterial cells. We employed an eGFP-tag (enhanced green

103 fluorescent protein) facilitated co-purification approach, in which each of the proteins

104 (Prim-PolC, LigC1 and LigC2) were fused to eGFP to serve as baits for the

105 purification and identification of cellular partners. Purified complexes were cross-

106 linked with a reversible cross-linker, 3,3'-dithiobis (sulfosuccinimidyl propionate)

107 (DSSP), washed and subjected to mass spectrometry (MS) analysis 15. Complexes

108 were purified from cells grown to late stationary phase. As expected, tagged LigC

109 and Prim-PolC were among most abundant “hits” in the MS samples, supporting the

110 specificity of the purification process. A list of polypeptides that co-purified with each

111 bait, known to be association with DNA repair processes, are shown in Figure 1a. A

112 full list of all co-purified proteins is provided in Supplementary Data 1. Significantly,

113 all three bait proteins co-purified with multiple repair enzymes, including the major

114 DNA glycosylases and nucleases involved in base excision repair (BER) repair

115 pathways. Reverse experiments using mycobacterial eGFP-tagged endonuclease IV

116 as bait in M. bovis BCG resulted in co-purification of LigC, confirming that these two

117 proteins form a complex in vivo (Supplementary Data 2). Together, these in vivo

118 studies reveal that Prim-PolC and LigC interact with BER components, suggesting

119 that they function primarily in the repair of damaged bases, abasic sites and single-

120 strand breaks (SSBs). 5

121

122 DNA ligase C complex interacts with BER enzymes in vitro

123 To validate the interactions of LigC1 and Prim-PolC with components of the BER

124 machinery identified in our pull-down studies, we expressed and purified

125 recombinant forms of each of these BER enzymes (Supplementary Fig. 2). Taking

126 advantage of Prim-PolC and LigC1-specific antibodies, a slot blotting-based

127 methodology was employed to authenticate the interactions between the identified

128 proteins. We also used this approach to address if Prim-PolC or LigC1 interacted

129 with Ku, to determine if they also function in NHEJ repair. Mono-functional DNA

130 glycosylase (MPG) was purified alongside bifunctional glycosylases (FPG and Nth),

131 that possess abasic site lyase activity, as well as several of the key end-processing

132 nucleases including: exodeoxyribonuclease VIIB (ExoVIIB), endonuclease IV

133 (EndoIV) and both exonuclease III paralogues (ExoIII, XthA) from M. smegmatis.

134 PolD2, the Prim-Pol with closest similarity to Prim-PolC, as well as the major BER

135 repair polymerase PolA, were also purified for these in vitro interaction studies.

136 Recombinant proteins were spotted onto nitrocellulose membrane and incubated

137 with either Prim-PolC or LigC1, respectively, followed by extensive washing and

138 subsequent detection by western blotting using Prim-PolC / LigC1-specific

139 antibodies. We observed significant signals with many of the BER proteins incubated

140 with LigC1 (Figure 1b). MPG and FPG (MutM) glycosylases exhibited the strongest

141 interactions with LigC1. LigC1 also interacted with EndoIV, ExoIII and XthA end-

142 processing enzymes and both Prim-PolC and PolD2. A summary of the major

143 interactions identified in these studies is shown in Figure 1c. In contrast, Prim-PolC

144 only interacts with LigC1 but not with either Ku or ExoVIIB, which therefore served as

145 control proteins for these experiments. Together, these data establish that LigC

6

146 appears to act as a key scaffolding protein, akin to ligase III in eukaryotes 16,17,

147 required for sequential recruitment of BER repair enzymes and therefore potentially

148 coordinating the processing and repair of lesions and SSBs in vivo 18-20. To further

149 validate the interactions with the glycosylase, we performed pull-down assays of

150 LigC1 with the two potentially interacting glycosylases, FPG and MPG and observed

151 a strong association of tag-free LigC1 with FPG coated His-trap beads and a weaker

152 interaction with MPG (Supplementary Fig. 3).

153

154 Previously, comparative sequence database mining analysis of bacterial operons

155 uncovered a genetic link between LigD and Ku, which led to the identification of the

156 NHEJ pathway in prokaryotes 21,22. Similar in silico comparative operon analysis of

157 many mycobacterial genomes revealed that the LigC and Prim-PolC genes are co-

158 transcribed with the major base excision repair bifunctional glycosylase, FPG, in

159 operons present in M. avium, M. intracellulare and Mycobacterium JS623, a close

160 relative of M. smegmatis. Notably, these operons are also located within close

161 genomic proximity of other DNA excision repair enzymes (PolA and UvrB;

162 Supplementary Fig. 3). Together, these findings further strengthen a functional

163 connection between LigC-related proteins and BER processes in mycobacteria.

164

165 Prim-PolC preferentially binds to short gapped DNA

166 Following validation of the association of Prim-PolC and LigC1 with components of

167 the BER machinery, we proceeded to biochemically characterize the capacity of

168 these enzymes to process DNA repair intermediates that arise during the processing

169 of BER substrates, e.g. abasic sites. We first performed electrophoretic mobility shift 7

170 assays (EMSAs) to determine the optimal DNA substrates for binding and

171 processing by Prim-PolC. The most common BER intermediates are, short DNA

172 gaps containing a phosphate moiety on the 5’ terminus, that arise from base excision

173 and subsequent abasic site removal. We therefore tested Prim-PolC’s capacity to

174 bind to an array of phosphorylated and non-phosphorylated DNA substrates,

175 containing various gap lengths. Notably, Prim-PolC’s binding affinity was significantly

176 stimulated by the presence of a 5’-phosphate group within a gap and it bound most

177 avidly to short gaps of 1-3 nucleotides (Figure 2a; Supplementary Fig. 4a). In

178 contrast with its NHEJ-specific orthologue, Prim-PolC bound much more weakly to

179 DNA substrates containing overhangs (Figure 2b).

180

181 NHEJ Prim-Pols bind strongly to 5’ phosphorylated DNA overhangs 23 and can

182 directly induce break synapsis 23-26. To determine if Prim-PolC also possesses end-

183 synapsis activity, we employed a microhomology-mediated end-joining (MMEJ)

184 assay that utilises substrates that mimic DSBs to measure this activity in vitro 27. As

185 expected, Prim-PolD efficiently promoted break synapsis and subsequent in-trans

186 extension of MMEJ intermediates (Figures 2c & d). In contrast, Prim-PolC did not

187 exhibit significant MMEJ activity, indicating that DSBs are not preferred substrates

188 for this . This identifies another significant difference in substrate specificities

189 between these two classes of Prim-Pols.

190

191 To cross-validate the observed DNA substrate binding affinities, we carried out

192 additional DNA binding studies using polyA-tailed DNA substrates coated onto

193 magnetic particles. Beads pre-coated with different DNA substrates were incubated

8

194 with whole cell mycobacterial lysates, washed extensively and bound preys were

195 identified by mass spectrometry. As expected, LigD and Ku were both found to co-

196 purify on DSB-like substrates with free ends. However, neither Ku or LigD bound to

197 this substrate when the terminal nucleotides were blocked with S-bond. Notably,

198 Prim-PolC and LigD, but not Ku, co-purified on a substrate containing a single

199 nucleotide gap, suggesting that both may contribute to the repair of such

200 intermediates in vivo (Supplementary Fig. 4b, Supplementary Data 3).

201

202 LigC complex repairs short gapped DNA intermediates

203 To establish the functional importance of Prim-PolC’s preferential gap binding

204 activity, we next performed primer-template extension assays using gapped

205 substrates (1,2,3 or 5 nt gap) in the presence of intracellular levels of divalent

206 cations. Consistent with the DNA binding results, Prim-PolC most efficiently

207 processed short DNA gaps (Figure 3a), with its polymerization activity significantly

208 stimulated by the presence of a phosphate group at the 5’ terminus of the gap

209 (Figure 3a). In common with Prim-PolD, Prim-PolC also preferentially inserted rNTPs

210 during gap-filling synthesis (Figure 3a). This intrinsic preference for NTPs over

211 dNTPs was quantified using a nucleotide competition assay (Supplementary Fig. 5a)

212 23. This assay revealed that its nucleotide preference factor (F) was ~190 fold in

213 favour of rNTP incorporation, which compares to an F value of ~70 fold for Mt Prim-

214 PolD (PolDom) 23.

215

216 The implication of this finding is that gap filling synthesis by Prim-PolC would

217 preferentially place RNA, or at least a single rNTP, on the 3’ side of the nick that 9

218 must then be ligated to DNA on the opposite 5’ side. To test this prediction, we

219 evaluated the ligation specificity of LigC1 using nicked DNA substrates containing

220 either DNA or RNA on the 3’ side and a phosphorylated DNA strand the 5’ side.

221 LigC1 preferentially ligated hybrids of RNA and DNA within the repaired gap, rather

222 than more typical DNA:DNA gaps (Supplementary Fig. 5b), consistent with the

223 previously reported ligation specificity of LigD 10,14. As repair synthesis by Prim-PolC

224 would result in the production of such hybrid RNA:DNA nicks in vitro, we next tested

225 whether the two enzymes co-operatively repair short DNA gaps. Together, Prim-

226 PolC and LigC1 efficiently filled in and ligated short (1-3nt) gaps with RNA but were

227 much less efficient at repairing longer DNA gaps (Figure 3b), consistent with the

228 reduced DNA binding and extension activities of Prim-PolC observed previously on

229 these substrates (Figure 3a and Supplementary Fig. 4a).

230

231 Reconstituting a LigC-dependent BER repair pathway in vitro

232 To establish if the LigC apparatus operates as part of a functional BER complex, we

233 sought to reconstitute a complete excision repair pathway in vitro using Prim-PolC,

234 LigC1 and the interacting BER proteins. We employed oligonucleotide substrates

235 containing either deoxyuracil (dU) or 8-oxoGuanine (8-oxoG) incorporated in a

236 central position of the labelled DNA strand to enable us to follow the consecutive

237 enzymatic processing steps of BER (Figure 4a). dU-containing substrates were

238 initially treated with UDG glycosylase and the resulting abasic site substrate was

239 then tested with the same enzyme combinations also assayed on 8-oxoG containing

240 substrates. We observed that FPG removed the oxidatively damaged DNA base and

241 processed the resulting abasic site with its lyase activity (Figure 4b, c). The 3’ end of

242 the gap was subsequently processed by either one of the 3’-phosphatase and / or 3’- 10

243 5’ exonuclease activities of EndoIV, ExoIII or XthA. The resulting DNA gap was

244 subsequently filled-in with RNA by Prim-PolC and, finally, the resulting nick was

245 ligated by LigC1. We observed that EndoIV was the most beneficial nuclease for

246 ensuring proper repair in vitro as it lacked strong 3’-5’ exonuclease activity.

247 Together, these findings establish that BER end-processing enzymes, shown to

248 interact with LigC1 in vivo, facilitate LigC complex-mediated repair of lesions arising

249 from abasic site formation in vitro.

250

251 Structure of a mycobacterial LigC-associated Prim-Pol

252 Although Prim-PolC shares many common features with its NHEJ orthologue, such

253 as 5’ phosphate recognition and gap filling synthesis, this enzyme is likely to have

254 evolved distinct structural features to facilitate its bespoke role in excision repair. To

255 define molecular differences between these distinct Prim-Pols, we elucidated the

256 crystal structure of Ms Prim-PolC (Table 1) and compared it to a NHEJ-specific Prim-

257 Pol. Prim-PolC retains the same overall fold of the mycobacterial Prim-PolD

258 (PolDom) structure23-26, which can be superposed with an RMSD of 1.712 Å (over

259 270 aligned positions, Figure 5a; Supplementary Fig. 6a). Shared structural features

260 include the phosphate recognition pocket containing the conserved basic residues

261 (K23, K35 & N20) that facilitate this key DNA interaction. Prim-PolC also retains

262 Loops 1, which is critically required to orient the template / 3’ overhanging strand,

263 and Loop 2 that accepts and positions the incoming primer strand from the adjacent

264 side of the break and also facilitates the binding / activation of the second metal ion

265 in the active site 23-26. These loops are critical for Prim-PolD’s engagement with the

266 break termini, promoting end-synapsis / MMEJ and are required for activation of the

267 catalytic centre to allow DNA synthesis to commence25,26. 11

268

269 Despite these similarities, Prim-PolC orthologues possess a strictly conserved C-

270 terminal extension whose function has not been ascribed (Figure 5b). In the Prim-

271 PolC structure, we observe that this additional C-terminal region (aa 294-336)

272 consists of a conserved alpha helix, (residues G299-R311), which lies across alpha

273 helix 4 after which the peptide chain becomes an unstructured loop (Figure 5a,b).

274 Notably, this structural element, termed Loop 3 (residues G313-P333), is positioned

275 between the two prominent surface loops (1 and 2) that together form a continuous

276 surface on one side of the active site pocket (Figures 5c, Supplementary Fig. 6b).

277 Apical Loop 1 residues, normally involved in template strand orientation 23-26, are not

278 conserved and are instead repurposed for locking Loop 3 into position

279 (Supplementary Fig. 6b & 7). Residue P94 packs against a pocket formed of

280 residues Y321, P322, K323 and M324 from Loop 3. This also results in Loop 1

281 adopting a position more proximal to the active site. Loop 2 residue W219 interacts

282 with Loop 3 residues P316, Y317, P333 and K335 via a mixture of hydrophobic and

283 polar interactions (Supplementary Fig. 6b & 7). Again, this has the effect of locking

284 Loop 3 into its current orientation, and also reorients Loop 2 into a more distal

285 position to the active centre, compared to the existing Prim-PolD MMEJ structure25.

286 A consequence of Loop 2’s interaction with Loop 3 is the fixing of Loop 2 into its

287 current orientation. The effects of limiting the flexibility of Loop 2, and therefore the

288 conserved arginine 224 (equivalent to R220 in Mt Prim-PolD), may have a significant

289 effect on Prim-PolC’s catalytic activity.

290

291 Superposition of Prim-PolC onto the structure of a DNA-bound Prim-PolD MMEJ

292 complex revealed that the 5’ phosphate fits into the conserved binding pocket and 12

293 the dsDNA is also well positioned into the structure (Figure 5d and Supplementary

294 Fig. 6c & 6d). The first three bases of the single-stranded template strand / 3’

295 overhang are within contacting distance of Loop 1. However, the 3’ terminal bases of

296 this strand clash with Loop 3 in the Prim-PolC structure (Figure 5d, Supplementary

297 Fig. 6c). This clash suggests that a possible key role of the insertion is to

298 accommodate a continuous template strand that is reoriented towards the active site

299 thus positioning gapped substrates, as opposed to NHEJ intermediates, in a more

300 optimal position. This repositioning ensures, together with Loops 1 and 2, that the

301 primer strand is correctly oriented into the active site in preparation for gap-filling

302 synthesis although this prediction remains to be proven.

303

304 LigC or LigD mutants are sensitive to oxidative DNA damage

305 Collectively, our interaction and in vitro reconstitution studies strongly implicate Prim-

306 PolC and LigC, and their associated factors, in the repair of lesions and SSBs

307 processed via abasic site intermediates. As such lesions are formed in large

308 quantities during all stages of the cell cycle, we next sought to determine if this repair

309 pathway functions throughout the cell cycle or, like LigD-dependent repair, is specific

310 to a particular stage. To address this, we examined the expression profiles of Prim-

311 PolC and LigC1 at different phases of the growth cycle (early, mid, stationary) using

312 specific antibodies. This revealed that the levels of both proteins peaks during early

313 stationary phase (Figure 6a), indicating that this pathway preferentially operates as

314 cells begin to enter stationary phase.

315

13

316 To evaluate their roles in excision repair processes in vivo, we deleted prim-polC (

317 prim-polC) and ligC1, ligC2 (ligC1, C2) in M. smegmatis. To test whether deletion

318 of ligD and prim-polC genes have a cumulative effect or the proteins can fully

319 compensate for one another’s activities, we generated single (LigD) and double (

320 ligD, prim-polC) deletion mutants. We also created a polA mutant complemented

321 with truncated PolA, lacking functional DNA polymerase domain but maintaining its

322 essential 3’ exo activity28, (polA :: PolA1-755 aa) along with prim-polC and ligD

323 mutants in a polA deficient background (prim-polC, polA :: polA1-755 aa; prim-

324 polC, ligD, polA :: polA1-755 aa). Cells were grown to stationary phase before

325 exposing them to genotoxic agents, to assess the oxidative damage sensitivity of

326 each strain. We tested several oxidizing agents and observed decreased viabilities

327 for prim-polC and ligC1,C2 strains exposed to oxidizing agents (Figure 6b &

328 Supplementary Fig. 8), particularly cumene hydroperoxide and tert-butyl

329 hydroperoxide. We observed that the kinetics of damage sensitivity induced by

330 organic hydroperoxides compared to hydrogen peroxide differs over time. Peroxide

331 produced an initial burst of bacterial killing that then remained at similar levels after

332 longer incubation times, whereas organic hydroperoxides continued to induce

333 increased damage sensitivity over longer time periods (Supplementary Fig. 8),

334 probably reflecting their stability in cultures.

335

336 Prim-PolC and LigC1 were previously reported to contribute to DNA repair during

337 stationary phase in M. smegmatis 12,14. Loss of Prim-Pol (prim-polC) or ligase (

338 ligC1, C2) components of the LigC complex caused a similar level of impaired

339 survival when treated with either of these oxidizing agents (Figure 6b &

14

340 Supplementary Fig. 8). As expected, PolA deficient strains were the most sensitive

341 to these genotoxins. Cumene hydroperoxide is among the best known inducers of

342 PolA expression in mycobacteria, suggesting that it is one of the key enzymes

343 necessary for the repair of DNA lesions caused by organic hydroperoxides.

344 Unexpectedly, LigD-deficient strains were even more sensitive to these oxidizing

345 agents and simultaneous loss of LigD and Prim-PolC (ligD, prim-polC) had an

346 additive impact on cell survival (Figure 6b and Supplementary Fig. 8), indicating that

347 both pathways are important for repairing oxidative lesions in stationary phase.

348

15

349 Discussion

350 Prior to this study, it was proposed that the LigC complex functions as an alternative

351 “back-up” pathway 8,14 required for repairing DSBs in stationary phase, following the

352 loss of the Ligase D complex 6,29. However, this hypothesis has not been

353 experimentally proven and therefore the cellular function(s) of this pathway remained

354 unclear. A major issue hampering the assignment of a biological role for the LigC

355 complex, distinct from LigD, has been the almost indistinguishable biochemical

356 activities shared between these closely related repair complexes. These include an

357 affinity towards 5’-P moieties on their respective DNA substrates and a preference to

358 incorporate short patches of ribonucleotides to fill in gapped intermediates, followed

359 by requisite resealing by a specific RNA / DNA ligase activity 24,25. These similarities

360 also extend to the structures of Prim-PolC and D, which are highly superposable and

361 share similar prominent surface loops (Loop 1 & 2) and a 5’ phosphate binding

362 pocket that together dictate the preferential substrate binding specificity of Prim-PolD

363 for DNA overhangs and break-annealed gaps 23-26.

364

365 Despite these similarities, this study reveals that Prim-PolC possesses a number of

366 distinguishing biochemical and structural features that enable us to define its

367 preferred DNA substrates. Prim-PolC preferentially binds to and fills-in short DNA

368 gaps. Unlike Prim-PolD, it shows weak preference for overhangs and is unable to

369 mediate break synapsis / synthesis. It also contains a conserved C-terminal

370 structural element (Loop 3), which is absent from Prim-PolD orthologues. This loop is

371 positioned between loops 1 & 2, where it forms a continuous surface that is probably

372 involved in directing the template strand towards the active site to optimally position,

373 with Loops 1 and 2, the 3’ hydroxyl moiety of the primer strand into the active site in

16

374 preparation for extension. The absence of Loop 3 in Prim-PolD orthologues, leaves a

375 space between the remaining loops (1 and 2), which allows these NHEJ

376 polymerases to promote break annealing / MMEJ and processing of subsequently

377 annealed DSBs 23-26. Validation of this proposed role for Loop 3 awaits the

378 elucidation of additional structures of Prim-PolC bound to DNA gap intermediates.

379

380 In addition to these molecular analyses, our in vivo studies establish that the LigC

381 complex is a central nexus for a distinctive excision repair pathway required to

382 remove and replace damaged or modified bases in mycobacterial genomes during

383 stationary phase (Figure 6c). BER reconstitution studies in vitro demonstrated that

384 LigC-dependent excision repair complexes can repair oxidatively damaged bases

385 and probably other lesions, e.g. methylated / nitrosative damaged bases and SSBs,

386 that are all processed via an abasic site intermediate. A key feature of this repair

387 pathway is the preferential incorporation of ribonuclotides, consistent with their

388 abundance in stationary phase. We observed the most efficient repair in vitro when

389 endonuclease IV was employed as the gap processing enzyme, dephosphorylating

390 the 3’-phosphate end of the lesion resulting from ȕ,Ȗ-elimination of the abasic site,

391 catalyzed by FPG. EndoIV lacks a strong 3’-5’ exonuclease activity, present in

392 alternative end-processing enzymes (ExoIII and XthA), and therefore this appears to

393 limit the gap resection to an optimal size for LigC-mediated repair. Consistent with

394 this, it was previously reported that EndoIV, rather than ExoIII, is responsible for vast

395 majority of BER gaps processing in mycobacteria, in contrast to eubacteria where

396 ExoIII plays a more dominant role 30.

397 In mycobacteria, the expression of the LigC complex peaks during early stationary

398 phase. This coincides with a significant reduction in intercellular pools of dNTPs in 17

399 the absence of replication 31,32, probably explaining the necessity to incorporate

400 ribonucleotides during DNA repair processes in stationary phase. Another notable

401 feature of the LigC pathway is the specific incorporation of short ribonucleotide

402 patches that are likely resistant to RNase H1 activities 33-35. Apart from Prim-PolC /

403 D, DinB2 also incorporates ribonucleotides into DNA during stationary phase repair

404 processes in mycobacteria 36. However, this Y-family polymerase has a tendency to

405 incorporate longer RNA patches and the resulting DNA:RNA hybrids would be

406 sensitive to RNaseH1 digestion. In gap filling assays, Prim-PolC efficiently filled in

407 short gaps (1-3nt) but failed to process longer gaps (5nt). Notably, stretches of >4

408 ribonucleotides incorporated into dsDNA are known to exhibit high instability due to

409 their sensitivity to RNase H1 digestion in vivo 33-35. It is therefore likely that short

410 RNA patches incorporated by Prim-Pols during BER and NHEJ repair processing are

411 later recognized and replaced by PolA, rather than RNase H, which undertakes a

412 similar role during RNA primer removal, once intracellular pools of dNTPs are

413 restored 37,38.

414 Prim-Pols have relatively low fidelity during synthesis and therefore another reason

415 to favor the incorporation of ribonucleotides during repair is to “mark” these regions

416 of repair synthesis for subsequent replacement in a post-repair manner by more

417 accurate polymerases 5,10, reminiscent of RNA primer removal during replication.

418 PolA removes RNA primers and is also known to participate in several DNA repair

419 pathways in bacteria 37,39-41. PolA-driven DNA synthesis occurs slowly but accurately

420 and it has proofreading and bidirectional nuclease activities beneficial for correcting

421 errors introduced by Prim-Pols 39,42. Notably, in this study we also detected a weak

422 association between PolA and the LigC complex, suggesting that this replicase may

18

423 remove and replace RNA inserted by this pathway, and likely others e.g. NHEJ, with

424 more accurately synthesized DNA.

425

426 Bothprim-polC andligC1,C2 strains were similarly sensitive to oxidizing agents,

427 consistent with their proposed joint roles in a distinctive excision repair pathway in

428 mycobacteria. Surprisingly, a LigD deficient strain was even more sensitive to these

429 agents, although this could potentially be attributed to a deficit in DSB repair.

430 However, this sensitivity was evident even with low levels of peroxide (~5mM),

431 reported to produce predominantly SSBs and < 0.1% DSBs 43, thus uncovering an

432 unexpected and additional role for LigD in excision repair. Consistent with this

433 proposal, co-deletion of LigD and Prim-PolC had an additive impact on cell survival

434 in the presence of these genotoxins, also indicating that they are not epistatic in their

435 function. Consistent with the mutant phenotypes, both LigD and Prim-PolC co-

436 purified on a 5’-phosphorylated single nucleotide gap DNA substrate in pull-down

437 assays. This substrate mimics a naturally occurring BER DNA intermediate that

438 would arise from the enzymatic processing of an oxidative lesion by FPG and

439 EndoIV. Together, these findings establish that both LigC and LigD-dependent

440 complexes contribute significantly to the repair of lesions produced by oxidative

441 damage during stationary phase in mycobacteria. Notably, Ory et al. reported

442 recently that Bacillus subtilis LigD possesses dRP-ase activity in vitro supporting this

443 proposed role in excision repair44. Further studies are required to define the exact

444 nature of the oxidative lesions that these Prim-Pol pathways process in vivo and

445 uncover the overlap and “cross-talk” that occurs between these, and other, stationary

446 phase DNA repair pathways in mycobacteria.

19

447

448

449

20

450

451 Methods

452 Bacterial strains and growth conditions

453 Laboratory stock of M. smegmatis mc2 155 and its derivatives were cultured in 7H9

454 liquid broth or 7H10 solid medium (Beckton Dickinson) with ADC supplement

455 (Albumin-Dextrose-Catalase) and hygromycin B (50ȝg ml-1) or kanamycin (30ȝg ml-

456 1), if antibiotic selection was required. E. coli DH5Į strain (Invitrogen) was used for

457 cloning purposes and OrigamiTM 2(DE3)pLysS (Novagen) for protein purification,

458 respectively. E. coli strains were cultured in standard Luria broth medium or terrific

459 broth (Formedium), where large amounts of protein production was required with

460 addition of kanamycin (50ȝg ml-1) for selection.

461

462 Protein complex purification and mass spectrometry analysis

463 Genes encoding Prim-PolC, LigC1 and LigC2 were PCR amplified and sub-cloned

464 into pKW08-gfp vector using a sequence and ligation independent protocol to

465 produce C-terminal eGFP fusion proteins used as baits for protein complex

466 purification 15. Purification and data analysis was performed essentially as described

467 previously 15. Briefly, mycobacterial cells were collected by centrifugation (15 min,

468 4800 × g, 4°C) and resuspended in a cold sonication buffer containing 50 mM Tris

469 (pH 8.0), 100 mM NaCl, 1 mM dithiotreitol, 2 mM phenylmethylsulfonyl fluoride, 25

470 U/ml benzonase, 0.5% Triton X-100 and protease inhibitor coctail. The cells were

471 lysed by sonication and lysates were pre-cleared by centrifugation (20 min, 4800 × g,

472 4°C). Lysates were then mixed with 50 ȝls of the anti-GFP Sepharose beads

473 (Chromotek), incubated for 2 h at 4°C to capture protein complexes, washed 2 times

21

474 with 10 ml of wash buffer [10 mM Tris (pH 8.0), 150 mM NaCl and 0,1% Triton X-

475 100], followed by 2 washes with TEV buffer [10 mM Tris (pH 8.0), 150 mM NaCl, 0.5

476 mM EDTA and 1 mM DTT] and digested with TEV protease (Thermo Fisher

477 Scientific) to elute protein complexes.

478 For DNA-protein pull-downs, each DNA substrate at a final concentration of 50nM

479 was annealed to 1mg of the poly-dT magnetic particles (NEB) in buffer containing 20

480 mM Tris-HCl, pH 8.0, 10 mM MgCl2, 0.1 M KCl, for three hours at room temperaturĊ

481 and washed twice with the wash buffer [50mM HEPES pH 7.5, 150mM NaCl, 0.5%

482 TritonX, 1mM EDTA]. Whole cell lysates were prepared as described above,

483 however, in 50mM HEPES pH 7.5 as buffering agent and without the addition of

484 nucleases. The pre-cleared lysates were filtered through 0.22µm filters and

485 incubated with the DNA coated beads for three hours at 4°C. For the last 10

486 minutes 2µls of RNaseA (10mg/ml) were added to the 1ml reaction. Beads were

487 collected on the magnetic stand and washed 5 times with the wash buffer. DNA

488 bound protein complexes were then eluted by DNAseI digestion (NEB) for 1hour at

489 37°C in dedicated buffer.

490 The resultant protein mixture was precipitated with pyrogallol red–molybdate (PRM;

491 0.05 mM pyrogallol red, 0.16 mM sodium molybdate, 1 mM sodium oxalate, 50 mM

492 succinic acid, pH 2.5), collected by centrifugation and subjected to a standard trypsin

493 digestion. The resulting peptide mixtures were applied to a nano-HPLC RP-18

494 column (Waters) using an acetonitryle gradient in the presence of trifluoroacetic acid.

495 The column was directly linked to the ion source and the Orbitrap (Thermo Scientific)

496 was operated in a data-dependent mode.

22

497 The MaxQuant (v1.3.0.5) computational proteomics platform was used to process

498 raw MS files. Integrated Andromeda search engine and apropriate M. smegmatis or

499 M. bovis BCG protein databases were used to search against the fragmentation

500 spectra. Carbamidomethylation of cysteines, N-terminal acetylation and oxidations

501 were set as possible peptide modifications. One percent false discovery rate was

502 applied to all protein and peptide identifications. Human derived contaminants and

503 random protein identifications were excluded from the results files. Proteins

504 considered as valid identifications were identified by at least two peptides.

505

506 Purification of multiple DNA repair proteins

507 Genes encoding Prim-PolC, PolD2, LigC1, LigC2, MPG, FPG, NTH, EndoIV, ExoIII,

508 XthA, XseVIIB and PolA were PCR amplified using primers flanked with restriction

509 digestion sites required for in frame cloning into pET28 vector (Supplementary Table

510 1). All the proteins were designed to contain a N-terminal His-tag, except for FPG,

511 which was rendered inactive unless tagged at its C terminus. Proteins were purified

512 according to routine laboratory procedures using ÄKTA purifier and compatible

513 columns purchased from GE healthcare. Briefly, pre-cleared cell lysates obtained

514 after overexpression of recombinant proteins in E. coli Origami B pLysS strain were

515 loaded onto Ni Sepharose (Qiagen) column in Tris buffers [50mM Tris pH8.0],

516 washed extensively and eluted in a gradient of imidazole. Proteins were next loaded

517 onto an ion exchange column (SP or Q (GE Healthcare), depending on the

518 isoelectric point (pI) ) eluted in a gradient of NaCl and further purified on preparative

519 gel filtration columns (S200 or S75 (GE Healthcare), depending on the molecular

520 mass of purified protein). Quality of proteins after each and every purification step

23

521 was checked using sodium dodecyl sulfate (SDS) polyacrylamide gel electrophoresis

522 (PAGE).

523

524 EMSA, Polymerization and DNA repair assays

525 Assays were carried out essentially as described previously 10,25. Briefly, EMSAs

526 were employed to determine optimal substrates for Prim-PolC activity and were set

527 up in 50 mM Tris-HCl (pH 7.5), 0.1 mg ml-1 of BSA (NEB Biolabs), 1 mM DTT, 5%

528 glycerol, with indicated 30nM 6-carboxyfluorescein (6-FAM) labelled DNA substrates

529 and different concentrations of Prim-PolC in a 15 ȝl reaction volume. Incubation

530 mixtures were kept on ice for 30 minutes and were resolved by native gel

531 electrophoresis on a 5% polyacrylamide gel (80:1 (w/w) acrylamide / bisacrylamide).

532 DNA extension reaction mixture contained 50mM Tris-HCl (pH 7.5), 5mM MgCl2,

533 100ȝM MnCl2, 30nM 6-FAM labelled DNA substrate, 250ȝM NTPs and indicated

534 DNA repair proteins, in a total volume of 20ȝl. Reactions were further supplemented

535 with 1mM ATP, 1mM DTT and 0.1 mg ml-1 bovine serum albumin (BSA) for DNA

536 repair reconstitution reactions. After a set incubation time at 37°C, reactions were

537 terminated by adding stop buffer solution (95% (v/v) formamide, 0.09% (w/v)

538 bromophenol blue, 20mM EDTA and 600nM unlabelled oligonucleotide, identical to

539 labelled strand). Resulting DNA extension or repair products were resolved for 2

540 hours at constant wattage of 20W, on TBE-buffered 15% polyacrylamide gels

541 containing 8M urea. Detection of fluorescently labelled oligonucleotide products was

542 carried out using Fujifilm FLA-5100 fluorescent image scanner and EMSA data was

543 quantified using Quantity One (Bio-Rad).

544

24

545 Antibody production, purification and western blotting

546 Antibodies were generated against M. smegmatis proteins (Ku, LigC1 or Prim-PolC)

547 using a 28 day rabbit immunisation program (Eurogentec), using ~1 mg full-length

548 recombinant protein (2 applications), with serum collected at different time points.

549 Recombinant proteins (5 µg / lane) were resolved by 10% SDS-PAGE and proteins

550 transfer onto polyvinylidene fluoride (PVDF) membrane, which was subsequently

551 blocked with Tris-buffered saline and Tween 20 (TBST) and 5% (w/v) non-fat dried

552 milk. Membrane bands containing the recombinant protein were excised, washed

553 twice in TBST and incubated overnight at 4°C with 500 µl of serum. Serum was next

554 removed and membranes washed (x4) with 500 µl of TBST. Purified

555 immunoglobulins bound to the membranes were eluted with 150 µl of glycine buffer

556 (500 mM NaCl, 50 mM glycine, 0.5% Tween 20, 0.1% BSA, pH 2-3) and the blot

557 vortex pulsed before adding 25 µl of 1M Tris pH 8.8.

558 For western blotting analysis, the primary antibodies (anti- Ku, Prim-PolC and LigC1)

559 were used at a dilution of 1/1000 and the secondary antibody (HRP-conjugated Anti-

560 Rabbit IgG; Abcam ab6721) at a dilution of 1:5000 dilution. Uncropped versions of all

561 Western blots and gels can be found in Supplementary Fig. 9.

562

563

564 Far-western analysis

565 This method was used to test for protein-protein interactions in vitro. Recombinant

566 proteins were first slot-blotted using a slot blot manifold (GE Healthcare) onto a

567 methanol-wetted PVDF membrane according to the manufacturer’s instructions.

25

568 Typically, a fixed concentration of recombinant protein (50 ng for the probed protein

569 and 3 ȝg for the possible interactors) was blotted onto the membrane. The

570 membrane was first blocked with Tris-buffered saline (TBS: 280 mM NaCl, 20 mM

571 Tris) containing 0.05 % (v/v) Tween 20 and 5 % (w/v) non-fat dried milk. For far-

572 Western analysis, the membrane was then incubated with blocking buffer containing

573 5 ȝg ml-1 of the candidate interacting recombinant protein. The membrane was then

574 washed, probed with primary (1/1000 dilution) and secondary (1/5000 dilution)

575 antibodies, and subjected to chemiluminescent detection. The primary antibody used

576 was specific for the candidate interacting recombinant protein, and

577 chemiluminescence detected if interaction occurred between the two proteins.

578

579 Pull-down assays

580 Pull-down assays were used to confirm chosen protein-protein interactions.

581 Equimolar amounts of all proteins were used at 5 nM per pull-down. Recombinant

582 FPG-6xHis or 6xHis-MPG bait proteins were first bound to nickel-nitrilotriacetic acid

583 (NTA) coupled resin in a Tris-based buffer [50mM, pH8.0] containing 150 mM NaCl,

584 for 1 hour at 4°C. The protein-coated beads were washed and interacted with tag-

585 free LigC (6xHis-LigC subjected to a standard thrombin digestion) or BSA, for further

586 2 hours. The beads were next washed extensively and protein-complexes were

587 eluted with buffer containing 300 nM imidazole. The protein complexes were

588 visualized by SDS-PAGE followed by Coomassie staining.

589

590 Crystallisation and X-ray structure determination

26

591 Crystals of the apo Prim-PolC were grown at 285 K by vapour diffusion as sitting

592 drops. The protein was screened at a concentration of 8 mg ml-1 and 0.5 ȝL of

593 protein solution was mixed with 0.5 ȝL of crystallisation buffer (0.05 M Tris HCl (pH

594 8.0), 0.2 M ammonium chloride, 0.01M calcium chloride, 30% (w/v) PEG 4000). Prior

595 to data collection, crystals were soaked in mother liquor containing 20% ethylene

596 glycol. 0.9795 Å X-ray diffraction data was collected at 100 K using a synchrotron

597 source at station I02 Diamond Light Source, Didcot, UK. The diffraction data were

598 processed with xia2 45 with additional processing by programs from the CCP4 suite

599 (Collaborative Computational Project, Number 4, 1994). The statistics for data

600 processing are summarized in Table 1. Initial phases were obtained by molecular

601 replacement with PHASER 46 using Prim-PolD (2IRY) as a search model 23. Iterative

602 cycles of model building and refinement were performed using Coot 47 and Phenix 48.

603 A final refined model at 1.84 Å resolution, with an Rfactor of 18.58% and Rfree of

604 20.92% was obtained. 99.1% of residues are in preferred regions with the remainder

605 (0.9%) in allowed regions according to Ramachandran statistics. Structural images

606 were prepared with CCP4mg 49. The structure of apo Prim-PolC is deposited in the

607 Protein Data Bank under accession code 5OP0.

608

609 Construction of mutant strains lacking DNA repair genes Targeted

610 replacement strategy was employed to generate unmarked ǻprim-polC, ǻligC1,

611 ǻligC2, ǻligD, and ǻpolA::polA1-755aa mutants and strains with simultaneous

612 removal of multiple of these genes as previously published 50,51. Briefly, short

613 fragments corresponding to 3’ and 5’ end of each genes, flanked with surrounding

614 genomic region of about 1 kb at each side (primers listed in Supplementary Table 1),

615 were ligated together out of frame and sub-cloned into a suicidal p2NIL delivery

27

616 vector, merged with crossing over event selection cassette from pGOAL17. Multistep

617 screening led to double crossing over strains selection and resulted in gene

618 inactivation, which was confirmed by Southern blotting and PCR, for each

619 recombination event.

620

621 Phenotypic analysis of M. smegmatis strains

622 M. smegmatis MC2 155 wild type (ATCC 700084) and mutant strains were grown to

623 late stationary phase in 7H9 liquid media, with supplements, typically 4 days after

624 reaching optical density (OD600nm) of 3.0. The cells were harvested and suspended

625 in PBS containing Tween 80 (0.05% (v/v)), to prevent clumping, at OD600 of 0.1.

626 They were next treated with 10mM cumene hydroperoxide (from 1M stock in 50%

627 ethanol), 50mM tert-butyl hydroperoxide (from 5M stock in water), or 5mM hydrogen

628 peroxide (from 0.5M stock in water), respectively, for various amount of time with

629 incubation at 37°C. Untreated cells suspension was used to control overall viability of

630 each strain. At indicated time points 10ȝls of each cell suspension was transferred

631 onto 7H10 solid agar and plates were incubated for 72 hours before read out of cell

632 survival was recorded.

633

634 Data Availability. All data are provided in full in the results section and the

635 supplementary information accompanying this paper. The crystal structure has been

636 deposited in the Protein Data Bank and can be accessed using accession code

637 5OP0.

638

28

639 References

640 1. Bouche, J.P., Zechel, K. & Kornberg, A. dnaG gene product, a rifampicin- 641 resistant RNA polymerase, initiates the conversion of a single-stranded 642 coliphage DNA to its duplex replicative form. Journal of Biological Chemistry 643 250, 5995-6001 (1975).

644 2. Keck, J.L., Roche, D.D., Lynch, A.S. & Berger, J.M. Structure of the RNA 645 polymerase domain of E. coli primase. Science 287, 2482-6 (2000).

646 3. Frick, D.N. & Richardson, C.C. DNA . Annual Review of 647 Biochemistry 70, 39-80 (2001).

648 4. Guilliam, T.A., Keen, B.A., Brissett, N.C. & Doherty, A.J. Primase- 649 polymerases are a functionally diverse superfamily of replication and repair 650 enzymes. Nucleic Acids Research 43, 6651-64 (2015).

651 5. Pitcher, R.S. et al. NHEJ protects mycobacteria in stationary phase against 652 the harmful effects of desiccation. DNA Repair (Amst) 6, 1271-6 (2007).

653 6. Weller, G.R. et al. Identification of a DNA nonhomologous end-joining 654 complex in bacteria. Science 297, 1686-9 (2002).

655 7. Della, M. et al. Mycobacterial Ku and ligase proteins constitute a two- 656 component NHEJ repair machine. Science 306, 683-5 (2004).

657 8. Gong, C. et al. Mechanism of nonhomologous end-joining in mycobacteria: a 658 low-fidelity repair system driven by Ku, ligase D and ligase C. Nature 659 Structural & Molecular Biology 12, 304-12 (2005).

660 9. Pitcher, R.S., Brissett, N.C. & Doherty, A.J. Nonhomologous end-joining in 661 bacteria: a microbial perspective. Annual Review of Microbiology 61, 259-82 662 (2007).

663 10. Bartlett, E.J., Brissett, N.C. & Doherty, A.J. Ribonucleolytic resection is 664 required for repair of strand displaced nonhomologous end-joining 665 intermediates. Proceedings of the National Academy of Sciences USA 110, 666 E1984-91 (2013).

667 11. Bowater, R. & Doherty, A.J. Making ends meet: repairing breaks in bacterial 668 DNA by non-homologous end-joining. PLoS Genetics 2, e8 (2006).

669 12. Zhu, H., Bhattarai, H., Yan, H.G., Shuman, S. & Glickman, M.S. 670 Characterization of Mycobacterium smegmatis PolD2 and PolD1 as 671 RNA/DNA Polymerases Homologous to the POL Domain of Bacterial DNA 672 Ligase D. Biochemistry 51, 10147-10158 (2012).

673 13. Matthews, L.A. & Simmons, L.A. Bacterial Nonhomologous End Joining 674 Requires Teamwork. Journal of Bacteriology 196, 3363-3365 (2014).

29

675 14. Bhattarai, H., Gupta, R. & Glickman, M.S. DNA ligase C1 mediates the LigD- 676 independent nonhomologous end-joining pathway of Mycobacterium 677 smegmatis. Journal of Bacteriology 196, 3366-76 (2014).

678 15. Plocinski, P. et al. Identification of protein partners in mycobacteria using a 679 single-step affinity purification method. PLoS One 9, e91380 (2014).

680 16. Caldecott, K.W., Tucker, J.D., Stanker, L.H. & Thompson, L.H. 681 Characterization of the XRCC1-DNA ligase III complex in vitro and its 682 absence from mutant hamster cells. Nucleic Acids Research 23, 4836-43 683 (1995).

684 17. Leppard, J.B., Dong, Z., Mackey, Z.B. & Tomkinson, A.E. Physical and 685 functional interaction between DNA ligase IIIalpha and poly(ADP-Ribose) 686 polymerase 1 in DNA single-strand break repair. Molecular and Cellular 687 Biology 23, 5919-27 (2003).

688 18. De, A. & Campbell, C. A novel interaction between DNA ligase III and DNA 689 polymerase gamma plays an essential role in mitochondrial DNA stability. 690 Biochemical Journal 402, 175-86 (2007).

691 19. Tomkinson, A.E. & Sallmyr, A. Structure and function of the DNA ligases 692 encoded by the mammalian LIG3 gene. Gene 531, 150-7 (2013).

693 20. Fan, J. & Wilson, D.M., 3rd. Protein-protein interactions and posttranslational 694 modifications in mammalian base excision repair. Free Radical Biology and 695 Medicine 38, 1121-38 (2005).

696 21. Doherty, A.J., Jackson, S.P. & Weller, G.R. Identification of bacterial 697 homologues of the Ku DNA repair proteins. FEBS Letters 500, 186-8 (2001).

698 22. Aravind, L. & Koonin, E.V. Prokaryotic homologs of the eukaryotic DNA-end- 699 binding protein Ku, novel domains in the Ku protein and prediction of a 700 prokaryotic double-strand break repair system. Genome Research 11, 1365- 701 74 (2001).

702 23. Pitcher, R.S. et al. Structure and function of a mycobacterial NHEJ DNA 703 repair polymerase. Journal of Molecular Biology 366, 391-405 (2007).

704 24. Brissett, N.C. et al. Structure of a Preternary Complex Involving a Prokaryotic 705 NHEJ DNA Polymerase. Molecular Cell 41, 221-231 (2011).

706 25. Brissett, N.C. et al. Molecular Basis for DNA Double-Strand Break Annealing 707 and Primer Extension by an NHEJ DNA Polymerase. Cell Reports 5, 1108- 708 1120 (2013).

709 26. Brissett, N.C. et al. Structure of a NHEJ polymerase-mediated DNA synaptic 710 complex. Science 318, 456-459 (2007).

30

711 27. Kent, T., Chandramouly, G., Mcdevitt, S. M., Ozdemir, A. Y. & Pomerantz, R. 712 T. . Mechanism of microhomology-mediated end-joining promoted by human 713 DNA polymerase ș. Nature Structural & Molecular Biology 22, 230-237 714 (2015).

715 28. Gordhan, B.G., Andersen, S.J., De Meyer, A.R. & Mizrahi, V. Construction by 716 homologous recombination and phenotypic characterization of a DNA 717 polymerase domain polA mutant of Mycobacterium smegmatis. Gene 178, 718 125-30 (1996).

719 29. Wilson, T.E., Topper, L.M. & Palmbos, P.L. Non-homologous end-joining: 720 bacteria join the breakdance. Trends in Biochemical Sciences 721 28, 62-6 (2003).

722 30. Puri, R.V., Singh, N., Gupta, R.K. & Tyagi, A.K. Endonuclease IV Is the major 723 apurinic/apyrimidinic endonuclease in Mycobacterium tuberculosis and is 724 important for protection against oxidative damage. PLoS One 8, e71535 725 (2013).

726 31. Buckstein, M.H., He, J. & Rubin, H. Characterization of nucleotide pools as a 727 function of physiological state in Escherichia coli. Journal of Bacteriology 190, 728 718-26 (2008).

729 32. Jacewicz, A. & Shuman, S. Biochemical Characterization of Mycobacterium 730 smegmatis RnhC (MSMEG_4305), a Bifunctional Enzyme Composed of 731 Autonomous N-Terminal Type I RNase H and C-Terminal Acid Phosphatase 732 Domains. Journal of Bacteriology 197, 2489-98 (2015).

733 33. Minias, A.E. et al. RNase HI Is Essential for Survival of Mycobacterium 734 smegmatis. PLoS One 10, e0126260 (2015).

735 34. Nowotny, M., Gaidamakov, S.A., Crouch, R.J. & Yang, W. Crystal structures 736 of RNase H bound to an RNA/DNA hybrid: substrate specificity and metal- 737 dependent catalysis. Cell 121, 1005-16 (2005).

738 35. Jacewicz, A. & Shuman, S. Biochemical Characterization of Mycobacterium 739 smegmatis RnhC (MSMEG_4305), a Bifunctional Enzyme Composed of 740 Autonomous N-Terminal Type I RNase H and C-Terminal Acid Phosphatase 741 Domains. J Bacteriol 197, 2489-98 (2015).

742 36. Ordonez, H., Uson, M.L. & Shuman, S. Characterization of three 743 mycobacterial DinB (DNA polymerase IV) paralogs highlights DinB2 as 744 naturally adept at ribonucleotide incorporation. Nucleic Acids Research 42, 745 11056-70 (2014).

746 37. Ogawa, T. & Okazaki, T. Function of RNase H in DNA replication revealed by 747 RNase H defective mutants of Escherichia coli. Molecular Genetics and 748 Genomics 193, 231-7 (1984).

31

749 38. Minias, A.E., Brzostek, A.M., Minias, P. & Dziadek, J. The deletion of rnhB in 750 Mycobacterium smegmatis does not affect the level of RNase HII substrates 751 or influence genome stability. PLoS One 10, e0115521 (2015).

752 39. Heyneker, H.L. & Klenow, H. Involvement of Escherichia coli DNA 753 polymerase-I-associated 5' in equilibrium 3' exonuclease in excision-repair of 754 UV-damaged DNA. Basic Life Sciences 5A, 219-23 (1975).

755 40. Friedberg, E.C. The eureka enzyme: the discovery of DNA polymerase. 756 Nature Reviews Molecular Cell Biology 7, 143-7 (2006).

757 41. Tait, R.C., Harris, A.L. & Smith, D.W. DNA repair in Escherichia coli mutants 758 deficient in DNA polymerases I, II and-or 3. Proceedings of the National 759 Academy of Sciences USA 71, 675-9 (1974).

760 42. Bailly, V. & Verly, W.G. The excision of AP sites by the 3'-5' exonuclease 761 activity of the of Escherichia coli DNA polymerase I. FEBS 762 Letters 178, 223-7 (1984).

763 43. Ismail, I.H., Nyström, S., Nygren, J. & Hammarsten, O. Activation of ataxia 764 telangiectasia mutated by DNA strand break-inducing agents correlates 765 closely with the number of DNA double strand breaks. Journal of Biological 766 Chemistry 280, 4649-55 (2005).

767 44. de Ory, A. et al. Identification of a conserved 5'-dRP lyase activity in bacterial 768 DNA repair ligase D and its potential role in base excision repair. Nucleic 769 Acids Research 44, 1833-44 (2016).

770 45. Winter, G. xia2: an expert system for macromolecular crystallography data 771 reduction. Journal of Applied Crystallography 43, 186-190 (2010).

772 46. Mccoy, A.J. et al. Phaser crystallographic software. Journal of Applied 773 Crystallography 40, 658-674 (2007).

774 47. Emsley, P., Lohkamp, B., Scott, W.G. & Cowtan, K. Features and 775 development of Coot. Acta Crystallographica Section D: Biological 776 Crystallography 66, 486-501 (2010).

777 48. Adams, P.D. et al. PHENIX: a comprehensive Python-based system for 778 macromolecular structure solution. Acta Crystallographica Section D: 779 Biological Crystallography 66, 213-21 (2010).

780 49. McNicholas, S., Potterton, E., Wilson, K.S. & Noble, M.E. Presenting your 781 structures: the CCP4mg molecular-graphics software. Acta Crystallographica 782 Section D: Biological Crystallography 67, 386-94 (2011).

783 50. Parish, T. & Stoker, N.G. Use of a flexible cassette method to generate a 784 double unmarked Mycobacterium tuberculosis tlyA plcABC mutant by gene 785 replacement. Microbiology 146 ( Pt 8), 1969-75 (2000).

32

786 51. Plocinski, P. et al. Mycobacterium tuberculosis CwsA interacts with CrgA and 787 Wag31, and the CrgA-CwsA complex is involved in peptidoglycan synthesis 788 and cell shape determination. Journal of Bacteriology 194, 6398-409 (2012).

789

790

791

792

33

793 Accession codes. Atomic coordinates and structure factors have been deposited in 794 the Protein Data Bank under accession code 5OP0.

795

796 Acknowledgements

797 We declare that none of the authors have a financial interest related to this work. We

798 thank Dr M. Roe for assistance with X-ray data collection. AJD’s laboratory was

799 supported by grants from Biotechnology and Biological Sciences Research Council

800 (BB/F013795/1, BB/J018643/1 and BB/M004236/1). PP was supported by a mobility

801 grant from the Polish Ministry of Science and Higher Education (1073/MOB/2013/0:

802 Mobilnosc Plus). Funding for open access charge: Research Councils UK (RCUK).

803

804 Author contributions. A.J.D. designed the project, directed the experimental work,

805 performed database analysis and wrote the manuscript. P.P. and N.B. contributed to

806 the project design, performed and designed experiments, performed computational

807 analyses and co-wrote the manuscript. P.P. performed mass spectrometry analysis.

808 N.B. performed X-ray data processing, model building and analysis. J.B. generated

809 LigC and Prim-PolC expression plasmids, purified proteins, measured protein

810 expression levels in cells and performed the initial biochemical and phenotype

811 studies. A.B. and M.K-M. constructed mutant strains in M. smegmatis and assisted

812 P.P. with CFU experiments. A.D. and J.D. helped with study design and directed

813 parts of the experimental work (mass spectrometry analysis of protein-complexes

814 and generation of mutant strains and phenotypic analysis, respectively).

815

816 Competing interests: The authors declare no competing financial interests. 34

817

818

35

819 Figure Legends

820

821 Figure 1. Interactions between Prim-PolC, LigC proteins and base excision

822 repair elements. (a) A table showing the DNA repair-associated preys that co-

823 purified in an eGFP-facilitated affinity purification experiment using LigC and Prim-

824 PolC as baits. (b) Base excision repair enzymes were purified as recombinant

825 proteins and interactions with Prim-PolC and LigC1 were confirmed by slot-blot

826 analysis, where positive interactions are marked with red, possible weak

827 associations with green and negative interactions with black font, respectively. (c)

828 Verified interactions are summarized in a schematic diagram showing that LigC is

829 the major scaffolding protein involved in multiple protein complex formation.

830

831 Figure 2. DNA binding activity of Prim-PolC. (a) Prim-PolC prefers to bind to 5’

832 phosphorylated gaps. In EMSAs, mixtures of a single nucleotide gap containing

833 substrate (30nM), with or without phosphorylation of the 5' end of the lesion, and 0-

834 10µM Prim-PolC were incubated for 20 minutes and resolved on a native

835 polyacrylamide gel. A filled triangle indicates unshifted probe, whereas the black

836 arrow probably indicates a binary complex of Prim-PolC and DNA. The grey arrow

837 indicates multiple protein monomers bound to the probe. Quantification of the EMSA

838 data is presented. For each Prim-PolC concentration the percentage of DNA bound

839 (in relation to the total DNA) was calculated and compared for EMSAs containing

840 Prim-PolC binding a 1 nucleotide gap with or without 5’-phosphorylation.

841 (b) Prim-PolC prefers to bind to gaps rather than overhangs unlike Prim-PolD. This

842 EMSA uses 30nM of 5'-fluorescein labelled 36-mer containing a 5' phosphorylated

843 single nucleotide gap or 30nM of 5'-fluorescein labelled 36-mer containing a single- 36

844 stranded 3’ overhang given by phosphorylation of the 5'-end of the gap and no

845 primer strand. The triangles and arrow are as in (a). Using the same conditions as

846 (a), this time comparing Prim-PolC to Mtu Prim-PolD (25, 50, 75 & 100 nM).

847 Quantification of the EMSA data is presented. For each enzyme concentration the

848 percentage of DNA bound (in relation to the total DNA) was calculated and

849 compared for EMSAs containing Prim-PolC or Prim-PolD binding a 5’-

850 phosphorylated 1 nucleotide gap or a 3’-overhang with a 5’-phosphorylated

851 downstream strand. (c) Prim-PolC is not an NHEJ polymerase. A schematic of 5'-

852 fluorescein-labelled substrates used in a MMEJ activity assay. The pss- number

853 refers to the number of bases at the end of the 3’overhang that can with

854 itself. A primer extension assay, 30nM of the denoted substrate extended by 300nM

855 enzyme (PPC-Prim-PolC, PPD-Mtu Prim-PolD) in the presence of 250 µM dNTPs

856 mix for 30 min at 37°C. The products are resolved on a 15% denaturing gel. (d) The

857 same reaction products from (c), after treatment with Proteinase K and run on a 12%

858 non-denaturing gel.

859

860 Figure 3. Short gap-filling activity of Prim-PolC in complex with LigC1. (a) Prim-

861 PolC prefers to insert rNTPs into short gaps. In primer extension assays, 30nM of 5'-

862 fluorescein labelled 36-mer containing a nick (n), a single-stranded gap of various

863 length (indicated as numbers), with or without phosphorylation of the 5'-end or

864 lacking the 5'-end strand thus forming an overhang (ov), was extended by Prim-PolC

865 (300nM) in the presence of a 250 µM dNTPs or rNTPs mix for 5 min at 37°C. (b) The

866 preferred products from Prim-PolC gap-filling are ligatable. For gap filling and

867 ligation, LigC1 (300nM) was added to the primer extension reactions (same as a)

868 and were incubated for an extra 45 min at 37°C to produce a repair product (P).

37

869

870 Figure 4. Reconstitution of a Prim-PolC - LigC1-dependent BER repair complex

871 in vitro. (a) Schematic representation of BER repair reactions showing repair

872 intermediates generated by subsequent DNA repair activities of glycosylases, end-

873 processing enzymes, gap-filling and ligation. (b & c) In vitro repair of deoxyuracil- (b)

874 or 8-oxoguanine- (c) containing DNA substrates. 30nM of 5'-fluorescein labelled ds

875 36-mer containing a central lesion was processed by uracil glycosylase or directly by

876 recombinant mycobacterial FPG (for 8-oxoguanine), to produce a resulting abasic

877 site intermediate processed into a 3',5'- biphosphorylated gap by the abasic site

878 lyase activity of FPG. Reactions were carried out with addition of a 250 µM dNTPs or

879 rNTPs mix for 30 min at 37°C. Gap ends were next processed, including removal of

880 the 3’ phosphate, by addition of various amounts of either XthA, ExoIII or EndoIV for

881 5 min at 37°C. Subsequently, a mixture of Prim-PolC and LigC1, 300nM of each,

882 was added to provide gap filling and ligation activities for 45 min at 37°C.

883

884 Figure 5. Crystal structure of Prim-PolC. (a) A ribbon diagram representation of

885 the crystal structure of Prim-PolC (PDBID: 5OP0). The previously characterized

886 NHEJ polymerase core is coloured sky blue (see also Figure S6a), with the

887 conserved loop structures from this family coloured dark blue and cyan for Loops 1

888 and 2, respectively. The catalytic aspartate side-chains are represented with the

889 carboxylic oxygens shown in red. Side-chains involved in forming the 5’ phosphate

890 binding pocket are depicted in dark blue. The Prim-PolC specific C-terminal

891 extension is shown in pink, comprising α8 and Loop 3. (b) Schematic representation

892 comparing the primary structures of Prim-PolD and Prim-PolC. Conservation of the

38

893 C-terminal extension of Prim-PolC amongst selected organisms is highlighted by a

894 sequence alignment of this region (c) Ribbon diagram representation of the Prim-

895 PolC crystal structure highlighting the molecular contacts formed between the

896 conserved loops (1 and 2) and the additional Loop 3. The loops are further

897 represented as translucent solvent accessible surfaces colour matched to the

898 specific loop element. (d) Superposition of DNA from the NHEJ Prim-PolD MMEJ

899 complex (PDBID: 4MKY) onto the Prim-PolC structure. The path of the templating

900 DNA from the in-trans structure clashes with Loop 3.

901

902 Figure 6. Expression of Prim-PolC and LigC1 in vivo and phenotypes

903 associated with their deletion. (a) Expression profiling using western blotting.

904 Specific antibodies risen against LigC1, Prim-PolC and Ku proteins were used to

905 probe for corresponding proteins in whole cell lysates obtained from M. smegmatis

906 cells grown to various growth phases including, exponential (E”), early stationary

907 (E/S) and stationary (S) phase. (b) A plot of colony forming units (CFU) data showing

908 the survival of wild type (MC 155) or mutant cells, lacking Prim-Pols and / or ligases,

909 treated with 10mM cumene hydroperoxide. Cells were plated at 1, 15, 30 or 60 min

910 after treatment and the percentage survival is calculated against untreated control

911 cells, treated with diluted buffer instead of peroxide. Data are representative of the

912 mean of five individual experiments and error bars show the standard deviation. (c)

913 A model of the repair of a damaged base by the LigC-dependent excision repair

914 complex. The damage is initially recognised by a specific mono or bifunctional

915 glycosylases, whose activity results in the formation of an abasic site intermediate.

916 LigC is recruited to the lesion early to coordinate later repair steps and mediate

917 multi-protein repair complex formation. Short gaps produced by processing of the

39

918 lesion by the BER machinery are then filled in with RNA by Prim-PolC and the

919 resulting nick is ligated by LigC. RNA is probably later excised and replaced with

920 DNA catalyzed by PolA and the nicked DNA intermediate sealed by LigA.

921

40

922 Table 1: Data collection and refinement statistics (molecular replacement)

923

Msm Prim-PolC

Data collection

Space group P1

Cell dimensions

a, b, c (Å) 51.55, 56.43, 64.45

α, β, γ (°) 97.17, 100.2, 90.64

Resolution (Å) 44.77 (1.84) *

Rsym or Rmerge 0.09 (0.60)

Ι / σΙ 6.8 (1.7)

Completeness (%) 96.9 (95.4)

Redundancy 2.6 (2.6)

Refinement

Resolution (Å) 44.77 (1.84)

No. reflections 59594 (5894)

Rwork / Rfree 0.1858 / 0.2092

No. atoms 5824

Protein 5254

Water 570

B-factors

Protein 28.81

Water 37.11

R.M.S. deviations

Bond lengths (Å) 0.005

Bond angles (°) 0.743

924 *From 1 crystal. *Values in parentheses are for highest resolution shell.

925

926

927

41

a! c! Prim-PolC! PolD2! Pept M.W. Intensity Intensity Intensity Protein Identifier! No.! [kDa]! LigC1! LigC2! PrimPolC! PolA! MS_6302 LigC1! 16 39,8 3,26E+09 Primases / ! MS_0001 DNA Pol III, beta subunit (dnaN) [2.7.7.7] 14 41,3 9,56E+084,98E+08 5,70E+09 MS_2731 ATPase involved in DNA repair 13 48,5 3,61E+083,56E+08 2,10E+09 Polymerases! MS_3839 DNA polymerase I [2.7.7.7] 31 101 2,78E+084,08E+08 5,06E+09 Nucleases! MS_6896 Single-stranded DNA-binding protein 4 17,4 2,48E+083,43E+08 2,11E+09 MS_5156 Base excision DNA repair protein 3 20,9 3,85E+071,44E+07 4,46E+08 MS_1383 Endonuclease IV 4 26,6 2,18E+07 6,14E+08 DNA ! MS_2399 Uracil-DNA glycosylase (ung) [3.2.2.-] 2 26,6 1,46E+07 1,00E+08 EndoIV! Glycosylases! MS_3816 Excinuclease ABC, B subunit (uvrB) 14 80,8 1,33E+078,44E+07 1,16E+09 MS_5423 Transcription-repair coupling factor (mfd) 9 131 3,63E+064,17E+06 4,00E+08 LigC1! MS_3883 5ʼ-3 exonuclease 3 33,9 2,71E+07 NTH! MS_5440 Deoxyribonuclease 4 33,4 2,59E+07 4,07E+08 ExoIII! MS_6304 LigC2! 4 39,3 2,90E+08 MS_6285 DNA Pol III gamma/tau subunit [2.7.7.7] 5 69,9 4,22E+08 FPG! MS_6301 Prim-PolC! 15 41,6 1,13E+10 XthA! MS_2362 DNA ligase (ligA) [6.5.1.2] 16 76,3 1,69E+081,43E+08 2,35E+09 MPG! b! + Interactor + Anti- LigC1! + Interactor + Anti- Prim-PolC! !

XseB BSA PolD2! Ku Nth ! XseB BSA PolD2! Ku Nth !

EndoIV Prim-PolC ! LigC1 FPG! EndoIV LigC1 ! Prim-PolC FPG!

XthA ExoIII LigC1! MPG PolA ! XthA ExoIII Prim-PolC! MPG PolA ! - Interactor + Anti- LigC1! - Interactor ! + Anti- Prim-PolC!

Ku Nth ! XseB BSA PolD2! Ku Nth ! XseB BSA PolD2!

EndoIV LigC1 ! EndoIV Prim-PolC ! LigC1 FPG! Prim-PolC FPG!

ExoIIIA ExoIII LigC1! MPG PolA ! ExoIIIA ExoIII Prim-PolC! MPG PolA ! a 1nt Gap 1nt Gap/5ʼ-℗ '!!" '*+",-." &!" Prim-PolC Prim-PolC '*+",-./)012" %!" $!" #!" &'$#()*+,#-./# !" !(!'#)" !(!#)" !('" !(#)" !()" '" )" '!" 012340)5!#-µ6/# b '!!" 1nt Gap/5ʼ-℗ 3ʼ Ov/5ʼ-℗ 445"'*+",-./)012" &!" 446"'*+",-./)012" Ø Prim-PolC Prim-PolD Ø Prim-PolC Prim-PolD %!" 445"78"9:/)012" 446"78"9:/)012" $!" #!" &'$#()*+,#-./# !" !(!#)" !(!)" !(!3)" !('" 01)782+#-µ6/# c d pss-0 pss-4 pss-6 pss-0 pss-4 pss-6 26nt 5ʼ $%$!#3ʼ pss-0 3ʼ 5ʼ Ø Ø Ø Ø Ø Ø 14nt

50 nt 26nt 5ʼ !!""#3ʼ pss-4 3ʼ 5ʼ 14nt 50 nt

26nt 5ʼ !!!"""#3ʼ pss-6 3ʼ 5ʼ 14nt a 0-5 nt %#$$ "#$ !"#$ %#$ %#$ "#$ rNTPs rNTPs/5ʼ-℗ dNTPs dNTPs/5ʼ-℗ Ø n 1 2 3 5 ov n 1 2 3 5 Ø n 1 2 3 5 ov n 1 2 3 5 +4 +3 +2 +1 0 b

rNTPs rNTPs/5ʼ-℗ dNTPs dNTPs/5ʼ-℗ Ø n 1 2 3 5 ov n 1 2 3 5 n 1 2 3 5 ov n 1 2 3 5

P

+3 +2 +1 0 a! dU! !" !" +UNG! Abasic site 3ʼ-℗, 5ʼ-℗-gap 3ʼ-OH, 5ʼ-℗-gap Fill-in product Ligated DNA 3’ 5’ 5’ DNA Lesion! !" !" !" !" !" !" !" !" !" !" +FPG! +XthA! +Prim-PolC! +LigC! 8OxoG! /ExoIII! !" !" +FPG! /EndoIV! b! LigC + PPolC! LigC + PPolC! LigC + PPolC!

XthA! ExoIII! EndoIV! rNTP! 5ʼ 5ʼ

!" 5ʼ 5ʼ 5ʼ !!" " c! 5ʼ LigC + PPolC! LigC + PPolC! LigC + PPolC! XthA! ExoIII! EndoIV! rNTP! 5ʼ 5ʼ

!" 5ʼ 5ʼ 5ʼ !!" " 5ʼ a! Loop 3 b! Loop 1 5’ PB L1 L2 C Prim-PolD N-! -C! !8! L3! Prim-PolC 5’ PO4 binding

R63 Loop 2 !8! Loop 3! K23

N20 Msmeg 6301 SVQQSLGPLLDLVAADE-ERGLGDLPYPPNYPKMPGEPPRVQPSKK P65 Mt Rv3730c DVAQSIAPLLDLAAADE-ERGLGDMPYPPNYPKMPGEPKRVQPSRD N Mav 0362 DVAQSIGPLLRMAEADE-ERGLGDMPYPPNYPKMPGEPKRVQPSRD Rop 51690 DVAQSLDVLLEMAERDT-ANGLGDLPYPPNYPKMPGEPKRVQPSRD Gbro 0416 KHRKGIERLLEMVQADD-ANGLGDMPYPPSYPKMPGEPPRVQPSKK K35 Srot 2335 --PGSLDKLLE------QDTPLLPYPPQYPKMPDEPKRVQPSKD Sav 1696 DHAFSLEALLELADRDEHDHGLGDLPYPPEYPKMPGEPKRVQPSRA .: ** . :****.***** ** *****: c! Loop 3 DNA clash 8! d! ! Loop 3 Loop 1 !8!

Loop 2 Loop 2 R63 Loop 1 K23

P65 N20 N

K35 a! b! 100! mc2 155!

E E/S S 10! Δprim-polC ΔligC1,C2

anti-Ku ! 1! ΔligD anti-Prim-PolC ΔligD,prim-polC C 0.1! ΔpolA %Survival anti-LigC1 ΔpolA,prim-polC 0.01! ΔpolA,ligD,prim-polC

0.001! 0! 1! 2! 3! Time of exposure [hr]! c! !"#$%&'

Damage detection & base removal !"#$ Recruitment of repair enzymes

Processing by FPG! PolA & LigA ? Ap 5ʼ 3ʼ

%"#$ ExoIII/ EndoIV! LigC!

Gap filling Abasic site & ligation Prim-PolC! removal P 5ʼ P P

End processing !"##$%&%'()*+,-).$%,/0,123(,45,678,#*2&%*3,"3%9,2',(:23,3("9+0,

!"#$%"&'(&%)'*+"',"%,-"-.#+/'+*',"+.%#/'%0,"%&&#+/'1+/&."(1.&,

Amplicon Forward primer 5'->3' Reverse primer 5'->3' Nth GGAATTCCATATGAGTGCGGGTGCCGCCCGG GCCCAAGCTTCACAAACCGGCCAGCGCCA FPG (SLIC) TTAACTTTAAGAAGGAGATATACCATGCCTGAGC TGGTGGTGCTCGAGTGCGGCCGCAAGCTGACCCGAG TTCCCGAGGT GCACCCGCGGAC MPG GGAATTCCATATGAGCGTCGACCTGCTGAC CGCGGATCCTCAGTCACTGCTGCCGGGCGC ExoIII GGAATTCCATATGGCTTCCCCGCAGACTCT CGCGGATCCTCATGCGCTGAACTCGACCG XthA GGAATTCCATATGCGTGTGGCCACCTGGAA CGCGGATCCTCAGTTCAGTTCGACGAGCACG EndoIV GGAATTCCATATGCTCATTGGCTCGCATGTAG CGCGGATCCCTAACCCGCGTGTTCGCGCA XseB GGAATTCCATATGAAGCCCATTAGTGAACTGG GCCGCTCGAGGTCCTCGTCGGCCTCGCGG PolA GGAATTCCATATGAGCCCCGCCAAGACCGC GCCCAAGCTTCAGTGAGCCGCGGCGTCCCA PolD1 GGAATTCCATATGACAACCATCGCAGTTGC CGCGGCCGCTCAGCGGCGCCACTGTGCGCCCG Prim-PolC GGAATTCCATATGTGATGGCCAGTGCGGCAACC GCCCAAGCTTTCACTGCTTGCGGTTGCCCTGCT LigC1 GGAATTCCATATGGACTTGCCGGTGCAGCC GCCCAAGCTTTCACTGTTCCTCCAGCACGTCGT Prim-PolC GGAGGATCCGAACGATGGCCAGTGCGGCAACCG CCTTAAGCTTCTGCTTGCGGTTGCCCTGCT GFP (SLIC) AA LigC1 GFP TCGCTACTCTCATCGTGGAATCCTGACAGGATCG TGTACAGCGAGGTGATGTCGGCGGCCTTAAGCTTCT (SLIC) CCAGGGAGGATCCGAACGATGGACTTGCCGGTG GTTCCTCCAGCACGTCGT CAGCC '

!"#$%"&'(&%)'*+"'2%/%"-.#+/'+*'2%/%'"%,3-1%$%/.',3-&$#)&'

Amplicon Forward primer 5'->3' Reverse primer 5'->3' !PolD2 (MSMEG_0597) LF AACTGCAGCGACATCGCCGGACGACATC CAAGCTTGAGGATCATCGGCCTGCCG RF GCTGATCGAGATCGCCCGC GAAGATCTAGGCGCCATCACGGTCGG !Prim-PolC (MSMEG_6301) LF TGGAGACCCGCACCCTGATG GGCGAGCGGTTGTTCATCG RF GTCCGGGTTGGTGAACCGG TCATACGACATCCCCGGCGG !LigC2 (MSMEG_6304) LF AGACCAGACAGCCCGAGATCGC TCATACGACATCCCCGGCGG RF CGACGTTCCACCTGACGCC CGTGGGCCTGACCGATCAG !LigC1 (MSMEG_6302),LigC2 AGACCAGACAGCCCGAGATCGC CGAGGCCGTCGACAACAAACG LF RF CGACGTTCCACCTGACGCC CGTGGGCCTGACCGATCAG !LigC1,LigC2,Prim-PolC LF TGGAGACCCGCACCCTGATG GGCGAGCGGTTGTTCATCG RF CGATCTGCTGGCACTCGGC CGTGGGCCTGACCGATCAG ' M. smegmatis! LigD! LigC!

0 1 2 3 5 6 6.99!

LigD! KU! Prim-PolC! LigC1!LigC2!

M. tuberculosis! LigD! LigC!

0 1 2 3 4 4.41 [Mbp]!

M. sp. JSP623! LigD! LigC!

0 1 2 3 4 5 6 7 7.2! pMYCSM02 pMYCSM03!

S. Coelicolor! LigD! LigC! LigD!LigD!

0 1 2 5 6 7 8 8.67!

ATP-Ligase! AEP / Prim-Pol! LigD! Ku! LigD_N!

Supplementary Figure 1. Diversity and genomic localization of Prim-Pol and ATP-dependent DNA ligase genes in Actinobacteria. Approximate distribution of genes (depicted as arrows) encoding Prim-Pols (blue), ATP- dependent DNA ligases (green) and non-homologous end joining elements: multidomain LigD (red), Ku protein (orange) and LigD associated phosphoesterase (yellow); are shown on respective actinobacterial genomes and mobile elements (black nodes). The three separate components of the

LigD complex in Streptomyces are denoted.

kDa! !1 2 3 4 5 6 7 8 9 10 11 12! 250!

150!

100!

75! 50! 37! 25!

20!

15! Supplementary Figure 2. Recombinant proteins purified for this study.

Approximately 2µg of each purified recombinant protein was loaded on a 4-

12% gradient SDS-polyacrylamide gel and resolved in 1 x TGS buffer under routine conditions. Gel was stained with colloidal Coomassie stain. Proteins loaded as follows:! 1. Protein molecular weight marker, 2. LigC1 39,5kDa

+HIS, 3. PrimPolC 39,4kDa +HIS, 4. PolD1 47,6kDa +HIS, 5. PolA 99,9kDa

+HIS, 6. FPG 31,6kDa +C-HIS, 7. MPG 21,5kDa +HIS, 8. Nth 28,8kDa +HIS,

9. Endo4 26,6kDa +HIS, 10. XthA (MSMEG_0829) 29,1kDa +HIS, 11. ExoIII

(MSMEG_1656) 30,1kDa +HIS, 12. XseB 7,8kDa.

a! M. intracellulare! ! !"#$% &'()% *!+%$,!% -.+% ! Mycobacterium sp. JSP623! !"#$% &'()% ! *!+%$,!% M. avium!

! !"#$% &'()% *!+% -.+% b! strong interaction weak/no interaction

[kDa]

50 40 LigC no-tag LigC no-tag

30 FPG-HIS MPG-HIS LigC digestion with thrombin 20 !""""#$""""%"""""&"" !""""#$""""%"""""&"" [kDa]

50 40 LigC-HIS LigC no-tag no interaction no interaction

BSA BSA

FPG-HIS MPG-HIS

!""""#$""""%"""""&"" !""""#$""""%"""""&""

!'()*+""""#$',)-"./0)12/""""%'-*3/4"""""&'5(16)7"899:;"<:<+*=)(5" Supplementary Figure 3. Operonic and functional association of

bifunctional glycosylase FPG with Prim-PolC and LigC homologues in

mycobacteria.

(a) For chosen mycobacteria, genomic regions encoding FPG co-transcribed

with either individual or both Prim-PolC and LigC orthologues, in the vicinity of

base excision repair genes polA and uvrB, are presented. (b) Pull-down assay confirmation of the stable protein-protein interaction between recombinant tag- free LigC (prey protein) and FPG-6xHis (bait protein) from M. smegmatis, with respective controls.

! a

Non-phosphorylated! 5ʼ-℗ gap ! n! 1! 2! 3! 5! 10! ov! ss! n! 1! 2! 3! 5! 10! ov! ss!

b RecD SRAP """"""""""""""""! $$$$$$$$$$$$$$$$$$$!

Prim-PolC LigD RecBCD PolA En4 SRAP #! """"""""""""""""! $$$$$$$$$$$$$$$$$$$!

PolA LigD RecBCD """"""""""""""""! $$$$$$$$$$$$$$$$$$$! En4 Ku

- DNA end blocked with S-bond Supplementary Figure 4. Substrate specificities of Prim-PolC and LigC1.

(a) An EMSA showing the optimal substrate binding specificity of Prim-PolC.

Reaction mixtures composed of 30nM of substrate, with a nick or single stranded gap of various lengths, with or without phosphorylation of the 5' end of the lesion and 300nM Prim-PolC were incubated on ice for 20 minutes and resolved on a 5% native polyacrylamide gel. A filled triangle indicates the unshifted probe, whereas the arrow indicates a discreet bound complex of

PrimPolC and DNA. (b) A schematic representation of proteins isolated from mycobacterial extracts that bound to a variety of different DNA substrates coated onto magnetic beads (black filled circles). Proteins were isolated by pulling-down these beads, washing them extensively and subsequently identified by mass spectrometry analysis.

a!

X=T! X=G! X=C! X=A!

Ø! A! B! Ø! A! B! Ø! A! B! Ø! A! B!

P+rNTP! P+dNTP! P!

dA+A! dC+C! dG+G! dT+U!

!"#$ b! X! )*# )*# !"%+$)*# %&'($ !"#$ //$(01## !"!# !"$%# !"&%# !"'%# !"(%# # !"! !"% )'&*$$!+$$,-./01$ !"!# !"#$ !"$%# !"#$ !"&%# !"#$ !"'%# /# !"#$ 2# /# !"(%# !"#$ !"!)*# ,&-.# !"#$ !"%)*# ,&-.# !"#$ ,&-.# /# !"%+$)*# Supplementary Figure 5.

(a) Competition assay for deoxy- versus ribo- nucleotides. The ratio of deoxy : ribo is 100:1 with 50nM : 500pM in A and 5nM : 50pM in B. The schematic of the substrate is depicted below the gel. The templating base is defined with an X and the complementary deoxy : ribo mix is used as the cofactor in a gap- filling reaction. The activity was assayed as described in the methods with the following exceptions; 30°C incubation for 30 minutes with the products resolved on a 25% denaturing gel. The preference factor, F, is given by F = P

+ rNMP / P + dNMP as determined by densitometry. F was determined for a minimum of 5 experiments at two concentrations and averaged for the four base pairings. (b) LigC1 ligation assays. Ligation reactions were carried out in the Tris-pH 7.5 based buffer containing 300nM of LigC1, 0.1mg/ml BSA, 1mM

DTT with addition of 1mM ATP, 100µM manganese and 5mM magnesium as cofactors for 45 min at 37°C or up to 2 hours for LigD substrate (D:R+1bl).

Substrates used for ligation are depicted as a cartoon on the right side of panel b. Phosphorylation of DNA ends is marked with a red circle and green and blue diamonds represent ribo- and deoxyribonucleotides, respectively.

a Loop 3 b Loop 1 8! C ! !8! C P334 K324

Loop 2 W220 Y318 P317 M325 P323 W219

N Y322 S95 P94 R224 F93 K221

c Loop 3 Loop 3 Loop 1 d Loop 1 C C !8! !8!

Loop 2 Loop 2 R63 R63

K23 K23

P65 N20 P65 N20 N N

K35 K35 Supplementary Figure S6. Structural differences between Prim-PolC and Prim-PolD. (a) Superposition of Msm Prim-PolC (sky blue) and Mtu Prim-

PolD (lemon, PDBID: 3PKY). Loop 1 and Loop 2 are shown in dark blue and cyan, respectively. The loops occupy differing positions, reinforcing the idea that the functions of the conserved structural elements have developed to stabilize the Loop 3 element (pink). (b) Loop 1 and Loop2 residues involved in stabilising Loop 3 are highlighted along with the conserved residues of Loop

2. (c) Superposed DNA as in Figure 5D. In this instance, the full path of the template DNA is shown with the DNA (red) that clashes with Loop 3 in translucent form. The conserved residues involved in phosphate binding and

DNA interaction are highlighted. (d) As C, but the superposed DNA (red), UTP

(green) and catalytic metal ions (magenta) are from the Mtu Prim-PolD pre- ternary structure (PDBID: 3PKY). This representation illustrates the catalytic core is structurally conserved.

!" Supplementary Figure S7. Stereo view of electron density for Loop 3 of

Prim-PolC. A stick representation of the C-terminal residues G313-P333 that make up Loop 3 (pink). Density from a weighted 2Fo-Fc map scaled at 1.1σ is also depicted in grey. Loop 1 and Loop 2 residues are coloured blue and cyan, respectively.

5mM H2O2! 50mM tert-Butyl hydroperoxide! 1.00E+08! 1.00E+08!

1.00E+07! 1.00E+07! mc2 155 1.00E+06! !primPolC 1.00E+06! !"#$%& !"#$%& !ligC1C2 1.00E+05! 1.00E+05! !ligD 1.00E+04! !ligD,primPolC 1.00E+04! !polA 1.00E+03! 1.00E+03! 0! 1! 2! 3! 0! 1! 2! 3! time of exposure [h]! time of exposure [h]!

Supplementary Figure 8.

Plots of CFU data for phenotypes of mutants lacking Prim-Pols and ATP- dependent DNA ligases treated with inorganic and organic hydroperoxides:

5mM H2O2 – hydrogen peroxide and 50mM tert-butyl hydroperoxide. Cells were plated at 1, 15, 30 or 60 min after treatment.

! Supplementary Figure 9. Uncropped versions of the western blots and gels used in this study are shown in the panels below.!

Figure 1b

+ Interactor + Anti- LigC1 + Interactor + Anti- Prim-PolC

XseB BSA PolD2 Ku Nth XseB BSA PolD2 Ku Nth

EndoIV Prim-PolC LigC1 FPG EndoIV LigC1 Prim-PolC FPG

XthA ExoIII LigC1 MPG PolA XthA ExoIII Prim-PolC MPG PolA - Interactor + Anti- LigC1 - Interactor ! + Anti- Prim-PolC

Ku Nth XseB BSA PolD2 Ku Nth XseB BSA PolD2

EndoIV LigC1 EndoIV Prim-PolC LigC1 FPG Prim-PolC FPG

ExoIIIA ExoIII LigC1 MPG PolA ExoIIIA ExoIII Prim-PolC MPG PolA Figure 2a Figure 2b

1nt Gap/5ʼ-℗ 3ʼ Ov/5ʼ-℗ 1nt Gap 1nt Gap/5ʼ-℗ Prim-PolC Prim-PolC Ø Prim-PolC Prim-PolD Ø Prim-PolC Prim-PolD

Figure 2c Figure 2d

pss-0 pss-4 pss-6 pss-0 pss-4 pss-6

Ø Ø Ø Ø Ø Ø

50 nt

50 nt Figure 3a

rNTPs rNTPs/5ʼ-℗ dNTPs dNTPs/5ʼ-℗ Ø n 1 2 3 5 ov n 1 2 3 5 Ø n 1 2 3 5 ov n 1 2 3 5 +4 +3 +2 +1 0

Figure 3b

rNTPs rNTPs/5ʼ-℗ dNTPs dNTPs/5ʼ-℗ Ø n 1 2 3 5 ov n 1 2 3 5 n 1 2 3 5 ov n 1 2 3 5

P

+3 +2 +1 0 Figure 4b

Figure 4c Figure 6a

E E/S S! Anti-Ku!

! Anti-Prim-PolC!

C! Anti-LigC1! Supplementary Figure 2

kDa! !1 2 3 4 5 6 7 8 9 10 11 12! 250!

150!

100!

75! 50! 37! 25!

20!

15! Supplementary Figure 3b

strong interaction weak/no interaction

[kDa]

50 40 LigC no-tag LigC no-tag

30 FPG-HIS MPG-HIS LigC digestion with thrombin 20 "!!!!#$!!!!%!!!!!&!! "!!!!#$!!!!%!!!!!&!! [kDa]

50 40 LigC-HIS LigC no-tag no interaction no interaction

BSA BSA

FPG-HIS MPG-HIS

"!!!!#$!!!!%!!!!!&!! "!!!!#$!!!!%!!!!!&!!

L-load FT-flow through W-wash6 E-elution 300mM imidazole! Supplementary Figure 4a

Non-phosphorylated! 5ʼ-℗ gap ! n! 1! 2! 3! 5! 10! ov! ss! n! 1! 2! 3! 5! 10! ov! ss! Supplementary Figure 5a

X=T X=G X=C X=A

Ø A B Ø A B Ø A B Ø A B

P+rNTP P+dNTP P

dA+A! dC+C! dG+G! dT+U!

Supplementary Figure 5b ! ,- ,-! '()*+,-! "#$%! ..+012!! '('! '(+)! '(3)! '(4)! '(0)! '(' '() &$#'!!()!!*+,-./!

.! /! .!

.!