bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1 A nucleotide resolution map of Top2-linked

2 DNA breaks in the yeast and 3 4 William Gittens, Dominic J. Johnson, Rachal M. Allison, Tim J. Cooper, Holly Thomas, and 5 Matthew J Neale 6 7 Genome Damage and Stability Centre, University of Sussex, UK, BN1 9RQ 8 9 Corresponding author: [email protected] 10 11

12 Abstract

13 DNA topoisomerases are required to resolve DNA topological stress. Despite this essential

14 role, abortive topoisomerase activity generates aberrant -linked DNA breaks,

15 jeopardising genome stability. Here, to understand the genomic distribution and

16 mechanisms underpinning topoisomerase-induced DNA breaks, we map Top2 DNA

17 cleavage with strand-specific nucleotide resolution across the S. cerevisiae and human

18 genomes—and use the meiotic Spo11 protein to validate the broad applicability of this

19 method to explore the role of diverse topoisomerase family members. Our data

20 characterises Mre11-dependent repair in yeast, and defines two strikingly different fractions

21 of Top2 activity in humans: tightly localised CTCF-proximal and broadly distributed

22 transcription-proximal. Moreover, the nucleotide resolution accuracy of our assay reveals

23 the influence primary DNA sequence has upon Top2 cleavage—for the first time

24 distinguishing canonical DNA double-strand breaks (DSBs) from a major population of DNA

25 single-strand breaks (SSBs) induced by etoposide (VP16) in vivo.

26

27

28 Introduction

29 DNA topoisomerases are a broad and ubiquitous family of enzymes that tackle topological

30 constraints to replication, transcription, the maintenance of genome structure and

31 segregation in mitosis and meiosis (Pommier et al., 2016). Although the

32 specific mechanisms by which this is accomplished vary considerably across the family,

33 key aspects are shared: including single or double-strand DNA cleavage to form a transient

34 covalent complex (CC), which allows alteration of the topology of the nucleic acid substrate

35 prior to relegation (Wang, 2009). These processes are essential but carry with them a

36 significant risk to genome stability because the CC may be stabilised as a permanent bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

37 protein-linked DNA break by several physiological factors; such as the proximity of other

38 DNA lesions (Sabourin and Osheroff, 2000; Kingma et al., 1995; Khan et al., 2003; Vélez-

39 Cruz et al., 2005; Deweese and Osheroff, 2009; Wang et al., 2000; Wang et al., 1999), the

40 collision of transcription and replication complexes (Wu and Liu, 1997; Hsiang et al., 1989),

41 denaturation of the topoisomerase (Liu et al., 1983), or by the binding of small molecules

42 that inhibit religation (Reviewed in (Nitiss, 2009)). Topoisomerase-induced DNA strand

43 breaks have been proposed to constitute a significant fraction of the total damage to

44 genomic DNA per day, and have been linked to the genesis and development of various

45 cancers (Lin et al., 2009; Haffner et al., 2010; Sasanuma et al., 2018), including a subset of

46 therapy-related acute myeloid leukaemias (t-AML) caused by the use of Topoisomerase 2

47 (Top2) poisons in the chemotherapeutic treatment of primary cancers (Rowley and Olney,

48 2002; Cowell et al., 2012).

49

50 In S. cerevisiae, either Top1 or Top2 activity is sufficient to support transcription (Brill et al.,

51 1987; Lotito et al., 2008; Bermejo et al., 2009). However, whilst top1∆ cells are viable, Top2

52 is essential for sister chromatid segregation (Baxter and Diffley, 2008; DiNardo et al., 1984;

53 Holm et al., 1985; Uemura and Yanagida, 1984). In contrast to S. cerevisiae, all known

54 vertebrate species encode two Top2 (TOP2α and TOP2β). Interestingly, whilst

55 TOP2α is essential for cellular proliferation—cells arrest in mitosis in its absence (Akimitsu

56 et al., 2003)—TOP2β is not. Rather, TOP2β apparently plays an important role in promoting

57 transcriptional programmes associated with neuronal development (Tiwari et al., 2012;

58 Gómez-Herreros et al., 2014), a function that cannot be supported by TOP2α.

59

60 Accurate maps of the positions of topoisomerase-DNA covalent complexes (Top2 CCs)

61 genome-wide and throughout the cell cycle provide insights into topological genome

62 structural organisation, as well as providing the tools to study their repair should they

63 become permanent DNA lesions. Whilst the dual catalytic sites present within the

64 homodimeric Top2 enzyme suggest a primary role in DNA double-strand break (DSB)

65 formation driven by a biological need for decatenation (Pommier et al., 2016), previous

66 research has suggested that etoposide and other Top2 poisons may induce a population of

67 DNA single-strand breaks (SSBs), due to independent inhibition of each active site

68 (Wozniak and Ross, 1983; Long et al., 1986; Zwelling et al., 1981; Pommier et al., 1984;

69 Bromberg et al., 2003). Yet, despite the use of etoposide in chemotherapy, and the different

70 toxicity of these two classes of DNA lesion, the prevalence of Top2-SSBs remains unclear.

71 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

72 More recently, general DSB mapping techniques have been applied to map etoposide-

73 induced lesions (Canela et al., 2017; Crosetto et al., 2013; Yan et al., 2017). However, such

74 methods are only able to map DSB ends, which therefore excludes and obscures

75 etoposide-induced SSBs. Moreover, these methodologies lack specificity for protein-linked

76 DNA ends, and require nucleolytic processing to blunt 5’ DNA termini as part of sample

77 preparation (Canela et al., 2017). When combined, these factors lead to loss of nucleotide

78 resolution at the site of Top2 cleavage.

79

80 To generate a more complete, and direct, picture of topoisomerase action across the

81 genome, we present here a technique for nucleotide resolution mapping of protein-linked

82 SSBs and DSBs (referred to collectively here as “Covalent Complexes”; CC), which we

83 demonstrate to be generally applicable in both yeast and human cell systems. We establish

84 with high confidence that the technique is able to map both the positions of Top2 CCs and

85 also CCs of DNA linked to the meiotic recombination protein Spo11, which is related to the

86 archaeal Topoisomerase VI (Bergerat et al., 1997). Furthermore, we provide insights into the

87 spatial distribution of human TOP2 CCs, including comparative analysis of their local

88 enrichment around transcription start sites (TSSs) and CTCF-binding motifs. We find that

89 TSS-proximal TOP2 CC levels are strongly correlated with transcription, a question that has

90 come under scrutiny recently (Canela et al., 2017). Finally, we present compelling evidence

91 that etoposide induces a majority of Top2-linked SSBs in vivo, corroborating previous

92 research and revealing, for the first time, that this is governed by primary DNA sequence at

93 the cleavage site.

94

95 Results

96 CC-seq enriches Spo11-linked DNA fragments

97 To elucidate the in vivo functions of topoisomerase-like enzymes, we set out to establish a

98 direct method (termed ‘CC-seq’) to enrich and map protein-DNA covalent complexes (CC)

99 genome-wide with nucleotide resolution. Silica fibre-based enrichment of CCs has been

100 used to study the adenovirus DNA terminal protein complex (Coombs and Pearson, 1978),

101 and in the original identification of the meiotic DSB-inducer Spo11 (Keeney and Kleckner,

102 1995; Keeney et al., 1997), an orthologue of archaeal Topoisomerase VI (Bergerat et al.,

103 1997). We first verified this enrichment principle using meiotic sae2Δ cells, in which Spo11-

104 linked DSBs are known to accumulate at defined loci due to abrogation of the nucleolytic

105 pathway that releases Spo11 (Keeney and Kleckner, 1995; Neale et al., 2005). To

106 demonstrate specific enrichment of protein-linked molecules, total genomic DNA from bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

107 meiotic S. cerevisiae cells was isolated in the absence of proteolysis, digested with PstI

108 restriction enzyme, and isolated on glass-fibre spin columns (Figure 1A, Methods). We

109 used eluted material to assay a known Spo11-DSB hotspot by Southern blotting (Figure

110 1B). While DSB fragments are a minor fraction of input material (~10% of total), and were

111 absent in wash fractions, DSBs accounted for >99% of total eluted material, indicating

112 ~1000-fold enrichment relative to non-protein-linked DNA (Figure 1B).

113

114 CC-seq maps known Spo11-DSB hotspots genome-wide with high reproducibility

115 To generate a genome-wide map of Spo11-DSBs, genomic DNA from meiotic sae2∆ cells

116 was sonicated to <400 bp in length, enriched upon the silica column, eluted, and ligated to

117 DNA adapters in a two-step procedure that utilised the known phosphotyrosyl-unlinking

118 activity of mammalian TDP2 to uncap the Spo11-bound end ('CC-seq'; Figure 1A /

119 Methods) (Cortes Ledesma et al., 2009; Johnson et al., 2019). Libraries were paired-end

120 sequenced and mapped to the S. cerevisiae reference genome (Table S1) alongside reads

121 from a previous mapping technique ('Spo11-oligo-seq') that relies on the isolation of

122 Spo11-linked oligonucleotides generated in wild-type cells during DSB repair (Pan et al.,

123 2011). CC-seq revealed sharp, localised peaks ('hotspots') in SPO11+ cells (Figure 1C,

124 middle) that visually (Figure 1C) and quantitatively (Figure 1S1A, r=0.82) correlate with

125 Spo11-oligo seq, and that are absent in the spo11-Y135F control strain (Figure 1C,

126 bottom) in which Spo11-DSBs do not form (Bergerat et al., 1997), demonstrating that

127 signal detected by CC-seq accurately reflects the distribution of Spo11 cleavages.

128 Moreover, CC-seq maps generated from independent biological samples were highly

129 correlated (Figure S1B; r = 0.98), demonstrating very high reproducibility. Finally, when

130 analysed at nucleotide resolution, CC-seq revealed high correlation (r = 0.85) between the

131 frequency of Watson-mapping and Crick-mapping cleavage sites, when offset by a single

132 bp (Figure 1E and S1C). This correlation is expected because of the 2 bp 5' overhang

133 generated at Spo11-DSBs in vivo (Fig1D; e.g. (Liu et al., 1995))—demonstrating the

134 nucleotide resolution accuracy of CC-seq, which is further supported by our observation of

135 nucleotide composition preference at the site of cleavage (Figure 1E).

136

137 CC-seq enriches and maps Top2-linked DNA fragments from S. cerevisiae

138 Confident in the specificity of CC-seq to map meiotic protein-linked DSBs with high

139 resolution and dynamic range, we next employed CC-seq to map within S. cerevisiae the

140 covalent complexes generated naturally by Topoisomerase 2 (Top2) that become stabilised

141 upon exposure to etoposide, and are thus a proxy for sites of Top2 catalytic activity bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

142 (Minocha and Long, 1984; Chen et al., 1984; Wu et al., 2011). Because S. cerevisiae is

143 relatively insensitive to etoposide, we utilised a strain background ('pdr1∆') in which the

144 action of the major drug export pathways are downregulated due to modulation of PDR1

145 activity (Stepanov et al., 2008). As expected, the pdr1∆ background dramatically increased

146 cellular sensitivity to etoposide, which was further enhanced by mutation of SAE2 and

147 MRE11 (Figure 2A), factors involved in the repair of covalent protein-linked DNA breaks

148 (Neale et al., 2005; Garcia et al., 2011; Anand et al., 2016; Cannavo and Cejka, 2014;

149 Cannavo et al., 2018; Hoa et al., 2016; Hartsuiker et al., 2009).

150

151 Next, we prepared sequencing libraries from mitotically growing etoposide-treated control

152 (pdr1∆) S. cerevisiae cells, or cells that also harboured sae2∆ and mre11∆ mutations (Table

153 S2). A human DNA spike-in was included to enable calibration of relative signal between

154 strains (Methods). Libraries displayed high replicate-to-replicate reproducibility (r = 0.89,

155 0.95 and 0.97 for control, sae2∆ and mre11∆, respectively; Figures S2A-C), therefore

156 replicates were pooled. In untreated control cells, Top2 CC signal was distributed

157 homogeneously across the genome, with few strong peaks (Figure 2B). Upon etoposide

158 treatment, sharp single-nucleotide peaks arose at similar locations across the genome in all

159 strains (Figure 2B and S2D-E), but with increased amplitude in the repair mutants—thereby

160 directly linking the MRX pathway to repair of Top2 CCs in S. cerevisiae, and corroborating

161 prior research in other organisms (Hartsuiker et al., 2009; Hoa et al., 2016; Nakamura et al.,

162 2010).

163

164 Global analyses revealed Top2 activity in S. cerevisiae to be relatively enriched in divergent

165 and tandem intergenic regions (IGRs) and depleted in intragenic and convergent IGRs

166 (Figure 2B and C), similar to, but less pronounced than, the patterns of Spo11 (Figure 2B

167 and C). Such connections to global organisation are likely driven by the need for

168 Spo11 and Top2 to interact with the DNA helix—an interpretation underpinned by an

169 anticorrelation with nucleosome occupancy (Figure S1E and S3A). Nevertheless, the fact

170 that S. cerevisiae Top2 CC signal is not correlated with proximal gene expression—neither

171 in untreated cells, nor following etoposide exposure (Figure S3B)—suggests that in the

172 gene-dense S. cerevisiae genome topological stress may be dealt with by Top2 at sites

173 dislocated from where it is generated. By contrast, Spo11 activity is positively correlated

174 with gene expression (Figure S1D), perhaps due to the influence transcription has upon the

175 generation of higher order chromatin loop structures (Schalbetter et al., 2018), which in turn

176 are known to pattern Spo11-DSBs (Sun et al., 2015). bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

177

178 Interestingly, Top2 CC signal remained weakly correlated over greater distances than

179 Spo11 (Figure 2D), suggesting localised 5-10 kb wide regions of enriched activity—much

180 larger than Spo11-DSB hotspots (generally <500 bp in size; Figure 1C; (Pan et al., 2011)).

181 Top2 activity was strongly correlated between Watson and Crick strands when offset by 3

182 bp (Figure 2E and S2F-K)—as expected for the 4 bp overhang generated by Top2-DSBs—

183 further demonstrating the nucleotide resolution accuracy of CC-seq. Together these data

184 describe how CC-seq reveals a strand-specific nucleotide resolution map of etoposide-

185 induced Top2 CCs.

186

187 CC-seq maps TOP2 CCs in the Human genome

188 Having demonstrated that CC-seq is applicable for mapping two divergent types of

189 topoisomerase-like covalent complexes in S. cerevisiae, we applied CC-seq to map TOP2

190 activity in a human cell system. Asynchronous, sub-confluent RPE-1 cells were treated with

191 etoposide (VP16) in the presence of MG132 proteasome inhibitor, in order to limit potential

192 proteolytic degradation of TOP2 CCs that might otherwise hamper enrichment (Figure 1A).

193 Slot-blotting and immunodetection with anti-TOP2β antibody demonstrated complete

194 recovery of input TOP2β-linked CCs within the column elution, with no TOP2β remaining in

195 the flow through or being removed by the washes (Figure 3A). Eluted material was used to

196 generate sequencing libraries from three replicate control (–VP16) and four replicate

197 etoposide-treated samples (+VP16), and high-depth paired-end sequencing reads (Table

198 S2) were aligned to the human genome (hg19; Methods).

199

200 Replicates displayed high correlation in the distribution of TOP2 CC signal at broad scale (r

201 values ≥ 0.79; Figure S4A-B), and so were pooled. Visual inspection revealed that -VP16

202 and +VP16 signals are spatially-correlated (Figure 3B), but that +VP16 signal intensity is

203 less uniform, with more dynamic range (Figure S4C-D). These differences are likely due to

204 higher signal-to-noise enabled by enrichment of TOP2 CC following VP16 treatment. Like

205 yeast Top2 libraries (Figure 2E), nucleotide resolution analysis of human CC-seq libraries

206 displayed a skew towards Watson-Crick read-pairs that are offset by 3 bp, as expected for

207 the 4 bp 5’ overhang generated by TOP2 (Figure S4E). This skew is much greater in VP16-

208 treated samples (Figure S4E), supporting the view that +VP16 libraries have greater Top2

209 CC enrichment.

210 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

211 The human genome encodes two type-IIA topoisomerases, TOP2α and TOP2β

212 (Introduction). To demonstrate the specificity of CC-seq to map human TOP2 activity we

213 used CRISPR-Cas9 to generate a human RPE-1-derived cell line with homozygous

214 knockout mutations in the non-essential TOP2B gene, and arrested cells in G1 (Figure S5A)

215 when TOP2α is not expressed (Heck et al., 1988; Woessner et al., 1991; Negri et al., 1992).

216 TOP2α and TOP2β expression was undetectable in such G1-arrested cells (Figure S5B),

217 and importantly, VP16-induced ɣ-H2AX foci were reduced ~7-fold relative to wild type

218 control lines, indicating that most of the signal is TOP2β-dependent (Figure S5C-D).

219

-/- 220 Next, we applied CC-seq to asynchronous and G1-arrested wild type and TOP2B cells. In

221 G1, VP16 exposure induced localised regions of enriched TOP2 CC formation in wild type

-/- 222 cells that were largely diminished or absent in TOP2B cells (Figure S5E). By comparison,

223 TOP2β deletion did not prevent VP16-induced signal in asynchronous cells (Figure S5E), in

224 which TOP2α is still present. Together, these results indicate that CC-seq detects a mixture

225 of both TOP2α and TOP2β covalent complexes in wild type human cells depending on the

226 cell cycle phase.

227

228 Spatial distribution of Top2 activity in the human genome

229 The human genome is divided into a relatively gene-rich ‘A’ compartment and a relatively

230 gene-poor ‘B’ compartment, the 3D spatial segregation of which can be determined by HiC

231 (Lieberman-Aiden et al., 2009). In both untreated and VP16-treated asynchronous wild type

232 cells, TOP2 CC are enriched in the active A compartment (Figure 3B and C), consistent

233 with the role of TOP2 in facilitating transcription. TOP2 activity also correlated with regions

234 of negative DNA supercoiling (Naughton et al., 2013) (Figure 3B). These observations

235 provide a functional link between TOP2 activity, DNA topological stress, and large-scale

236 chromatin compartments.

237

238 We next focused our attention in more detail on the genome-wide pattern of TOP2 activity

239 enriched by VP16 treatment. Simple visual inspection (Figure 3D) suggested that TOP2 CC

240 signals are enriched around genome features previously identified as regions of high TOP2-

241 binding and etoposide-induced DNA strand breakage, including sites of CTCF-binding

242 (Uusküla-Reimand et al., 2016; Canela et al., 2017) and active transcription start sites

243 (TSSs) marked by H3K4Me3 and H3K27Ac (Baranello et al., 2014; Yang et al., 2015).

244 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

245 Genomic maps of etoposide-induced DNA breaks were recently generated using a related

246 methodology, ‘END-seq’ (Canela et al., 2017), that is not specific for protein-linked TOP2

247 CC. Aggregating END-seq data around strong CC-seq positions revealed concordant peak

248 signals at broad scale (Figure 3E), demonstrating a general agreement in the positions

249 reported by the two techniques. However, at finer scale, END-seq positions are offset by 1-

250 15 bp in the 3’ direction relative to CC-seq positions on each strand (Figure 3F)—

251 something not observed when aggregating CC-seq data around the same loci (Figure 3F).

252 This important difference is likely to be explained by the different populations of DNA

253 breaks mapped by each technique: specific TOP2 cleavage positions (CC-seq) versus DNA

254 breaks that have undergone limited nucleolytic resection (END-seq).

255

256 To investigate the relationship between CTCF and TOP2 activity, we filtered CTCF motifs to

257 include only those that overlapped an RPE-1 CTCF ChIP-seq peak (Methods; (Akimitsu et

258 al., 2003)). After aligning Watson and Crick motifs in the same orientation, we frequently

259 observed a strong TOP2 CC peak upstream of the motif centre and accompanied by more

260 distal, weaker peaks on both sides of the motif (Figure 4A upper panel, and S6A). Loci

261 containing multiple CTCF-binding motifs display more complex TOP2 CC patterns (Figure

262 4A upper panel, and S6B) with enrichment on both sides of the double motif, as would be

263 expected from the aggregation of a heterogeneous population of CTCF and TOP2 activity

264 present across different cells.

265

266 We next stratified CTCF motifs into quantiles based on proximal CTCF binding intensity,

267 and aggregated CC-seq signals centred on these loci (Figure 4A, middle and bottom

268 panels). Most Top2 CC signal is concentrated in two peaks located ±~54 bp relative to the

269 centre of the CTCF motif, with the upstream peak ~2-fold more intense (Fig 4A, middle

270 and bottom panels). Remaining Top2 CC signal is focused in a flanking array of weaker

271 peaks. This signal pattern has been reported recently for VP16-induced DSBs (Canela et

272 al., 2017; Gothe et al., 2018), and has been suggested to result from a role for TOP2 in

273 facilitating genome organisation by cohesin-dependent loop extrusion operating in the

274 context of CTCF and other boundary elements. In support of this interpretation, total

275 aggregated TOP2 CC signal across these regions is positively correlated with CTCF

276 occupancy as measured by ChIP-seq (Figure 4A, middle and bottom); interestingly

277 however, TOP2 CCs were not correlated with interaction strength of associated loops

278 (Figure S6C), annotated previously by Hi-C (Darrow et al., 2016). The positions of TOP2 CC

279 surrounding CTCF motifs anticorrelate with the well-positioned nucleosomes at these sites bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

280 (Figure 4B). However, such anticorrelation with nucleosome positioning is not a unique

281 feature of CTCF loci, but is instead a general property of all TOP2 CC sites (Figure 4C),

282 consistent with maps of both Top2 CC (Figure S3B) and Spo11 (Figure S1E; (Pan et al.,

283 2011)) in S. cerevisiae.

284

285 In human cells, etoposide-induced TOP2 CCs are also enriched around gene TSSs (Figure

286 5), typically concentrated in broad peaks ~1-2 Kbp downstream and ~0.5-1 Kbp upstream

287 of the TSS (Figure 5A; (Yang et al., 2015)). Similar to elsewhere in the genome, in the region

288 immediately downstream of the TSS (0-400 bp), TOP2 CC signal anticorrelated with

289 nucleosome occupancy (Figure S6D). Stratifying TSSs based on gene expression

290 microarray data (Ganem et al., 2014) revealed a strong positive correlation between TSS-

291 proximal TOP2 CC enrichment and gene expression (Figure 5A, middle and bottom)—

292 something not identified when prior etoposide-induced DSB maps were compared to

293 nascent transcription as measured by GRO-Seq (Canela et al., 2017). Thus, while human

294 TOP2 activity is correlated with total transcription of the gene, it is not proportional to local

295 RNA polymerase activity when assayed at finer scale—suggesting a model where, as

296 postulated for S. cerevisiae (Figure S3A), superhelical stress is resolved at sites dislocated

297 from the process that generates them. Indeed, when analysed with an identical pipeline,

298 both TSS-proximal TOP2 CCs (Figure 5A) and etoposide-induced DSBs (Canela et al.,

299 2017) positively correlate with gene expression (Figure S6E), indicating that this correlation

300 is not due to technical differences in mapping methodologies. Based on the twin-domain

301 model of DNA supercoiling, topological stress is expected to increase not just with gene

302 expression level, but also with increasing gene length (Liu and Wang, 1999). In support of

303 this idea, we observe a strong positive correlation between human TOP2 CC enrichment

304 and gene length (Figure S7A, top) independent from gene expression (Figure S7A,

305 bottom). Interestingly, no such correlation with gene length is present in S. cerevisiae

306 (Figure S7B), further supporting a model whereby, in this compact gene-dense genome,

307 Top2 activity in the IGR is neither directly linked to expression nor length of the proximal

308 gene.

309

310 Notably, the increased density of etoposide-induced human TOP2 CCs at strongly

311 expressed is 2.7-fold greater than at strongly bound CTCF motifs (Figure 5B

312 and C). Thus, whilst strongly-bound CTCF motifs (14,924) greatly outnumber active genes

313 (4,194)—leading, globally, to more TOP2 activity in total at CTCF sites (Figure 5D)—our

314 data reinforce the important role of topoisomerase activity during transcription. bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

315

316 Local DNA sequence and methylation status direct the formation of Top2 CCs

317 Previous reports suggest that etoposide induces a majority of SSBs, rather than DSBs, due

318 to independent poisoning of each active site in the Top2 homodimer (Bromberg et al.,

319 2003). In our yeast and human datasets, we similarly observe more single-strand Top2 CCs

320 (90.1% and 94.2%, respectively) than double-strand Top2 CCs (9.9% and 5.8%,

321 respectively) following etoposide exposure (Figure 6A). Interestingly, MRE11 deletion

322 increased the proportion of double-strand Top2 CCs (Figure S3C 12.9%, z-test p-value <

-16 323 2.2 x 10 ), consistent with the role of MRN/X in protein-linked DNA repair (Neale et al.,

324 2005; Garcia et al., 2011; Anand et al., 2016; Cannavo and Cejka, 2014; Cannavo et al.,

325 2018; Hoa et al., 2016; Hartsuiker et al., 2009).

326

327 Prior, fine-scale mapping of Top2 cleavage at specific substrates indicated that cleavage

328 patterns are influenced by local DNA sequence composition (Pommier et al., 1991;

329 Borgnetto et al., 1996; Strumberg et al., 1999; Wu et al., 2011). To investigate the generality

330 of such findings, and to precisely determine how DNA sequence influences Top2 site

331 preference across the yeast and human genome, we computed the nucleotide resolution

332 average base composition around Top2 CC SSB and DSB sites (Fig 6B)—something made

333 possible by the positional accuracy of our mapping technique. For DSBs, average DNA

334 sequence patterns are rotationally symmetrical around the dyad axis of cleavage (Figure

335 6B, left), consistent with the known homodimeric nature of eukaryotic Top2 (Pommier et

336 al., 2016). Base positions with notable skews are restricted to the region ±11 bp from the

-4 337 dyad axis (p<10 , Chi-squared test), consistent with the DNA contacts made by Top2 (Lee

338 et al., 1989; Thomsen et al., 1990; Wu et al., 2011), yet with minor, species-specific

339 differences at a few positions (±2, ±7, ±9; Figure 6C, top), suggesting subtle differences in

340 the ScTop2 and HsTOP2 DNA binding surface. Positions -3 and +3 are the bases

341 immediately 5’ to the scissile phosphodiester bonds cleaved by Top2 (Figure 6B). In the

342 averaged DSB sequence motif, strong preference for cytosine at these positions (69.0% of

343 DSBs) on the scissile strand (Figure 6B, left) agrees with in vitro studies of Top2

344 mechanism—attributed to favourable base-stacking interactions of etoposide with

345 guanosine at the -1 position on the non-scissile strand (Pommier et al., 1991; Strumberg et

346 al., 1999; Wu et al., 2011).

347

348 Importantly, whilst SSBs display similar overall symmetrical sequence features as DSBs,

349 SSBs differed substantially at position +3, where the strong preference for guanine on the bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

350 scissile strand (and paucity of cytosine on the non-scissile strand) is absent (Figure 6B,

351 right, and Figure 6C, bottom). Furthermore, the preference for cytosine at the -3 position

352 on the scissile strand increases to 88.1% for SSBs. This clear difference in sequence bias

353 provides strong evidence that such positions truly reflect sites of preferred VP16-induced

354 SSBs vs DSBs in vivo.

355

356 The human genome is subject to methylation of cytosine (meC) at CpG sites. Dramatically,

357 meC completely abolishes the formation of VP16-induced TOP2 CCs (Figure 6D),

358 suggesting this bulky group may interfere sterically with the VP16-Top2-DNA ternary

359 complex, modifying TOP2 catalysis. Collectively, these observations demonstrate that local

360 DNA sequence composition—including epigenetic marks—are major determinants of both

361 TOP2 CC abundance and the differential generation of SSBs and DSBs by VP16 in vivo.

362

363

364 Discussion

365 Topoisomerases are ubiquitous gatekeepers to the genetic material—facilitating changes to

366 DNA topology, nuclear organisation, and chromosome segregation that are essential for the

367 generation and propagation of all life on Earth. Yet, topoisomerase dysfunction can also

368 prove harmful, generating difficult to repair protein-linked DNA breaks that create risks to

369 genome stability. Thus, understanding how, where, and when topoisomerase activity takes

370 place is of great importance. Here, we demonstrate a robust, technically simple and

371 sensitive method to map sites of preferred type IIA topoisomerase activity across the

372 eukaryotic genomes of yeast and human cells at strand-specific nucleotide resolution. Our

373 methodology joins an extensive toolkit of complementary NGS solutions (Canela et al.,

374 2017; Crosetto et al., 2013; Yan et al., 2017), each with its own set of unique advantages.

375 However, many of these techniques do not achieve nucleotide resolution, either due to the

376 use of nucleolytic processing to blunt 5’ DNA termini (Canela et al., 2017), and/or inability to

377 directly map DNA breaks covalently linked to protein (Canela et al., 2017; Crosetto et al.,

378 2013; Yan et al., 2017). Here, by utilising the tyrosyl phosphodiesterase activity of

379 recombinant Tdp2 to unblock 5’-DNA termini non-nucleolytically, we are able to accurately

380 map the position of not just Top2 CCs, but also related classes of topoisomerase-like

381 enzymes such as Spo11, which acts uniquely in meiotic cells. Importantly, such specificity

382 in our cleavage maps enable us to corroborate and extend—with completely independent

383 methodology—prior research findings from meiosis (Pan et al., 2011), the impact that VP16

384 has on SSB versus DSB formation—and for the first time demonstrate the influence that bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

385 primary DNA sequence composition and methylation status has on the process of Top2

386 catalysis in vivo in both yeast and human cells.

387

388 Top2 activity has long been associated with transcriptional activity (Yang et al., 2015; Brill

389 et al., 1987; Baranello et al., 2014; Yu et al., 2017). In addition, recent research has

390 identified etoposide-induced DSBs that colocalise with a subpopulation of chromatin-

391 bound TOP2 on the loop-external side of CTCF- and Rad21-bound CTCF motifs (Canela et

392 al., 2017; Uusküla-Reimand et al., 2016). TOP2 activity at these sites has been proposed to

393 facilitate genome organisation driven by chromatin loop extrusion, and thereby to constitute

394 a major fraction of the total nuclear TOP2 activity (Canela et al., 2017). Moreover, contrary

395 to expectations, the frequency of etoposide-induced DSBs was found to be locally

396 uncorrelated with nascent RNA levels (Canela et al., 2017)—generating some confusion in

397 the understanding of where TOP2 activity is most prevalent and most important. Here, our

398 independent CC-seq method and analyses permit us to revisit this question in explicit

399 detail, and clarify the role of TOP2 in these processes. Importantly, whilst Top2 activity is

400 certainly highly enriched at CTCF-bound genomic loci, the total amount of activity at any

401 given CTCF locus is many times lower than observed at transcriptionally active genes.

402 Nevertheless, it is critical to emphasise that because of the great abundance of CTCF-

403 bound sites compared to active genes, total aggregated TOP2 activity is globally similar for

404 CTCF-associated versus transcriptionally active regions of the genome.

405

406 Our data also permit us to investigate how other genomic features influence Top2 activity.

407 For example, at both TSS and CTCF-proximal regions—and indeed in the wider genome of

408 both yeast and human—Top2 cleavages are highly enriched in nucleosome-free DNA, in

409 support of prior findings made at a subset of genomic loci (Capranico et al., 1990; Udvardy

410 and Schedl, 1991; Käs and Laemmli, 1992). Notably, whilst yeast Top2 activity is also

411 enriched near TSSs, it is concentrated immediately (120-140 bp) upstream, unlike human

412 TOP2 activity, which is primarily concentrated in a broader peak 1-2 kb downstream, within

413 the gene body. Furthermore, unlike human TOP2, yeast Top2 activity is uncorrelated with

414 downstream gene expression. These differences may stem from the very different scale

415 and transcriptional organisation of the yeast and human genomes, and the strong

416 patterning of Top2 activity driven by accessibility to nucleosome-free intergenic promoter

417 regions in S. cerevisiae. In support of this interpretation, similar nucleosomal constraints

418 appear to substantially govern the activity of the related topoisomerase-like enzyme,

419 Spo11, during meiosis (Pan et al., 2011) (Figure S1 herein). bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

420

421 Despite these differences in distribution of Top2 activity at this larger scale, the DNA

422 sequence preferences for yeast and human Top2 are similar, implying substantial

423 conservation of the DNA binding interface, which is supported by analysis of the published

424 crystal structure of VP16-TOP2-DNA (Wu et al., 2011). A nucleotide resolution genome-

425 wide map of E. coli gyrase activity was recently reported (Sutormin et al., 2018)), revealing a

426 sequence motif composed of a central 35 bp region containing the cleavage site, plus two

427 flanking 47 bp regions with ~10.5 bp periodicity centred ~±40 bp from the site of cleavage,

428 which are attributed to the known wrapping of proximal DNA around the gyrase C-terminal

429 domains (Reece and Maxwell, 1991). Strikingly, whilst we observe a core central motif for

430 Top2, there are no such flanking motifs over these ranges, in agreement with its lack of

431 DNA wrapping domains (Figure S3D). We do, however, observe two more-proximal

432 flanking regions centred ~±20 bp from the site of cleavage, each with ~10.5 bp periodicity

433 (Figure S3E), suggesting that Top2 also bends its DNA substrate—in agreement with a

434 recent biophysical study (Huang et al., 2017).

435

436 We have used CC-seq to probe MRX-dependent repair of Top2 CCs in S. cerevisiae,

437 corroborating and extending prior genetic experiments in this organism (Hamilton and

438 Maizels, 2010; Stepanov et al., 2008; Garcia et al., 2011), and others (Nakamura et al.,

439 2010; Hoa et al., 2016; Hartsuiker et al., 2009). Importantly, calibration of our signal using

440 an internal human standard enables the relative quantification of Top2 CCs, and thus

441 directly demonstrates for the first time that loss of Mre11 activity increases cellular levels of

442 Top2 CCs in S. cerevisiae. The advantage of the antigen-independent enrichment step is

443 three-fold. Firstly, the spike-in standard contains human TOP2 CCs, which are enriched to

444 a similar extent as the yeast Top2 CCs, and should therefore be less susceptible to

445 variation than background DNA. Secondly, the spike-in standard requires no genetic

446 epitope-tagging or reconstitution of recombinant protein-DNA complexes, unlike ChIP-seq

447 calibration methods (Hu et al., 2015; Grzybowski et al., 2015). Finally, we believe that this

448 calibration step could be performed using any other organism with a suitably divergent

449 genome. Thus, the capacity of CC-seq to quantify as well as map CCs between samples

450 makes this technique ideally suited to the future study of repair dynamics.

451

452 We have demonstrated that CC-seq maps CCs of both human TOP2α and TOP2β—

453 proteins that are conserved amongst all vertebrates and whose functional differences

454 remain a subject of interest within the field (Austin et al., 2018). Given the essential bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

455 requirement for TOP2α during mitosis (Akimitsu et al., 2003), and the apparent inability of

456 TOP2β to compensate for its loss (Grue et al., 1998), it will be particularly interesting to

457 determine the genomic distribution of TOP2α/TOP2β activities throughout the cell cycle,

458 and to relate this to 3D chromatin conformation measured by Hi-C. Whilst not essential for

459 cell proliferation, TOP2β plays an important role in promoting transcriptional programmes

460 associated with neuronal development (Tiwari et al., 2012), and this function cannot be

461 supported by TOP2α. Furthermore, loss of TDP2 function inhibits TOP2-dependent gene

462 transcription and leads to neurological symptoms including intellectual disability, seizures

463 and ataxia (Gómez-Herreros et al., 2014). Thus, accurate genomic maps of TOP2β CC

464 formation in these tissues will help to identify those genes most pertinent to the

465 development of neurodegeneration. Moreover, as demonstrated by our Spo11 maps, the

466 antigen-independent, high-efficiency protein-DNA enrichment process makes CC-seq

467 generally applicable for mapping not just Top2 activity in diverse organisms, but also for

468 mapping similar types of topoisomerase-DNA covalent complexes, or even those less

469 distinct DNA-protein crosslinks that arise upon ablation of the DNA-dependent protease

470 SPRTN/Wss1 (Stingele et al., 2015).

471

472 Acknowledgements

473 We thank Keith Caldecott, Antony Oliver and Peter Hornyak for sharing recombinant TDP2,

474 and Jon Nitiss for sharing the pdr1∆ strains.

475

476 Author Contributions

477 W.G., D.J. and M.J.N. conceived the project and developed the CC-seq methodology.

478 W.G., D.J., H.T. and R.A. generated whole genome CC-seq libraries. W.G. performed all

479 data processing and analysis, and all human cell work. D.J., H.T. and R.A. performed yeast

480 sensitivity and meiotic DSB assays. T.J.C. developed the mapping pipeline and associated

481 tools. W.G. and M.J.N. interpreted the observations and wrote the manuscript.

482

483 Competing interests

484 The authors declare no competing financial interests.

485

486 Funding statement

487 W.G., D.J., R.A., T.J.C., and M.J.N. were supported by an ERC Consolidator Grant

488 (#311336), the BBSRC (#BB/M010279/1) and the Wellcome Trust (#200843/Z/16/Z). bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

489

490 Data availability

491 All strains and cell lines listed in Table S3 and S4 are available on request. Sequencing

492 reads are available via the NCBI Sequence Read Archive (accession numbers pending).

493

494 Methods

495 Yeast strains, culture methods and treatment

496 The Saccharomyces cerevisiae yeast strains used in this study are described in Table 2 and

497 were derived using standard genetic techniques. Strains used for Spo11 mapping are

498 isogenic to the SK1 subtype and carry the sae2∆::kanMX gene disruption allele. Strains

499 used for Top2 mapping are isogenic to the BY4741 subtype and carry the pdr1DBD-CYC8

500 drug sensitivity cassette (Stepanov et al., 2008). For Spo11 DSB mapping, cells were

501 induced to undergo synchronous meiosis as follows: Cells were grown overnight to

502 saturation in 4 ml YPD medium (1% yeast extract, 2% peptone, 2% glucose supplemented

503 with 0.5 mM adenine and 0.4 mM uracil) at 30 ˚C, then diluted to OD600 0.2 in 200 ml YPA

504 medium (1% yeast extract, 2% peptone, 1% potassium acetate) and grown vigorously for

505 15 h at 30 ˚C. Cells were then washed with water, resuspended in 200 ml sporulation

506 medium (2% potassium acetate supplemented with diluted amino acids), incubated

507 vigorously for 6 h at 30 ˚C and harvested by centrifugation. For etoposide treatment, cells

508 were grown overnight to saturation in 4 ml YPD medium at 30 ˚C, then diluted to OD600 0.5

509 in 100 ml YPD and grown until OD600 2. 50 ml cultures were then incubated for a further 4 h

510 in the presence of either 1 mM etoposide or 2% DMSO, before harvesting by

511 centrifugation.

512

513 Human cell lines, culture methods and treatment

514 The human cell lines used in this study are described (Supplementary Table 3). Human

515 hTERT RPE-1 cells were obtained from ATCC and cultured at 37 ˚C, 5% CO2 and 3% O2 in

516 Dulbecco’s Modified Eagle’s Medium DMEM/F-12 (Sigma), supplemented with 10% Fetal

517 Calf Serum (FCS) and 90 Units/ml Penicillin-Streptomycin. For all experiments (CC-seq,

518 Slot Blot, WB, IF, FACs) with asynchronous cell populations, RPE-1 cells were seeded at a

3 2 519 density of (3.5x10 cells/cm l) and incubated for 72 h at 37 ˚C, to ensure subconfluent log-

520 phase growth (~70% confluency) at the time of the experiment. For G1 cell populations, WT

-/- 3 2 521 and TOP2B RPE-1 cells were seeded at a density of 5x10 cells/cm and incubated for 48

522 h at 37 ˚C in DMEM/F12 containing 10% FCS, prior to a further 24 h incubation in bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

523 DMEM/F12 medium containing no FCS to ensure complete G1-phase arrest (verified by

524 FACs, see below). For experiments with proteasome inhibitor and etoposide (CC-seq, Slot

525 Blot), cells were preincubated with 5 µM MG132 (Sigma) for 90 min, trypsinised and

526 incubated in suspension with 5 µM MG132 and 100 µM etoposide (Sigma) for 20 min at 37

527 ˚C. For experiments with etoposide alone (IF), adherent cells were treated with 100 µM

528 etoposide for 20 min at 37 ˚C.

529

-/- 530 Generation of TOP2B RPE-1 cells

531 The oligonucleotides 5’-CACCGCCGCAGCCACCCGACT and 5’-

532 AAACAGTCGGGTGGCTGCGGC (identified using Benchling; https://benchling.com) were

533 annealed and cloned into pX330 following BbsI restriction, as described previously (Hsu et

534 al., 2013). This SpCas9/ trugRNA co-expression plasmid was transiently expressed in RPE-

535 1 cells to target the 17 bp target sequence GCCGCAGCCACCCGACT (TGG) within exon 1

536 of TOP2B. Single clones were trypsinised and passaged to isolated culture vessels prior to

-/- 537 screening for absent protein by Western Blot (WB). All experiments involving TOP2B were

538 conducted using the RPE-1 clone T2B/6, in which no TOP2β is detectable by WB or IF.

539

540 Spot tests of chronic etoposide sensitivity

541 Single colonies of each S. cerevisiae strain were incubated overnight in 4 ml YPD at 30 ˚C

542 with shaking. 0.1 ml of this starter was used to inoculate 4 ml YPD, prior to incubation for 5

543 hours at 30 ˚C. Cultures were diluted to make a stock with an OD600 of 2.0, then this was

544 10-fold serially diluted five times. Each dilution was spotted onto plates containing 0, 0.1,

545 0.3 or 1.0 mM VP16, prior to incubation for three days at 30 ˚C. Plates were imaged on a

546 2400 Photo scanner (Epson).

547

548 Southern blotting of meiotic Spo11 DSBs

549 Approximately 2 μg of genomic DNA (isolated by non-proteolysing Phenol-Chloroform

550 extraction, as described below) was digested at 37 °C overnight using PstI restriction

551 enzyme (NEB) in NEBuffer 3.1 (100 mM NaCl, 50 mM Tris Base·HCl pH 7.9, 10 mM MgCl2,

552 100 μg ml-1 BSA). Additional PstI was added for 4 hours before the addition of NEB purple

553 loading dye to 1×. Digested samples were proteolysed using 1 mg ml-1 Proteinase K

554 (Sigma) at 60 ˚C for 30 minutes, left to reach room temperature before 10 μg was loaded on

555 a 0.7% 1× TAE agarose gel (40 mM Tris Base·HCl, 20 mM glacial acetic acid, 1 mM EDTA

556 pH 8.0) containing 50 μg ml-1 ethidium bromide. DNA was separated in 1× TAE at 60 V for

557 18 hours. The gel was imaged using InGenius (Syngene) bioimaging system to check bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

2 558 migration and then exposed to 180 mJ/m UV in the Stratalinker (Stratagene). The gel was

559 then soaked in three times its volume of denaturation solution (0.5 M NaOH, 1.5 M NaCl) for

560 30 minutes and then transferred to Zetaprobe (Bio-Rad) membrane by means of a vacuum

561 at 55 mBar for 2 hours. After transfer the membrane was washed in water ten times and

2 562 then cross-linked by exposing the membrane to 120 mJ/m UV in the Stratalinker. The

563 membrane was incubated in 30 ml of hybridisation buffer (0.5 M NaHPO4 buffer pH 7.5, 7%

564 SDS, 1 mM EDTA, 1% BSA) at 65 ˚C for 1 hour. The MXR2 probe for looking at the

565 HIS4::LEU2 locus was created from 50 ng of template DNA, 0.1 ng of Lambda DNA (NEB)

566 digested with BstEII (NEB), and water. The mix was denatured at 100 °C for 5 minutes then

32 567 put on ice. High Prime (Roche) was added in addition to 0.5-3 mBq of α- P dCTP and

568 incubated at 37 °C for 15 minutes. 30 μl 1× TE was added and the probe spun through a G-

569 50 spin column (GE Healthcare) at 400 × g for 2 minutes. The probe was then denatured by

570 incubating at 100 °C for 5 minutes and then put on ice before being added to 20 ml

571 hybridisation mixture. The original 30 ml hybridisation buffer was discarded and the 20 ml

572 containing the probe was added to the membrane and incubated overnight at 65 ˚C. After

573 incubation, the membrane was washed five times with 100 ml pre-warmed Southern wash

574 buffer (1% SDS, 40 mM NaHPO4 buffer pH 7.5, 1 mM EDTA) and exposed to phosphor

575 screen overnight.

576

577 Western blotting

578 Whole human cell extracts (WCE) were harvested by direct lysis in 1x Laemmli loading

579 buffer, denatured for 10 min at 95 ˚C and sonicated for 30 s using Bioruptor® Pico.

580 Samples were subjected to SDS-PAGE (7% or gradient gel) and transferred to

581 nitrocellulose membrane. Primary immunodetection was with antibodies targeting TOP2β

582 (Clone 40, BD Biosciences), TOP2α (ab52934, Abcam), or Ku80 (ab80592, Abcam).

583 Secondary immunodetection was with HRP-conjugated Rabbit anti-Mouse IgG

584 (ThermoFisher), prior to detection of peroxidase activity using ECL reagent and X-Ray film

585 (Scientific Laboratory Supplies Ltd).

586

587 Slot blotting

588 Samples were diluted 4-fold (500 uL total volume) in NaPO4 buffer (25 mM, pH 6.5), and slot

589 blotted onto 0.2 µM nitrocellulose membrane (Amersham), using the Minifold I (Whatman)

590 manifold. The wells were washed twice with 750 µL NaPO4 buffer. The membrane was then

591 blocked with 10% milk-TBST for 1 h at RT, prior to incubation overnight with anti-TOP2β

592 antibody (Clone 40, BD Biosciences) at 4 ˚C. The membrane was washed 4 times with bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

593 TBST, incubated with HRP-conjugated Rabbit anti-Mouse IgG (ThermoFisher) for 1 h at RT,

594 washed 4 times with TBST, and incubated with ECL detection reagent for 1 min. X-Ray film

595 was used for detection.

596

597 Fluorescence-Assisted Cell Sorting (FACS)

598 Approximately 10 million RPE-1 cells were trypsinised, washed once in PBS and

599 resuspended in 1.5 ml PBS. 3.5 ml ethanol was added dropwise to pellet, with vortexing.

600 Cells were fixed for 1 hr at 4 ˚C, prior to centrifugation and aspiration of the supernatant.

601 Cells were washed twice with PBS, prior to resuspension in 0.5 ml 0.25% Triton-X100-PBS

602 for 15 min on ice. Cells were pelleted by centrifugation, supernatant was aspirated, and the

603 pellet was resuspended in 0.5 ml TBS containing 10 ug/ml RNase A (Sigma) and 167 nM

604 Sytox Green (ThermoFisher). After 30 min incubation in the dark at RT, the suspension was

605 filtered through fine mesh into test tubes. DNA content in 50,000 cells was analysed using

606 the Accuri C6 (BD Biosciences), with gating to exclude doublets and cell debris.

607

608 Immunofluorescence

609 Cells were seeded onto glass coverslips (Agar Scientific), incubated and treated according

610 to the protocol outlined above. Cells were then fixed with 4% paraformaldehyde PBS for 10

611 min, washed three times with PBS, permeabilised with 0.2% Triton-X100-PBS for 10 min,

612 blocked for 1 h with 10% FCS-PBS, incubated with primary antibodies targeting Phospho-

613 Histone H2AX (S139) (JBW-301, Merck Millipore) and TOP2α (ab52934, Abcam), washed

614 three times with PBS, incubated with secondary antibodies Alexa 488-conjugated Goat

615 anti-mouse IgG (Fisher) and Alexa 647-conjugated Goat anti-Rabbit (Fisher), washed three

616 times with PBS, washed once with distilled water, and mounted with VECTASHIELD

617 containing DAPI (Vector Laboratories).

618

619 High-content microscopy

620 Automated wide-field microscopy was performed on an Olympus ScanR system (motorised

621 IX83 microscope) with ScanR Image Acquisition and Analysis Software, 40x/0.6

622 (LUCPLFLN 40x PH) dry objectives and Hamamatsu ORCA-R2 digital CCD camera

623 C10600. Numbers of anti-phospho-Histone H2AX (S139) foci (Alexa 488; FITC filter) were

624 quantified in the nuclear region colocalising with DAPI, using Olympus ScanR Analysis

625 software. TOP2α signal (Alexa 647; Cy5 filter) was also quantified in this region.

626 CC-seq: Enrichment of protein-linked DNA bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

7 627 1.5x10 RPE-1 cells were treated as described above, pelleted by centrifugation, washed

628 once with 15 ml ice-cold PBS, and resuspended in three aliquots of 400 µL ice-cold PBS.

9 629 1x10 yeast cells were treated as described above, then spheroplasted in 1.5 ml

630 spheroplasting buffer (1 M sorbitol, 50 mM NaHPO4 buffer pH 7.2, 10 mM EDTA) containing

631 200 μg/ml Zymolyase 100T (AMS Biotech) and 1% β-mercaptoethanol (Sigma) for 20 min at

632 37 ˚C. 2 µL Protease Inhibitor Cocktail and 2 µL Pefabloc (Sigma) were added before

633 splitting into five aliquots of 400 µL. All subsequent steps of the protocol are the same for

634 yeast and human samples. 1 ml ice-cold ethanol was added to 400 µL cell suspensions in

635 microcentrifuge tubes, mixed, incubated for 10 min on ice, and pelleted by centrifugation.

636 Supernatant was thoroughly removed by aspiration, prior to addition of 200 µL 1x STE

637 buffer (2% SDS, 0.5M Tris pH 8.1, 10 mM EDTA, 0.05% bromophenol blue), cell disruption

638 using a pestle (VWR), addition of a further 400 µL 1x STE buffer, and incubation for 10 min

639 at 65 ˚C. Samples were cooled on ice and 500 µL Phenol-Chloroform-isoamyl alcohol

640 (25:24:1; Sigma) added. Mixtures were emulsified by shaking and pipetting 5 times with a 1

641 ml micropipette, prior to phase separation by centrifugation at 20,000 g for 20 min. By

642 minimising mechanical shearing of the lysate prior to phenol chloroform extraction,

643 peptides that are covalently linked to high molecular weight DNA partition into the aqueous

644 phase. 500 µL of the aqueous phase was removed to a clean microcentrifuge tube and

645 nucleic acids precipitated with 1 ml ice-cold ethanol, pelleted by centrifugation, washed

646 with ice-cold 70% ethanol, and dissolved in TE buffer overnight at 4 ˚C. Samples were then

647 incubated with 0.2 mg/ml RNase A (Sigma) for 1 h at 37 ˚C; nucleic acids were precipitated

648 with 1 ml ethanol, pelleted by centrifugation, washed twice with 70% ethanol and dissolved

649 in TE overnight. Aliquots were combined to a total of 1 ml and sonicated to an average

650 fragment size of 300-400 bp with Covaris (duty cycle: 10%, intensity/peak power incidence:

651 75W, cycles/burst: 200, time: 15 min). 1 ml of sonicated sample was added to 1.2 ml of

652 binding buffer (10 mM Tris pH 8.1, 10 mM EDTA, 0.66 M NaCl, 0.22% SDS, 0.44% N-

653 Lauroylsarcosine sodium salt). Each sample was divided over several Miniprep (QIAGEN)

654 silica-fibre membrane spin columns, such that the total DNA loaded to each was

655 approximately 20 µg. The flowthrough was reapplied to the column to improve binding.

656 Columns were washed 6 times with 600 µL of TEN (10 mM Tris, 1 mM EDTA, 0.3 M NaCl)

657 per 1 min wash, prior to elution with 100 µL TES (10 mM Tris, 1 mM EDTA, 0.5% SDS).

658

659 CC-seq: DNA end repair and adapter ligation

660 Eluted products were pooled to 500 µL in TES and incubated with 1 mg/ml Proteinase K

661 (Sigma) for 30 min at 60 ˚C, prior to overnight ethanol precipitation at -80 ˚C with 1.41 ml bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

662 ethanol, 0.2 mg/ml glycogen and 200 mM NaOAc. The DNA-glycogen precipitate was

663 pelleted by centrifugation at 20,000 g for 1 hr at 4 ˚C, washed once with 1.5 ml 70%

664 ethanol, and re-pelleted by centrifugation. The supernatant was aspirated and the pellet

665 was air-dried for 10 min at RT, prior to solubilisation in 52 µL 10 mM Tris-HCl. DNA

666 concentration was measured in a 2 µL sample with the Qubit (ThermoFisher) and High

667 Sensitivity reagents. The remaining 50 µL was used as input for one round of end repair

668 and adapter ligation with NEBNext Ultra II DNA Library Preparation kit (NEB), according to

669 manufacturer’s instructions, except for the use of a custom P7 adapter (see table). The use

670 of custom adapters is to allow differentiation of the sheared end (P7 adapter) from the

671 Top2/Spo11 end (P5 adapter). After ligation of the P7 adapter, DNA was isolated with

672 AMPure XP beads (Beckman Coulter) according to manufacturer’s instructions (beads:input

673 of 78:90) and eluted in 50 µL 10 mM Tris-HCl. Samples were diluted 2-fold with 50 µL Tdp2

674 reaction buffer (100 mM TrisOAc, 100 mM NaOAc, 2 mM MgOAc, 2 mM DTT, 200 µg/ml

675 BSA) and incubated with 3 µL of 10 µM recombinant human TDP2 (Hornyak et al., 2016;

676 Johnson et al., 2019), for 1 h at 37 ˚C. DNA was isolated again with AMPure XP beads

677 (beads:input of 103:103) and eluted in 52 µL 10 mM Tris-HCl. Next, a second round of end

678 repair and adapter ligation was conducted, using adapter P5 (see table). After ligation of

679 the P5 adapter to the Top2/Spo11-cleaved end, DNA was isolated with AMPure XP beads

680 (beads:input of 78:90) and eluted in 17 µL 10 mM Tris-HCl.

681

682 CC-seq: PCR and size selection

683 DNA concentration was measured in 2 µL using the Qubit. The remaining 15 µL was used

684 as template for the PCR step of the NEBNext Ultra II PCR step using universal primer (P5

685 end) and indexed primers for multiplexing (P7 end), according to manufacturer's

686 instructions. PCR reactions were diluted with 50 µL 10 mM Tris-HCl, DNA was isolated with

687 AMPure XP beads (beads:input of 84:100) and eluted in 30 µL 1 mM Tris-HCl pH 8.1.

688 Samples were then subjected to 200-600 bp size selection using the BluePippin (Sage

689 Science), prior to quantification of molarity using the Bioanalyzer (Agilent).

690

691 CC-seq: NGS and data pipeline

692 Multiplexed library pools were sequenced on the Illumina MiSeq (Kit v3 - 150 cycles) or

693 Illumina NextSeq 500 (Kit v2 - 75 cycles), with paired-end read lengths of 75 or 42 bp,

694 respectively. Paired end reads that passed filter were aligned using bowtie2 (options: -X

695 1000 --no-discordant --very-sensitive --mp 5,1 --np 0), using MAPQ0 settings for yeast or

696 MAPQ10 settings for human experiments, then SAM files processed by the custom-built bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

697 Perl program termMapper that computes the coordinates of the protein-linked 5’-terminal

698 nucleotide. The reference genomes used in this study are hg19 (human), and Cer3H4L2 (S.

699 cerevisiae), which we generated by inclusion of the his4::LEU2 and leu2::hisG loci into the

700 Cer3 yeast genome build. Yeast data sets were filtered to exclude long terminal repeats,

701 retrotransposons, telomeres, and the rDNA. Human datasets were filtered to remove known

702 ultra-high signal regions (Hoffman et al., 2013; ENCODE, 2012) and repeat regions

703 (ENCODE, 2012). All subsequent analyses were performed in R (Version 3.4.3) using

704 RStudio (Version 1.1.383), unless indicated otherwise.

705

706 CC-seq: Calibrated library generation

707 Calibration of Top2 CC-seq experiments was conducted to allow comparison of relative

708 peak intensities in different yeast strains. This was achieved by spike-in of human DNA

709 following the sonication stage of the protocol, which is a method that has been used to

710 calibrate other sequencing methods (Hu et al., 2015; Grzybowski et al., 2015). S. cerevisiae

711 and RPE-1 cells were exposed to etoposide and processed until just after sonication,

712 exactly as according to the cell treatment and CC-seq protocols above. DNA concentration

713 was quantified by Qubit, and then mixed at a molar ratio of human DNA:yeast DNA of

714 1:100. All subsequent stages of the protocol were identical, except read alignment, for

715 which we used both hg19 and Cer3H4L2 builds successively. Cer3H4L2-aligned peak

716 heights were corrected in each sample by multiplying by the reciprocal fraction of human

717 reads in that sample.

718

719 CC-seq: Fine-scale mapping

720 Fine-scale (nucleotide resolution) maps of Spo11/Top2 CCs were produced as simple

721 histograms over a specified region (Figures 1B, 2B, 3D, 4A, 5A, S1A, S3A, S5A and S5B).

722 Dark red and blue line heights indicate numbers of 5’-terminal nucleotides detected at that

723 position on the Watson and Crick strands, in units of HpM. Where indicated in the figure

724 caption, smoothed data are also plotted as pale red and blue polygons, in addition to the

725 unsmoothed nucleotide resolution data. This smoothing was either applied using a sliding

726 Hann window of the indicated width, or using a custom smoothing function (VarX), as

727 indicated in the figure caption.

728

729 CC-seq: Broad-scale mapping

730 Broad-scale maps of Top2 CCs were produced by binning nucleotide resolution data at 10

731 Kbp (Figure 3B) or 100 Kbp (Figure S4D) resolution. Binned data were either plotted bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

732 directly (Figure S4D); or first scaled according to the estimated noise fraction (see below),

733 smoothed with a 10-bin Hanning window, and median subtracted (Figure 3B).

734

735 CC-seq: Estimation of the noise fraction in Human datasets

736 The noise fraction in each sample was estimated using an adaptation of the previously

737 published NCIS method (Liang and Keleş, 2012). Briefly: the data for -VP16 and +VP16

738 samples were first binned at 10 Kbp resolution. Then the subpopulation of bins with the

739 lowest TOP2 CC signal was identified in each sample. The average signal density of this

740 subpopulation of bins, in each sample, was defined as the noise density (d-VP16 and d+VP16).

741 Signal in the -VP16 sample was scaled by a normalisation factor equal to:

$%&'() 742 ! = $*&'()

743

744 CC-seq: Quantification of Spo11 and Top2 activity in defined genomic regions

745 Yeast Spo11 hotspots were the same as defined previously (Pan et al., 2011). Yeast

746 intra/intergenomic regions were defined based on TSS and TTS coordinates reported on

747 the Saccharomyces Genome Database (https://www.yeastgenome.org). Human chromatin

748 compartments A and B were defined based on eigenvector analysis of previously published

749 100 kb resolution RPE-1 Hi-C data (Darrow et al., 2016) using the Juicer package

750 (Lieberman-Aiden et al., 2009; Durand et al., 2016). Human “CTCF” and “TSS” regions were

751 defined as 10 kb regions centred on the top quartile of expressed TSSs or the midpoint of

752 top quartile CTCF-bound CTCF-binding motifs, with overlaps merged using

753 GenomicFeatures::reduce. The “Both” region was defined as the intersection of the “TSS”

754 and “CTCF” regions using GenomicFeatures::intersect. This region was then excluded from

755 the individual “TSS” and “CTCF” regions. The “Neither” region was defined as regions of

756 hg19 which overlapped neither “TSS”, nor “CTCF”, nor “Both”, using

757 GenomicFeatures::gaps. Nucleotide resolution data was tallied within defined regions, and

758 is expressed as an aggregate (bar plots) in which the signal is corrected using the average

759 signal in the “Neither” compartment, and/or as the distribution of region signal densities

760 (box-and-whisker plot), as indicated in figure captions.

761

762 CC-seq: Correlation of Watson/Crick cleavage positions

763 Nucleotide resolution human TOP2 data was thresholded at 0.01 HpM. Nucleotide

764 resolution yeast Top2 and Spo11 data were not thresholded. Peak coordinates on the Crick bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

765 strand were offset over the range -100 to +100, relative to Watson coordinates. After each

766 offset, the data was filtered to include only sites with both Watson and Crick hits. The

767 Pearson correlation (r) between the Watson and Crick signal intensities in these n sites was

768 calculated. We also counted the fraction of reads found within these n sites, and

769 normalised this number over the -100 to +100 bp range. Data are expressed as R values

770 and/or normHpM values, as indicated in the figure captions.

771

772 CC-seq: Aggregation of CC-seq data around loci of interest

773 Nucleotide resolution CC-seq data (line-plots) or binned CC-seq data (heatmaps) were

774 aggregated within regions of specified size, centred on the loci of interest. The resulting

775 sum total HpM was divided by numbers of loci to give a mean HpM per locus.

776

777 CC-seq: Loci of interest used in this study

778 Yeast TSS and TTS coordinates were obtained from the Saccharomyces Genome

779 Database https://www.yeastgenome.org. These were stratified based on associated gene

780 length, expression level in vegetative SK1 (GSM907178, GSM907179, GSM907180), or

781 expression level in meiotic SK1 (GSM907176, GSM907177; (Dominissini et al., 2012)).

782 Human TSS and TTS coordinates were obtained from the UCSC hg19 knownGene

783 annotation, and stratified based on associated gene length, or expression level

784 (GSM1395252, GSM1395253, GSM1395254; (Ganem et al., 2014)). Occupied RPE-1 CTCF

785 motifs were identified as follows: The FIMO tool (Grant et al., 2011) and the CTCF Position

786 Weight Matrix (PWM) from the JASPAR database (Mathelier et al., 2014) were used to find

-4 787 all significant hg19 CTCF motifs (p < 1x10 ). These were filtered to include only those

788 which overlapped positions of RPE-1 CTCF ChIP-seq peaks (GSM749673, GSM1022665;

789 (ENCODE, 2012)), and stratified based on this ChIP-seq data. RPE-1 loop anchor-

790 associated CTCF motifs were identified using the Juicer MotifFinder (Durand et al., 2016)

791 with the RPE-1 WT Hi-C looplist (GSE71831(Darrow et al., 2016)) and CTCF ChIP-seq BED

792 file (GSM749673, GSM1022665; (ENCODE, 2012)) as input. Human TOP2 CC-seq peak

793 coordinates were identified by thresholding (0.05 HpM) of pooled +VP16 data presented.

794

795 CC-seq: Spatial correlation of yeast Top2 and Spo11 signals

796 Nucleotide resolution Top2/Spo11 CC-seq data was binned at 50 bp resolution and the

797 Pearson correlation was calculated between the sum HpM for all bins at a given inter-bin

798 distance.

799 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

800 CC-seq: Quantifying single and double-strand Top2 CCs

801 Nucleotide resolution Human and Yeast Top2 CC-seq data were first thresholded at 0.05

802 and 1 HpM, respectively. SSB sites were defined as those sites without a cognate on the

803 opposite strand at the expected offset of 3 bp. DSBs were defined as cognate sites with a

804 3 bp offset. SSB and DSB percentages are expressed as a percentage of the total number

805 of sites (SSB + DSB). A randomisation experiment was conducted in order to estimate

806 expected numbers of SSBs and DSBs within the sample under a random model where

807 signal is distributed independently on each strand. To achieve this, the amplitudes of Top2

808 CCs (HpM) were shuffled amongst the positions in the nucleotide resolution datasets, prior

809 to thresholding and offset analysis as described above.

810

811 CC-seq: DNA sequence composition of single and double-strand Top2 CCs

812 Nucleotide resolution Human and Yeast Top2 CC-seq data were first thresholded at 0.05

813 and 1 HpM, respectively. The high frequency positions remaining were classified as part of

814 a single or double-stranded Top2 CC based on absence or presence of a cognate cleavage

815 at the expected W-C offset of 3 bp. For double-stranded Top2 CCs, a dyad axis coordinate

816 was defined as the centrepoint between the Top2-linked nucleotides on the W and C strand

817 (that is, the midpoint of the central two base pairs in the four overhang). For

818 single-stranded Top2 CCs, we used an imaginary dyad axis in the same relative position.

819 DNA sequence was aggregated ±20 bp around these two classes of Top2 CCs. Statistical

820 significance was determined using the one sample (goodness-of-fit) Chi-squared test, as

821 described previously (Pommier et al., 1991).

822

823 CC-seq - MethylC Analysis

824 A publically-available RPE-1 reduced representation bisulphite sequencing (RRBS) dataset

825 (two replicates) was used to define coordinates of nucleotides immediately 3’-relative to

826 919,817 unmethylated or 341,136 methylated Cs, based on <10% or >90% methylation.

827 Next we aggregated RPE-1 CC-seq (0.01 HpM thresholded) hits on these loci, and divided

828 by the number of loci to give a density (barplot). Statistical significance was determined by

829 the Kolmogorov-Smirnov test.

830

831 Ideograms

832 Human chromosome ideograms were adapted from the open source ideogram package

833 (https://eweitz.github.io/ideogram/).

834 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

835 Figure Legends 836 837 Figure 1. CC-seq maps covalent Spo11-linked DNA breaks in S. cerevisiae meiosis with 838 nucleotide accuracy 839 A) Schematic of the CC-seq method. 840 B) Column-based enrichment of Spo11-linked DNA fragments detected by Southern blotting at the 841 his4::LEU2 recombination hotspot (Methods). Arrows indicate expected sizes of Spo11-DSBs. 842 C) Nucleotide resolution mapping of S. cerevisiae Spo11 hotspots by CC-seq or oligo-seq (Pan et 843 al., 2011). Red and blue traces indicate Spo11-linked 5’ DNA termini on the Watson and Crick 844 strands, respectively. Grey arrows indicate positions of gene open reading frames. 845 D) Pearson correlation (r) of Spo11 CC-seq signal between Watson and Crick strands, offset by the 846 indicated distances. 847 E) Average nucleotide composition over a 200 (top) and 30 bp (bottom) window centred on Spo11 848 breaks. Bases reported are for the top strand only. 849 HpM = Hits per million mapped reads per base pair. 850 851 Figure 2. CC-seq maps covalent Top2-linked DNA breaks in S. cerevisiae cycling cells with 852 nucleotide accuracy 853 A) Serial dilution spot tests of VP16 tolerance for the indicated strains. 854 B) Nucleotide resolution mapping of S. cerevisiae Top2 CCs by CC-seq of the indicated strains after 855 treatment for 4 hours with 1 mM VP16. Spo11-CC data is plotted for comparison. Top2 CC data 856 were calibrated using a human DNA spike-in (Methods). Red and blue traces indicate CC-linked 5’ 857 DNA termini on the Watson and Crick strands, respectively. Grey arrows indicate positions of gene 858 open reading frames. Lower panels show an expanded view of the region from 31.5 to 40 Kbp. 859 C) Quantification of Top2 and Spo11 CC signal stratified by genomic region. The genome was 860 divided into intra and intergenic regions; the intergenic region was further divided into divergent, 861 tandem and convergent based on orientation of flanking genes. Spo11 and Top2 activity mapped by 862 CC-seq is expressed as box-and-whisker plots of density (upper and lower box limit: 3rd and 1st 863 quartile; bar: median; upper and lower whisker: highest and lowest values within 1.5-fold of the 864 interquartile range), or as the percentage of total mapped reads. 865 D) Local correlation of Top2 or Spo11 CC-seq signals. Top2 or Spo11 CC-seq data were binned at 866 50 bp resolution and the Pearson correlation calculated between bins of increasing separation. 867 E) Pearson correlation (r) of Top2 CC-seq signal on Watson and Crick strands, offset by the 868 indicated distance. 869 HpM = Hits per million mapped reads per base pair. 870 871 Figure 3. CC-seq maps TOP2-linked DNA breaks in Human cells with nucleotide accuracy 872 A) Anti-TOP2β western slot blot of input, flow through, wash, and elution fractions. RPE-1 cells were 873 treated or not with VP16, prior to processing according to Figure 1A. 874 B) Broad-scale maps of H. sapiens TOP2 CCs produced by CC-seq in RPE-1 cells ±VP16. Raw data 875 were scaled, binned, smoothed and median subtracted prior to plotting (Methods). Chromatin 876 compartments revealed by Hi-C eigenvector analysis (Darrow et al., 2016), and supercoiling revealed 877 by bTMP ChIP-seq (Naughton et al., 2013) are shown for comparison. 878 C) Quantification of TOP2 CC in chromatin compartments A and B. Data are expressed as box-and- 879 whisker plots of density as for Figure 2C. 880 D) Fine-scale mapping of H. sapiens TOP2 CCs by CC-seq in RPE-1 cells ±VP16. Red and blue 881 traces indicate TOP2-linked 5’ DNA termini on the Watson and Crick strands, respectively. Pale 882 shaded areas are the same data smoothed according to local density (see Methods). RPE-1 CTCF 883 and H3K4Me3 ChIP-seq data plus H3K27Ac ChIP-seq data overlaid from seven cell lines (ENCODE, 884 2012) is shown for comparison. bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

885 E) Medium-scale aggregate of END-seq mapped DSBs (Canela et al., 2017) surrounding nucleotide 886 resolution CC-seq mapped TOP2 CCs. 887 F) Fine-scale strand-specific aggregate of END-seq mapped DSB ends (red and blue bars) and 888 nucleotide resolution CC-seq mapped TOP2 CCs (grey bars) surrounding strong TOP2 CC sites. 889 890 Figure 4. CTCF-proximal TOP2 activity is tightly confined within nucleosome-depleted regions 891 A) Aggregation of TOP2 CCs in a 1 Kbp window centred on orientated CTCF motifs in human RPE-1 892 cells. Four example CTCF loci are shown orientated in the 5’-3’ direction (top). A heatmap of all 893 CTCF-motifs in the human genome, with 25 rows stratified by the strength of colocalising CTCF 894 ChIP-seq peaks in RPE-1 cells (middle). The colour scale indicates average TOP2 CC density. Motifs 895 are also stratified into 4 quartiles of CTCF-binding, and the average TOP2 CC distribution in each 896 quartile plotted (bottom). 897 B) Aggregated TOP2 CC distribution (red line) in the highest quartile of CTCF-binding compared with 898 the average MNase-seq signal (blue line). 899 C) Aggregated TOP2 CC distribution (black and red lines: single-nucleotide resolution and 900 smoothed, respectively) and the average MNase-seq signal (blue line) surrounding strong TOP2 CC 901 sites. 902 In (B) and (C) peaks in MNase-seq signal indicate inferred nucleosome positions (blue circles). 903 904 Figure 5. TSS-proximal TOP2 activity is strongly correlated with gene transcription 905 A) Aggregation of TOP2 CCs in a 10 Kbp window centred on orientated TSSs in human RPE-1 cells. 906 Four example TSSs are shown orientated in the 5’-3’ direction (top). A heatmap of all TSSs in the 907 human genome, with 25 rows stratified by gene expression level in RPE-1 cells (middle). The colour 908 scale indicates average TOP2 CC density. Motifs are also stratified into 4 quartiles of gene 909 expression, and the average TOP2 CC distribution in each quartile plotted (bottom). 910 B) Comparison of CTCF-proximal and TSS-proximal TOP2 CC distributions in the highest quartile of 911 CTCF-binding (pink) and gene-expression (blue), respectively. 912 C) Average TOP2 CC density in 10 Kbp regions centred on the highest quartile of TSS, CTCF, or 913 regions where both features are present. Data are expressed as box-and-whisker plots of density as 914 in Figure 2C. Statistical significance was determined using the KS test. 915 D) As in (C) but sum total TOP2 CCs found in these regions. 916 917 Figure 6. Local DNA sequence and methylation status direct the formation of Top2 CCs 918 A) The percentage of strong sites that are SSBs and DSBs in etoposide-treated S. cerevisiae and 919 human cells. 920 B) Average nucleotide composition over a 30 bp window centred on the DSB or (inferred) SSB dyad 921 axis, in etoposide-treated S. cerevisiae and human cells. Values reported are for the top strand only. 922 C) Heatmaps describing pairwise similarity of nucleotide composition patterns shown in (B). Rows 1- 923 4 are the absolute differences between: yeast and human SSB patterns, yeast and human DSB 924 patterns, yeast SSB and DSB patterns, and human SSB and DSB patterns, respectively. 925 D) The average number of TOP2 CCs at the +1 position relative to methylated and unmethylated 926 cytosines in human RPE-1 cells, ±VP16. Statistical significance was determined using KS test. 927 928

929 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

930 Supplementary Figure Legends

931 932 Figure S1. CC-seq maps covalent Spo11-linked DNA breaks in S. cerevisiae meiosis with 933 nucleotide accuracy (related to Figure 1) 934 A) Correlation of hotspot signals from CC-seq and oligo-seq. 935 B) Correlation of 500 bp binned Spo11 maps from two representative replicates. 936 C) Correlation of Spo11 cleavages on Watson and Crick strand when offset by 1 bp (boxed), relative 937 to other offsets from -4 to +4 bp. 938 D) Spo11 breaks mapped by CC-seq (raw=black, or smoothed=red) anticorrelate with nucleosome 939 occupancy measured by MNase-seq (blue). 940 E) Aggregation of Spo11 activity in a 10 Kbp window centred on orientated TSSs in S. cerevisiae. 941 Three example TSSs are shown orientated in the 5’-3’ direction (top). A heatmap of all TSSs in the S. 942 cerevisiae genome, with 25 rows stratified by gene expression level in SK1 cells (middle). The colour 943 scale indicates average Spo11 break density. Motifs are also stratified into 4 quartiles of gene 944 expression in meiotic SK1 cells, and the average distribution of Spo11 activity in each quartile 945 plotted (bottom). 946 947 Figure S2. CC-seq maps covalent Top2-linked DNA breaks in S. cerevisiae cycling cells with 948 nucleotide accuracy (related to Figure 2). 949 A-C) Correlation of 1 Kbp binned Top2 CC maps from pdr1∆ (A), pdr1∆sae2∆ (B) and pdr1∆mre11∆ 950 (C) cells treated with VP16. 951 D) Pairwise Pearson correlation values for 100 bp binned Top2 CC maps from all assayed 952 conditions. Each condition is a pool of two biological replicates. 953 E) Nucleotide-resolution S. cerevisiae Top2 CC map of chromosome 3 for all assayed conditions. 954 Each condition is a pool of two biological replicates. 955 F-H) The normalised number of Top2 CCs retained in the pdr1∆ (F), pdr1∆sae2∆ (G) and 956 pdr1∆mre11∆ (H) cells treated ±VP16, after filtering to include only sites offset by the given number 957 of base pairs. All data were normalised over a -100 to +100 bp window. 958 I-K) Pearson correlation (r) of Top2 CC-seq signal on Watson and Crick strands, offset by the 959 indicated distance in pdr1∆ (I), pdr1∆sae2∆ (J) and pdr1∆mre11∆ (K) cells treated ±VP16. 960 961 Figure S3. Top2-linked DNA breaks in S. cerevisiae are not correlated with gene expression, 962 anticorrelate with nucleosomes, and have biased nucleotide skews indicative of bent DNA 963 (related to Figure 2) 964 A) Top2 CC mapped by CC-seq (raw=black, or smoothed=red) anticorrelate with nucleosome 965 occupancy measured by MNase-seq (blue). 966 B) Aggregation of Top2 CCs in a 10 Kbp window centred on orientated transcription start sites (TSS) 967 in pdr1∆ S. cerevisiae. Four example TSSs are shown orientated in the 5’-3’ direction (top). Heatmap 968 of all TSSs in the S. cerevisiae genome, with 25 rows stratified by gene expression level in vegetative 969 growth (middle). Colour scale indicates average Top2 CC density. Motifs are also stratified into four 970 quartiles of gene expression, and the average distribution of Top2 CCs in each quartile plotted for 971 untreated and +VP16 conditions (bottom). 972 C) The percentage of strong sites that are SSBs and DSBs in etoposide-treated pdr1∆ and 973 pdr1∆mre11∆ S. cerevisiae. Sites were thresholded at 1 HpM prior to sorting into DSB or SSB 974 classes based on presence or absence of a 3 bp offset cognate (Obs.). As a control, the amplitudes 975 of Top2 CCs (HpM) were randomised amongst the positions in the nucleotide resolution datasets, 976 prior to thresholding and offset analysis as described above (Rand.) 977 D) Average nucleotide composition over a 200 bp window centred on the DSB dyad axis, in 978 etoposide-treated S. cerevisiae. The position of the flanking regions (F.R.) identified in Gyrase bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

979 mapping experiments, and the flanking regions observed here for Top2 are indicated in blue and 980 black, respectively. 981 D) Average nucleotide composition over a 120 bp window centred on the DSB dyad axis, in 982 etoposide-treated S. cerevisiae. The signal pattern is separated into A+T and G+C (top left and right), 983 and into individual A, T, C and G plots (middle left, bottom left, middle right, and bottom right, 984 respectively). The position of the flanking regions (F.R.) observed here for Top2 are indicated in 985 black, and their ~10.5 bp periodicity is highlighted with black and red crosses. 986 987 Figure S4. CC-seq maps of TOP2-linked DNA breaks in Human cells are enriched by 988 etoposide, and show high reproducibility and nucleotide accuracy (related to Figure 3) 989 A) Broad-scale H. sapiens TOP2 CC-seq maps in individual biological replicates of RPE-1 cells 990 ±VP16. Raw data were binned at 100 Kbp prior to plotting. Each plot is offset on the y-axis by +0.3 991 HpB. 992 B) Replicate-to-replicate Pearson correlation values (r) for 10 Kbp binned TOP2 CC-seq maps of 993 RPE-1 cells ±VP16. 994 C) Scatter plot of -VP16 and +VP16 TOP2 CC-seq maps binned at 10 Kbp resolution. Data were first 995 scaled according to the estimated noise fraction (Methods), and are presented in a hexagonal- 996 binned format, where the density of overplotting is indicated by the colour scale. 997 D) Violin plots of TOP2 CC-seq maps ±VP16 binned at 100 Kbp resolution. The inner black bar, 998 black dot, and dotted horizontal line indicate the interquartile range, median, and expected mean 999 Top2 CC density based on random distribution. 1000 E) The normalised number of TOP2 CCs retained in the CC-seq maps in RPE-1 cells ±VP16 after 1001 filtering to include only sites offset by the given number of base pairs. Data were normalised over a - 1002 100 to +100 bp window. 1003 1004 Figure S5. CC-seq signal is TOP2-dependent 1005 A) DNA content histograms of wild type (WT) and TOP2B-/- RPE-1 cells under asynchronous (10% 1006 FCS) and serum-deprived (0% FCS) conditions, as measured by FACS following propidium iodide 1007 (PI) staining. G1, S and G2 populations are clearly present under asynchronous growing conditions. 1008 A strong G1 arrest is observed in serum deprived conditions. Percentages of cells in each of the 1009 indicated regions (red dotted brackets) are given. 1010 B) Western blots demonstrating the absence of TOP2β protein in serum deprived and asynchronous 1011 TOP2B-/- RPE-1 cells (left), and the absence of TOP2α in serum deprived wild type and TOP2B-/- 1012 RPE-1 cells (right). Ponceau S total protein loading is presented (Pon. S) for the left and right panels, 1013 and additionally a Ku80 loading control is included for the right panel. 1014 C) Immunofluorescence experiment demonstrating induction of γ-H2AX foci (green) in asynchronous 1015 (Async.) and serum-deprived (Ser. Dep.) wild type (WT) and TOP2B-/- RPE-1 cells, all co-stained with 1016 DAPI (blue). Galleries of nine cells per condition were chosen randomly using Olympus ScanR 1017 Analysis software. 1018 D) Quantification of (C). Numbers of γ-H2AX foci per cell were counted automatically using Olympus 1019 ScanR Analysis software. The mean ±SEM is reported for n= 3 biological replicate experiments. 1020 E) Broad-scale H. sapiens TOP2 CC-seq maps in asynchronous and serum-deprived wild type (WT) 1021 and TOP2B-/- RPE-1 cells -VP16 (orange) and +VP16 (green). Raw hits on Watson and Crick strands 1022 were summed and smoothed according to local signal density (Fsize=501). 1023 1024 Figure S6. TOP2 CC-seq signal enrichment around CTCF and TSS sites, compared with END- 1025 seq DSB signal at TSSs 1026 A) Fine-scale TOP2 CC-seq maps of H. sapiens CTCF-proximal loci in RPE-1 cells +VP16. Red and 1027 blue traces indicate TOP2-linked 5’ DNA termini on the Watson and Crick strands, respectively. Pale 1028 shaded areas are the same data smoothed with a sliding 11 bp Hanning window. Red and Blue bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1029 rectangles indicate the positions of CTCF motifs on the Watson and Crick strands respectively. The 1030 grey line indicates Hanning-smoothed sum of Watson and Crick TOP2 CCs. 1031 B) Fine-scale mapping of TOP2 CCs surrounding three complex CTCF loci, processed as in (A). 1032 C) Aggregation of TOP2 CCs in a 1 Kbp window centred on the subset of orientated CTCF motifs 1033 that can be assigned to a chromatin loop anchor in human RPE-1 cells (Darrow et al., 2016). Motifs 1034 are stratified into 4 quartiles of loop anchor interaction strength, and the average TOP2 CC 1035 distribution in each quartile plotted. 1036 D) Fine-scale aggregation of TOP2 CCs (red) in a 800 bp window centred on TSSs in human RPE-1 1037 cells, showing anticorrelation with aggregated MNase-seq signal (blue). 1038 E) Aggregation of END-seq mapped DSBs in a 10 Kbp window centred on orientated TSSs in human 1039 MCF7 cells. Four example TSSs are shown orientated in the 5’-3’ direction (top). A heatmap of all 1040 TSSs in the human genome, with 25 rows stratified by flanking gene expression level in MCF7 cells 1041 (middle). The colour scale indicates average END-seq DSB density. TSSs are also stratified into 4 1042 quartiles based on strength of flanking gene expression, and the average END-seq DSB distribution 1043 in each quartile plotted (bottom). 1044 1045 Figure S7. TSS-proximal TOP2-linked DNA breaks in humans are correlated with gene length, 1046 independently from gene expression level (related to Figure 5); TSS-proximal Top2-linked DNA 1047 breaks in yeast are not strongly correlated with gene length, (related to Figure 2) 1048 A) Aggregation of TOP2 CCs in a 10 Kbp window centred on orientated TSSs in human RPE-1 cells. 1049 A heatmap of all TSSs in the human genome, with 25 rows stratified by gene length (top). The colour 1050 scale indicates average TOP2 CC density. Motifs are also stratified into 4 quartiles of gene length, 1051 and the average TOP2 CC distribution in each quartile plotted (bottom). 1052 B) Aggregation of Top2 CCs in a 10 Kbp window centred on orientated transcription start sites (TSS) 1053 in pdr1∆ S. cerevisiae. Heatmap of all TSSs in the S. cerevisiae genome, with 25 rows stratified by 1054 gene length level (top). Colour scale indicates average Top2 CC density. Motifs are also stratified 1055 into four quartiles of gene length, and the average distribution of Top2 CCs in each quartile plotted 1056 (bottom). 1057 C) Scatter plot of human gene length and RPE-1 gene expression, showing no correlation. 1058 D) Scatter plot of S. cerevisiae gene length and gene expression during vegetative growth, showing 1059 little correlation. 1060 1061 1062 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1063 Supplementary Tables 1064 1065 1066 Table S1. DNA libraries used for mapping Spo11 activity. The number of reads 1067 remaining after each stage of the data pipeline are indicated. 1068 Species Strain/ Cell Condition NGS Platform Library code Total Read Mapped reads Blacklist- % of Total line Pairs filtered reads 1 3512089 3376630 3177210 90.5

2A 10101423 8796998 8564017 84.8

S. cerevisiae sae2∆ Meiosis MiSeq 3 4957477 4745192 4502308 90.8

5 4012972 3705100 3559541 88.7

7BC 4065600 3779390 3569290 87.8

1069 1070 1071 1072 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1073 Table S2. DNA libraries used for mapping Top2 activity. The number of reads 1074 remaining after each stage of the data pipeline are indicated. 1075 Species Strain/ Cell Condition NGS Platform Library Total Read Mapped reads De-duplicated Blacklist- % of Pooled line code Pairs reads filtered Total Reads reads (Millions) NextSeq 500 RA1 3741251 3542955 3179049 2746294 73.4 S. cerevisiae pdr1∆ -VP16 3.50 MiSeq RA8 1118483 879107 842290 751850 67.2

NextSeq 500 RA2 3096257 3053356 2679303 2234941 72.2 S. cerevisiae pdr1∆ +VP16 3.93 MiSeq RA9 2262958 2071231 1958341 1690075 74.7

NextSeq 500 RA4 3634284 3552818 3215523 2745668 75.5 pdr1∆ S. cerevisiae -VP16 3.79 sae2∆ MiSeq RA10 1420662 1214067 1181399 1046827 73.7

NextSeq 500 RA5 2529368 2476714 1386551 1097282 43.4 pdr1∆ S. cerevisiae +VP16 3.54 sae2∆ MiSeq RA11 3226626 3063388 2824882 2444486 75.8

NextSeq 500 RA6 5345587 5205411 4288662 3646842 68.2 pdr1∆ S. cerevisiae -VP16 4.76 mre11∆ MiSeq RA12 1519020 1326435 1280137 1109310 73.0

NextSeq 500 RA7 3382546 3357168 2921333 2526342 74.7 pdr1∆ S. cerevisiae +VP16 5.54 mre11∆ MiSeq RA13 4133347 4032996 3540366 3010542 72.8

WG11 125203096 109503841 68845619 68639594 54.8

Human RPE-1 WT -VP16 NextSeq 500 WG15 125328792 110432447 46588296 46449378 37.1 185.36

WG23 89154181 77728045 70484665 70269334 78.8

WG12 110910655 95546007 55848200 55681939 50.2

WG16 102804692 91872706 54312084 54164650 52.7 Human RPE-1 WT +VP16 NextSeq 500 276.97 WG24 130816235 114086421 101162102 100857557 77.1

WG34 91829516 79751337 66458538 66270349 72.2

-VP16 WG13 2601642 2292863 2007177 1991088 76.5 Human RPE-1 WT MiSeq 4.27 G1 WG21 2610523 2298289 2295625 2277124 87.2

+VP16 WG14 3075620 2771819 2491674 2475350 80.5 Human RPE-1 WT MiSeq 5.23 G1 WG22 3139164 2779809 2772349 2751281 87.6

-VP16 WG15b* 3207507 2866283 2782876 2760652 86.1 Human RPE-1 WT MiSeq 5.35 Async WG23 2959289 2616954 2608670 2586523 87.4

+VP16 WG16b* 3018163 2721770 2673273 2656580 88.0 Human RPE-1 WT MiSeq 6.30 Async WG24 4165048 3682708 3668602 3639324 87.4

WG17 2978044 2624262 2558332 2536752 85.2 RPE-1 -VP16 Human MiSeq 4.69 -/- TOP2B G1 WG25 2460949 2181002 2173777 2155101 87.6

RPE-1 +VP16 WG18 2927745 2588870 2557532 2537085 86.7 Human MiSeq 5.10 TOP2B-/- G1 WG26 2922624 2596547 2587152 2565454 87.8

RPE-1 -VP16 WG19 3713068 3278164 3188049 3162895 85.2 Human MiSeq 7.04 -/- TOP2B Async WG27 4413242 3943962 3910035 3877611 87.9

RPE-1 +VP16 WG20 3426100 3038601 2978047 2956076 86.3 Human MiSeq 8.13 -/- TOP2B Async WG28 5878342 5251089 5211027 5169010 87.9

1076 1077 1078 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1079 Table S3. S. cerevisiae strains used in this study. Strains MJ315 and M319 were 1080 used for Spo11 mapping; strains MJ429, MJ475 and MJ551 were used for Top2 1081 mapping. 1082 Strain Number Genotype

sae2∆ MJ315 SK1: ho::LYS2/’, lys2/’, ura3/’, arg4-nsp/’, leu2::hisG/’, his4X::LEU2/’, nuc1::LEU2/’, sae2∆::KanMX6/’ sae2∆ spo11-Y135F MJ319 SK1: ho::LYS2/’, lys2/’, ura3/’, arg4-nsp/’, leu2::hisG/’, his4X::LEU2/’, nuc1::LEU2/’, spo11(Y135F)-HA3His6::KanMX4/’, sae2∆::KanMX6/’ pdr1∆ MJ429 BY4741: ura3∆0, leu2∆0, his3∆1, met15∆0, pdr1∆::PDR1-DBD-CYC8::LEU2

pdr1∆ sae2∆ MJ475 BY4741: ura3∆0, leu2∆0, his3∆1, met15∆0, pdr1∆::PDR1-DBD-CYC8::LEU2, sae2∆::KanMX6 pdr1∆ mre11∆ MJ551 BY4741: ura3∆0, leu2∆0, his3∆1, met15∆0, pdr1∆::PDR1-DBD-CYC8::LEU2, mre11∆::KanMX4 1083 1084 1085 1086 Table S4. Human cell lines used in this study. 1087 Parent cell line Clone number Genotype

hTERT RPE-1 WT WT

hTERT RPE-1 T2B/1 TOP2B-/- 1088 1089 1090

1091 Table S5. The annealed structures of the custom P5 and P7 adapters. “Ti” indicates 1092 an inverted dT, linked via a 3’-3’ phosphodiester. This inhibits unwanted ligation at this 1093 end of the adapter. Also note that the 5’-terminal moieties of each adapter (A for P5, G 1094 for P7) are nucleosides (3’-OH), which further inhibits ligation at this end 1095 Adapter Annealed Structure

P5 blockedi ACACTCTTTCCCTACACGACGCTCTTCCGATCT ligated end TiTGTGAGAAAGGGATGTGCTGCGAGAAGGCTAGp end P7 ligated pGATCGGAAGAGCACACGTCTGAACTCCAGTCACTi blocked end TCTAGCCTTCTCGTGTGCAGACTTGAGGTCAGTG end 1096 1097 1098 Table S6. Publically available datasets used in this study. 1099 Species/ Cell line Dataset Type Accession(s) Reference

S. cerevisiae / Gene expression GSM907178, GSM90719, GSM907180 (Dominissini et al., 2012) SK1 (meiotic) microarray S. cerevisiae / Gene expression GSM907176, GSM907177 (Dominissini et al., 2012) SK1 (vegetative) microarray H. sapiens / RPE-1 Gene expression GSM1395252, GSM1395253, GSM1395254 (Ganem et al., 2014) microarray H. Sapiens / MCF7 Gene expression GSM1141244 (Nelson et al., 2013) microarray H. sapiens / RPE-1 Hi-C GSE71831 (Darrow et al., 2016)

H. sapiens / RPE-1 bTMP-seq GSM1062645, GSM1062646 (Naughton et al., 2013) H. sapiens / MCF7 END-seq GSM2635568 (Canela et al., 2017)

H. sapiens / RPE-1 CTCF ChIP-seq GSM749673, GSM1022665 (ENCODE, 2012) H. sapiens / RPE-1 H3K4Me3 ChIP-seq GSM945271, GSM945271 (ENCODE, 2012)

H. sapiens / RPE-1 H3K27Ac ChIP-seq GSM733771, GSM733718, GSM733755, (ENCODE, 2012) GSM733691, GSM733656, GSM733674, GSM733646 H. sapiens / RPE-1 Methyl-RRBS GSM683773, GSM683905 (ENCODE, 2012) H. sapiens / MNase-seq GSM920557 (ENCODE, 2012) K562 bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1100 References

1101 1102 Akimitsu, N., Adachi, N., Hirai, H., Hossain, M. S., Hamamoto, H., Kobayashi, M., Aratani, Y., Koyama, H., and Sekimizu, K. 1103 (2003). Enforced cytokinesis without complete nuclear division in embryonic cells depleting the activity of DNA topoisomerase 1104 IIalpha. Genes Cells 8, 393-402. 1105 Anand, R., Ranjha, L., Cannavo, E., and Cejka, P. (2016). Phosphorylated CtIP Functions as a Co-factor of the MRE11- 1106 RAD50-NBS1 Endonuclease in DNA End Resection. Mol Cell 64, 940-950. 1107 Austin, C. A., Lee, K. C., Swan, R. L., Khazeem, M. M., Manville, C. M., Cridland, P., Treumann, A., Porter, A., Morris, N. J., 1108 and Cowell, I. G. (2018). TOP2B: The First Thirty Years. Int J Mol Sci 19, 1109 Baranello, L., Kouzine, F., Wojtowicz, D., Cui, K., Przytycka, T. M., Zhao, K., and Levens, D. (2014). DNA break mapping 1110 reveals topoisomerase II activity genome-wide. Int J Mol Sci 15, 13111-13122. 1111 Baxter, J., and Diffley, J. F. (2008). Topoisomerase II inactivation prevents the completion of DNA replication in budding yeast. 1112 Mol Cell 30, 790-802. 1113 Bergerat, A., de Massy, B., Gadelle, D., Varoutas, P. C., Nicolas, A., and Forterre, P. (1997). An atypical topoisomerase II from 1114 Archaea with implications for meiotic recombination. Nature 386, 414-417. 1115 Bermejo, R., Capra, T., Gonzalez-Huici, V., Fachinetti, D., Cocito, A., Natoli, G., Katou, Y., Mori, H., Kurokawa, K., Shirahige, 1116 K., and Foiani, M. (2009). Genome-organizing factors Top2 and Hmo1 prevent chromosome fragility at sites of S phase 1117 transcription. Cell 138, 870-884. 1118 Borgnetto, M. E., Zunino, F., Tinelli, S., Kas, E., and Capranico, G. (1996). Drug-specific sites of topoisomerase II DNA 1119 cleavage in Drosophila chromatin: heterogeneous localization and reversibility. Cancer Res 56, 1855-1862. 1120 Brill, S. J., DiNardo, S., Voelkel-Meiman, K., and Sternglanz, R. (1987). Need for DNA topoisomerase activity as a swivel for 1121 DNA replication for transcription of ribosomal RNA. Nature 326, 414-416. 1122 Bromberg, K. D., Burgin, A. B., and Osheroff, N. (2003). A two-drug model for etoposide action against human topoisomerase 1123 IIalpha. J Biol Chem 278, 7406-7412. 1124 Canela, A., Maman, Y., Jung, S., Wong, N., Callen, E., Day, A., Kieffer-Kwon, K.-R., Pekowska, A., Zhang, H., Rao, S. S. P., 1125 Huang, S.-C., Mckinnon, P. J., Aplan, P. D., Pommier, Y., Aiden, E. L., Casellas, R., and Nussenzweig, A. (2017). Genome 1126 Organization Drives Chromosome Fragility. Cell 170, 507-521.e18. 1127 Cannavo, E., Johnson, D., Andres, S. N., Kissling, V. M., Reinert, J. K., Garcia, V., Erie, D. A., Hess, D., Thomä, N. H., Enchev, 1128 R. I., Peter, M., Williams, R. S., Neale, M. J., and Cejka, P. (2018). Regulatory control of DNA end resection by Sae2 1129 phosphorylation. Nat Commun 9, 4016. 1130 Cannavo, E., and Cejka, P. (2014). Sae2 promotes dsDNA endonuclease activity within Mre11-Rad50-Xrs2 to resect DNA 1131 breaks. Nature 514, 122-125. 1132 Capranico, G., Jaxel, C., Roberge, M., Kohn, K. W., and Pommier, Y. (1990). Nucleosome positioning as a critical determinant 1133 for the DNA cleavage sites of mammalian DNA topoisomerase II in reconstituted simian virus 40 chromatin. Nucleic Acids Res 1134 18, 4553-4559. 1135 Chen, G. L., Yang, L., Rowe, T. C., Halligan, B. D., Tewey, K. M., and Liu, L. F. (1984). Nonintercalative antitumor drugs 1136 interfere with the breakage-reunion reaction of mammalian DNA topoisomerase II. J Biol Chem 259, 13560-13566. 1137 Coombs, D. H., and Pearson, G. D. (1978). Filter-binding assay for covalent DNA-protein complexes: adenovirus DNA-terminal 1138 protein complex. Proc Natl Acad Sci U S A 75, 5291-5295. 1139 Cortes Ledesma, F., El Khamisy, S. F., Zuma, M. C., Osborn, K., and Caldecott, K. W. (2009). A human 5’-tyrosyl DNA 1140 phosphodiesterase that repairs topoisomerase-mediated DNA damage. Nature 461, 674-678. 1141 Cowell, I. G., Sondka, Z., Smith, K., Lee, K. C., Manville, C. M., Sidorczuk-Lesthuruge, M., Rance, H. A., Padget, K., Jackson, 1142 G. H., Adachi, N., and Austin, C. A. (2012). Model for MLL translocations in therapy-related leukemia involving topoisomerase 1143 IIβ-mediated DNA strand breaks and gene proximity. Proc Natl Acad Sci U S A 109, 8989-8994. 1144 Crosetto, N., Mitra, A., Silva, M. J., Bienko, M., Dojer, N., Wang, Q., Karaca, E., Chiarle, R., Skrzypczak, M., Ginalski, K., 1145 Pasero, P., Rowicka, M., and Dikic, I. (2013). Nucleotide-resolution DNA double-strand break mapping by next-generation 1146 sequencing. Nat Methods 10, 361-365. 1147 Darrow, E. M., Huntley, M. H., Dudchenko, O., Stamenova, E. K., Durand, N. C., Sun, Z., Huang, S. C., Sanborn, A. L., Machol, 1148 I., Shamim, M., Seberg, A. P., Lander, E. S., Chadwick, B. P., and Aiden, E. L. (2016). Deletion of DXZ4 on the human inactive 1149 X chromosome alters higher-order genome architecture. Proc Natl Acad Sci U S A 113, E4504-12. 1150 Deweese, J. E., and Osheroff, N. (2009). The DNA cleavage reaction of topoisomerase II: wolf in sheep’s clothing. Nucleic 1151 Acids Res 37, 738-748. 1152 DiNardo, S., Voelkel, K., and Sternglanz, R. (1984). DNA topoisomerase II mutant of Saccharomyces cerevisiae: 1153 topoisomerase II is required for segregation of daughter molecules at the termination of DNA replication. Proc Natl Acad Sci U 1154 S A 81, 2616-2620. 1155 Dominissini, D., Moshitch-Moshkovitz, S., Schwartz, S., Salmon-Divon, M., Ungar, L., Osenberg, S., Cesarkas, K., Jacob- 1156 Hirsch, J., Amariglio, N., Kupiec, M., Sorek, R., and Rechavi, G. (2012). Topology of the human and mouse m6A RNA 1157 methylomes revealed by m6A-seq. Nature 485, 201-206. 1158 Durand, N. C., Shamim, M. S., Machol, I., Rao, S. S., Huntley, M. H., Lander, E. S., and Aiden, E. L. (2016). Juicer Provides a 1159 One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst 3, 95-98. 1160 ENCODE, P. C. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57-74. 1161 Ganem, N. J., Cornils, H., Chiu, S. Y., O’Rourke, K. P., Arnaud, J., Yimlamai, D., Théry, M., Camargo, F. D., and Pellman, D. 1162 (2014). Cytokinesis failure triggers hippo tumor suppressor pathway activation. Cell 158, 833-848. bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1163 Garcia, V., Phelps, S. E. L., Gray, S., and Neale, M. J. (2011). Bidirectional resection of DNA double-strand breaks by Mre11 1164 and Exo1. Nature 479, 241-244. 1165 Gómez-Herreros, F., Schuurs-Hoeijmakers, J. H., McCormack, M., Greally, M. T., Rulten, S., Romero-Granados, R., Counihan, 1166 T. J., Chaila, E., Conroy, J., Ennis, S., Delanty, N., Cortés-Ledesma, F., de Brouwer, A. P., Cavalleri, G. L., El-Khamisy, S. F., 1167 de Vries, B. B., and Caldecott, K. W. (2014). TDP2 protects transcription from abortive topoisomerase activity and is required 1168 for normal neural function. Nat Genet 46, 516-521. 1169 Gothe, H. J., Bouwman, B. A. M., Gusmao, E. G., Piccinno, R., Sayols, S., Drechsel, O., Petrosino, G., Minneker, V., Josipovic, 1170 N., Mizi, A., Nielsen, C. F., Wagner, E.-M., Takeda, S., Sasanuma, H., Hudson, D. F., Kindler, T., Baranello, L., Papantonis, A., 1171 Crosetto, N., and Roukos, V. (2018). Spatial chromosome folding and active transcription drive DNA fragility and formation of 1172 oncogenic MLL translocations. bioRxiv http://dx.doi.org/10.1101/485763, 1173 Grant, C. E., Bailey, T. L., and Noble, W. S. (2011). FIMO: scanning for occurrences of a given motif. Bioinformatics 27, 1017- 1174 1018. 1175 Grue, P., Grässer, A., Sehested, M., Jensen, P. B., Uhse, A., Straub, T., Ness, W., and Boege, F. (1998). Essential mitotic 1176 functions of DNA topoisomerase IIalpha are not adopted by topoisomerase IIbeta in human H69 cells. J Biol Chem 273, 33660- 1177 33666. 1178 Grzybowski, A. T., Chen, Z., and Ruthenburg, A. J. (2015). Calibrating ChIP-Seq with Nucleosomal Internal Standards to 1179 Measure Histone Modification Density Genome Wide. Mol Cell 58, 886-899. 1180 Haffner, M. C., Aryee, M. J., Toubaji, A., Esopi, D. M., Albadine, R., Gurel, B., Isaacs, W. B., Bova, G. S., Liu, W., Xu, J., 1181 Meeker, A. K., Netto, G., De Marzo, A. M., Nelson, W. G., and Yegnasubramanian, S. (2010). Androgen-induced TOP2B- 1182 mediated double-strand breaks and prostate cancer gene rearrangements. Nat Genet 42, 668-675. 1183 Hamilton, N. K., and Maizels, N. (2010). MRE11 function in response to topoisomerase poisons is independent of its function in 1184 double-strand break repair in Saccharomyces cerevisiae. PLoS One 5, e15387. 1185 Hartsuiker, E., Neale, M. J., and Carr, A. M. (2009). Distinct requirements for the Rad32(Mre11) nuclease and Ctp1(CtIP) in the 1186 removal of covalently bound topoisomerase I and II from DNA. Mol Cell 33, 117-123. 1187 Heck, M. M., Hittelman, W. N., and Earnshaw, W. C. (1988). Differential expression of DNA topoisomerases I and II during the 1188 eukaryotic cell cycle. Proc Natl Acad Sci U S A 85, 1086-1090. 1189 Hoa, N. N., Shimizu, T., Zhou, Z. W., Wang, Z. Q., Deshpande, R. A., Paull, T. T., Akter, S., Tsuda, M., Furuta, R., Tsusui, K., 1190 Takeda, S., and Sasanuma, H. (2016). Mre11 Is Essential for the Removal of Lethal Topoisomerase 2 Covalent Cleavage 1191 Complexes. Mol Cell 64, 580-592. 1192 Hoffman, M. M., Ernst, J., Wilder, S. P., Kundaje, A., Harris, R. S., Libbrecht, M., Giardine, B., Ellenbogen, P. M., Bilmes, J. A., 1193 Birney, E., Hardison, R. C., Dunham, I., Kellis, M., and Noble, W. S. (2013). Integrative annotation of chromatin elements from 1194 ENCODE data. Nucleic Acids Res 41, 827-841. 1195 Holm, C., Goto, T., Wang, J. C., and Botstein, D. (1985). DNA topoisomerase II is required at the time of mitosis in yeast. Cell 1196 41, 553-563. 1197 Hornyak, P., Askwith, T., Walker, S., Komulainen, E., Paradowski, M., Pennicott, L. E., Bartlett, E. J., Brissett, N. C., Raoof, A., 1198 Watson, M., Jordan, A. M., Ogilvie, D. J., Ward, S. E., Atack, J. R., Pearl, L. H., Caldecott, K. W., and Oliver, A. W. (2016). 1199 Mode of action of DNA-competitive small molecule inhibitors of tyrosyl DNA phosphodiesterase 2. Biochem J 473, 1869-1879. 1200 Hsiang, Y. H., Lihou, M. G., and Liu, L. F. (1989). Arrest of replication forks by drug-stabilized topoisomerase I-DNA cleavable 1201 complexes as a mechanism of cell killing by camptothecin. Cancer Res 49, 5077-5082. 1202 Hsu, P. D., Scott, D. A., Weinstein, J. A., Ran, F. A., Konermann, S., Agarwala, V., Li, Y., Fine, E. J., Wu, X., Shalem, O., 1203 Cradick, T. J., Marraffini, L. A., Bao, G., and Zhang, F. (2013). DNA targeting specificity of RNA-guided Cas9 nucleases. Nat 1204 Biotechnol 31, 827-832. 1205 Hu, B., Petela, N., Kurze, A., Chan, K.-L., Chapard, C., and Nasmyth, K. (2015). Biological chromodynamics: a general method 1206 for measuring protein occupancy across the genome by calibrating ChIP-seq. Nucleic Acids Res 1207 Huang, W. C., Lee, C. Y., and Hsieh, T. S. (2017). Single-molecule Förster resonance energy transfer (FRET) analysis 1208 discloses the dynamics of the DNA-topoisomerase II (Top2) interaction in the presence of TOP2-targeting agents. J Biol Chem 1209 292, 12589-12598. 1210 Johnson, D., Allison, R. M., Cannavo, E., Cejka, P., and Neale, M. (2019). Removal of Spo11 from meiotic DNA breaks in vitro 1211 but not in vivo by Tyrosyl DNA Phosphodiesterase 2. bioRxiv http://dx.doi.org/10.1101/527333, 1212 Käs, E., and Laemmli, U. K. (1992). In vivo topoisomerase II cleavage of the Drosophila histone and satellite III repeats: DNA 1213 sequence and structural characteristics. EMBO J 11, 705-716. 1214 Keeney, S., Giroux, C. N., and Kleckner, N. (1997). Meiosis-specific DNA double-strand breaks are catalyzed by Spo11, a 1215 member of a widely conserved protein family. Cell 88, 375-384. 1216 Keeney, S., and Kleckner, N. (1995). Covalent protein-DNA complexes at the 5’ strand termini of meiosis-specific double-strand 1217 breaks in yeast. Proc Natl Acad Sci U S A 92, 11274-11278. 1218 Khan, Q. A., Kohlhagen, G., Marshall, R., Austin, C. A., Kalena, G. P., Kroth, H., Sayer, J. M., Jerina, D. M., and Pommier, Y. 1219 (2003). Position-specific trapping of topoisomerase II by benzo[a]pyrene diol epoxide adducts: implications for interactions with 1220 intercalating anticancer agents. Proc Natl Acad Sci U S A 100, 12498-12503. 1221 Kingma, P. S., Corbett, A. H., Burcham, P. C., Marnett, L. J., and Osheroff, N. (1995). Abasic sites stimulate double-stranded 1222 DNA cleavage mediated by topoisomerase II. DNA lesions as endogenous topoisomerase II poisons. J Biol Chem 270, 21441- 1223 21444. 1224 Lee, M. P., Sander, M., and Hsieh, T. (1989). Nuclease protection by Drosophila DNA topoisomerase II. Enzyme/DNA contacts 1225 at the strong topoisomerase II cleavage sites. J Biol Chem 264, 21779-21787. 1226 Liang, K., and Keleş, S. (2012). Normalization of ChIP-seq data with control. BMC Bioinformatics 13, 199. 1227 Lieberman-Aiden, E., van Berkum, N. L., Williams, L., Imakaev, M., Ragoczy, T., Telling, A., Amit, I., Lajoie, B. R., Sabo, P. J., 1228 Dorschner, M. O., Sandstrom, R., Bernstein, B., Bender, M. A., Groudine, M., Gnirke, A., Stamatoyannopoulos, J., Mirny, L. A., bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1229 Lander, E. S., and Dekker, J. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the 1230 human genome. Science 326, 289-293. 1231 Lin, C., Yang, L., Tanasa, B., Hutt, K., Ju, B. G., Ohgi, K., Zhang, J., Rose, D. W., Fu, X. D., Glass, C. K., and Rosenfeld, M. G. 1232 (2009). Nuclear receptor-induced chromosomal proximity and DNA breaks underlie specific translocations in cancer. Cell 139, 1233 1069-1083. 1234 Liu, J., Wu, T. C., and Lichten, M. (1995). The location and structure of double-strand DNA breaks induced during yeast 1235 meiosis: evidence for a covalently linked DNA-protein intermediate. EMBO J 14, 4599-4608. 1236 Liu, L. F., Rowe, T. C., Yang, L., Tewey, K. M., and Chen, G. L. (1983). Cleavage of DNA by mammalian DNA topoisomerase 1237 II. J Biol Chem 258, 15365-15370. 1238 Liu, Q., and Wang, J. C. (1999). Similarity in the catalysis of DNA breakage and rejoining by type IA and IIA DNA 1239 topoisomerases. Proc Natl Acad Sci U S A 96, 881-886. 1240 Long, B. H., Musial, S. T., and Brattain, M. G. (1986). DNA breakage in human lung carcinoma cells and nuclei that are 1241 naturally sensitive or resistant to etoposide and teniposide. Cancer Res 46, 3809-3816. 1242 Lotito, L., Russo, A., Chillemi, G., Bueno, S., Cavalieri, D., and Capranico, G. (2008). Global transcription regulation by DNA 1243 topoisomerase I in exponentially growing Saccharomyces cerevisiae cells: activation of telomere-proximal genes by TOP1 1244 deletion. J Mol Biol 377, 311-322. 1245 Mathelier, A., Zhao, X., Zhang, A. W., Parcy, F., Worsley-Hunt, R., Arenillas, D. J., Buchman, S., Chen, C. Y., Chou, A., 1246 Ienasescu, H., Lim, J., Shyr, C., Tan, G., Zhou, M., Lenhard, B., Sandelin, A., and Wasserman, W. W. (2014). JASPAR 2014: 1247 an extensively expanded and updated open-access database of transcription factor binding profiles. Nucleic Acids Res 42, 1248 D142-7. 1249 Minocha, A., and Long, B. H. (1984). Inhibition of the DNA catenation activity of type II topoisomerase by VP16-213 and VM26. 1250 Biochem Biophys Res Commun 122, 165-170. 1251 Nakamura, K., Kogame, T., Oshiumi, H., Shinohara, A., Sumitomo, Y., Agama, K., Pommier, Y., Tsutsui, K. M., Tsutsui, K., 1252 Hartsuiker, E., Ogi, T., Takeda, S., and Taniguchi, Y. (2010). Collaborative action of Brca1 and CtIP in elimination of covalent 1253 modifications from double-strand breaks to facilitate subsequent break repair. PLoS Genet 6, e1000828. 1254 Naughton, C., Avlonitis, N., Corless, S., Prendergast, J. G., Mati, I. K., Eijk, P. P., Cockroft, S. L., Bradley, M., Ylstra, B., and 1255 Gilbert, N. (2013). Transcription forms and remodels supercoiling domains unfolding large-scale chromatin structures. Nat 1256 Struct Mol Biol 20, 387-395. 1257 Neale, M. J., Pan, J., and Keeney, S. (2005). Endonucleolytic processing of covalent protein-linked DNA double-strand breaks. 1258 Nature 436, 1053-1057. 1259 Negri, C., Chiesa, R., Cerino, A., Bestagno, M., Sala, C., Zini, N., Maraldi, N. M., and Astaldi Ricotti, G. C. (1992). Monoclonal 1260 antibodies to human DNA topoisomerase I and the two isoforms of DNA topoisomerase II: 170- and 180-kDa isozymes. Exp 1261 Cell Res 200, 452-459. 1262 Nelson, E. R., Wardell, S. E., Jasper, J. S., Park, S., Suchindran, S., Howe, M. K., Carver, N. J., Pillai, R. V., Sullivan, P. M., 1263 Sondhi, V., Umetani, M., Geradts, J., and McDonnell, D. P. (2013). 27-Hydroxycholesterol links hypercholesterolemia and 1264 breast cancer pathophysiology. Science 342, 1094-1098. 1265 Nitiss, J. L. (2009). Targeting DNA topoisomerase II in cancer chemotherapy. Nat Rev Cancer 9, 338-350. 1266 Pan, J., Sasaki, M., Kniewel, R., Murakami, H., Blitzblau, H. G., Tischfield, S. E., Zhu, X., Neale, M. J., Jasin, M., Socci, N. D., 1267 Hochwagen, A., and Keeney, S. (2011). A Hierarchical Combination of Factors Shapes the Genome-wide Topography of Yeast 1268 Meiotic Recombination Initiation. Cell 144, 719-731. 1269 Pommier, Y., Capranico, G., Orr, A., and Kohn, K. W. (1991). Local base sequence preferences for DNA cleavage by 1270 mammalian topoisomerase II in the presence of amsacrine or teniposide. Nucleic Acids Res 19, 5973-5980. 1271 Pommier, Y., Schwartz, R. E., Kohn, K. W., and Zwelling, L. A. (1984). Formation and rejoining of deoxyribonucleic acid 1272 double-strand breaks induced in isolated cell nuclei by antineoplastic intercalating agents. Biochemistry 23, 3194-3201. 1273 Pommier, Y., Sun, Y., Huang, S. N., and Nitiss, J. L. (2016). Roles of eukaryotic topoisomerases in transcription, replication 1274 and genomic stability. Nat Rev Mol Cell Biol 17, 703-721. 1275 Reece, R. J., and Maxwell, A. (1991). DNA gyrase: structure and function. Crit Rev Biochem Mol Biol 26, 335-375. 1276 Rowley, J. D., and Olney, H. J. (2002). International workshop on the relationship of prior therapy to balanced chromosome 1277 aberrations in therapy-related myelodysplastic syndromes and acute leukemia: overview report. Genes Cancer 1278 33, 331-345. 1279 Sabourin, M., and Osheroff, N. (2000). Sensitivity of human type II topoisomerases to DNA damage: stimulation of enzyme- 1280 mediated DNA cleavage by abasic, oxidized and alkylated lesions. Nucleic Acids Res 28, 1947-1954. 1281 Sasanuma, H., Tsuda, M., Morimoto, S., Saha, L. K., Rahman, M. M., Kiyooka, Y., Fujiike, H., Cherniack, A. D., Itou, J., Callen 1282 Moreu, E., Toi, M., Nakada, S., Tanaka, H., Tsutsui, K., Yamada, S., Nussenzweig, A., and Takeda, S. (2018). BRCA1 ensures 1283 genome integrity by eliminating estrogen-induced pathological topoisomerase II-DNA complexes. Proc Natl Acad Sci U S A 1284 115, E10642-E10651. 1285 Schalbetter, S. A., Fudenberg, G., Baxter, J., Pollard, K. S., and Neale, M. J. (2018). Principles of Meiotic Chromosome 1286 Assembly. bioRXiv http://dx.doi.org/10.1101/442038, 1287 Stepanov, A., Nitiss, K. C., Neale, G., and Nitiss, J. L. (2008). Enhancing drug accumulation in Saccharomyces cerevisiae by 1288 repression of pleiotropic drug resistance genes with chimeric transcription repressors. Mol Pharmacol 74, 423-431. 1289 Stingele, J., Habermann, B., and Jentsch, S. (2015). DNA-protein crosslink repair: proteases as DNA repair enzymes. Trends 1290 Biochem Sci 40, 67-71. 1291 Strumberg, D., Nitiss, J. L., Dong, J., Kohn, K. W., and Pommier, Y. (1999). Molecular analysis of yeast and human type II 1292 topoisomerases. Enzyme-DNA and drug interactions. J Biol Chem 274, 28246-28255. 1293 Sun, X., Huang, L., Markowitz, T. E., Blitzblau, H. G., Chen, D., Klein, F., and Hochwagen, A. (2015). Transcription dynamically 1294 patterns the meiotic chromosome-axis interface. Elife 4, bioRxiv preprint doi: https://doi.org/10.1101/530667; this version posted January 26, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. Gittens et al. 2019

1295 Sutormin, D., Rubanova, N., Logacheva, M., Ghilarov, D., and Severinov, K. (2018). Single-nucleotide-resolution mapping of 1296 DNA gyrase cleavage sites across the Escherichia coli genome. Nucleic Acids Res 1297 Thomsen, B., Bendixen, C., Lund, K., Andersen, A. H., Sørensen, B. S., and Westergaard, O. (1990). Characterization of the 1298 interaction between topoisomerase II and DNA by transcriptional footprinting. J Mol Biol 215, 237-244. 1299 Tiwari, V. K., Burger, L., Nikoletopoulou, V., Deogracias, R., Thakurela, S., Wirbelauer, C., Kaut, J., Terranova, R., Hoerner, L., 1300 Mielke, C., Boege, F., Murr, R., Peters, A. H., Barde, Y. A., and Schübeler, D. (2012). Target genes of Topoisomerase IIβ 1301 regulate neuronal survival and are defined by their chromatin state. Proc Natl Acad Sci U S A 109, E934-43. 1302 Udvardy, A., and Schedl, P. (1991). Chromatin structure, not DNA sequence specificity, is the primary determinant of 1303 topoisomerase II sites of action in vivo. Mol Cell Biol 11, 4973-4984. 1304 Uemura, T., and Yanagida, M. (1984). Isolation of type I and II DNA topoisomerase mutants from fission yeast: single and 1305 double mutants show different phenotypes in cell growth and chromatin organization. EMBO J 3, 1737-1744. 1306 Uusküla-Reimand, L., Hou, H., Samavarchi-Tehrani, P., Rudan, M. V., Liang, M., Medina-Rivera, A., Mohammed, H., Schmidt, 1307 D., Schwalie, P., Young, E. J., Reimand, J., Hadjur, S., Gingras, A. C., and Wilson, M. D. (2016). Topoisomerase II beta 1308 interacts with cohesin and CTCF at topological domain borders. Genome Biol 17, 182. 1309 Vélez-Cruz, R., Riggins, J. N., Daniels, J. S., Cai, H., Guengerich, F. P., Marnett, L. J., and Osheroff, N. (2005). Exocyclic DNA 1310 lesions stimulate DNA cleavage mediated by human topoisomerase II alpha in vitro and in cultured cells. Biochemistry 44, 1311 3972-3981. 1312 Wang, J. C. (2009). Untangling the double helix Cold Spring Harbor Laboratory Pr). 1313 Wang, Y., Knudsen, B. R., Bjergbaek, L., Westergaard, O., and Andersen, A. H. (1999). Stimulated activity of human 1314 topoisomerases IIalpha and IIbeta on RNA-containing substrates. J Biol Chem 274, 22839-22846. 1315 Wang, Y., Thyssen, A., Westergaard, O., and Andersen, A. H. (2000). Position-specific effect of ribonucleotides on the 1316 cleavage activity of human topoisomerase II. Nucleic Acids Res 28, 4815-4821. 1317 Woessner, R. D., Mattern, M. R., Mirabelli, C. K., Johnson, R. K., and Drake, F. H. (1991). Proliferation- and cell cycle- 1318 dependent differences in expression of the 170 kilodalton and 180 kilodalton forms of topoisomerase II in NIH-3T3 cells. Cell 1319 Growth Differ 2, 209-214. 1320 Wozniak, A. J., and Ross, W. E. (1983). DNA damage as a basis for 4’-demethylepipodophyllotoxin-9-(4,6-O-ethylidene-beta- 1321 D-glucopyranoside) (etoposide) cytotoxicity. Cancer Res 43, 120-124. 1322 Wu, C. C., Li, T. K., Farh, L., Lin, L. Y., Lin, T. S., Yu, Y. J., Yen, T. J., Chiang, C. W., and Chan, N. L. (2011). Structural basis 1323 of type II topoisomerase inhibition by the anticancer drug etoposide. Science 333, 459-462. 1324 Wu, J., and Liu, L. F. (1997). Processing of topoisomerase I cleavable complexes into DNA damage by transcription. Nucleic 1325 Acids Res 25, 4181-4186. 1326 Yan, W. X., Mirzazadeh, R., Garnerone, S., Scott, D., Schneider, M. W., Kallas, T., Custodio, J., Wernersson, E., Li, Y., Gao, 1327 L., Federova, Y., Zetsche, B., Zhang, F., Bienko, M., and Crosetto, N. (2017). BLISS is a versatile and quantitative method for 1328 genome-wide profiling of DNA double-strand breaks. Nat Commun 8, 15058. 1329 Yang, F., Kemp, C. J., and Henikoff, S. (2015). Anthracyclines induce double-strand DNA breaks at active gene promoters. 1330 Mutat Res 773, 9-15. 1331 Yu, X., Davenport, J. W., Urtishak, K. A., Carillo, M. L., Gosai, S. J., Kolaris, C. P., Byl, J. A. W., Rappaport, E. F., Osheroff, N., 1332 Gregory, B. D., and Felix, C. A. (2017). Genome-wide TOP2A DNA cleavage is biased toward translocated and highly 1333 transcribed loci. Genome Res 27, 1238-1249. 1334 Zwelling, L. A., Michaels, S., Erickson, L. C., Ungerleider, R. S., Nichols, M., and Kohn, K. W. (1981). Protein-associated 1335 deoxyribonucleic acid strand breaks in L1210 cells treated with the deoxyribonucleic acid intercalating agents 4’-(9- 1336 acridinylamino) methanesulfon-m-anisidide and adriamycin. Biochemistry 20, 6553-6563. 1337 Fig 1. Take home messages: - CCseq is able to map Spo11 cleavage positions in yeast. - Positions corroborate previous study, and hotspot activity is highly correlated. Chapter 5: Genome-wide mapping of Spo11-DSBs - W-C Offset analysis demonstrates nucleotide resolution.

Figure 1

A / Spo11 HIS4::LEU2 DSB Top2hotspotTop2 ! Spo11

B PK D W - -through C Input Input +PKFlow Wash 1 Wash 6 Elution Spo11 Top2 Parental Top2 / ParentalRestriction! 0.8 Spo11 fragment Library Prep + 0.6 Unproteolysed Tdp2 gDNA 0.4 extraction Spo11DSB II-!DSB 1 Illumina Sequencing 0.2

Pearson correlation (r) Pearson correlation 0 CC Enrichment -10 -8 -6 -4 -2 0 2 4 6 8 10 on Column Single nucleotide resolution mapping Spo11DSB I-!DSB 2 Watson-Crick Rel. Offset (bp) Right side of break (Watson) Left side of break E (Crick) Slot Blot 0. 4 or Southern Blot

C 15 0. 2 composition

CC-seq base Fractional 0 Figure 5.3: Southern blot of Spo11-DSB enrichment. (SPO11+) 0

) A meiotic time course is performed for a sae2! strain and cells harvested at 6 hours. Unproteolysed -100 -60 -20 20 60 100 Position relative to dyad axis (bp) genomic DNA was extracted as described in Figure 5.2. Molecules were precipitated with ethanol, HpM 15 Genes Spo11 A 15 resuspended in 1! TE and restriction enzyme digested with PstI for 1 h at 37 °C to fragment the DNA W T C (‘INPUT’). The sample is bound to the glass fibre membrane of a QIAQuick spin column and Spo11 G C DNA CC ( CC DNA oligo-seq 0.4 - 0 centrifuged. The flow-through was rebound to the column and centrifuged again to increase yield (second flow-through – ‘UNBOUND’). The membrane is washed using TEN (10 mM(SPO11 Tris Base+) !HCl

Spo11 pH 8.0, 1 mM EDTA, 300 mM NaCl) to remove any non-protein-bound DNA (fraction taken – 0.2 15 ‘WASH’). Spo11-bound DNA is released from the column using two sequential 50 µl TES buffer 5 composition (10 mM Tris Base!HCl pH 8.0, 1 mM EDTA, 0.5% SDS) (‘ELUATE’). All CC samples-seq were base Fractional 0 (spo11-Y135F) 5 Proteinase K treated (unless stated) at 60 °C for 1 hour and the samples separated on a 0.7% agarose 0 150 160 170 180 190 200 gel for 18 hours at 60 V. The gel was transferred to nylon membrane under denaturing conditions and -12 -8 -4 +4 +8 +12 Position on Chromosome 6 (Kbp) Position relative to dyad axis (bp) hybridised with a radioactive probe for the HIS4::LEU2 locus. The membrane was exposed to a phosphor screen and image taken using a Fuji phosphor scanner.

Figure 1. CC-seq maps covalent Spo11-linked DNA breaks in S. cerevisiae meiosis with nucleotide accuracy A) Schematic of the CC-seq method. B) Column-based enrichment of Spo11-linked DNA fragments detected by Southern blotting at the his4::LEU2 recombination hotspot (Methods). Arrows indicate expected sizes of Spo11-DSBs. C) Nucleotide resolution mapping of S. cerevisiae Spo11 hotspots by CC-seq or oligo-seq (Pan et al., 2011). Red and blue traces indicate Spo11-linked 5’ DNA termini on the Watson and Crick strands, respectively. Grey arrows indicate positions of gene open reading frames. D) Pearson correlation (r) of Spo11 CC-seq signal between Watson and Crick strands, offset by the indicated distances. E) Average nucleotide composition over a 200 (top) and 30 bp (bottom) window centred on Spo11 breaks. Bases reported are for the top strand only. HpM = Hits per million mapped reads per base pair.

134 is able to map Top2cc in yeast with nucleotide resolution. Top2cc are subject to repair by the Mre11 complex. The distribution of Top2/Spo11 cleavages is very different.

Figure 2 Chapter 7: Genome-wide mapping of Top2-DSBs

A ! Etoposide concentration! A + VP16 B YPDYPD ! 0.10.1 μmMM! 0.30.3 mMμM! 1.0 mM μM! 20 0 -VP16 pdr1∆ PDR1pdr1∆! 20 45 PDR1pdr1 ∆sae2sae2"!∆

wtwt! ) 0 +VP16 pdr1∆

PDR1pdr1 mre11∆ mre11"!∆ cHpM 45 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 Fold dilution: — 10 10 10 10 — 10 10 10 10 — 10 10 10 10 — 10 10 10 10 45

Top2 CC ( pdr1∆ +VP16 B ! C 1 0 mre11∆ 1.001.00 Spo11 DExposure time! Top2 0.8 Top2 45 ) )

0.75 Genes 0.75 1 h! 2 h! 0.6 4 h! Spo11 24 h! 45 HpM HpM PDR1! t0! 0.4 0.50 0 Meiosis (Spo11)

0.50 Pearson(R) mre11"! HpMpKb 0.2 DMSO! 0.25 45 0.25 0 ( CC Spo11 CC Density ( Density CC 1 μM etoposide! 0 1 2 3 4 5 6 7 10 20 30 40 50 60 Position on Chromosome 11 (Kbp) 0.000.00 E Inter-bin distance (Kb) Intergenic Divergent Tandem Convergent Genic PDR1! 80 t0! Region ) Top2 sae2"! 45 70 0.7 W cHpM DMSO! C pdr1∆ 60 0.6 +VP16 1 μM etoposide! Top2 mre11∆ 50 0.5

Top2 CC ( 45 40 0.4 ) 10 30 0.3 Genes HpM 20 Pearson(R) 0.2 meiosis C! DMSO volume! Percentage of totalof signal Percentage 10 0.1 0 0 10

0 μl! 15 μl! 45 μl! 150 μl! ( CC Spo11 Inter.:(Div. Tan. Conv.)Intra. -10 -8 -6 -4 -2 0 2 4 6 8 10 32 34 36 38 40 PDR1! Watson-Crick Rel. Offset (bp) Position on Chromosome 11 (Kbp) PDR1 sae2"! wt! PDR1 mre11"!

Figure 2. CC-seq maps covalent Top2-linked DNA breaks in S. cerevisiae cycling cells with nucleotide accuracy Figure 7.1:A) A sensitivitySerial dilutioncassette renders spot wildtests type, of sae2 VP16! and tolerance mre11! cells forsensitive the toindicated chronic strains. and acute etoposide exposure. Serial spot B)dilutionNucleotide assays of sensitivity resolution to etoposide mapping of stated of S. S.cerevisiae cerevisiae mutants. Top2The sensitivity CCs by CC-seq of the indicated strains after treatment for 4 hours with cassette (‘PDR1’1 mM) consistsVP16. of a Spo11deletion of-CC the PDR1 data gene is andplotted a fusion for of thecomparison. transcriptional repressor Top2 CC data were calibrated using a human DNA spike-in (Methods). CYC8p to theRed DNA and binding blue domain traces of Pdr1p. indicate A&C. Cells CC were-linked grown 5’ overnight DNA attermini 30 °C, diluted on the to Watson and Crick strands, respectively. Grey arrows indicate an OD of 0.5 in fresh YPD and grown to an OD of 2.0. A serial 10-fold dilution series, starting at 600 positions of gene open reading600 frames. Lower panels show an expanded view of the region from 31.5 to 40 Kbp. OD600 2.0, were spotted onto YPD plates containing the stated final concentration of etoposide (A) or DMSO (C) C)and Quantificationgrown at 30 °C for 3 days.of Top2 B. Overnight and Spo11 cells were CC diluted signal to an OD stratified600 of 0.5, grown by genomic region. The genome was divided into intra and intergenic to an OD of 2.0 and the culture split into four. Cells were taken at time 0 (‘t0’) and a serial dilution 600regions; the intergenic region was further divided into divergent, tandem and convergent based on orientation of flanking genes. series was spotted. 1 µM Etoposide or DMSO was added and incubated for the stated exposure time. rd A serial 10-foldSpo11 dilution and series Top2 was spottedactivity and mapped cells grown byat 30 CC °C -forseq 3 days.is expressed Images were taken as box-and-whisker plots of density (upper and lower box limit: 3 using a Syngeneand InGenius 1st quartile; bioimaging bar: system. median; Experiment upper was performed and lower by Holly whisker: Thomas. highest and lowest values within 1.5-fold of the interquartile range), or as the percentage of total mapped reads. D) Local correlation of Top2 or Spo11 CC-seq signals. Top2 or Spo11 CC-seq data were binned at 50 bp resolution and the Pearson correlation calculated between bins of increasing separation. E) Pearson correlation (r) of Top2 CC-seq signal on Watson and Crick strands, offset by the indicated distance. HpM = Hits per million mapped reads per base pair.

182 Figure 3 is able to map Top2cc in humans. scale mapping reveals large domains of high and low activity. Offset analysis demonstrates nucleotide resolution. Short Exposure Long Exposure A -VP16 +VP16 -VP16 +VP16 C D

0.80.8 ) Input 20 0 0.6

0.6 VP16

20 -

F.T. HpBpbp 60 0.40.4 Density

Wash 1 HpM Window Position Human Feb. 2009 (GRCh37/hg19) chr1:114,400,000-114,500,000 (100,001 bp) Scale 50 kb Window Position Human Feb. 2009 (GRCh37/hg19) chr1:114,400,000-114,500,000 (100,001 bp) 253.4 _ HRPEpiC H3K4me3 signal from UW 0Scale 50 kb TOP2 CC 0.20.2 253.4 _ HRPEpiC H3K4me3 signal from UW HRPEpiC H3K4me3 Wash 2 +VP16 HRPEpiC H3K4me3 0 _ Top2 CC Density ( 0.0 6054.4 _ HRPEpiC TFBS Signal of CTCF from UW 0.0 0 _ Elution A B 5454.4 _ HRPEpiC TFBS Signal of CTCF from UW A Region B HRPEpiC CTCF Sg HRPEpiC CTCF Sg Compartment Window Position Human Feb. 2009 (GRCh37/hg19) chr1:114,400,000-114,500,000 (100,001 bp) 0Scale0 _ 50 kb CTCF UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) 253.40 _ HRPEpiC H3K4me3 signal from UW UCSC DNaseIGenes (RefSeq,Hypersensitivity GenBank, Clusters CCDS, in Rfam,125 cell tRNAs types & from Comparative ENCODE Genomics) (V3) 253DNase Clusters HRPEpiC H3K4me3 DNaseI HypersensitivitychromHMM Clusters tracks in 125 from cell Roadmap types from ENCODE (V3) DNaseLNG.IMR90 Clusters AU LNG.IMR90 chromHMM tracks from Roadmap LNG.IMR90306.627 _ H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE LNG.IMR9000 _ H3K4Me3 306.62754.4 _ H3K27Ac Mark (Often FoundHRPEpiC Near Active TFBS Regulatory Signal of CTCF Elements) from UWon 7 cell lines from ENCODE B Layered307 H3K27Ac LayeredHRPEpiC H3K27Ac CTCF Sg 0 _ No data _ AlkB demethylated RNA-seq Plus signal GM12878 Rep 1 from UC Santa Cruz 00 _ Plus dm-GM12878_1No data _ UCSCAlkB Genesdemethylated (RefSeq, RNA-seq GenBank, Plus CCDS, signal Rfam,GM12878 tRNAs Rep & 1Comparative from UC Santa Genomics) Cruz H3K27Ac Plus dm-GM12878_1No data _ DNaseI Hypersensitivity Clusters in 125 cell types from ENCODE (V3) 0.505e−04 DNase Clusters Transcription Factor ChIP-seq (161 factors) from ENCODE with Factorbook Motifs 4e−04 TOP2 CC -VP16 Txn FactorNo data ChIP _ chromHMM tracks from Roadmap 3e−04 114.40LNG.IMR901 _ 114.42 114.44TranscriptionAlkB demethylated Factor ChIP-seq RNA-seq (161 Plus factors) signal GM05372 from ENCODE Rep 1 with from Factorbook UC Santa114.46 CruzMotifs 114.48 114.50 2e−04 Txn FactorLNG.IMR90 ChIP Plus dm-GM05372_1 . 1e−04 1 _ AlkB demethylated RNA-seq Plus signal GM05372 Rep 1 from UC Santa Cruz

HpBpBp 306.627 _ H3K27Ac Mark (Often Found Near Active Regulatory Elements) on 7 cell lines from ENCODE 0e+00 Plus dm-GM05372_1 0 _ Position on (Mbp) −1e−04 HpBpbp 2 _ AlkB demethylated RNA-seq Minus signal GM12878 Rep 1 from UC Santa Cruz -0.10 0 _ −2e−04 Layered H3K27Ac 2e+07 3e+07 4e+07 Minus dm-GM12878_12 _ AlkB demethylated RNA-seq Minus signal GM12878 Rep 1 from UC Santa Cruz Subtr x Minus dm-GM12878_10 _ 04 __ AlkB demethylated RNA-seq Minus signal GM05372 Rep 1 from UC Santa Cruz 0.505e−04 TOP2 CC +VP16 No data0 _ AlkB demethylated RNA-seq Plus signal GM12878 Rep 1 from UC Santa Cruz 4e−04 Minus dm-GM05372_14 _ AlkB demethylated RNA-seq Minus signal GM05372 Rep 1 from UC Santa Cruz Plus dm-GM12878_1 Top2

Med. Med. 3e−04 Minus dm-GM05372_10 _ W 2e−04 E No data _ K562 Whole Cell Short RNA CSHL Contigs C 1e−04 0 _ HpBpBp Transcription FactorK562 ChIP-seq Whole (161 Cell factors) PolyA Long from FCSHLENCODE Contigs with Factorbook Motifs Top2 0e+00 Txn Factor ChIP K562 Whole Cell Short RNA CSHL Contigs K562K562 Whole Whole Cell Cell non-PolyA PolyA Long Long CSHL CSHL Contigs Contigs −1e−04 1 _ AlkB demethylated RNA-seq Plus signal GM05372 Rep 1 from UC Santa Cruz HpBpbp -0.10 −2e−04

0.10 Plus dm-GM05372_1 K562 RepeatingWhole Cell Elements non-PolyA by Long RepeatMasker CSHL Contigs 0.06 2e+07 3e+07 4e+07

0.04 60 x RepeatMasker 40 0 _ 0.09 GM12878Repeating Whole Elements Cell Short by RepeatMaskerRNA CSHL Contigs

) 90 A RepeatMasker2 _ AlkB demethylatedGM12878 RNA-seq Nucleus Minus signal Short GM12878 RNA CSHL Rep Contigs 1 from UC Santa Cruz 0.05 Compartment Eigenvector GM12878 Whole Cell Short RNA CSHL Contigs 400.04 0.05 Minus dm-GM12878_1 GM12878GM12878 Nucleus Cytosol Short RNA CSHL Contigs

GM12878 Whole Cell PolyA Long CSHL Contigs 0.02 GM12878 Cytosol Short RNA CSHL Contigs 20 0 _ ) 0.02 ) 4 _HpB AlkB demethylatedGM12878GM12878 RNA-seq Whole Whole Minus Cell Cell non-PolyAsignal PolyA GM05372 Long Long CSHL CSHLRep Contigs 1 Contigsfrom UC Santa Cruz 0.00 ( 20 0.06 AU Eigen Minus dm-GM05372_1 60 GM12878GM12878 Whole Nucleus Cell non-PolyA PolyAvariable Long Long CSHL CSHL Contigs Contigs 0.00 0.00

0 _ value GM12878GM12878 Nucleus Nucleus non-PolyA PolyA Long Long CSHLFiltered.Total CSHL Contigs Contigs 0 0 B HpB HpB −0.05 K562 Whole Cell Short RNA CSHL Contigs (

-0.05 seq ( GM12878K562 Whole Nucleus Cell non-PolyA PolyA Long Long CSHL CSHL Contigs Contigs - RNA-seq Signal from REMC 0.02

NCCCD smRNA HuFNSC02 20− 0.03 K562 WholeRNA-seq Cell non-PolyA Signal from Long REMC CSHL Contigs 0.02 NCCCD smRNA HuFNSC03 − −0.10 30 20 NCCCD smRNA HuFNSC02 ENDSEQ_PILEUP$WatsonW N smRNA HuFNSC01 Repeating Elements by RepeatMasker seq 2e+07 3e+07 4e+07 NCCCD smRNA HuFNSC03 seq N smRNA HuFNSC02 -

RepeatMasker - 40 N smRNA HuFNSC01END 0.40 Supercoiling x Neg PBMC smRNA TC014 GM12878 Whole Cell Short RNA CSHL Contigs /input) N smRNA HuFNSC02 PFF smRNA 01 84 GM12878 Nucleus Short RNA CSHL Contigs 0.04 − PBMC smRNA TC014 40 0.06 CC PFK smRNA 01 85 GM12878 Cytosol Short RNA CSHL Contigs 60− PFMPFF smRNAsmRNA 0101 8483 PFK smRNA 01 85 0.00 GM12878 Whole Cell PolyA Long CSHL Contigs 0 Haplotypes to GRCh37 Reference Sequence END PFM smRNA 01 83 GM12878Patches Whole to GRCh37Cell non-PolyA Reference Long Sequence CSHL Contigs −400 −200 0 200 400Haplotypes to GRCh37 Reference Sequence −20 −10 0 10 20 bTMP RefSeq gene predictions from NCBI RefSeq Curated -400 -200 Pos0 +200 +400GM12878Patches Nucleusto GRCh37 PolyA Reference Long CSHL Sequence Contigs -20 -10 0 +10 +20 -0.20 Gene Expression in 53RefSeq tissues genefrom GTExpredictions RNA-seq from of NCBI 8555 samples (570 donors) ENDSEQ_PILEUP$Pos Pos RefSeq Curated GM12878 Nucleus non-PolyA Long CSHL Contigs PTPN22 BCL2L15 AP4B1 Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors) Pos. rel. to TOP2 CC (bp) 20 30 40 RNA-seqDCLRE1B Signal from REMC HIPK1-AS1 Log2( Pos. rel. to TOP2 CC (bp) NCCCD smRNA HuFNSC02PTPN22 BCL2L15 AP4B1 DCLRE1B HIPK1-AS1 NCCCD smRNA HuFNSC03 HIPK1 N smRNA AP4B1-AS1HuFNSC01 N smRNAAP4B1-AS1 HuFNSC02 Simple Nucleotide Polymorphisms (dbSNP 151) Found in >= 1% of Samples HIPK1 Position on Chromosome 11 (Mbp) PBMCCommon smRNA SNPs(151) TC014 PFF smRNA 01 84 Simple Nucleotide Polymorphisms (dbSNP 151) Found in >= 1% of Samples CommonPFK smRNA SNPs(151) 01 85 PFM smRNA 01 83 Haplotypes to GRCh37 Reference Sequence Patches to GRCh37 Reference Sequence RefSeq gene predictions from NCBI RefSeq Curated Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors) PTPN22 BCL2L15 AP4B1 DCLRE1B HIPK1-AS1

AP4B1-AS1 HIPK1 Simple Nucleotide Polymorphisms (dbSNP 151) Found in >= 1% of Samples Common SNPs(151) Figure 3. CC-seq maps TOP2-linked DNA breaks in Human cells with nucleotide accuracy A) Anti-TOP2β western slot blot of input, flow through, wash, and elution fractions. RPE-1 cells were treated or not with VP16, prior to processing according to Figure 1A. B) Broad-scale maps of H. sapiens TOP2 CCs produced by CC-seq in RPE-1 cells ±VP16. Raw data were scaled, binned, smoothed and median subtracted prior to plotting (Methods). Chromatin compartments revealed by Hi-C eigenvector analysis (Darrow et al., 2016), and supercoiling revealed by bTMP ChIP-seq (Naughton et al., 2013) are shown for comparison. C) Quantification of TOP2 CC in chromatin compartments A and B. Data are expressed as box-and-whisker plots of density as for Figure 2C. D) Fine-scale mapping of H. sapiens TOP2 CCs by CC-seq in RPE-1 cells ±VP16. Red and blue traces indicate TOP2-linked 5’ DNA termini on the Watson and Crick strands, respectively. Pale shaded areas are the same data smoothed according to local density (see Methods). RPE-1 CTCF and H3K4Me3 ChIP-seq data plus H3K27Ac ChIP-seq data overlaid from seven cell lines (ENCODE, 2012) is shown for comparison. E) Medium-scale aggregate of END-seq mapped DSBs (Canela et al., 2017) surrounding nucleotide resolution CC-seq mapped TOP2 CCs. F) Fine-scale strand-specific aggregate of END-seq mapped DSB ends (red and blue bars) and nucleotide resolution CC-seq mapped TOP2 CCs (grey bars) surrounding strong TOP2 CC sites.

2 column (full wicth) detects CTCF-proximal Top2cc in human cells. Typically a strong peak is detected at ~55bp 5’-relative to the motif. Complex CTCF loci typically exhibit additive patterns. The aggregated Top2cc around CTCF motifs corroborates previously published data. proximal Top2cc positions anticorrelate with nucleosome positions. proximal Top2cc positions are correlate with CTCF occupancy at the motif. Figure 4

A

5’ 3’

B 3.0 3.0

● ●2.53.015● ● ● ● ● ● ● ● ● ● ● ● 2.5● ) C 2.02.5(AU) 2.0 HpB

High seq

1.52.0- 10 1.5

1.01.5 1.0 data2$WCTotal2 TOP2 TOP2 CC ( MNase 0.51.05 0.5 data2$WCTotal2 0.00.5 0.0

0.0 −1000-1 −-0.5500 0 +0.5500 1000+1 TOP2 CC Density Position relative to CTCF motif (Kbp) Low −1000 −500 data2$Pos0 500 1000

CTCF CTCF binding data2$Pos C 0.0099 0.8 0.8 ●●0.0300.008●8 ● ● ● ● ● ● ● ● ● ● ● ● ● 0.0250.6(AU) 0.6

3.0 -54 (smoothed)

0.0030 seq

0.0077 ) 0.020- ) 0.4 2.5 4th quart. 0.4

0.0025 HpM 0.0150.0066

HpB rd

3 quart. MNase 2.0 0.0100.2 0.2 0.0020 +54 2nd quart. PileUp$Filtered.Total PileUp2$Filtered.Total 0.0055 1st quart. PileUp2$Filtered.Total 0.005 0.00151.5 CTCF CTCF binding 0.0 0.0 TOP2 CC ( 0.000

TOP2 CC ( -1 -0.5 0 +0.5 +1 0.00101.0 −1000 −500 0 500 1000 data[[1]]$Pos −1000Position −relative500 to TOP20 CC 500peak (Kbp1000) 0.5 PileUp2$PosPileUp$Pos 0.0005 PileUp2$Pos -400 -200 0 +200 +400

Position relative to 0 CTCF motif (bp) 400 200 200 400 − − Index

Figure 4. CTCF-proximal TOP2 activity is tightly confined within nucleosome-depleted regions A) Aggregation of TOP2 CCs in a 1 Kbp window centred on orientated CTCF motifs in human RPE-1 cells. Four example CTCF loci are shown orientated in the 5’-3’ direction (top). A heatmap of all CTCF-motifs in the human genome, with 25 rows stratified by the strength of colocalising CTCF ChIP-seq peaks in RPE-1 cells (middle). The colour scale indicates average TOP2 CC density. Motifs are also stratified into 4 quartiles of CTCF-binding, and the average TOP2 CC distribution in each quartile plotted (bottom). B) Aggregated TOP2 CC distribution (red line) in the highest quartile of CTCF-binding compared with the average MNase-seq signal (blue line). C) Aggregated TOP2 CC distribution (black and red lines: single-nucleotide resolution and smoothed, respectively) and the average MNase-seq signal (blue line) surrounding strong TOP2 CC sites. In (B) and (C) peaks in MNase-seq signal indicate inferred nucleosome positions (blue circles). D) D) the KS test. as box expressed are present. Data C) gene B) the average and expression, of gene quartiles TOP2 (bottom).plotted quartile each in distribution CC RPE in level expression the in 5’ orientated shown A) TSS 5. Figure As in (C) As but (C) sum total in TOP2 these regions. in found CCs Average TOP2 10 in density CC of CTCF Comparison of Aggregation TOP2 a 10 in CCs - expression (blue), respectively. (blue), expression - proximal TOP2 activity is strongly correlated with gene transcription gene with correlated strongly is TOP2proximal activity Figure5 MAN1A2 ABCD3 SZRD1 A ACAT1

- TOP2 CC (HpB) - 1 cells (middle). (middle). average 1 cells indicates scale The colour TOP2 density.CC 4 into stratified Motifs also are 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 High Low

proximal and and proximal TSS TOP2 CC Density - 3’ (top). direction A of all heatmap by stratified gene rows 25 with TSSsgenome, the in human - 4 Positionrelative to TSS ( Kbp - - 2 and Kbp regions centred on the highest quartile of quartile the on highest centred regions TSS, CTCF, both features are where or regions - whisker plots ofplots as in density whisker 0 window centred on orientated orientated on centred window RPE TSSs human in TSS - proximal proximal TOP2 of CTCF quartile the in highest distributions CC +2

Kbp Gene exp 2 3 4 1 nd rd th st ) +4 quart. quart. quart. quart.

Gene expression level D C B

BackgroundSmooth-subtracted

Background-subtracted Background-subtracted 0.0000 0.0005 0.0010 0.0015 0.0020 0.0025 TOP2 CC (HpB) Sum TOP2 CC (KHpM) 0.0 0.5 1.0 1.5 2.0 2.5 TOP2- CC Density (HpBpbp) 10 15 −

0.2 0.0 0.2 BGSubDensity0.4 0.6 0.8 1.0 - − 5000 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 0.0002 0 5 Figure 2C Figure 5.0 TSS TSS − TSS - 2500 2.5 P < 2.2x10 . Statistical significance was determined using using determined was . significance Statistical Pos Pos - p p = 4.3x10 CTCF 17 CTCF 0 0 ( Region CTCF bp ) P < 2.2x10 - 7 +2.5 2500 - 17 - 1 cells. Four example Four 1 example cells. TSSs are Both Both Both CTCF TSS +5.0 5000 Type Region Both CTCF TSS TSS CTCF - binding (pink) and and (pink) binding Figure 6

A C Top2 100% W

80% C Top2 60% SSBs Yeast vs Dissimilar 40% DSBs Human Yeast DSBs vs 20% Human SSBs

Percentage of sites Percentage Similar 11 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 11 0% Negative Positive Yeast Human Position relative to dyad axis (nucleotides) SSBs DSBs

DSBs A G SSBs T C Top2 B W D C

0.8 0.8

0.6 0.6 3.5 P < 4.2x10-15

0.4 0.4 3

Yeast C )

0.2 0.2 2.5 MeC HpB 0 0 2

0.8 0.8 1.5 P = 1 0.6 0.6 1 TOP2 CC TOP2 ( 0.4 0.4

Human 0.5

Fractional base composition base Fractional 0.2 0.2 0 0 0 -VP16 +VP16 -12 -8 -4 +4 +8 +12 -12 -8 -4 +4 +8 +12 Position relative to dyad axis (nucleotides)

Figure 6. Local DNA sequence and methylation status direct the formation of Top2 CCs A) The percentage of strong sites that are SSBs and DSBs in etoposide-treated S. cerevisiae and human cells. B) Average nucleotide composition over a 30 bp window centred on the DSB or (inferred) SSB dyad axis, in etoposide-treated S. cerevisiae and human cells. Values reported are for the top strand only. C) Heatmaps describing pairwise similarity of nucleotide composition patterns shown in (B). Rows 1-4 are the absolute differences between: yeast and human SSB patterns, yeast and human DSB patterns, yeast SSB and DSB patterns, and human SSB and DSB patterns, respectively. D) The average number of TOP2 CCs at the +1 position relative to methylated and unmethylated cytosines in human RPE-1 cells, ±VP16. Statistical significance was determined using KS test. Spo11 Supplement activity in each quartile plotted (bottom).plotted quartile each activity in of Spo11 distribution the average SK1 and cells, meiotic in expression of gene 4 into quartiles stratified Motifs also are Spo11 (middle). SK1 average in cells indicates level scale The colour expression by stratified gene density. break the in 5’ orientated shown are E) by measured D) from C) B) A) (related to Figure 1) CC S1. Figure Aggregation of Spo11Aggregation a activity 10 in Spo11 by CC mapped breaks of Spo11Correlation Watsonon cleavages offset when strand Crick and by 1 of 500 Correlation from CC of hotspot signals Correlation - 4 to +4 bp. MNase FigureS1 - A seq B C maps covalent Spo11 Spo11 oligo-seq (normalized HpM) - 10 10 10 10 seq bp Spo11 CC (HpM) (Replicate 2) 10 100 3 0 1 2 4 0.1

Crick Spo11 CC (HpM) 10 1 binned Spo11binned replicates. from maps two representative Spo11CC ( (blue). 10 WatsonSpo11CC( 0 CC 0.1 - - 3’ (top). direction A of all heatmap TSSs the in 10 seq Pearson r = 0.82 1 - seq HpM (normalized r = 0.98 = r 1 10 ) ( ) 2 (raw=black, or smoothed=red) anticorrelate with nucleosome occupancy occupancy nucleosome with anticorrelate or smoothed=red) (raw=black, Replicate1 10 Kbp 10 HpM - HpM 3 seq - linked DNAlinked in breaks 100 ) ) window centred on orientated orientated on centred window TSSs in 10 ) and oligo 4 E − − − D − 20000 15000 10000 data[[1]]$Pos MNase$Filtered.TotalMNase$Filtered.Total 10000 5000 5000

FUN12 MNase-seq (AU) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 - - - ADE1 -

Spo11 CC (HpM) 10 20 30 40 50 60 10 15 20 RFT1 20 15 10 10 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 - 0 0 seq. seq. 0 5 High Low Spo11 CC Density 5 0 5 - − − − Position relative Position Spo11peakto ( 400 400 400 400 - − 4000 4 Positionrelative to TSS ( - − − − 200 200 200 200 S. cerevisiae - − 2000 2 MNase$Pos MNase$Pos MNase$Pos Index 0 0 0 0 0 0 +200 200 200 200 S. cerevisiae +2

2000Kbp

bp Expression level 2 meiosis with nucleotide accuracy +400 3 4 1 bp 400 400 400 nd rd th st ) quart. quart. quart. quart. S. cerevisiae ) (boxed), relative to offsetsother relative (boxed), 4000 +4 0 5 10 15 20

genome, with 25 rows rows 25 with genome, Spo11 CC (HpM) (smoothed) Expression Level . example Three TSSs Yeast Top2 Supplement 1

Figure S2

A E S. cerevisiae chromosome 3 F I pdr1∆ + VP16 0. 03 pdr1∆ 0. 8 pdr1∆ 2.5 r = 0.89 0. 7 ) ) 0. 025 -VP16 Series1-VP16

VP16 0. 6 +VP16 Series2+VP16 - 0. 02

0. 5 2.0 count ∆ HpMpKb 20.085537

normHpM 0. 015 0. 4 7.389056 2.718282 0. 3 Log2( Rep.2 Top2 CC Top2 Rep.2 1.000000 pdr1 0. 01 1.5 Log10_HpMp1Kb_RA9.429E_bf 0. 2 Pearson correlation (r)

Top2 CC ( 0. 005 0. 1

+VP16 0 0 1.0 1.0 1.5 2.0 2.5 1 4 7 1 4 7 -8 -5 -2 -8 -5 -2 Log10_HpMp1Kb_RA2.429E_bf 10 13 16 19 10 13 16 19 -20 -17 -14 -11 -20 -17 -14 -11 Rep.1 Top2 CC Watson-Crick Relative Offset Watson-Crick Relative Offset Log2(HpMpKb) (bp) (bp)

pdr1∆ sae2∆ +VP16 0. 8 B 0. 03 2.5 r = 0.95 G pdr1∆ sae2∆ J pdr1∆ sae2∆ 0. 7 ) ) 0. 025 Series1-VP16

∆ -VP16 VP16 0. 6 ) Series2+VP16 - +VP16 2.0 0. 02 0. 5 count 20.085537 sae2

cHpM

normHpM 0. 4 7.389056 0. 015 ∆

HpMpKb 2.718282 0. 3 1.000000 1.5 0. 01 Log10_HpMp1Kb_RA11.475E_bf 0. 2 Log2( Rep.2 Top2 CC CC Top2 Rep.2 pdr1 Pearson correlation (r) Top2 CC ( 0. 005 0. 1

1.0 0 +VP16

1.0 1.5 2.0 2.5 Top2 CC (

0 1 4 7

Log10_HpMp1Kb_RA5.475E_bf -8 -5 -2 10 13 16 19 1 4 7 -20 -17 -14 -11 -8 -5 -2 10 13 16 19

Rep.1 Top2 CC -20 -17 -14 -11 Log2(HpMpKb) Watson-Crick Relative Offset Watson-Crick Relative Offset C (bp) (bp) pdr1∆ mre11∆ +VP16 H K 0. 03 0. 8 2.5 r = 0.97 pdr1∆ mre11∆ pdr1∆ mre11∆ 0. 7 ∆ ) 0. 025 -VP16 Series1-VP16

) 0. 6 VP16 +VP16 Series2+VP16 - 2.0 0. 02 count 0. 5 mre11

7.389056 normHpM 0. 015 0. 4

2.718282 ∆ HpMpKb

1.000000 0. 3 1.5 0. 01 Log10_HpMp1Kb_RA13.551E_bf Log2( pdr1 Rep.2 Top2 CC Top2 Rep.2 0. 2 Pearson correlation (r)

Top2 CC ( 0. 005 0. 1

1.0 +VP16 1.0 1.5 2.0 2.5 0 0 1 4 7 Log10_HpMp1Kb_RA7.551E_bf 1 4 7 -8 -5 -2 -8 -5 -2 10 13 16 19 10 13 16 19 -20 -17 -14 -11 Rep.1 Top2 CC -20 -17 -14 -11 Log2(HpMpKb) Watson-Crick Relative Offset Watson-Crick Relative Offset (bp) (bp) D Chromosome coordinate (bp) r values

sae2∆ 0.84 0.94

mre11∆ 0.82 0.85 0.95 0.93

WT sae2∆ WT sae2∆ -VP16 +VP16

Figure S2. CC-seq maps covalent Top2-linked DNA breaks in S. cerevisiae cycling cells with nucleotide accuracy (related to Figure 2). A-C) Correlation of 1 Kbp binned Top2 CC maps from pdr1∆ (A), pdr1∆sae2∆ (B) and pdr1∆mre11∆ (C) cells treated with VP16. D) Pairwise Pearson correlation values for 100 bp binned Top2 CC maps from all assayed conditions. Each condition is a pool of two biological replicates. E) Nucleotide-resolution S. cerevisiae Top2 CC map of chromosome 3 for all assayed conditions. Each condition is a pool of two biological replicates. F-H) The normalised number of Top2 CCs retained in the pdr1∆ (F), pdr1∆sae2∆ (G) and pdr1∆mre11∆ (H) cells treated ±VP16, after filtering to include only sites offset by the given number of base pairs. All data were normalised over a -100 to +100 bp window. I-K) Pearson correlation (r) of Top2 CC-seq signal on Watson and Crick strands, offset by the indicated distance in pdr1∆ (I), pdr1∆sae2∆ (J) and pdr1∆mre11∆ (K) cells treated ±VP16. Yeast Top2 Supplement 2 periodicity is highlighted with black and red crosses. red and black with is highlighted periodicity bottom right, respectively). (F.R.) regions of the flanking The position for here observed Top2 t and black, in indicated are into pattern is separated A+T G+C and (top individual into left right), and and A, T, left, G bottom C and (middle left plots D) respectively.black, and (F.R.) regions the flanking for here observed regions the flanking and Top2experiments, Gyrase mapping in identified ind are D) offset (Rand.) above as described analysis of the amplitudes Top2 ( CCs thresholded C) of distribution the average and expression, of gene Top2 (bot +VP16 conditions forand plotted untreated quartile each in CCs average indicates scale Top2Colour (middle). growth vegetative in level expression density.CC in stratified Motifs also are the in 5’ orientated TSSs shown are B) A) DNA 2) bent of Figure to indicative (related skews nucleotide biased have Top2 S3. Figure Average nucleotide composition over a 120 a Average120 over composition nucleotide a Average200 over composition nucleotide of sites strong that The percentage SSBsetoposide are DSBs in and of Aggregation Top2 a 10 in CCs Top2 by CC mapped CC FigureS3 at 1 - HpM linked DNAlinked in breaks C E

prior to sorting into DSB or SSB classes based on presence or absence of a 3 or absence presence on DSB tointo or SSBbased sorting prior classes Fractional base composition DSBs SSBs

Percentage of Sites 100% 20% 40% 60% 80% 0% - − − seq − 15000 MNase$Filtered.Total10000 HpM 5000 A pdr1

Rand. 10 15 20 25 30

- MNase- -seq (AU) 0 5 0 1 2 3 4 5 15 10 0 - 5 0 ∆ (raw=black, or smoothed=red) anticorrelate with nucleosome occupancy measured by measured occupancy nucleosome with anticorrelate or smoothed=red) (raw=black, Obs. ) were randomised amongst the positions in the nucleotide resolution datasets, prior to thresholding and and todatasets, thresholding prior resolution the in nucleotide the positions amongst randomised ) were mre11 pdr1 Position relative Position Top2to peakCC ( - − − − -

Rand. 400 Kbp 400 400 400 3’ of(top). all Heatmap direction TSSs the in ∆ ∆ Obs. Pos.relative to dyad axis ( S. cerevisiae − − window centred on orientated transcription start sites transcription (TSS) in orientated on centred window − - 200 200 200 200 D MNase$Pos MNase$Pos MNase$Pos Fractional base composition bp bp 0 0 0 0 0.2 0.4 0.6 0.8 0 -100 window centred on the DSB dyad axis, in etoposide the axis, on in DSB dyad centred window etoposide the axis, on in DSB dyad centred window Pos.relative to dyad axis ( Gyrase F.R. Top2 F.R. Top2 +200 200 200 200 -60 are not correlated with gene expression, anticorrelate with nucleosomes, and +400 bp 400 400 400 -20 bp ) ) 20 0 1 2 3 4 5

60 Top2 CC (HpM) (smoothed) bp 100 ) G C A T - treated data[[1]]$Pos data[[1]]$Pos B ECM31 0.05 0.10 0.15 0.20 0.25 0.07 0.08 Top20.09 CC0.10 HpM0.11 0.12 Top2 CC HpM PRK1 0.05 0.10 0.15 0.20 0.25 FAF1 0.07 0.08 0.09 0.10 0.11 0.12 High HPC2 Low Top2 CC pdr1 S. cerevisiae - - 4 4 ∆ and and ∆

− 4000Positionrelative to TSS ( − 4000 - - − 2000 2 − 2000 2 pdr1 Index Index genome, with 25 rows stratified by gene by stratified gene rows 25 with genome, 0 0 TSS ∆ 0 0 mre11 +2 +2 bp Kbp - - 2000

treated treated 2000 Expression level Expression level pdr1 2 2 3 3 ∆ ∆ 4 4 1 1 offset (Obs.). cognate As a control, ) nd nd rd rd th th st st quart. quart. quart. quart. quart. quart. quart. quart. +4 +4 S. cerevisiae

∆ 4000 4000 S. cerevisiae S. cerevisiae S. cerevisiae –VP16 +VP16 Expression Level MNase , m . Sites were hei iddle right, and iddle to to . Four example . Four example . The signal . of The position r ~10.5 ica four quartiles four quartiles tom - seq ted in blue blue in ted ). bp (blue). Human Top2 Supplement 1

Figure S4

A C 1.0010.0 -VP16 +VP16 1.5 ) 0.757.5

) -VP16 WG11 (R1)

-VP16 WG15 (R2) count

HpB 1.0 -VP16 WG23 (R3) 0.505.0 403.428793

+VP16 WG12 (R1) normHpB 54.598150 7.389056

+VP16 WG16 (R2) 1.000000Density 0.5 0.252.5 +VP16 WG24 (R3) +VP16 ( +VP16

TOP2 CC ( +VP16 WG34 (R4)

0.000.0 0 0 50 100 0 50 100 0.0 2.5 5.0 7.5 10.0 HpMp10Kb_WT_UT_WG11B_WG15B_WG23B_MAPQ10_dupfree_abBF_pooled0.00 0.25 0.50 0.75 1.00 HpMp10Kb_WT_VP16_WG12B_WG16B_WG24B_WG34A_MAPQ10_dupfree_abBF_pooled Pos. on Chr11 (Mbp) -VP16 (normHpB) D E B 0.7575 **** -VP16 R2 0.95 +VP16

) 0. 04 )

R2 0.95 R3 0.87 0.79 HpB -VP16 0.5050 Condition 0. 03 ● WT_UT_WG11B_WG15B_WG23B_MAPQ10_dupfree_abBF_pooled+VP16 ● WT_VP16_WG12B_WG16B_WG24B_WG34A_MAPQ10_dupfree_abBF_pooled

R3 0.93 0.96 R4 0.98 0.95 0.84 HpMp100Kb normHpM ● ● 0. 02

R1 R2 R1 R2 R3 TOP2 CC ( 0.2525 Pair-wise Pearson correlation values 0. 01 TOP2 CC ( TOP2 0 0 1 2 3 4 5 6 7 8 9 -9 -8 -7 -6 -5 -4 -3 -2 -1 10 00 -10 WT_UT_WG11B_WG15B_WG23B_MAPQ10_dupfree_abBF_pooledWT_VP16_WG12B_WG16B_WG24B_WG34A_MAPQ10_dupfree_abBF_pooled Watson-Crick Relative Offset (bp) -VP16Condition+VP16

Figure S4. CC-seq maps of TOP2-linked DNA breaks in Human cells are enriched by etoposide, and show high reproducibility and nucleotide accuracy (related to Figure 3) A) Broad-scale H. sapiens TOP2 CC-seq maps in individual biological replicates of RPE-1 cells ±VP16. Raw data were binned at 100 Kbp prior to plotting. Each plot is offset on the y-axis by +0.3 HpB. B) Replicate-to-replicate Pearson correlation values (r) for 10 Kbp binned TOP2 CC-seq maps of RPE-1 cells ±VP16. C) Scatter plot of -VP16 and +VP16 TOP2 CC-seq maps binned at 10 Kbp resolution. Data were first scaled according to the estimated noise fraction (Methods), and are presented in a hexagonal-binned format, where the density of overplotting is indicated by the colour scale. D) Violin plots of TOP2 CC-seq maps ±VP16 binned at 100 Kbp resolution. The inner black bar, black dot, and dotted horizontal line indicate the interquartile range, median, and expected mean Top2 CC density based on random distribution. E) The normalised number of TOP2 CCs retained in the CC-seq maps in RPE-1 cells ±VP16 after filtering to include only sites offset by the given number of base pairs. Data were normalised over a -100 to +100 bp window. Human Top2 Supplement 3 -KO controls

Figure S5

WT TOP2B-/- A B WT TOP2B-/- Asyn. S.D.

- - / / - -

G1 G1 . . 67.5% 66.5%

G2 G2 Ser. Dep. Asyn Ser. Dep. Asyn WT TOP2B WT TOP2B Count S 13.7% S 14.7%

18.3% 18.7% 10% FCS TOP2β TOP2⍺

Pon. S

G1 G1 98.5% 99.1% FCSf Count Ku80

S G2 S G2 0% 0% 1.5% 0% 0.9%

PI Intensity (AU) PI Intensity (AU)

WT G1 UT C γ-H2AX D E 2.5 γ-H2AX /DAPI 80 1.0

70 T2B G1 UT WT 2.0 VP16

- 1.0 60 . ) 4.5 WT G1 VP16 50 WT Async 40 HpM -

/ 1.0 - 30 T2B G1 VP16 2.5

TOP2B 20

TOP2 CC ( 1.0 H2AXFociper Cell +VP16

- 10 γ

WT T2B Async VP16 4.5 0 G1 - / - -VP16 -VP16 -VP16 -VP16 +VP16 +VP16 +VP16 +VP16 -WTWT TOP2B+TOP2B-/- WT-WT TOP2B+TOP2B-/- 1.0 20 25 30 35 40 45 TOP2B As ynchronousAsync. SerumG1 Depl eted Position on Chromosome 11 (Mbp)

Figure S5. CC-seq signal is TOP2-dependent A) DNA content histograms of wild type (WT) and TOP2B-/- RPE-1 cells under asynchronous (10% FCS) and serum-deprived (0% FCS) conditions, as measured by FACS following propidium iodide (PI) staining. G1, S and G2 populations are clearly present under asynchronous growing conditions. A strong G1 arrest is observed in serum deprived conditions. Percentages of cells in each of the indicated regions (red dotted brackets) are given. B) Western blots demonstrating the absence of TOP2β protein in serum deprived and asynchronous TOP2B-/- RPE-1 cells (left), and the absence of TOP2α in serum deprived wild type and TOP2B-/- RPE-1 cells (right). Ponceau S total protein loading is presented (Pon. S) for the left and right panels, and additionally a Ku80 loading control is included for the right panel. C) Immunofluorescence experiment demonstrating induction of γ-H2AX foci (green) in asynchronous (Async.) and serum-deprived (Ser. Dep.) wild type (WT) and TOP2B-/- RPE-1 cells, all co-stained with DAPI (blue). Galleries of nine cells per condition were chosen randomly using Olympus ScanR Analysis software. D) Quantification of (C). Numbers of γ-H2AX foci per cell were counted automatically using Olympus ScanR Analysis software. The mean ±SEM is reported for n= 3 biological replicate experiments. E) Broad-scale H. sapiens TOP2 CC-seq maps in asynchronous and serum-deprived wild type (WT) and TOP2B-/- RPE-1 cells -VP16 (orange) and +VP16 (green). Raw hits on Watson and Crick strands were summed and smoothed according to local signal density (Fsize=501). Human Top2 Supplement 2 FigureS6 C

0.000 data[[1]]$Pos0.002 0.004 0.006 0.008 A expression, and the average END the average and expression, END average (middle). MCF7 cells in indicates level scale The colour expression gene by stratified flanking rows 25 with Four example the in 5’ orientated TSSs shown are E) aggregated with anticorrelation D) strength, the average and interaction anchor of loop quartiles TOP2 plotted. quartile each in distribution CC RPE human in anchor loop to a chromatin assigned C) B) Crick and TOP2 CCs. motifs the on Watson respectively.strands Crick and indicates The line grey 11 a sliding with data smoothed same indicate TOP2 A) signal at TSSs TOP2 S6. Figure CC TOP2 CC (HpB) TOP2 CC HpM 0.0 2.0 4.0 6.0 8.0 Aggregation of END Aggregation Fine of Aggregation TOP2 a 1 in CCs Fine Fine Chr2: 15,944,725 Chr2: Chr19: 20,564,557 Chr19: 32,219,513 Chr16: - 400 - 400 Positionrelative to CTCF motif ( - - - scale aggregation of aggregation scale TOP2 a 800 in (red) CCs of mapping scale TOP2 (A). as in processed CTCF loci, complex three surrounding CCs scale TOP2 CC - −400 200 Watson Motifs Watson - 200 0 - Positionrelative to locus (

- −200 seq linked 5’linked DNA the on Watson termini strands, respectively. Crick and the are areas shaded Pale +200 Index 0 DSB DSB density. gene strength ofon flanking based 4 TSSsinto quartiles stratified also are 0 +400 - seq +200 - seq - Chr22: 21,111,194 Chr22: 44,378,835 Chr5: Chr20: 43,739,807 Chr20: seq 200 - 400 signal enrichment around CTCF and TSS sites, compared with END 2 3 4 1 +400 nd rd th bp st mapped DSBs in a DSBs 10 in mapped quart. quart. quart. quart. - maps of maps 200 400) CrickMotifs bp MNase ) 0 - Hi-C interaction seq

Kbp Value +200 H. sapiens bp - DSB distribution in each quartile plotted (bottom).plotted quartile each in DSB distribution D seq window centred on the CTCF on motifssubset of centred that orientated be can window TOP2Top2cc CC signal HpB (HpM) +400 Top2cc signal (HpM) Hanning 0.24 0.26 0.28 0.30 0.32 0.35 0.40 0.45 0.50 0.55 0.60 - - 400 400 signal (blue). (blue). signal B - 4000 TOP2 CC HpM Positionrelative to TSS ( CTCF - - Position relative to TSS ( to TSS Position relative 200 200 Position relative to TSS ( to TSS Position relative - - window. of CTCF the positions indicate rectangles Blue and Red 2000 Kbp Chr15: 56,538,425 Chr15: 192,178 Chr11: 32,429,883 Chr1: 3’ (top). direction A of all heatmap TSSsgenome, the in human Positionrelative to locus ( - - 400 1 cells (Darrow et al., Motifs 4 into 2016). stratified (Darrow are 1 cells bp - window centred on orientated orientated on centred window MCF7 cells. TSSs human in - proximal loci in RPE in loci proximal 200 0 0 0 Complex Loci Complex window centred on on centred window RPE TSSs human in 0 2000 +200 bp bp 200 ) ) +200 bp ) 4000 +400 bp +400 400

)

0.20 0.22 0.24 0.26 0.28 0.30

signal (AU) signal seq - Mnase signal (AU) signal seq - Mnase - Hanning SLC25A33 1 cells +VP16. Red and blue traces blue +VP16.and 1 Red cells 0.0004 data[[1]]$Pos0.0006 0.0008 0.0010 0.0012 0.0014 SPSB1 E End-Seq DSBs (HpB) KIF1B High 0.4 0.6 0.8 1.0 1.2 1.4 Low DSB Density SKI - smoothed sum of smoothed Watson - − 4000 4 Positionrelative to TSS ( - − 2000 2 Index 0 TSS - 0 1 cells, showing showing 1 cells, - seq +2

DSB DSB 2000 Kbp 2 3 4 1 nd rd th st ) quart. quart. quart. quart. 4000 +4

Expression level Expression Level D) C) Motifs are stratifiedalso into four quartiles of gene length, and the average distribution of Top2 each in quartile plot CCs all thein TSSs B) quartiles of gene length, and the average TOP2 distribution each in quartile plottedCC (bottom). genome, 25 with rows stratified by gene length (top). The average colour indicates scale TOP2 density.CC Motifs are stralso A) length, (related to Figure 2) expression level (related to Figure 5); TSS Figure S7. TSS Scatter plot of Scatter plot of human gene length and RPE Aggregation of Top2 a in 10 CCs Aggregation of TOP2 a in 10 CCs FigureS7 3e 4e 5e 6e 7e 8e C

data[[1]]$Pos A

− − TOP2− CC− HpM− − 0.05 0.10 0.15 0.20 0.25 04 04 04 04 04 04

Gene Expression log2(AU) High Low TOP2 CC S. cerevisiae 10 12 14 - 4 6 8 proximal TOP2 S. cerevisiae S. cerevisiae 5 - − 4000 4 Positionrelative to TSS ( GeneLength log2( - 2 10 genome, 25 with rows stratified by gene (top). length level average indicates scale Colour Top2 density.CC − 2000 Human Human r = = r gene gene length and gene expression during vegetative growth,correlation.little showing - Index linked DNA breaks in are humans correlated with length,gene fromindependently gene - 0.087 0 0 TSS Kbp 15 Kbp +2 window centred window on orientated transcription start sites (TSS) in bp window centred window on orientated humanin TSSs RPE

2000Kbp ) Gene Length 2 3 4 1 - 20 nd proximal Top2 ) rd th st - quart. quart. quart. quart. 1 1 gene no expression, correlation.showing 4000 +4

+VP16 5.7 20 58 4.8 Mb 4.8 Kb Kb Kb - linked DNA breaks in yeast are not strongly correlated with gene Gene Length

data[[1]]$Pos B 0.05 0.10 0.15 0.20 0.25

D Top2 CC HpM 0.05 0.10 0.15 0.20 0.25 High Gene Expression log2(AU) Low Top2 CC 10 12 14 2 4 6 8 5 - − 4000 4 Positionrelative to TSS ( GeneLength log2( - 8 2 S. cerevisiae − 2000 S. cerevisiae r = = r Index - - 0 1 cells. 1 cells. A heatmap of all thein TSSs human 0.23 TSS 10 0 +2 bp

2000Kbp 12 Gene Length ) pdr1 2 3 4 1 nd ) rd th st quart. quart. quart. quart. +4

∆ 4000 S. cerevisiae 1.1 1.1 +VP16 0.6 1.8 1.8 12.8 12.8 Kb Kb Kb Kb ted Gene Length . Heatmap of (bottom). (bottom). ati fied into 4