Sister chromatid, but not NHEJ-mediated inter-chromosomal telomere fusions, occur

independently of DNA ligases 3 and 4

Kate Liddiard,1 Brian Ruis,2 Taylor Takasugi,2 Adam Harvey,2 Kevin E. Ashelford, 1 Eric A. Hendrickson2¶ and Duncan M. Baird1¶

Supplementary materials.

Supplementary Methods

Cells

The MRC5 diploid human fibroblast cell line, retrovirally-transduced with amphotropic vectors for forced expression of HPV16 E6E7 under 0.4 mg/ml G418 selection, was generated by Capper, R

(Capper et al. 2007). MRC5HPVE6E7 Clone 1 cells were cultured in Dulbecco's minimal essential medium supplemented with 10% (v/v) fetal calf serum, 1 105 IU/l penicillin, 100 mg/l streptomycin, and 2 mM glutamine. Cells were maintained at 70 to 85% confluency, with population doubling (PD) calculated at each passage until entry into telomere-driven crisis was reached, at PD46.5. This followed a 7-day period of stable population size and the appearance of vacuolated cells.

HCT116 cell lines were cultured in McCoy's 5A medium supplemented with 10% (v/v) fetal calf serum, 1

105 IU/l penicillin, 100 mg/l streptomycin, and 2 mM glutamine. Antibiotic selection agents were also added as required (2.5 µg/ml G418 for LIG3-/- and 2.5 µg/ml G418 and 2.5 µg/ml puromycin for LIG3-/-:NC3). All cells were routinely cultured at 37◦C in 5% CO2, screened for the absence of mycoplasma, and authenticated by functional or expression assays appropriate to their specific genetic backgrounds. Cell counts were performed in triplicate using a haemocytometer.

Construction of the LIG3-/-:LIG4-/- HCT116 cell line

To generate an HCT116 cell line deficient for both LIG3 and LIG4, the LIG4 was disrupted in the

LIG3-/- cell line (expressing only the mitochondrial form of DNA ligase 3) by CRISPR/Cas9-mediated gene targeting. Briefly, cells were electroporated with a CRISPR/Cas9-GFP targeting vector containing a guide sequence specific for the DNA binding domain of the human LIG4 gene (Supplementary Fig. 12B). The

GFP-expressing cells were sorted by flow cytometry and plated onto 10 cm tissue culture plates for clonal isolation. Successful CRISPR/Cas9-mediated targeting was determined by genomic PCR of individual clones using primers that flank the CRISPR guide sequence that includes a unique MwoI restriction enzyme site.

Construction of LIG3-/-:TP53-/- HCT116 cell line

To generate an HCT116 cell line deficient for both LIG3 and TP53, sequence encoding the nuclear isoform of LIG3 was mutated in the TP53-/- HCT116 cell line obtained from Bunz (Bunz et al. 1998) by rAAV- mediated gene targeting. TP53-/- cells were infected with rAAV-LIG3-Exon1-knockin virus that included two ATG to ATC point mutations in the right homology arm to replace the second (nuclear isoform) start and a downstream in-frame ATG codon with ATC (Supplementary Fig. 2C). Correctly targeted clones were identified by PCR screening of genomic DNA and the retention of the ATC mutations was verified by

Sanger sequencing. A second round of targeting was performed using the same targeting vector after first subjecting the clone successfully-modified in the initial round to Cre recombination using an AdCre virus.

Subsequent targeting of this line resulted in an additional 4 clones bearing both ATC mutations. Each of these clones was then further subcloned.

Corroboration of the two new double knockout lines is shown in Supplementary Figs. 2B and 2C and

Supplementary Figs. 3A to 3C.

Western blotting

Whole cell lysates and nuclear extracts were prepared from HCT116 WT, LIG3-/-, LIG4-/-, LIG3-/-:TP53-/- and

LIG3-/-:LIG4-/- clones to test for effective gene targeting and reduced expression of TP53, LIG3 and LIG4 proteins, as appropriate. 30 µg of reduced extracts were resolved by SDS polyacrylamide gel electrophoresis using 4 to 20% CriterionTM TGX gels and blotting onto Immobilon-FL nylon membranes.

Mouse anti-human DNA ligase 3 (GeneTex GTX70143), rabbit anti-human DNA ligase 4 (Santa Cruz sc2832) and rabbit anti-human p53 (Santa Cruz sc6243) primary antibodies were used to detect specific immuno-reactive antigens and MW was confirmed using a pre-stained protein ladder.

In vivo plasmid end-joining assays

The in vivo plasmid end-joining assay utilizing the pEGFP-Pem1-Ad2 plasmid has been described

(Seluanov et al. 2004). Before transfection, the pEGFP-Pem1-Ad2 plasmid was digested with either HindIII to generate cohesive 4 bp overlapping ends, or I-SceI to create incompatible ends that require processing prior to re-ligation. Parental HCT116, LIG3-/-, LIG4-/-, LIG3-/-:LIG4-/- cells were transfected with either a

HindIII linearised, or I-SceI linearised pEGFP-Pem1-Ad2 plasmid using Lipofectamine 2000 (Invitrogen). A pCherry plasmid (Clontech) was co-transfected to control for transfection efficiency. Cells were allowed to repair the linearised plasmid for 24 hr after which they were harvested and assayed for total repair by flow cytometry to determine the ratio of cells that are both red (pCherry+; transfected) and green (EGFP+; repaired) compared to the total number of red cells for each transfection. The final value for each mutant is reported as a percent repair compared to parental HCT116 cells. 17p subtelomere TALEN transfections

Subconfluent HCT116 cells were nucleofected with 2.5µg of each TALEN plasmid in batches of 1 x 106 cells/cuvette in 100 µl supplemented nucleofection mix (Lonza SE Cell Line Kit L) using the Amaxa 4D nucleofector program DS-138 optimised to achieve 98% transfection efficiency with the pmaxGFP expression vector in > 80% viable HCT116 24hr post-nucleofection. Six replicates of each transfection were performed and cultured in individual wells prior to cell harvests at 48 (t48) and 120 (t120) hr post- nucleofection. At these time points, cells were photographed and replicate transfectants pooled for counting prior to pelleting for DNA extraction, or re-plating 2.5 x 105 cells for subculture to the next experimental time point. Cell yields at t48 were in the range 1.2 to 2.3 x 106 cells/line and (non-cumulative) yields at t120 were in the range 1 to 7 x 106/line.

Genomic DNA extraction

Total genomic DNA was extracted from crisis-stage (PD46.5) MRC5HPVE6E7 cells using standard Tris-HCl lysis buffer in the presence of proteinase K and RNase A, followed by phenol/chloroform purification, precipitation with sodium acetate, and solubilisation in 10 mM Tris-HCl (pH 7.5). Total genomic DNA was extracted from TALEN-treated HCT116 cells using Sigma GenEluteTM Mammalian Genomic DNA Miniprep

Kits and eluted in nuclease-free water. DNA was quantified using a nanodrop spectrophotometer to confirm a consistent quality and DNA yield for use in telomere length and fusion PCR and sequencing reactions.

Telomere length PCR

Telomere length at 17p and XpYp telomeres was examined for 2.5 ng sample input genomic DNA in 4 replicate reactions using a modified Single Telomere Length Analysis (STELA) protocol with 17p6 and

XpYpC telomere-adjacent and Teltail primers, as described (Baird et al. 2003; Britt-Compton et al. 2006;

Capper et al. 2007). Following 0.5% TAE agarose gel electrophoresis, resolved telomere length amplicons were detected by Southern blotting using a random-primed α-33P-radiolabelled telomere repeat or telomere- adjacent sequence probe in conjunction with a probe to detect the molecular weight markers.

Selected fusion amplicons were reamplified from diluted primary reactions in a second round of long-range

PCR using combinations of the same subtelomere-specific primers or internal primers where possible. The reamplified products were resolved by agarose gel electrophoresis for size confirmation and subsequent excision and gel purification for Sanger sequencing using BigDye 3.1 or 3.0 for GC-rich amplicons.

Illumina HiSeq 2000 paired-end sequencing data handling and quality control

Paired-end sequence read quality was checked prior to analysis using the Fastqc (v0.11.2; http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) software package alongside in-house tools that assessed mapping success and insert size-range. All reads were trimmed and filtered with Trimmomatic

(version 0.30 (Bolger et al. 2014)) to remove sequencing adaptors and over-represented primer sequences.

All scripts developed for the mapping and analysis of sequence data are available as Supplemental Scripts or can be downloaded from GitHub (https://github.com). The main repository can be downloaded via https://github.com/nestornotabilis/GenomeResearch_2016_scripts. This repository consists of a number of

Perl scripts, orchestrated by a Bash wrapper script. Several of the scripts call Java code which can be downloaded via https://github.com/nestornotabilis/WGP-Toolkit.

Training datasets

To establish the architecture of a genuine telomere fusion event mapped in the context of a genomic sequencing landscape, we performed paired-end sequencing of a composite sample of re-amplified and gel-purified telomere fusions previously characterised by Sanger sequencing. These complementary datasets guided the development of our bespoke mapping and filtering strategies subsequently applied to other experimental datasets, including those derived from 17p TALEN-treated HCT116 cells. The mapping and sequence validation of known fusion junctions from the Illumina paired-end data revealed that mapping quality score (mean MAPQ for paired reads), read orientation frequency and the percentage of read soft- clipping as critical indicators of authentic and unambiguous fusion events independent of read coverage.

Reads with MAPQ scores below 0 could not be assigned to unique genomic loci, hence fusion junction characteristics could not reliably be assessed. An orientation frequency of 1 was indicative of a single telomere junction, whereas lower frequencies reflected the involvement of more than one telomere or complex multi-junction events. The distribution of reads stacked up to generate a sharp edge with a degree of soft-clipping provided a distinctive and diagnostic visual manifestation of the fusion event, where the soft- clipped information represents the junction with another area of the genome that cannot, hence, be mapped in the same frame. Mapping strategies to detect telomere fusion events using paired-end sequence reads

Two mapping strategies were applied to the trimmed paired-end sequencing reads to detect two types of fusion events. One strategy was used to investigate inter-chromosomal genomic fusion events (defined here as those linkages between at least one telomere end and a non-telomeric region of the genome). The second strategy was applied to the same data to investigate intra-chromosomal telomere fusion events (as represented by reads mapping to a single subtelomeric sequence in opposite orientations, indicative of head-to-head juxtapositions). Throughout, BWA-MEM (Li and Durbin 2009) version 0.7.4 (Li, H., 2013.

Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM, arXiv:1303.3997) was used to map reads. All mappings were performed with the default parameters save for the addition of the -

M flag to mark shorter split hits as secondary for later filtering. Unless otherwise stated, all manipulations of the mapped outputs were performed using SAMtools version 0.1.19 (Li et al. 2009).

Detecting inter-chromosomal genomic fusion events

A two-step mapping strategy was used to identify reads mapping partly to subtelomeric regions (17p, 21q, or XpYp) and partly to non-telomeric chromosomal sequence and so represent inter-chromosomal fusion events.

Step one involved mapping the reads to a custom reference of telomere sequences to identify for later exclusion those telomere-derived reads not involved in inter-chromosomal fusion events. The custom reference was prepared from a multi-sequence fasta file comprising of (i) 17p family, (ii) 21q family, and (iii)

XpYp subtelomeric sequences as described (Jones et al. 2014) along with all known telomere variant repeats (TVRs). Properly mapped concordant reads - namely those where the R1 and R2 tags mapped within the expected insert size-range and with the correct orientation - were identified using the SAMtools command ‘samtools view -S -f67’ applied to the SAM file output.

Step two involved re-mapping the reads to a custom version of the hg19 human reference sequence. This reference was prepared from the unmasked human reference hg19 supplemented with the subtelomeric sequences as described by Stong et al. (Stong et al. 2014) and the 17p family, 21q family, and XpYp series of subtelomeric sequences (Jones et al. 2014). An in-house Java script, based on the Picard API (version

1.108) library, was used to filter for discordant reads mapping across chromosomal boundaries. SAMtools

(samtools view -bF 256) was then used to filter out secondary mappings. A second in-house Java script then removed all reads that might otherwise map solely within telomeric regions as identified during step one. Finally, using a custom script based around SAMtools, the output was subdivided into mappings involving (i) 17p, (ii) 21q, or (iii) XpYp and these mappings written as separate sorted and indexed BAM files for further analysis and visualisation.

Detecting intra-chromosomal telomere fusion events

A different strategy was used to identify read pairs straddling fusion sites within the same telomeric region resulting from intra-chromosomal fusion events. When mapped to the appropriate telomere reference, such straddling read pairs should map with both R1 and R2 tags orientated in the same direction (hence flagged as discordant) on the same strand with the location of the mapping tags describing the two fusion sites within the telomere.

Thus, reads were first mapped to each of the telomeric sequences (17p, 21q, or XpYp) separately. The

SAMtools view command (samtools view -bF316) was then applied to each output to filter for mapped paired-end reads where both the R1 and R2 tags were orientated on the same strand and in the forward direction. A SAMtools mpileup command (samtools mpileup -ABd100000) was next applied to the filtered output to extract per-base coverage data. From these outputs, mapping ‘hotspots’ were identified with an in-house Perl script. A mapping hotspot was classified as any region where an average coverage depth of three reads or more was achieved with no more than 10 bases separating individual reads. Hotspots should be linked together in pairs reflecting the two sides of a fusion event. These paired hotspots were described and catalogued by a separate in-house script.

Data visualisation

Indexed BAM files for all sample inter- and intra-chromosomal mapped and filtered read pairs were prepared for visualization in IGV (Robinson et al. 2011) alongside custom tracks relating to Repeatmasker

(Smit, A.F.A., Hubley, R. and Green, P. 2013-2015) defined repeat sequences and aphidicolin-induced common fragile sites (aCFS) (Fungtammasan et al. 2012). Sequence-authenticated fusion junctions were prepared as BED files for Ensembl (Cunningham et al. 2015) karyotype plots (Fig. 1B), as well as IGV alignments. Circos plots (Krzywinski et al. 2009) depicting telomere linkages with genomic locations were also generated for low-resolution data exploration. All genomic coordinates are appropriate to the Feb

2009 GRCh37/hg19 assembly reference and the robustness of the mappings was verified by cross- reference with the GRCh38 assembly when it was released. We would not anticipate any substantive changes to our results or conclusions arising from realigning reads to this newer reference since we have focussed on the high resolution investigation of sequences at the junctions between telomeres (mapped in detail with reference to Stong et al. 2014) and predominantly gene-rich well-curated regions of the genome that are not considerably altered by mapping to either reference.

Enumeration of reads

For comparison purposes, raw read numbers were normalised with respect to total read yields achieved for each sample (Supplementary Table 2B).

Identification of telomere fusion junctions

Genomic loci and associated read coverage and quality data linked to each telomere were rendered into spreadsheets for each sample. Read pairs were filtered on a MAPQ (mapping quality) value > 0 to exclude ambiguous (mapping to more than one location) or poor quality mappings. Each genomic location was subsequently interrogated by repeated mapping of reads using BLAST (Altschul et al. 1990) to ensure unique and accurate mapping (based on E-value significance and % sequence identity of alignment) to the hg19 reference. Linkages where a single unique genomic location could not be defined were excluded from analyses since junction sequence context data was considered unreliable. We observed that fusion junctions were characterised by asymmetrical read distributions with a sharp edge of soft-clipped reads

(containing mismatches with the reference sequence) to one side where the soft-clipped information represented the straddling of distant loci. We therefore extracted soft-clipped read information to precisely map and validate the junction points of each fusion event. For one MRC5HPVE6E7 and twenty-five HCT116 inter-chromosomal genomic linkages, the reads were not soft-clipped preventing us from discovering the exact nucleotides at which the fusion occurred. However, by arbitrarily assigning junction points at 50 bp from the most terminal read we were still able to extract valuable genomic context information for these junctions.

We precisely mapped junctions by re-mapping only those reads that had been soft-clipped and with the correct orientation. Unclipped reads were included in low-resolution histogram comparisons of intra - chromosomal fusions in Supplementary Fig. 5 to extract additional important positional information from these reads. Junction points in soft-clipped reads were mapped by pair-wise BLAST alignment with each individual subtelomeric sequence, as well as the hg19 modified human reference and the Ensembl nucleotide database for comprehensive characterisation of any insertions detected at telomeric junctions.

Junction analyses

The coincidence of each validated fusion junction with selected genomic features were assessed using the

UCSC Genome Browser (Kent et al. 2002) and Ensembl (Cunningham et al. 2015) based on the Feb 2009

GRCh37/hg19 assembly. Selected features included coding sequence (exons and introns), repeat features, DNase I hypersensitivity sites and transcription factor binding sites, as well as banding positions. Only RefSeq (Pruitt et al. 2014) and Ensembl curated were used for these analyses, although GenBank (Benson et al. 2005) annotations were noted in Supplementary Table 1 for reference. Significance of association with selected features was determined by Chi-squared comparison with genome average values for gene content, repetitive DNA and fragile sites according to RefSeq (Pruitt et al. 2014), RepeatMasker (Smit, A.F.A., Hubley, R. and Green, P. 2013-2015) and Tandem Repeat

Finder (TRF) databases (Benson 1999) and aCFS genome content (Fungtammasan et al. 2012).

The incidence of microhomology, insertions and resection from each individual inter-chromosomal and intra-chromosomal fusion junction was determined manually. Microhomology was defined as > 1 bp overlap in nt usage in both fusion partners and events were subsequently categorised in terms of frequency, as well as the numbers of nt involved at each junction. Insertions (< 50 bp) between apposed fusion partners were categorised as templated if > 2 nt localised duplication or sequence similarity could be determined, or untemplated if unidentified by low-stringency BLAST mapping to any reference sequences.

Resection was measured back from the telomere-proximal 17p TALEN target site at 3024 bp from the 3' terminal end of the 17p6 fusion amplifying primer. 17p chromatids extending beyond the TALEN cut site

(either uncut or extended post-cleavage) were assigned a resection value of 0. For the crisis-stage

MRC5HPVE6E7 samples, 17p chromatid resection was instead measured from the start of the telomere variant repeats at 17p 3038 bp from the 3' terminal end of the 17p6 fusion primer, with chromatids that contained telomere repeats being classified as having a resection value of 0. Incidence of the TALEN binding recognition footprint at 17p telomere fusions was calculated from all junctions occurring within 25 bp of the TALEN cleavage site.

Junction sequence content analyses To investigate the impact of genomic sequence context for the location of telomere fusion junctions and to determine whether proximal breakpoints could be identified, 500 bp of unidirectional proximal sequence relating to the genomic sequence displaced by the recombination was extracted for each validated inter- chromosomal fusion junction using a customised Perl script. These contextual sequences were analysed for GC content using Geecee (EMBOSS (Rice et al. 2000)), polyA/T (N5 and N10) and recombination motifs using FUZZNUC (EMBOSS), non-B DNA motifs using nBMST (Cer et al. 2013) (ABCC) and unsupervised motif searches using the MEME suite tools, (Bailey et al. 2009) including MEME, MAST, GOMO, TOMTOM and FIMO. More extensive explorations of local GC content were performed using both sliding window and discrete window sequence scanning with progressively-increasing size intervals using Artemis (Rutherford et al. 2000) and Geecee. Distance to the telomere of the chromosome arm harbouring each inter- chromosomal fusion junction was calculated from Ensembl genomic telomere position coordinates.

Distances were subsequently binned for histogram plots.

To investigate whether fusion junctions were more likely to occur near polyA/T motifs, all polyA and polyT stretches of five bases or greater were identified within the hg19 reference. From this dataset, the closest motif to each inter-chromosomal fusion junction was identified and the distance recorded. From these distances a histogram was constructed to describe how fusion junctions distribute around polyA/T motifs.

To see whether this distribution could be explained by chance alone, an equivalent distribution was prepared from 10,000 randomly chosen locations within high-quality regions of the hg19 reference (filtered down to 8,998 sites to remove random positions falling within low-quality polyN stretches).

Strand bias

Full-length unclipped 17p intra-chromosomal read pairs were extracted for each HCT116 sample for comparisons of de novo nucleotide changes in the long and short chromatids comprising each intra- chromosomal fusion event. Reads for all samples were first aligned to identify constitutive HCT116-specific nt changes to exclude from subsequent read error calculations. Reads were visualised as pairs in IGV to separate those mapping the shorter centromeric chromatid from those mapping the longer telomeric chromatids. Unique nt changes from the defined HCT116 17p subtelomeric reference were counted for all reads and independent error rates (expressed as errors/kb sequenced DNA) generated for the centromeric and telomeric components of each read pair.

Haplotype analysis The two MRC5HPVE6E7 alleles at the XpYp subtelomere have characteristic sequence polymorphisms that segregate to define the A and B haplotypes (Baird et al. 2003). To determine whether intra-chromosomal

XpYp telomere fusions sequenced from the MRC5HPVE6E7 crisis-stage sample were derived from a single allele or both (and thus infer sister chromatid or homologous chromosome fusion), these fusion reads were aligned in IGV to perform diagnostic characterisation of sequence at these recognised polymorphic nucleotides (Supplementary Fig. 1B).

Ontology searches

A list of all unique genes harbouring inter-chromosomal fusion junctions was compiled as an input database for ontology searching using DAVID (Huang da et al. 2009; Huang et al. 2007a) and GSEA (Mootha et al.

2003; Subramanian et al. 2005) tools. For functional annotation using DAVID, this gene list was subdivided further to examine any differential enrichment of targets derived from the MRC5HPVE6E7 samples or the samples sufficient for A- versus C-NHEJ repair. The numbers of genes from each list involved in specific cellular functions (and associated P-values) are detailed in Supplementary Table 3B. GSEA ontology searches were performed to distinguish the genes involved in each functional annotation as well as to identify genes enriched for cancer terms and binding motifs, including those for transcription factors

(Supplementary Table 3C).

Statistical analyses

All statistical analyses, including one- and two-tailed t-tests, ANOVA and Chi-square tests were performed using GraphPad Prism 6. A P value of less than 0.05 was considered statistically significant. One-tailed binomial tests were performed in R (version 3.0.2).

References for supplementary methods:

Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic local alignment search tool. Journal of molecular biology 215: 403-410. Bailey, T.L., M. Boden, F.A. Buske, M. Frith, C.E. Grant, L. Clementi, J. Ren, W.W. Li, and W.S. Noble. 2009. MEME SUITE: tools for motif discovery and searching. Nucleic acids research 37: W202-208. Baird, D.M., J. Rowson, D. Wynford-Thomas, and D. Kipling. 2003. Extensive allelic variation and ultrashort telomeres in senescent human cells. Nature genetics 33: 203-207. Benson, D.A., I. Karsch-Mizrachi, D.J. Lipman, J. Ostell, and D.L. Wheeler. 2005. GenBank. Nucleic acids research 33: D34-38. Benson, G. 1999. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research 27: 573-580. Bolger, A.M., M. Lohse, and B. Usadel. 2014. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30: 2114-2120. Britt-Compton, B., J. Rowson, M. Locke, I. Mackenzie, D. Kipling, and D.M. Baird. 2006. Structural stability and chromosome-specific telomere length is governed by cis-acting determinants in humans. Human molecular genetics 15: 725-733. Bunz, F., A. Dutriaux, C. Lengauer, T. Waldman, S. Zhou, J.P. Brown, J.M. Sedivy, K.W. Kinzler, and B. Vogelstein. 1998. Requirement for p53 and p21 to sustain G2 arrest after DNA damage. Science 282: 1497-1501. Capper, R., B. Britt-Compton, M. Tankimanova, J. Rowson, B. Letsolo, S. Man, M. Haughton, and D.M. Baird. 2007. The nature of telomere fusion and a definition of the critical telomere length in human cells. Genes & development 21: 2495-2508. Cer, R.Z., D.E. Donohue, U.S. Mudunuri, N.A. Temiz, M.A. Loss, N.J. Starner, G.N. Halusa, N. Volfovsky, M. Yi, B.T. Luke, A. Bacolla, J.R. Collins, and R.M. Stephens. 2013. Non-B DB v2.0: a database of predicted non-B DNA-forming motifs and its associated tools. Nucleic acids research 41: D94-D100. Cooper, D.N., A. Bacolla, C. Ferec, K.M. Vasquez, H. Kehrer-Sawatzki, and J.M. Chen. 2011. On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Human mutation 32: 1075-1099. Cunningham, F., M.R. Amode, D. Barrell, K. Beal, K. Billis, S. Brent, D. Carvalho-Silva, P. Clapham, G. Coates, S. Fitzgerald, L. Gil, C.G. Giron, L. Gordon, T. Hourlier, S.E. Hunt, S.H. Janacek, N. Johnson, T. Juettemann, A.K. Kahari, S. Keenan, F.J. Martin, T. Maurel, W. McLaren, D.N. Murphy, R. Nag, B. Overduin, A. Parker, M. Patricio, E. Perry, M. Pignatelli, H.S. Riat, D. Sheppard, K. Taylor, A. Thormann, A. Vullo, S.P. Wilder, A. Zadissa, B.L. Aken, E. Birney, J. Harrow, R. Kinsella, M. Muffato, M. Ruffier, S.M. Searle, G. Spudich, S.J. Trevanion, A. Yates, D.R. Zerbino, and P. Flicek. 2015. Ensembl 2015. Nucleic acids research 43: D662-669. Fungtammasan, A., E. Walsh, F. Chiaromonte, K.A. Eckert, and K.D. Makova. 2012. A genome-wide analysis of common fragile sites: what features determine chromosomal instability in the ? Genome research 22: 993-1005. Huang da, W., B.T. Sherman, and R.A. Lempicki. 2009. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature protocols 4: 44-57. Huang, D.W., B.T. Sherman, Q. Tan, J.R. Collins, W.G. Alvord, J. Roayaei, R. Stephens, M.W. Baseler, H.C. Lane, and R.A. Lempicki. 2007a. The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome biology 8: R183. Huang, D.W., B.T. Sherman, Q. Tan, J. Kir, D. Liu, D. Bryant, Y. Guo, R. Stephens, M.W. Baseler, H.C. Lane, and R.A. Lempicki. 2007b. DAVID Bioinformatics Resources: expanded annotation database and novel algorithms to better extract biology from large gene lists. Nucleic acids research 35: W169-175. Jones, R.E., S. Oh, J.W. Grimstead, J. Zimbric, L. Roger, N.H. Heppel, K.E. Ashelford, K. Liddiard, E.A. Hendrickson, and D.M. Baird. 2014. Escape from Telomere-Driven Crisis Is DNA Ligase III Dependent. Cell reports 8: 1063-1076. Kent, W.J., C.W. Sugnet, T.S. Furey, K.M. Roskin, T.H. Pringle, A.M. Zahler, and D. Haussler. 2002. The human genome browser at UCSC. Genome research 12: 996-1006. Krzywinski, M., J. Schein, I. Birol, J. Connors, R. Gascoyne, D. Horsman, S.J. Jones, and M.A. Marra. 2009. Circos: an information aesthetic for comparative genomics. Genome research 19: 1639-1645. Li, H. and R. Durbin. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754-1760. Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan, N. Homer, G. Marth, G. Abecasis, and R. Durbin. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078-2079. Mootha, V.K., C.M. Lindgren, K.F. Eriksson, A. Subramanian, S. Sihag, J. Lehar, P. Puigserver, E. Carlsson, M. Ridderstrale, E. Laurila, N. Houstis, M.J. Daly, N. Patterson, J.P. Mesirov, T.R. Golub, P. Tamayo, B. Spiegelman, E.S. Lander, J.N. Hirschhorn, D. Altshuler, and L.C. Groop. 2003. PGC- 1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nature genetics 34: 267-273. Nick McElhinny, S.A., D.A. Gordenin, C.M. Stith, P.M. Burgers, and T.A. Kunkel. 2008. Division of labor at the eukaryotic replication fork. Molecular cell 30: 137-144. Oh, S., A. Harvey, J. Zimbric, Y. Wang, T. Nguyen, P.J. Jackson, and E.A. Hendrickson. 2014. DNA ligase III and DNA ligase IV carry out genetically distinct forms of end joining in human somatic cells. DNA repair 21: 97-110. Pruitt, K.D., G.R. Brown, S.M. Hiatt, F. Thibaud-Nissen, A. Astashyn, O. Ermolaeva, C.M. Farrell, J. Hart, M.J. Landrum, K.M. McGarvey, M.R. Murphy, N.A. O'Leary, S. Pujar, B. Rajput, S.H. Rangwala, L.D. Riddick, A. Shkeda, H. Sun, P. Tamez, R.E. Tully, C. Wallin, D. Webb, J. Weber, W. Wu, M. DiCuccio, P. Kitts, D.R. Maglott, T.D. Murphy, and J.M. Ostell. 2014. RefSeq: an update on mammalian reference sequences. Nucleic acids research 42: D756-763. Rice, P., I. Longden, and A. Bleasby. 2000. EMBOSS: the European Molecular Biology Open Software Suite. Trends in genetics : TIG 16: 276-277. Robinson, J.T., H. Thorvaldsdottir, W. Winckler, M. Guttman, E.S. Lander, G. Getz, and J.P. Mesirov. 2011. Integrative genomics viewer. Nature biotechnology 29: 24-26. Rutherford, K., J. Parkhill, J. Crook, T. Horsnell, P. Rice, M.A. Rajandream, and B. Barrell. 2000. Artemis: sequence visualization and annotation. Bioinformatics 16: 944-945. Seluanov, A., D. Mittelman, O.M. Pereira-Smith, J.H. Wilson, and V. Gorbunova. 2004. DNA end joining becomes less efficient and more error-prone during cellular senescence. Proceedings of the National Academy of Sciences of the United States of America 101: 7624-7629. Stong, N., Z. Deng, R. Gupta, S. Hu, S. Paul, A.K. Weiner, E.E. Eichler, T. Graves, C.C. Fronick, L. Courtney, R.K. Wilson, P.M. Lieberman, R.V. Davuluri, and H. Riethman. 2014. Subtelomeric CTCF and cohesin binding site organization using improved subtelomere assemblies and a novel annotation pipeline. Genome research 24: 1039-1050. Subramanian, A., P. Tamayo, V.K. Mootha, S. Mukherjee, B.L. Ebert, M.A. Gillette, A. Paulovich, S.L. Pomeroy, T.R. Golub, E.S. Lander, and J.P. Mesirov. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America 102: 15545-15550.

Supplementary figure legends

Supplementary Figure 1

Supplementary Figure 1 (A) Inter-chromosomal fusions between 17p, XpYp and 21q family telomeres and diverse genomic loci mapped using Illumina HiSeq 2000 paired-end sequencing. Linkages between genomic loci and all telomere ends targeted by long-range fusion PCR primers can be mapped by paired- end sequencing of fusion amplicons derived from crisis-stage MRC5HPVE6E7 cells and 17p TALEN- transfected HCT116 cell lines. Illumina HiSeq 2000 read pairs were mapped to the human hg19 reference sequence with curated subtelomeric sequences (Stong et al. 2014), as well as to a bespoke reference constructed from the subtelomeric regions of 17p, XpYp and a composite 21q family subtelomeric sequence. Inter-chromosomal telomere-genomic fusion events were defined as those discordant and soft- clipped linkages mapping to at least one telomere end (17p, XpYp or the 21q family) and a non-telomeric genomic location. Sequence reads mapping inter-chromosomal links between genomic loci and each telomere amplified from MRC5HPVE6E7 cells undergoing a telomere-induced crisis are displayed as separate bars on the chart. The reduced number of reads mapping 17p fusions is consistent with the single amplifiable 17p allele in the MRC5HPVE6E7 clone employed (Capper et al. 2007). (B) MRC5HPVE6E7 intra- chromosomal fusions involving the XpYp telomere are pure haplotype B sister-chromatid events rather than inter-allelic fusions (Baird et al. 2003; Capper et al. 2007). The sequence of the two (A and B) MRC5HPVE6E7 haplotypes at the XpYp subtelomere is shown above the IGV (Robinson et al. 2011) representation of sequenced intra-chromosomal fusions from this sample, confirming the involvement of a single XpYp allele in the mapped events. (C) Inter-chromosomal fusion links sequenced from the HCT116 cell samples map predominantly to the 17p chromosome, indicating specificity of targeting activity of the 17p TALEN used to promote telomere fusions in these assays. Reads mapping inter-chromosomal links between genomic loci and each telomere amplified from DNA extracted from DNA repair-deficient and WT HCT116 cell lines at

48 (t48) and 120 (t120) hr post-nucleofection with 17p-specific TALEN nucleases are displayed as separate coloured bars on the chart. (D) Transfection of 17p TALEN pairs does not result in gross truncation of 17p telomeres in HCT116 cells. The length of 17p telomeres in bulk populations of HCT116 cell lines pre- transfection and 48 and 120 hr post-transfection with 17p TALEN vectors was measured by single telomere length analysis (STELA) (Baird et al. 2003) using a telomere repeat probe. (E) Examples of intra- and inter-chromosomal fusions reamplified and Sanger sequenced from all principal HCT116 lines employed are shown to illustrate the effective targeting of 17p with TALEN to induce 17p telomere fusions. The TALEN recognition site is in italics and underlined and distance from the fusion junction to the cleavage site is noted (Δ, deletion from this position). Junction insertions and microhomology are marked above and below the sequence, respectively. Distinct genomic loci are represented in different colours and annotated with their genomic locations.

Supplementary Figure 2

Supplementary Figure 2 (A) Telomere fusion amplicons generated from HCT116 cell lines harvested at 48 and 120 hr post-nucleofection with 17p TALEN pairs were subjected to Illumina HiSeq 2000 paired-end sequencing. Reads were mapped to the human hg19 reference sequence with refinement of subtelomeric sequences (Stong et al. 2014), as well as to a bespoke reference constructed from the subtelomeric regions of 17p, XpYp and a composite 21q family subtelomeric sequence. The incidence of the 17p

TALEN recognition footprint at the fusion junctions of inter- (left) and intra- (right) chromosomal fusion events sequenced is shown as a percentage of all junctions sequenced for each sample with a 95% confidence interval (CI). Incidence was defined as each junction located within 25 bp of the 17p TALEN cleavage site (and not further downstream within the 17p telomere variant or pure repeats), which represents the position of the most centromeric nucleotide of the TALEN footprint (left TALEN half-site).

(B) Generation of the new double knockout HCT116 LIG3-/-:LIG4-/- cell line using CRISPR/Cas9-mediated gene targeting. The upper diagram depicts the functional domains of the human LIG4 peptide and the

CRISPR guide and PAM sequences used to target the sequence encoding the DNA binding domain (OB- fold; oligonucleotide/oligosaccharide-binding, BRCT; BRCA1 C-terminus, XIR; XRCC4-interacting region).

The MwoI restriction enzyme recognition site used to determine the correct CRISPR/Cas9 targeting is boxed. Sequencing results from successfully-targeted subclones 3-4 and 3-16 are shown in the central panel with the parental allele and the MwoI restriction enzyme recognition site displayed above. The specific mutations detected for each allele are detailed with the corresponding nucleotides highlighted. The lower panel shows the western blot validation of these two LIG3-/-:LIG4-/- clones. Significantly-reduced expression of the protein products of the targeted alleles was determined using nuclear extracts derived from each knockout line (recorded above the lanes; particular clones indicated by #) in comparison with WT and single knockout control HCT116 cells. Primary antibodies used for detection are listed below each blot and the molecular weights of the specific reactive bands are indicated with arrowheads. The LIG3-/-:LIG4-/- clone #3-4 was used in all TALEN and telomere fusion assays contained in this study. (C) Generation of the new double knockout HCT116 LIG3-/-:TP53-/- cell line using recombinant adeno-associated virus (rAAV)- mediated gene targeting. The upper diagram depicts the functional domains of the human LIG3 peptide, including the first and second transcriptional start sites either side of the mitochondrial localization sequence (marked MLS) that produce mitochondrial or nuclear LIG3 isoforms, respectively (ZnF; zinc finger, OB-fold; oligonucleotide/oligosaccharide-binding). The rAAV vector used to target the LIG3 and replace both the second start and a proximal in-frame ATG upstream of exon 2 with ATC (marked with asterisks) is shown in the central panel (Puro; puromycin selection cassette). The sequences of 8 independent clones resulting from sequential rounds of gene targeting are shown below with the successful

ATG to ATC mutations indicated. Clones 35, 43, 82 and 94 were identified as carrying ATC mutations at both sites and were subsequently subcloned. The lower panel shows the western blot validation of three

LIG3-/-:TP53-/- subclones. Significantly-reduced expression of the protein products of the targeted alleles was determined using whole cell lysates prepared from each knockout line (recorded above the lanes; particular clones indicated by #) in comparison with WT and single knockout control HCT116 cells. Primary antibodies used for detection are listed below each blot and the molecular weights of the specific reactive bands are indicated with arrowheads. The truncated TP53 protein resulting from the targeted removal of exon 2 from the parental TP53-/- line is indicated (Bunz et al. 1998). The LIG3-/-:TP53-/- clone #35-1 was used in all TALEN and telomere fusion assays contained in this study. (D) Transfection of 17p TALEN pairs induces a reduction in bulk population viability. HCT116 WT and engineered lines compromised in A-

NHEJ (LIG3-/-) or C-NHEJ (LIG4-/-) DNA repair (Oh et al. 2014) were transfected with vectors encoding 17p

TALEN pairs or empty control vectors. Bulk population numbers were assayed 48 hr post-nucleofection and the reduction in TALEN-treated cell numbers compared with control populations is represented as the mean (and SEM of two experiments) percentage of growth recovered for each sample. (E) Cell counts of all 17p-transfected HCT116 lines used as input samples for Illumina paired-end sequencing of telomere fusion amplicons are displayed at relevant time points to highlight the differential growth responses of the

DNA repair-deficient lines to the induced 17p telomere DSBs. (F) The total numbers of filtered reads mapping 17p inter- and intra-chromosomal fusion events in HCT116 cell lines 48 hr after TALEN nucleofection were normalised to fusion PCR genomic DNA input to give a value of 17p telomere fusion frequency per diploid human genome, expressed as a bar chart. (G) The numbers of read pairs mapping inter-chromosomal telomere-genomic fusions for each sample were calculated from BAM (binary) files containing reads filtered for those discordant pairs that straddled the genomic reference and at least one subtelomeric reference sequence (blue bars). Intra-chromosomal 17p telomere fusion reads were calculated from BAM files containing reads filtered for discordant pairs mapping to a single subtelomeric sequence in opposite orientations (red bars). Read numbers for each sample were normalised to the sample-specific sequencing yields.

Supplementary Figure 3

Supplementary Figure 3 HCT116 cell lines lacking LIG4 display compromised in vitro end-joining activity.

(A) Diagram of the pEGFP-Pem1-Ad2 NHEJ reporter substrate plasmid used to test extra-chromosomal end-joining function in LIG3-/-, LIG4-/- and LIG3-/-:LIG4-/- HCT116 cells (upper panel). The vector contains an enhanced green fluorescent protein (EGFP) expression cassette driven by a cytomegaloviral (CMV) promoter and terminated by the simian virus (SV40) polyA sequence. The 5’ (EG) portion of EGFP is separated from the 3’ (FP) portion by a 2.4 kb phosphatidylethanolamine methylation gene 1 (Pem1) intron interrupted by an Ad2 (adenovirus 2) exon that is flanked by HindIII and I-SceI restriction enzyme recognition sites. Splice donor (SD) and splice acceptor (SA) are also noted. Digestion of the two HindIII sites produces compatible cohesive ends while digestion of the inverted 18 bp non-palindromic I-SceI sites produces incompatible ends (central panel). The Ad2-exon, within the Pem1 intron, is efficiently spliced into the middle of the EGFP ORF resulting in an inactivated EGFP gene product that leads to EGFP- negative cells (lower panel). Removal of the Ad2 exon by either HindIII or I-SceI results in a linearised plasmid that drives EGFP expression, upon successful intracellular recircularisation. (B) Repair of the

HindIII or I-SceI digested plasmid can be quantified as the percentage of EGFP-expressing cells (green) by flow cytometry and normalised to the overall proportion of cells successfully transfected with a pCherry control reporter plasmid (red). The indicated HCT116 cell lines were analysed by flow cytometry 24 hr post-transfection with HindIII or I-SceI linearised plasmid. Cell lines lacking LIG4 have notably reduced

EGFP+pCherry+ populations (upper right-hand quadrant). (C) Quantification of the results shown in (B), plotted as relative plasmid rejoining for HindIII (closed bars) or I-SceI (open bars). NHEJ efficiency was determined from the ratio of EGFP+ pCherry+:pCherry+ (mean and SEM for two independent experiments).

The statistical significance of the increased end-joining repair efficiency measured in LIG3-/-:LIG4-/- compared with LIG4-/- cells on HindIII (P = 0.0098) or I-SceI (P = 0.016) linearised plasmids was assessed by one-tailed unpaired t-tests with Welch's correction. (D) Inter-chromosomal telomere-genomic linkages for each sample were corroborated using BLAST (Altschul et al. 1990) to map precise fusion junction sequences. The numbers of fusion junctions localised to each human chromosome arm are plotted for each of the MRC5HPVE6E7 and 17p TALEN-transfected HCT116 cell line samples. (E) The numbers of all

HCT116 inter-chromosomal fusion junctions per chromosome were used to generate the number of junctions per Mb of DNA for each chromosome, based on hg19 human chromosome sizes obtained from

Ensembl (Cunningham et al. 2015).

Supplementary Figure 4

Supplementary Figure 4 (A) Characteristics and sequence context of inter-chromosomal fusions between

17p and diverse genomic loci sequenced from MRC5HPVE6E7 cells undergoing telomere-driven crisis or

HCT116 cells undergoing 17p TALEN-induced DNA damage and repair. Inter-chromosomal fusion junctions are located within a broad range of distances from the telomere. Distance to the telomere of the chromosome arm harbouring each inter-chromosomal fusion junction was calculated from Ensembl genomic telomere position coordinates and all events plotted as a scatter chart with a 95% CI. The mean distance to the nearest telomere of all MRC5HPVE6E7 inter-chromosomal junctions was 1.5-fold greater than for all HCT116 junctions (one-way ANOVA with Bonferroni's Multiple Comparison Test; P < 0.001). (B)

Inter-chromosomal junction locations were investigated using the UCSC Genome Browser (Kent et al.

2002) to determine coincidence with repetitive DNA features defined by RepeatMasker (Smit, A.F.A.,

Hubley, R. and Green, P. 2013-2015) tracks (LINE, SINE, LTR, DNA, simple, low complexity, satellite,

RNA) and short tandem repeats defined by Tandem Repeats Finder (TRF) (Benson 1999). No samples contained a higher proportion of fusion junctions within these repetitive DNA sequences than would be predicted by chance based on the RepeatMasker human genome gene content estimate of 46.7% (Chi- squared analysis). (C) Inter-chromosomal junction locations were investigated for coincidence with aphidicolin common fragile sites (aCFS) (Fungtammasan et al. 2012) that were used to generate a custom track for analysis in IGV (Robinson et al. 2011), Ensembl (Cunningham et al. 2015) and the UCSC Genome

Browser. No samples contained a higher proportion of fusion junctions within these aCFS than would be predicted by chance based on the aCFS human genome coverage of 15.1% (Chi- squared analysis). (D)

LIG3-/-:LIG4-/- inter-chromosomal fusion junctions are enriched for proximal non-B DNA motifs (Cooper et al. 2011) compared with WT junction-proximal sequence. 500 bp junction-proximal sequence for each inter- chromosomal fusion junction was analysed for non-B DNA motif content (A-phased, direct, inverted, mirror and short tandem repeats and G-quadruplex-forming and Z-DNA motifs) using nBMST (ABCC) (Cer et al.

2013). The fold-change over the human genome average content for all non-B DNA motifs (4.2%) was calculated with a 95% CI and samples compared by one-way ANOVA with Bonferroni's Multiple

Comparison Test to reveal a statistically significant 8.7-fold higher enrichment in all motifs for the LIG3-/-

:LIG4-/- sample than WT (P < 0.001). (E) Inter-chromosomal fusion junction-proximal sequences have divergent GC content independent of sample genotype. The percentage GC content of all MRC5HPVE6E7 and HCT116 500 bp junction-proximal sequences was determined using Geecee (EMBOSS) (Rice et al.

2000) and is plotted as a box and whiskers chart analysed by one-way ANOVA. (F) The distance from each validated crisis-stage MRC5HPVE6E7 (n = 70) and 17p TALEN-transfected HCT116 (n = 330) inter- chromosomal junction to the nearest polyA/T motif (defined as homopolymeric runs of A or T > 5) mapped throughout the genome was calculated using custom scripts in comparison with 8988 randomly-generated genomic locations (1012 out of the original 1000 loci were excluded due to lack of sequence information at these locations). The 2-fold MRC5HPVE6E7 and 3-fold HCT116 higher mean distances to nearest polyA/T compared with random genomic loci is plotted as a logarithmic scale box and whiskers chart and determined statistically-significant by one-way ANOVA with Bonferroni's Multiple Comparison Test (P <

0.001). (G) The percentages of inter-chromosomal fusion junctions for these samples within 100 bp binned distances from the nearest polyA/T motif are shown. The proportions of MRC5HPVE6E7 and HCT116 junctions within 100 bp of a polyA/T motif were significantly lower than would be expected by chance based on the 8988 random locations modelled (two-way ANOVA with Bonferroni post-tests; P < 0.001).

Supplementary Figure 5

Supplementary Figure 5 (A) Intra-chromosomal fusion junctions resolved from crisis-stage MRC5HPVE6E7 and 17p TALEN-transfected HCT116 cells are asymmetric with respect to length of contributing chromatids.

The subtelomeric distributions of all unclipped (white) and soft-clipped (black) sequencing reads mapping

17p intra-chromosomal fusions in all 17p TALEN-transfected HCT116 cells lines at t48 and t120 are shown as frequency plots with 100 bp binning. Position 0 is furthest from the start of the telomere repeats and position 3024 is the cleavage site of the 17p TALEN pair.

Supplementary Figure 6

Supplementary Figure 6 Differential processing of inter- and intra-chromosomal fusion junctions in crisis- stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 cell lines. The percentages of inter-chromosomal (A) and intra-chromosomal (B) fusion junctions for each sample displaying specific nucleotides of microhomology usage are plotted as individual bar charts. (C) The overall proportions of all inter- chromosomal (blue) and intra-chromosomal (red) fusions for each sample displaying microhomology usage

(> 1 bp sequence overlap) at the junctions is shown with a 95% CI. (D) The statistically-significant 1.36-fold increase in proportions of junctions demonstrating microhomology usage at intra-chromosomal compared with inter-chromosomal fusion junctions for all samples is shown as a box and whiskers plot analysed by one-tailed paired t-test.

Supplementary Figure 7

Supplementary Figure 7 Incidence of insertions and mutations within HCT116 intra-chromosomal telomere fusions (A) Examples of templated (> 2 nt sequence similarity) and untemplated (no BLAST nucleotide alignment) insertions (< 50 bp) at 17p intra-chromosomal fusion junctions are presented. The proximity of fusion junctions to the TALEN cleavage site in each chromatid is marked (Δ, deletion from this position), as well as the sequences and locations of insertions and microhomology. (B) Model exploring the generation of asymmetrical intra-chromosomal fusions by compound ligation of leading and lagging strand chromatids replicated by distinct DNA polymerases during DNA replication. Schematic of a DNA replication fork with high-fidelity leading strand DNA synthesis (red) mediated by DNA polymerases delta (δ) and epsilon (ε) and lagging strand (blue) replication by Okazaki fragment ligation of fragments synthesised by DNA polymerase alpha (α) and delta (δ), resulting in increased error incorporation (Nick McElhinny et al. 2008). Where the lagging strand is incompletely replicated or disproportionately resected, ligation of leading and lagging strands would result in an asymmetrical intra-chromosomal fusion product with respect to length of contributing chromatids. This hypothesised differential-strand usage would manifest as increased sequence errors (boxes inserted into blue strand) within the abridged chromatid revealed by comparison of the centromeric (blue) and telomeric (red) sequence components of each read pair mapping intra- chromosomal fusion events (illustrated in the panel below). (C) Experimental enumeration of de novo nucleotide changes within all centromeric (blue) and telomeric (red) sequence read pairs for each 17p

TALEN-transfected HCT116 sample expressed as errors/kb DNA sequenced (mean and SEM for two experimental time points). For all but the LIG3-/- samples, the shorter centromeric chromatids contain higher rates of error incorporation than the longer chromatids in each asymmetric fusion (one-tailed unpaired t-test with Welch's correction; P = 0.0524). (D) Statistical significance of the differences in all samples excluding LIG3-/- centromeric and telomeric paired chromatids was assessed by one-tailed unpaired t-test with Welch's correction (P = 0.0126).

Liddiard et al. Supplementary Figure 1

C 45

40 A 17p linkages 3 21q linkages 20 35 XpYp linkages 3 30 15 25

20 10 15

5 10 Number of filtered read pairs x10 5 5 Number of filtered read pairs x10 0 17p 21q XpYp 0 WT TP53 PARP1 LIG3 LIG4 LIG3 LIG3 LIG3 WT TP53 PARP1 LIG3 LIG4 LIG3 LIG3 LIG3 Genotype -/- -/- -/- -/-:NC3 -/- -/- -/- -/- -/- -/-:NC3 -/- -/- -/- :LIG4 :TP53 -/- :LIG4 :TP53

-/- -/- -/- -/- Time point t48hr t120hr B XpYp XpYp 1081 1241 A GGAACTGGGTTAGAGACCAGGGGCTGATGTAACGGGCTGTCCCTGGTCCCCTAAATCCCCACAGGGGAACTGGGTTAGAGATGAGGAGCTCATTTTCCGGGCTGTCCAGGTCCCCTAAATCCCAGATGGGAACTGGGTTATCGACCAGGTGCTCCTCTAGG Haplotype ||||||||||||||| ||||||||| ||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| ||||||||||| |||||| B GGAACTGGGTTAGAGTCCAGGGGCTCATGCAACGGGCTGTCCCTGGTCCCCTAAATCCCCACAGGGGAACTGGGTTAGAGATGAGGAGCTCATTTTCCGGGCTGTCCAGGTCCCCTAAATCCCAGATGGGAACTGGGTTATCAACCAGGTGCTCTTCTAGG Liddiard et al. Supplementary Figure 1

D

Time point t0hr t48hr t120hr -/- -/- -/- -/- -/- -/- -/- -/- -/- Genotype WT PARP1 LIG3 LIG4 WT PARP1 LIG3 LIG4 WT PARP1 LIG3 LIG4 17p TALEN - - - - - + - + - + - + - + - + - + - + 20 kb-

10 kb-

5 kb-

2 kb-

E

HCT116 WT inter-chromosomal fusion

17p TALEN footprint TALEN cut site Chr17q23.2 TAGCATATACATATAAAATGGAATGTTTGGCTAGGCACCCGGGCTCATACCTAAGTGGGTAAATCCTGGGTTTGTTGTCTCTGTTCTCCCCACACCTGCCCCTACGCAATCTCACTGCTCCAATGGCAGC

HCT116 TP53-/- intra-chromosomal fusion Insertion 17p Δ29bp from TALEN cut site 17p Δ2040bp from TALEN cut site GTTTAGAGATTATTCATTCTTAGCATATCCATATAAAATGGAACGATCAATCATTTCATTAGCGTACGAAAGACACATTACTTCGTA

HCT116 PARP1-/- intra-chromosomal fusion Insertion 17p Δ3bp from TALEN cut site 17p Δ2592bp from TALEN cut site TAGCATATACATATAAAATGGAATGTTTGGCTAGGCACCCGGGCTCATAAATGCTCTTTAAATGTGTCAAGGTCATAAATATGCAGGAAAGTCTGGGGA

HCT116 LIG3-/- inter-chromosomal fusion

17p Δ86bp from TALEN cut site Chr19q13.2 TCCTAAGTATAATATTGCTATTGGTATTGTTTAAGGTAGAGTAAACAGACCACTCACAGAGTGGGGAAAAAATATTCACAACTATGCATCGACAAAAGACTAATATCCAGAATTTACAAAGAACTCAAA Microhomology

HCT116 LIG4-/- intra-chromosomal fusion

17p Δ20bp from TALEN cut site 17p Δ2594bp from TALEN cut site TAGCATATACATATAAAATGGAATGTTTGGCTCTTTAAATGTGTCAAGGTCATAAATATGCAGGAAAGTCTGGGGAACAGTTCCAGG Microhomology Liddiard et al. Supplementary Figure 2

A

100 WT TP53-/- PARP1-/- 80 LIG3-/- LIG4-/- LIG3-/-:NC3 60 LIG3-/-LIG4-/- LIG3-/-TP53-/-

40

20

% fused chromatids containing 17p TALEN footprint TALEN % fused chromatids containing 17p 0 Inter-chromosomal Intra-chromosomal

B DNA Binding Adenylation OB-fold BRCT XIR BRCT

MwoI GTTCAGCACTTGAGCAAAAGTGGc CRISPR Guide sequence PAM sequence MwoI Parental Allele CAActtATAactCAGagtTCAgcaCTTgagCAAaagTGGcttATAcggATGatc Allele 1 CAActtATAactCAGagtTCAgcaCTTgagCAAa-gtGGCttaTACggaTGA 1 bp deletion Allele 2 CAActtATAactCAGagtTCAgcaCTTgagCAAaaaAGTggcTTAtacGGAtga 2 bp insertion

LIG3-/-: LIG3-/-: LIG3-/-: LIG3-/-: LIG4-/- LIG4-/- LIG4-/- LIG4-/- WT LIG3-/- LIG4-/- #3-4 #3-16 WT LIG3-/- LIG4-/- #3-4 #3-16

LIG3 LIG4 113 kDa 103 kDa

Mouse anti-human LIG3 Rabbit anti-human LIG4 C Nuclear DNA Ligase 3

MLS ZnF DNA Binding Adenylation OB-fold

Puro 2 3

ATC**

Desired sequence GAGATCGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA Clone 3 GAGATGGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA Clone 5 GAGATGGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA Clone 18 GAGATGGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATGAAA Clone 35 GAGATCGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA * Clone 43 GAGATCGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA * Clone 81 GAGATGGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA Clone 82 GAGATCGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA * Clone 94 GAGATCGCTGAGCAACGGTTCTGTGTGGACTATGCCAAGCGTGGCACAGCTGGCTGCAAAAAATGCAAGGAAAAGATTGTGAAGGGCGTATGCCGAATTGGCAAAGTGGTGCCCAATCCCTTCTCAGAGTCTGGGGGTGATATCAAA LIG3-/-: LIG3-/-: LIG3-/-: LIG3-/-: LIG3-/-: LIG3-/-: LIG3+/-: TP53-/- TP53-/- TP53-/- LIG3+/-: TP53-/- TP53-/- TP53-/- WT TP53-/- LIG3-/- TP53-/- #35-1 #43-1 #82-9 WT TP53-/- LIG3-/- TP53-/- #35-1 #43-1 #82-9

LIG3 113 kDa

TP53 55 kDa Truncated TP53

Mouse anti-human LIG3 Rabbit anti-human TP53 Liddiard et al. Supplementary Figure 2

E 22.5 WT

6 -/- D TP53 20 PARP1-/- LIG3-/- 80 17.5 LIG4-/- LIG3-/-:NC3 15 LIG3-/-:LIG4-/- LIG3-/-:TP53-/- 60 12.5

10 40 7.5

5 20 2.5 Cumulative counts of 17p TALEN-treated cells x10 TALEN-treated Cumulative counts of 17p

% recovery with 17p TALEN compared with control TALEN % recovery with 17p 0 0 -/- -/- WT LIG3 LIG4 0 48 120 Time in hr post-nucleofection with 17p TALENs

F WT LIG4-/- -/- -/-:NC3 6 TP53 LIG3 PARP1-/- LIG3-/-:LIG4-/- LIG3-/- LIG3-/-:TP53-/- 5 -3

4

3

2

per diploid genome x 10 1

0 Frequency of reads mapping 17p telomere fusions

G 3 6 inter-chromosomal linkages intra-chromosomal linkages

4

2

Number of normalised 17p-linked sequence reads x10 0 MRC5 WT TP53 PARP1LIG3 LIG4 LIG3 LIG3 LIG3 WT TP53 PARP1LIG3 LIG4 LIG3 LIG3 LIG3 Genotype -/- -/- -/- -/-:NC3-/- -/- -/- -/- -/- -/-:NC3-/- -/- HPVE6E7 -/- :LIG4 :TP53 -/- :LIG4 :TP53

-/- -/- -/- -/-

Time point t48hr t120hr Liddiard et al. Supplementary Figure 3

A

pCMV HindIII/I-SceI I-SceI/HindIII SV40 PolyA

EG Ad2 FP SD SA SD SA HindIII compatible ends I-SceI incompatible ends A AGCTT TAGGGATAA CCCTA TTCGA A ATCCC AATAGGGAT

splicing EG Ad2 FP EG Ad2 FP

repair and splicing EG FP EG FP

Green cells

B WT LIG3-/- LIG4-/- LIG3-/-:LIG4-/- 45% 56% 4% 6%

HindIII

pCherry 53% 57% 2% 7%

I-SceI

EGFP

C 150 HindIII I-SceI P=0.0098

P=0.016

100

50 Relative % end-joining end-joining % Relative

0 WT LIG3-/- LIG4-/- LIG3-/-:LIG4-/- Liddiard etal.SupplementaryFigure3 D 10 10 Numbers of sequence-verified10 inter-chromosomal fusion junctions 10 10 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 p p p p E 1 1 1 1 p q q q q 1 q p p p p 2 2 2 2 p q q q q 2 q p p p p 3 3 3 3 0.1 0.2 0.4 0.3

Number of junctions/Mb DNA p q q q q 3 q 0 p p p p 4 4 4 4 p q q q q 4 q p p p p 17 5 5 5 5 p q q q q 5 q p p p p 6 6 6 6 16 p q q q q 6 q p p p p 7 7 7 7 p q q q q 19 7 q p p p p 8 8 8 8 p q q q q 8 Chromosome containingtelomere-genomic fusionjunction q p p p p 1 9 9 9 9 LIG3 p q q q q MRC5 9 LIG3 q p p p p 10 10 10 10 TP53 11 LIG3 p 10 q q q q q -/- p p p p 11 11 11 11 :TP53 p 22 -/-:NC3 q q q q 11 HPVE6E7 q -/- p p p p -/- 12 12 12 12 Chromosome containingtelomere-genomicfusionjunction p q q q q 12 21 13 13 13 13 q q q q q 13 q 14 14 14 14 -/- q q q q 14 q 15 15 15 15 q q q q 13 15 q p p p p 16 16 16 16 p q q q q 16 12 q p p p p 17 17 17 17 p 17 q q q q q All HCT116 p p p p 18 18 18 18 14 p 18 q q q q q p p p p 19 19 19 19 p 19 q q q q 20 q p p p p 20 20 20 20 p 20 q q q q 10 q p p p p 21 21 21 21 p 21 q q q q 22 22 22 22 q q q q q 22 8 q p p p p X X X X p q q q q X q 4 10 10 10 10 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 0 2 4 6 8 p p p p 1 5 1 1 1 q q q q p p p p 2 2 2 2 q q q q 6 p p p p 3 3 3 3 q q q q p 3 p p p 4 4 4 4 q q q q p p p 18 p 5 5 5 5 q q q q p p p p 6 6 6 6 q 7 q q q p p p p 7 7 7 7 q q q q 9 p p p p 8 8 8 8 q q q q 15 p p p p LIG3 9 9 9 9 q q q q PARP1 p p p p 10 10 10 10 LIG4 2 q q q q WT -/- p p p p 11 11 11 11 :LIG4 q q q q X -/- p p p p 12 12 12 12 -/- q q q q 13 13 13 13 q q q q -/- 14 14 14 14 q q q q 15 15 15 15 q q q q p p p p 16 16 16 16 q q q q p p p p 17 17 17 17 q q q q p p p 18 p 18 18 18 q q q q p p p p 19 19 19 19 q q q q p p p p 20 20 20 20 q q q q p p p p 21 21 21 21 q q q q 22 22 22 22 q q q q p p X X q q Liddiard et al. Supplementary Figure 4

A 14 P<0.001 7 12

10

8

6

4

2 Distance from fusion junction to the nearest telomere in Mb x10

0

MRC5HPVE6E7 All HCT116 WT TP53-/- PARP1-/- LIG3-/- LIG4-/- LIG3-/-:NC3 LIG3-/-:LIG4-/- LIG3-/-:TP53-/-

Genome LIG3-/- Genome LIG3-/- B MRC5HPVE6E7 LIG4-/- C MRC5HPVE6E7 LIG4-/- All HCT116 LIG3-/-:NC3 All HCT116 LIG3-/-:NC3 WT LIG3-/-:LIG4-/- WT LIG3-/-:LIG4-/- TP53-/- LIG3-/-:TP53-/- TP53-/- LIG3-/-:TP53-/- 100 PARP1-/- 75 PARP1-/-

80

50 60

40 25

20

0 0 % inter-chromosomal fusion junctions within fragile sites % inter-chromosomal fusion junctions within repetitive DNA Liddiard et al. Supplementary Figure 4

D F

4.5 P<0.001

HPVE6E7 -/- P<0.001 4 MRC5 LIG3 All HCT116 LIG4-/- -/-:NC3 4 P<0.001 3.5 WT LIG3 1x10 TP53-/- LIG3-/-:LIG4-/- PARP1-/- LIG3-/-:TP53-/- 3 1x103 2.5

2 1x102 1.5 motif in bp (log10) over genome average

5 1 1x101 (A/T) 0.5 Mean fold-change in non-B DNA motifs/kb Mean fold-change in non-B DNA 1

0 Distance from fusion junction to nearest 1x100 Genome MRC5HPVE6E7 All HCT116

E 80 P=0.0477 P=0.0445

60

40

20 % GC content of 0.5 kb junction-proximal sequence

0 MRC5HPVE6E7 All HCT116 WT TP53-/- PARP1-/- LIG3-/- LIG4-/- LIG3-/-:NC3 LIG3-/-:LIG4-/- LIG3-/-:TP53-/-

G

P<0.001

75

50 **

25 % sample junctions within distance bin

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8

Distance from fusion junction to nearest (A/T)5 motif in kb (100 bp bins) Liddiard et al. Supplementary Figure 5

50 soft-clipped reads 50 unclipped reads 40 t48 WT 40 t120 WT

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 50 50

40 t48 TP53-/- 40 t120 TP53-/-

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 50 50

40 t48 PARP1-/- 40 t120 PARP1-/-

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 50 50

40 t48 LIG3-/- 40 t120 LIG3-/-

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 50 50

40 t48 LIG4-/- 40 t120 LIG4-/-

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 % reads within defined 100 bp dinstance bin 50 50

40 t48 LIG3-/-:NC3 40 t120 LIG3-/-:NC3

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 50 50

40 t48 LIG3-/-:LIG4-/- 40 t120 LIG3-/-:LIG4-/-

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 50 50

40 t48 LIG3-/-:TP53-/- 40 t120 LIG3-/-:TP53-/-

30 30

20 20

10 10

0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Distance along the 17p subtelomere in bp x102 Liddiard et al. Supplementary Figure 6

A Inter-chromosomal fusion junctions

50 50 50 50

40 40 40 40

30 30 30 30

20 20 20 20

10 10 10 10

0 0 0 0 0 1 2 3 4 0 1 2 3 4 0 1 2 3 4 5 6 0 1 2 3 4 5 HPVE6E7 -/- -/- MRC5 WT TP53 PARP1 50 50 50 50 50

40 40 40 40 40

30 30 30 30 30

20 20 20 20 20

10 10 10 10 10

0 0 0 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 -/- -/- -/-:NC3 -/- -/- -/- -/- LIG3 LIG4 LIG3 LIG3 :LIG4 LIG3 :TP53 % fusion junctions with defined microhomology Number of nt of microhomology at junction

B Intra-chromosomal fusion junctions

50 50 50 50

40 40 40 40

30 30 30 30

20 20 20 20

10 10 10 10

0 0 0 0 0 1 2 3 4 0 1 2 3 45 67 8 910 12 14 16 18 01 2 34 5 6 7 8 910 12 14 16 18 01234567891012 15 20 25 HPVE6E7 -/- -/- MRC5 WT TP53 PARP1 50 50 50 50 50

40 40 40 40 40

30 30 30 30 30

20 20 20 20 20

10 10 10 10 10

0 0 0 0 0 0 1 2 3 4 5 6 7 8 910 12 14 16 01 2 3 4 5 6 7 8 910 12 15 17 0 1 2 3 4 5 6 7 8 910 12 14 0 1 2 3 4 5 6 7 8 910 12 15 0 1 2 3 4 5 67 8 910 12 15 17 -/- -/- -/-:NC3 -/- -/- -/- -/- LIG3 LIG4 LIG3 LIG3 :LIG4 LIG3 :TP53

% fusion junctions with defined microhomology Number of nt of microhomology at junction

C D inter-chromosomal linkages 100 intra-chromosomal linkages

80 100 P<0.0001

60 75

40 50 20 with microhomology

0 % all sample fusion junctions 25 % fusion junctions with microhomology MRC5 WT TP53 PARP1 LIG3 LIG4 LIG3 LIG3 LIG3 Inter- Intra- -/- -/- -/- -/-:NC3 -/- -/- chromosomal chromosomal HPVE6E7 -/- :LIG4 :TP53

-/- -/- Liddiard et al. Supplementary Figure 7

A Templated Insertion 17p ∆2900bp TALEN cut site 13bp duplication 17p ∆18bp TALEN cut site CAACTTCCAGTAGTAGTACAAAGTACAAGTACAAAGTACAACTAGCCAAACATTCCATTTTATATGT Microhomology

Templated Insertion 5bp duplication 17p ∆2993bp TALEN cut site 17p ∆68bp TALEN cut site GTAGTAGTACAAAGTACAACTTGTTTCTAAACCAAACCAAATTATGAAAATTCTACCTTAA

Untemplated Insertion 17p ∆1294bp TALEN cut site 17p ∆136bp TALEN cut site GGAGATTGGCAGCAGACAACAGTCAATTGTCTGAAACGACAAGACTTAGATGAGGGAAA

Untemplated Insertion 17p ∆2773bp TALEN cut site 17p ∆25bp TALEN cut site CCAGGTAACTACCACATAACCCCTAGATTTAACATTCCATTTTATATGTATATGCTAG

B 3’

5’

DNA replication fork Leading strand synthesis High fidelity Polδ/Polε 5’

3’

Lagging strand synthesis Okazaki fragments Polα/δ 3’

5’ centromere telomere telomere centromere Asymmetric fusion molecule High error incorporation Low error incorporation

centromere telomere Distribution of sequence read pairs along 17p subtelomere

C 2 Centromeric read of pair Telomeric read of pair

1.5 D

1.5 P=0.0126 1

1

0.5 0.5 (combined samples) Number of errors per kb sequenced DNA

DN A 0 0 centromeric telomeric WT TP53 PARP1 LIG3 LIG4 LIG3 LIG3 LIG3 Number of errors per kb sequenced -/- -/- -/- -/- -/-:NC3 -/- -/- :LIG4 :TP53

-/- -/- Supplementary Table legends

Supplementary Table 1

Supplementary Table 1 Key features of all 70 inter-chromosomal fusion junctions sequenced from crisis- stage MRC5HPVE6E7 cells and all 330 inter-chromosomal fusion junctions sequenced from 17p TALEN- transfected HCT116 cell lines with differentially-compromised DNA repair functions. The telomere involved, as well as the hg19 human genomic coordinate and chromosome position of each telomere-genome fusion junction are listed. Coincidence with Ensembl (Cunningham et al. 2015) or HGNC (HUGO Gene

Nomenclature Committee) curated genes, DNA repeat features (RepeatMasker (Smit, A. F. A., Hubley,

R. and Green, P. 2013-2015) and TRF (Benson 1999)) and aphidicolin common fragile sites (aCFS)

(Fungtammasan et al. 2012) is documented in separate columns. The additional notes column provides supporting information outside of the stringent analysis criteria used.

Supplementary Table 2A

Supplementary Table 2A Key details pertaining to the Illumina paired-end sequencing methodology applied to all fusion amplicons from crisis-stage MRC5HPVE6E7 cells and HCT116 cell lines harvested at 48 and 120 hr post-nucleofection with 17p TALEN pairs. Sequence platform and library preparations are listed in columns 3 and 4 and the length of each R1 and R2 read that comprises a single read pair is listed in column 5. Total sample read yields are indicated in column 6 and reads mapping to the human hg19 reference with appended subtelomeric sequences are listed in column 8. Column 7 displays the mean fragment insert size between the R1 and R2 reads.

Supplementary Table 2B

Supplementary Table 2B Telomere fusion amplicons generated from crisis-stage MRC5HPVE6E7 cells and

HCT116 cell lines harvested at 48 and 120 hr post-nucleofection with 17p TALEN pairs were subjected to

Illumina HiSeq 2000 paired-end sequencing. Total reads for each sample are listed and were used to normalise all 17p inter- and intra-chromosomal fusion read counts. Sequence reads were mapped first to the 17p telomere (column 3) and then to the human hg19 reference sequence with refinement of subtelomeric sequences (column 6) (Stong et al. 2014). Intra-chromosomal 17p-17p telomere fusion reads

(columns 4 and 5) were defined as discordant read pairs mapping to 17p subtelomeric sequence in opposite orientations Inter-chromosomal telomere-genomic fusions were defined as discordant pairs that straddled the genomic reference and 17p subtelomeric reference sequence (columns 7 and 8). The ratio of inter-chromosomal:intra-chromosomal 17p fusion frequency was calculated from the normalised reads mapping these events in each sample.

Supplementary Table 3A

Supplementary Table 3A Identities and locations of all genes disrupted by telomere inter-chromosomal fusion events in crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 cell lines. The hg19 human genomic coordinates of all junctions contained within each gene identified are listed for each sample, as well as the HGNC (HUGO Gene Nomenclature Committee) or Ensembl (Cunningham et al. 2015) gene names, Ensembl identifiers and chromosome locations.

Supplementary Table 3B

Supplementary Table 3B Gene ontologies (GO) enriched in lists of genes disrupted by inter-chromosomal telomere fusions in crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 samples mapped using

DAVID Bioinformatics Resources 6.7 (NIAID/NIH; https://david.ncifcrf.gov/) (Huang et al. 2007a; Huang et al. 2007b). Separate enrichment searches were performed for (i) all 176 genes identified (ii)

23 MRC5HPVE6E7 genes, (iii) 22 Supra-A-NHEJ (LIG4-/- and LIG3-/-:NC3) genes, (iv) 41 Supra-C-NHEJ

(PARP1-/- and LIG3-/-) genes. The number and proportions of genes within each DAVID GO term are listed, accompanied by their associated statistical significance (P-value) and Benjamini-Hochberg multiple testing corrected values.

Supplementary Table 3C

Supplementary Table 3C Enrichment of DNA binding motifs defined by the Molecular Signatures Database

(MSigDB v. 5.0) (Subramanian et al. 2005) within lists of all 176 genes disrupted by inter-chromosomal telomere fusions in crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 identified using Gene Set

Enrichment Analysis (GSEA v.2.2.0; Broad Institute) (Subramanian et al. 2005). The overlaps between this experimental list of 176 genes and the genes within each MSigDB gene set are listed in column 4 and the statistical significance of association (P-value) and false discovery rate (q-value) in columns 5 and 6, respectively.

Supplementary Table 4

Supplementary Table 4 17p subtelomeric coordinates and sequences of all intra-chromosomal fusion junctions mapped using soft-clipped Illumina HiSeq 2000 paired-end sequence reads from crisis-stage

MRC5HPVE6E7 and 17p TALEN-treated HCT116 cells. Subtelomeric positions are numbered from the 3' terminus of the 17p6 fusion primer towards the telomere; the 17p TALEN cleavage site is at position 3024 and 17p telomere variant repeats begin at position 3038. Junction coordinates detail the precise fusion point of the two 17p chromatids (chromatid 1:chromatid 2), with a designation of 'telomere' awarded to all locations within pure telomere repeats. Minimal identifying junction sequence is also provided with chromatid 1 and any inserted sequence in upright font and chromatid 2 in italics.

Supplementary Table 1:

Features of 70 inter-chromosomal fusion junctions sequenced from crisis-stage MRC5HPVE6E7 and 330 inter-chromosomal fusion junctions sequenced from 17p TALEN-treated DNA repair-deficient HCT116 cell lines at t48 and t120 post-transfection

Telomere Genomic coordinate of HGNC symbol or Ensembl Coincident DNA Chromosome Coincident Sample Intron or exon Additional notes fused junction gene annotation Repeat feature location aCFS fragile site Crisis-stage MRC5HPVE6E7 17p chr1:15978720 DDI2 INTRON SINE 1p36.21 17p chr1:201037553 CACNA1S INTRON SINE 1q32.1 17p chr11:63431645 ATL3 INTRON 11q13.1 17p chr13:44803342 SMIM2-AS INTRON LINE 13q14.11 anti-sense transcript 17p chr13:44803396 SMIM2-AS INTRON LINE 13q14.11 anti-sense transcript 17p chr14:102510316 DYNC1H1 EXON 14q32.31 17p chr14:52478136 NID2 INTRON 14q22.1 17p chr16:78394880 WWOX INTRON SINE 16q23.1 17p chr18:26917657 CTD-2515C13.1 INTRON 18q12.1 17p chr2:114363749 LINE 2q13 GenBank X92108 mRNA exon 17p chr4:121600160 LINE 4q27 17p chr4:39777825 UBE2K INTRON 4p14 17p chr8:135550045 ZFAT INTRON LINE 8q24.22 17p chr8:19477341 CSGALNACT1 INTRON 8p21.3 21q chr1:15978977 DDI2 INTRON SINE 1p36.21 21q chr1:188075973 1q31.1 FRA1K 21q chr11:58010338 EIF4A2P3 EXON 11q12.1 21q chr12:104853154 CHST11 INTRON 12q23.3 21q chr13:105060062 SINE 13q33.2 GenBank JD472513 mRNA exon 21q chr13:66068304 13q21.32 21q chr14:102509957 DYNC1H1 INTRON 14q32.31 Exon of one Ensembl splice form 21q chr14:105465985 C14orf79 INTRON 14q32.33 21q chr14:105465922 C14orf79 INTRON 14q32.33 21q chr14:52478618 NID2 INTRON SINE 14q22.1 21q chr17:51404868 SINE 17q22 21q chr18:26915207 CTD-2515C13.1 INTRON 18q12.1 21q chr19:32553337 DNA 19q13.11 21q chr2:114363891 LTR 2q13 21q chr20:62918877 PCMTD2 INTRON Simple 20q13.33 GenBank BC032332 mRNA 21q chr21:36085143 CLIC6 INTRON 21q22.12 21q chr4:19147762 4p15.31 FRA4D 21q chr4:39777663 UBE2K INTRON 4p14 21q chr4:62150579 LPHN3 INTRON Low complexity 4q13.1 21q chr6:102229938 GRIK2 INTRON LINE 6q16.3 21q chr6:119558701 MAN1A1 INTRON 6q22.31 21q chr7:131159323 MKLN1 INTRON 7q32.3 FRA7H 21q chr7:66383494 SINE 7q11.21 GenBank JD364510 mRNA 21q chr8:135550320 ZFAT INTRON LINE 8q24.22 21q chr8:60148617 8q12.1 21q chr9:23106285 LINE 9p21.3 21q chrX:44244278 Xp11.3 XpYp chr1:155626165 MSTO1 INTRON LTR 1q22 GenBank JD534722, JD452430, JD367534 mRNAs XpYp chr1:188076126 LINE 1q31.1 FRA1K XpYp chr11:57341038 SINE 11q12.1 XpYp chr11:58010218 EIF4A2P3 EXON 11q12.1 XpYp chr12:104853460 CHST11 INTRON 12q23.3 XpYp chr13:44803396 SMIM2-AS INTRON LINE 13q14.11 anti-sense transcript XpYp chr13:44803342 SMIM2-AS INTRON LINE 13q14.11 anti-sense transcript XpYp chr13:83613646 LTR 13q31.1 XpYp chr13:83613576 LTR 13q31.1 XpYp chr14:20679594 SINE 14q11.2 GenBank JD452621 mRNA XpYp chr14:20679509 SINE 14q11.2 XpYp chr14:52478136 NID2 INTRON 14q22.1 XpYp chr17:51405080 Simple 17q22 XpYp chr18:26917657 CTD-2515C13.1 INTRON 18q12.1 XpYp chr2:114363757 LINE 2q13 GenBank X92108 mRNA exon XpYp chr21:17754221 LINC00478 and MIR99AHG INTRONS 21q21.1 lincRNA and miRNA XpYp chr4:137864375 RP11-138I17.1 INTRON LINE 4q28.3 XpYp chr4:39777825 UBE2K INTRON 4p14 XpYp chr4:62150846 LPHN3 INTRON 4q13.1 XpYp chr4:64876248 4q13.1 XpYp chr4:64876342 4q13.1 XpYp chr6:102230230 GRIK2 INTRON LINE 6q16.3 XpYp chr6:119558559 MAN1A1 INTRON 6q22.31 XpYp chr7:131159016 MKLN1 INTRON LINE 7q32.3 FRA7H XpYp chr8:135550045 ZFAT INTRON LINE 8q24.22 XpYp chr8:19477341 CSGALNACT1 INTRON 8p21.3 XpYp chr8:60148778 DNA 8q12.1 XpYp chr9:18081169 9p22.2 XpYp chrX:44244559 LTR Xp11.3 t48 HCT116 WT 17p chr11:26457598 ANO3 INTRON LINE 11p14.2 FRA11D 17p chr4:181932570 LINE 4q34.3 t48 HCT116 TP53-/- 17p chr1:205423723 LEMD1 INTRON 1q32.1 17p chr1:205423834 LEMD1 INTRON 1q32.1 17p chr19:3423228 NFIC INTRON 19p13.3 17p chr19:20007283 LTR and Microsatellites 19p12 Genbank JD380087 mRNA 17p chr1:248063013 LTR 1q44 FRA1I 17p chr1:31324196 1p35.2 17p chr1:31324319 SINE 1p35.2 17p chr12:121984196 KDM2B INTRON Simple Tandem 12q24.31 FRA12E 17p chr12:121984105 KDM2B INTRON 12q24.31 FRA12E 17p chr19:50945470 MYBPC2 EXON Microsatellites 19q13.33 17p chr12:2748188 CACNA1C INTRON LINE 12p13.33 17p chr13:113266239 LINE 13q34 17p chr19:247341 Satellite 19p13.3 17p chr13:113266356 LINE 13q34 17p chr13:84305490 13q31.1 17p chr13:84305557 13q31.1 17p chr14:65156322 14q23.3 FRA14B 17p chr8:119762719 8q24.12 FRA8C Genbank AY423624 mRNA 17p chr14:65212590 PLEKHG3 INTRON DNA 14q23.3 FRA14B 17p chr14:65212728 PLEKHG3 INTRON 14q23.3 FRA14B 17p chr17:29226407 TEFM EXON 17q11.2 17p chr17:29226530 TEFM EXON 17q11.2 17p chr13:84305556 13q31.1 17p chr2:160097324 WDSUB1 INTRON 2q24.2 17p chr2:160097076 WDSUB1 INTRON LINE 2q24.2 17p chr2:162573855 SLC4A10 INTRON LINE 2q24.2 17p chr2:162574561 SLC4A10 INTRON LINE 2q24.2 17p chr20:2790107 SINE 20p13 Genbank JD531480 mRNA 17p chr20:2790396 LTR 20p13 Genbank JD531480 mRNA 17p chr3:113455015 NAA50 INTRON 13q13.2 17p chr8:38088087 DDH2 INTRON SINE 8p11.23 17p chr3:146494387 Microsatellites 3q24 17p chr5:166423353 SINE and Simple Tandem5q34 17p chr5:166423422 DNA 5q34 17p chr3:53045108 SFMBT1 INTRON 3p21.1 17p chr3:53090675 RP11-894J14.5 INTRON 3p21.1 NMD transcript 17p chr5:59048661 PDE4D INTRON 5q12.1 17p chr6:67846850 Simple and Simple Tandem6q12 17p chr6:49502383 LINE 6p12.3 17p chr7:9985814 7p21.3 17p chr8:119762735 LTR 8q24.12 FRA8C Genbank AY423624 mRNA 17p chr9:127644716 GOLGA1 INTRON 9q33.3 17p chr9:127644881 GOLGA1 INTRON 9q33.3 17p chr16:28,878,650 SH2B1 INTRON 16p11.2 21q chr4:94537771 GRID2 INTRON Low complexity 4q22.2 FRA4F 21q chr4:94538210 GRID2 INTRON 4q22.2 FRA4F 21q chr2:162575323 SLC4A10 INTRON LINE 2q24.2 t48 HCT116 PARP1-/- 17p chr1:90245629 RP11-302M6.4 INTRON SINE 1p22.2 FRA1D NMD transcript 17p chr1:90245755 RP11-302M6.4 INTRON SINE 1p22.2 FRA1D NMD transcript 17p chr1:92605603 BTBD8 INTRON SINE 1p22.1 FRA1D 17p chr11:32803469 NUP98 INTRON 11p15.4 FRA11E 17p chr11:65138671 RP11-86708.5 INTRON 11q13.1 Proximal to POLA2 at chr11:65073060 17p chr13:54694030 LINC00458 INTRON 13q14.3 lincRNA 17p chr17:32650031 LTR 17q12 17p chr20:61382288 NTSR1 INTRON SINE 20q13.33 21q chr1:236108968 LINE 1q42.3 17p chr1:236108948 SINE 1q42.3 17p chr3:45125764 CDCP1 EXON 3p21.31 17p chr3:45126088 CDCP1 EXON LTR 3p21.31 17p chr9:136219546 SURF1 INTRON 9q34.2 17p chr9:136219837 SURF1 INTRON 9q34.2 17p chr5:158729149 LTR 5q33.3 17p chr5:158729202 LTR 5q33.3 17p chr2:121916029 LINE 2q14.2 t48 HCT116 LIG3-/- 17p chr10:98612890 LCOR INTRON SINE 10q24.1 17p chr11:1083184 MUC2 EXON 11p15.5 17p chr10:98612754 LCOR INTRON 10q24.1 17p chr11:75176251 GDPD5 INTRON 11q13.4 17p chr2:70587493 2p13.3 17p chr5:31122528 RP11-152KL.2 INTRON 5p13.3 anti-sense transcript 17p chr17:72939724 OTOP3 EXON 17q25.1 17p chr1:196147521 LTR 1q31.3 FRA1K 17p chr6:32102727 6p21.32 Genbank JD364472 mRNA 21q chr15:73141660 LTR 15q24.1 21q chr1:116435020 1p13.1 t48 HCT116 LIG4-/- 17p chr12:111322873 CCDC63 INTRON 12q24.11 FRA12E 17p chr12:111324228 CCDC63 INTRON 12q24.11 FRA12E 17p chr17:63908194 CEP112 INTRON LTR 17q24.1 17p chr4:135898926 4q26.3 XpYp chr13:35365201 13q13.2 FRA13A t48 HCT116 LIG3-/-:NC3 17p chr10:71866448 H2AFY2 INTRON SINE 10q22.1 FRA10D 17p chr19:46258100 AC074212.3 INTRON 19q13.32 17p chr3:70598952 LINE 3p13 17p chr3:70600987 LTR 3p13 17p chr7:50063983 ZFBP INTRON LINE 7p12.2 17p chr9:139748209 MAMDC4 INTRON 9q34.3 17p chr15:85386692 ALPK3 INTRON LINE 15q25.3 17p chr19:6394450 CTB-180A7.8 INTRON LINE 19p13.3 17p chr19:6394512 CTB-180A7.8 INTRON SINE 19p13.3 t48 HCT116 LIG3-/-:LIG4-/- 17p chr2:184146861 2q32.1 FRA2H 17p chr8:18513889 PSD3 INTRON 8p22 17p chr11:32543544 Simple 11p13 FRA11E 21q chr17:39958934 LEPREL4 EXON SINE 17q21.2 21q chr15:89428980 HAPLN3 INTRON SINE 15q26.1 21q chr1:184853591 FAM129A INTRON SINE 1q25.3 t48 HCT116 LIG3-/-:TP53-/- 17p chr1:24291140 SRSF10 EXON Simple Tandem 1p36.11 17p chr1:24293473 SRSF10 EXON Simple Tandem 1p36.11 17p chr1:27271348 NUDC INTRON 1p36.11 17p chr1:179363129 AXDND1 EXON 1q25.2 17p chr1:179364353 AXDND1 INTRON 1q25.2 17p chr2:49561344 2p16.3 17p chr2:49561629 2p16.3 17p chr4:70714934 SULTIE1 EXON 4q13.3 17p chr21:22082117 21q21.1 17p chr4:70715319 SULTIE1 INTRON 4q13.3 17p chr5:128011517 SLC27A6 INTRON 5q23.3 17p chr5:137738851 KDM3B INTRON LINE 5q31.2 17p chr5:137739212 KDM3B INTRON SINE 5q31.2 17p chr5:176732265 PRELID1 INTRON SINE 5q35.3 17p chr11:66781086 SYT12 INTRON 11q13.2 17p chr13:73800143 DNA 13q22.1 17p chr13:73801082 13q22.1 17p chr19:56480469 NLRP8 INTRON 19q13.43 17p chr20:2372026 TGM6 INTRON 20p13 17p chr20:2372375 TGM6 INTRON LINE 20p13 17p chr20:5625692 LINE 20p12.3 17p chr22:39713390 RPL3 EXON 22q13.1 17p chr7:26239040 HNPNPA2B1 INTRON 7p15.2 17p chr22:39713924 RPL3 INTRON 22q13.1 17p chrX:19767350 SH3KBP1 INTRON Xp22.12 17p chrX:32899288 DMD INTRON Xp21.1 17p chrX:32899358 DMD INTRON Xp21.1 17p chr5:50794766 LINE 5q11.2 centromere-adjacent 17p chr3:163236150 SINE 3q26.1 17p chr3:163236245 SINE 3q26.1 17p chr14:23496362 PSMB5 INTRON SINE 14q11.2 17p chr16:55791889 LINE 16q12.2 17p chr16:55793321 LINE 16q12.2 17p chr10:100540984 HPSE2 INTRON 10q24.2 17p chr10:100541189 HPSE2 INTRON 10q24.2 17p chr9:11252041 LINE 9p23 17p chr9:68499339 DNA and Simple Tandem9q13 17p chr9:66833179 Satellite and Simple Tandem9q13 17p chr15:78863110 CHRNA5 INTRON 15q25.1 17p chr21:10017301 21p11.2 17p chr9:13168100 MPDZ INTRON 9p23 17p chr17:46989758 UBE2Z INTRON 17q21.32 17p chr1029112710 10p12.1 t120 HCT116 WT 17p chr1:233080282 LTR 1q42.2 17p chr16:81707652 CMIP INTRON 16q23.3 FRA16D 17p chr16:81707324 CMIP INTRON LTR 16q23.3 FRA16D 17p chr17:33810356 SLFN12L INTRON LTR 17q12 17p chr1:233080086 1q42.2 17p chr17:33810217 SLFN12L INTRON LTR 17q12 17p chr18:41206437 18q12.3 17p chr19:39567651 SINE 19q13.2 17p chr7:152039382 KMT2C INTRON 7q36.1 FRA7I 21q chr11:64428465 NRXN2 EXON 11q13.1 t120 HCT116 TP53-/- 17p chr1:208008668 C1orf132 INTRON 1q32.2 lincRNA 17p chr1:208008949 C1orf132 INTRON 1q32.2 lincRNA 17p chr1:21960825 RAP1GAP INTRON 1p36.12 17p chr1:169170478 NME7 INTRON SINE 1q24.2 17p chr1:21960420 RAP1GAP INTRON 1p36.12 17p chr1:26396842 SINE 1p36.11 17p chr1:26398154 AL391650.1 INTRON SINE 1p36.11 miRNA 17p chr1:26398238 1p36.11 17p chr1:26679384 AIM1L INTRON 1p36.11 17p chr11:122763040 C11orf63 INTRON SINE 11q24.1 17p chr11:122766057 C11orf63 INTRON 11q24.1 17p chr11:43440447 TTC17 INTRON 11p12 17p chr11:31371068 DCDC1 INTRON 11p13 FRA11E 17p chr11:43439924 TTC17 INTRON 11p12 17p chr12:76234029 RNF41 INTRON SINE 12q21.2 lincRNA 17p chr12:76240715 RNF41 INTRON 12q21.2 lincRNA 17p chr13:65792955 LTR 13q21.32 17p chr13:65793190 13q21.32 17p chr8:92372990 SLC26A7 INTRON 8q21.3 17p chr17:73104253 17q25.1 17p chr17:73103992 SINE 17q25.1 Genbank JD434969 mRNA 17p chr17:74092305 EXOC7 INTRON 17q25.1 17p chr19:38901904 RASGRP4 INTRON 19q13.2 17p chr3:169820733 PHC3 INTRON 3q26.2 17p chr3:169821495 PHC3 INTRON DNA 3q26.2 17p chr4:1903182 WHSC1 INTRON 4p16.3 17p chr4:1901424 WHSC1 INTRON 4p16.3 17p chr7:151691713 GALNTL5 INTRON LINE 7q36.1 FRA7I 17p chr8:15216905 8p22 17p chrX:46714213 RP2 INTRON SINE Xp11.23 17p chr6:43011461 CUL7 and KLC4 INTRONS 6p21.1 17p chr6:43011611 CUL7 and KLC4 INTRONS SINE 6p21.1 17p chr2:151149320 AC016682.1 INTRON 2q23.3 17p chr1:36836337 STK40 INTRON 1p34.3 21q chr21:44772891 LTR 21q22.3 t120 HCT116 PARP1-/- 17p chr1:4173892 SINE 1p36.32 FRA1A 17p chr12:115867338 12q24.21 Genbank JD362661 mRNA 17p chr12:66360096 12q14.3 17p chr17:76354662 SOCS3 EXON 17q25.3 17p chr12:66359974 HMGA2 EXON 12q14.3 17p chr14:85206413 14q31.3 17p chr14:85209778 14q31.3 17p chr16:89631043 RPL13 EXON SINE 16q24.3 17p chr16:89631887 RPL13 EXON 16q24.3 17p chr17:2979981 17p13.3 Genbank BC172308 mRNA ORID4 17p chr17:6892388 AC027763.2 and ALOX12-AS1INTRON SINE 17p13.1 17p chr17:36559012 SOCS7 EXON 17q12 17p chr17:58537143 APPBP2 INTRON SINE 17q23.2 17p chr2:43365782 LTR 2p21 17p chr21:35464287 SLC5A3 and MRPS6 INTRON 21q22.11 17p chr18:34096762 FHOD3 INTRON 18q12.2 FRA18A 17p chr4:123433975 4q27 17p chr4:123434049 4q27 17p chr1:712171 RP11-206L10.2 INTRON SINE 1p36.33 FRA1A 17p chr4:79574267 LINC01094 INTRON SINE 4q21.21 lincRNA 17p chr4:79574800 LINC01094 INTRON LTR 4q21.21 lincRNA 17p chr5:176808447 LINE 5q35.5 17p chr7:31095764 ADCYAP1R1 INTRON 7p14.3 17p chr19:6434192 SINE 19p13.3 Genbank JD362661 mRNA 17p chr19:6434350 LTR 19p13.3 Genbank JD362661 mRNA 21q chr11:111896100 DLAT EXON 11q23.1 FRA12E 21q chr8:133085554 HHLA1 INTRON 8q24.22 21q chrX:98744344 XRCC6P5 INTRON LINE Xq22.1 FRAXC t120 HCT116 LIG3-/- 17p chr11:107687318 SLC35F2 INTRON 11q22.3 17p chr13:107234605 LTR 13q33.3 17p chr4:89715874 FAM13A INTRON 4q22.1 FRA4F 17p chr13:107235329 LTR 13q33.3 17p chr16:2140355 PKD1 EXON 16p13.3 17p chr16:2140216 PKD1 and miR1225 INTRON and EXON 16p13.3 miRNA 17p chr16:3235762 LINE 16p13.3 Genbank AF338196 mRNA 17p chr17:1977456 SMG6 INTRON SINE 17p13.3 17p chr17:21285018 SMG6 INTRON SINE 17p13.3 17p chr17:3723625 C17orf85 INTRON 17p13.2 17p chr17:4630066 SINE 17p13.2 Genbank JD410063 mRNA 17p chr17:4629856 LINE 17p13.2 17p chr22:42252650 SREBF2 EXON 22q13.2 17p chr5:166458099 5q34 17p chr6:43037144 KLC4 INTRON SINE 6p21.1 17p chr7:111315617 7q31.1 17p chr8:140779263 TRAPPC9 INTRON 8q24.3 FRA8D 17p chr8:140779460 TRAPPC9 INTRON SINE 8q24.3 FRA8D 17p chrX:41206622 DDX3X EXON Simple Xp11.4 17p chr11:34497285 11p13 FRA1E 21q chr11:58411940 GLYAT INTRON DNA 11q12.1 t120 HCT116 LIG4-/- 17p chr14:80956647 CEP128 INTRON 14q31.1 17p chr14:80957623 CEP128 INTRON 14q31.1 17p chr16:20712863 ACSM3 INTRON LINE 16p12.3 17p chr16:20719063 ACSM3 INTRON LINE 16p12.3 t120 HCT116 LIG3-/-:NC3 17p chr1:167109640 SINE 1q24.1 17p chr1:33264009 YARS INTRON 1p35.1 17p chr1:33264223 YARS INTRON SINE 1p35.1 17p chr14:92376909 FBLN5 INTRON 14q32.12 17p chr1:50953111 FAF1 INTRON LINE 1p32.3 17p chr12:95392590 NDUFA12 INTRON LINE 12q22 17p chr12:95392689 NDUFA12 INTRON LINE 12q22 17p chr16:88230816 16q24.2 17p chr16:84347911 WFDC1 INTRON 16q24.1 17p chr19:50097317 PRR12 INTRON 19q13.33 17p chr22:19921607 TXNRD2 INTRON 22q11.21 17p chr22:22052426 YPEL1 EXON 22q11.21 17p chr3:9437280 SETD5-AS1 EXON and INTRON 3p25.3 anti-sense transcript 17p chr6:136878873 MAP3K5 EXON 6q23.3 17p chr6:159152538 SYTL3 INTRON LTR 6q25.3 17p chr8:22944386 TNFRSF10c INTRON LINE 8p21.3 17p chr8:22944735 TNFRSF10c INTRON 8p21.3 XpYp chr6:136876051 SINE 6q23.3 XpYp chr6:136876411 6q23.3 XpYp chr22:19922030 TXNRD2 INTRON LTR 22q11.21 XpYp chr1:50943506 FAF1 INTRON 1p32.3 XpYp chr1:167109721 1q24.1 21q chr14:92377206 FBLN5 INTRON 14q32.12 t120 HCT116 LIG3-/-:LIG4-/- 17p chr16:65606589 LINC00922 INTRON 16q21 lincRNA 17p chr16:65606939 LINC00922 INTRON 16q21 lincRNA 17p chr16:66064467 16q21 17p chr16:51424936 16q12.1 17p chr16:51425042 16q12.1 21q chr6:17092209 SINE 6p22.3 21q chr10:38174141 SINE 10p centromere Genbank JD428778 exon 21q chr4:65629817 SINE 4q13.1 t120 HCT116 LIG3-/-:TP53-/- 17p chr1:63986165 ITGB3BP INTRON 1p31.3 FRA1L 17p chr1:64104767 PGM1 INTRON 1p31.3 FRA1L 17p chr3:128526843 RAB7A INTRON SINE 3q21.3 17p chr12:133598101 SINE 12q24.33 FRA12E 17p chr3:181967503 LINE 3q26.33 17p chr4:78115944 CCNG2 INTRON 4q21.1 17p chr4:88942405 PKD2 INTRON 4q22.1 FRA4F 17p chr4:88942895 PKD2 INTRON 4q22.1 FRA4F 17p chr7:91628652 AKAP9 INTRON SINE 7q21.2 FRA7E 17p chr7:91628459 AKAP9 INTRON MamRep605 (likely LTR)7q21.2 FRA7E 17p chr8:145126788 SINE 8q24.3 FRA8D GenBank JD514440 mRNA 17p chr11:80957241 LTR 11q14.1 17p chr11:80957370 11q14.1 17p chr12:69469991 LINE 12q15 17p chr16:85794772 DNA 16q24.1 17p chr16:85796256 16q24.1 17p chr17:58958091 BCAS3 INTRON 17q23.2 17p chr19:12952782 MAST1 and HOOK2 INTRONS 19p13.2 17p chr19:12953042 MAST1 and HOOK2 INTRONS 19p13.2 17p chrX:14547908 GLRA2 EXON Xp22.2 17p chr18:64392060 18q22.1 17p chr18:64392351 18q22.1 17p chr8:106070965 RP11-127H5.1 INTRON LINE 8q22.3 17p chr16:59647968 16q21 17p chr5:109795987 TMEM232 INTRON DNA 5q22.1 17p chr16:59647905 16q21 17p chr1:33300876 S100PBP EXON LINE 1p35.1 17p chr1:33300997 S100PBP EXON 1p35.1 17p chr2:35998799 2p22.3 17p chr3:145309692 3q24 17p chr3:145309709 3q24 17p chr22:33946960 LARGE INTRON LINE 22q12.3 17p chr21:23270548 21q21.1 17p chr21:23270569 21q21.1 17p chr10:107398659 LTR 10q25.1 17p chr10:107398685 LTR 10q25.1 17p chr14:78430874 SINE 14q24.3 GenBank JD41770 exon (and others) 17p chr10:35310372 CUL2 INTRON Simple 10p11.21 17p chr4:169421799 PALLD and DDX60L INTRONS SINE 4q32.3 17p chr10:35310458 CUL2 INTRON Simple 10p11.21 17p chr17:11069096 LINE 17p12 GenBank BC018500 exon 17p chr10:42596996 Satellite and Simple Tandem10q11.21 centromere-adjacent 17p chr20:61420134 Simple Tandem 20q13.33 17p chr12:66707833 MAP2K1 INTRON 15q22.31 FRA15A 17p chr12:69469708 12q15 17p chr5:90277489 GPR98 INTRON DNA 5q14.3 17p chr5:90277504 GPR98 INTRON DNA 5q14.3 17p chr14:88669322 LINE 6q15 FRA6G 17p chr10:101640086 14q32.31 17p chr10:63139493 LINE 10q21.2 17p chr6:140749025 LINE 6q24.1 17p chr6:146376502 GRM1 INTRON 6q24.3 17p chr8:104446775 DCAF13 INTRON LINE 8q22.3 17p chr7:150195823 LINE 7q36.1 FRA7I 17p chr10:42387737 Satellite and Simple Tandem10q11.21 17p chr18:9818872 RAB31 INTRON LINE 18p11.22 17p chr15:66708077 MAP2K1 INTRON 15q22.31 FRA15A 17p chr7:6097835 EIF2AK1 INTRON SINE 7p22.1 FRA7B 17p chr18:73364221 18q23 21q chr16:15946202 MYH11 INTRON SINE 16p13.11 21q chr16:15946417 MYH11 INTRON SINE 16p13.11 Supplementary Table 2A:

Basic metrics associated with Illumina HiSeq 2000 sequencing of crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 cell lines

Total reads mapped to Single read Sample Sample type Sequence platform Paired-end library Total yield (Mb) Mean insert size (bp) extended hg19 genome length (bp) reference Crisis-stage MRC5HPVE6E7 PCR amplicons Illumina HiSeq 2000 BGI short insert 91 10,793.7 126.00 56,107,446 t48 HCT116 WT PCR amplicons Illumina HiSeq 2000 NExtera XT 100 13,921.9 96.00 36,078,028 t48 HCT116 TP53-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 12,406.0 136.00 42,045,162 t48 HCT116 PARP1-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 13,236.7 127.00 42,627,224 t48 HCT116 LIG3-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 13,604.5 128.00 43,095,732 t48 HCT116 LIG4-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 12,978.1 135.00 43,872,265 t48 HCT116 LIG3-/-:NC3 PCR amplicons Illumina HiSeq 2000 NExtera XT 100 8,162.5 141.00 28,852,053 t48 HCT116 LIG3-/-:LIG4-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 10,737.4 152.00 38,164,928 t48 HCT116 LIG3-/-:TP53-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 11,504.6 147.00 40,094,382 t120 HCT116 WT PCR amplicons Illumina HiSeq 2000 NExtera XT 100 11,429.1 85.00 27,166,972 t120 HCT116 TP53-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 11,429.7 100.00 30,676,192 t120 HCT116 PARP1-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 15,513.9 127.00 50,279,210 t120 HCT116 LIG3-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 10,000.7 155.00 35,156,983 t120 HCT116 LIG4-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 13,836.6 112.00 39,759,999 t120 HCT116 LIG3-/-:NC3 PCR amplicons Illumina HiSeq 2000 NExtera XT 100 12,846.6 102.00 34,303,055 t120 HCT116 LIG3-/-:LIG4-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 10,186.9 147.00 35,865,442 t120 HCT116 LIG3-/-:TP53-/- PCR amplicons Illumina HiSeq 2000 NExtera XT 100 11,509.6 151.00 40,561,826 Supplementary Table 2B:

Raw and processed Illumina HiSeq 2000 paired-end read numbers mapping 17p fusions in crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 cell lines

Total filtered reads Normalised 17p-17p Total reads mapped to Total filtered reads Normalised 17p-genomic Frequencies of 17p inter- Total reads prior to Total reads Sample mapping 17p-17p intra-chromosomal extended hg19 genome mapping 17p-genomic inter-chromosomal read chromosomal fusions relative to mapping mapped to 17p fusions read count reference fusions count 17p intra-chromosomal fusions Crisis-stage MRC5HPVE6E7 56,383,986 304,727 602 533.84 56,107,446 5,603 4,968.61 9.31 t48 HCT116 WT 36,329,418 30,219 64 88.08 36,078,028 205 282.14 3.2 t48 HCT116 TP53-/- 42,394,853 134,636 437 515.39 42,045,162 2,082 2,455.49 4.76 t48 HCT116 PARP1-/- 42,929,419 69,749 209 243.42 42,627,224 1,206 1,404.63 5.77 t48 HCT116 LIG3-/- 43,568,154 49,395 70 80.33 43,095,732 1,004 1,152.22 14.34 t48 HCT116 LIG4-/- 44,277,925 39,457 100 112.92 43,872,265 470 530.74 4.7 t48 HCT116 LIG3-/-:NC3 29,186,082 58,395 75 128.49 28,852,053 1,085 1,858.76 14.47 t48 HCT116 LIG3-/-:LIG4-/- 38,485,338 19,250 194 252.04 38,164,928 433 562.55 2.23 t48 HCT116 LIG3-/-:TP53-/- 40,504,584 97,325 451 556.73 40,094,382 3,427 4,230.39 7.6 t120 HCT116 WT 27,302,175 18,690 77 141.01 27,166,972 294 538.42 3.82 t120 HCT116 TP53-/- 30,842,461 87,399 387 627.38 30,676,192 1,163 1,885.39 3.01 t120 HCT116 PARP1-/- 50,507,945 71,974 189 187.10 50,279,210 982 972.12 5.2 t120 HCT116 LIG3-/- 35,427,121 50,018 308 434.70 35,156,983 1,243 1,754.31 4.04 t120 HCT116 LIG4-/- 39,960,600 35,474 118 147.65 39,759,999 761 952.19 6.45 t120 HCT116 LIG3-/-:NC3 34,578,491 63,944 131 189.42 34,303,055 1,717 2,482.76 13.11 t120 HCT116 LIG3-/-:LIG4-/- 36,117,596 48,494 310 429.15 35,865,442 1,547 2,141.62 4.99 t120 HCT116 LIG3-/-:TP53-/- 40,868,771 122,694 800 978.74 40,561,826 4,138 5,062.55 5.17 Supplementary Table 3A:

Genomic locations and identities of all genes disrupted by telomere inter-chromosomal fusion events in crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 cell lines

HGNC gene symbol or Coding context of Chromosome Sample Genomic coordinate/s of fusion junction/s Ensembl identity of gene Ensembl gene name junction location of junction Crisis-stage MRC5HPVE6E7 chr1:15978720, chr1:15978977 DDI2 ENSG00000197312 INTRON 1p36.21 chr1:201037553 CACNA1S ENSG00000081248 INTRON 1q32.1 chr1:155626165 MSTO1 ENSG00000125459 INTRON 1q22 chr4:39777825, chr4:39777663 UBE2K ENSG00000078140 INTRON 4p14 chr4:62150579, chr4:62150846 LPHN3 ENSG00000150471 INTRON 4q13.1 chr4:137864375 RP11-138I17.1 ENSG00000248869 INTRON 4q28.3 chr6:102229938, chr6:102230230 GRIK2 ENSG00000164418 INTRON 6q16.3 chr6:119558701, chr6:119558559 MAN1A1 ENSG00000111885 INTRON 6q22.31 chr7:131159323, chr7:131159016 MKLN1 ENSG00000128585 INTRON 7q32.3 chr8:135550045, chr8:135550320 ZFAT ENSG00000066827 INTRON 8q24.22 chr8:19477341, chr8:19477341 CSGALNACT1 ENSG00000147408 INTRON 8p21.3 chr11:63431645 ATL3 ENSG00000184743 INTRON 11q13.1 chr11:58010338, chr11:58010218 EIF4A2P3 ENSG00000254589 EXON 11q12.1 chr12:104853154, chr12:104853460 CHST11 ENSG00000171310 INTRON 12q23.3 chr13:44803342, chr13:44803396 SMIM2-AS ENSG00000227258 INTRON 13q14.11 chr14:102510316, chr14:102509957 DYNC1H1 ENSG00000197102 EXON 14q32.31 chr14:52478136, chr14:52478618 NID2 ENSG00000087303 INTRON 14q22.1 chr14:105465985, chr14:105465922 C14orf79 ENSG00000140104 INTRON 14q32.33 chr16:78394880 WWOX ENSG00000186153 INTRON 16q23.1 chr18:26917657, chr18:26915207 CTD-2515C13.1 ENSG00000266479 INTRON 18q12.1 chr20:62918877 PCMTD2 ENSG00000203880 INTRON 20q13.33 chr21:36085143 CLIC6 ENSG00000159212 INTRON 21q22.12 chr21:17754221 LINC00478 ENSG00000215386 INTRONS 21q21.1 HCT116 WT chr7:152039382 KMT2C ENSG00000055609 INTRON 7q36.1 chr11:26457598 ANO3 ENSG00000134343 INTRON 11p14.2 chr11:64428465 NRXN2 ENSG00000110076 EXON 11q13.1 chr16:81707652, chr16:81707324 CMIP ENSG00000153815 INTRON 16q23.3 chr17:33810356, chr17:33810217 SLFN12L ENSG00000205045 INTRON 17q12 -/- HCT116 TP53 chr1:205423723, chr1:205423834 LEMD1 ENSG00000186007 INTRON 1q32.1 chr1:208008668, chr1:208008949 C1orf132 ENSG00000203709 INTRON 1q32.2 chr1:21960825, chr1:21960420 RAP1GAP ENSG00000076864 INTRON 1p36.12 chr1:169170478 NME7 ENSG00000143156 INTRON 1q24.2 chr1:26398154 AL391650.1 ENSG00000265320 INTRON 1p36.11 chr1:26679384 AIM1L ENSG00000176092 INTRON 1p36.11 chr1:36836337 STK40 ENSG00000196182 INTRON 1p34.3 chr2:160097324, chr2:160097076 WDSUB1 ENSG00000196151 INTRON 2q24.2 chr2:162573855, chr2:162574561, chr2:162575323 SLC4A10 ENSG00000144290 INTRON 2q24.2 chr2:151149320 AC016682.1 ENSG00000230645 INTRON 2q23.3 chr3:53045108 SFMBT1 ENSG00000163935 INTRON 3p21.1 chr3:53090675 RP11-894J14.5 ENSG00000272305 INTRON 3p21.1 chr3:169820733, chr3:169821495 PHC3 ENSG00000173889 INTRON 3q26.2 chr4:94537771, chr4:94538210 GRID2 ENSG00000152208 INTRON 4q22.2 chr4:1903182, chr4:1901424 WHSC1 ENSG00000109685 INTRON 4p16.3 chr5:59048661 PDE4D ENSG00000113448 INTRON 5q12.1 chr6:43011461, chr6:43011611 CUL7, KLC4 ENSG00000044090, ENSG00000137171 INTRONS 6p21.1 chr7:151691713 GALNTL5 ENSG00000106648 INTRON 7q36.1 chr8:38088087 DDH2 ENSG00000085788 INTRON 8p11.23 chr8:92372990 SLC26A7 ENSG00000147606 INTRON 8q21.3 chr9:127644716, chr9:127644881 GOLGA1 ENSG00000136935 INTRON 9q33.3 chr11:122763040, chr11:122766057 C11orf63 ENSG00000109944 INTRON 11q24.1 chr11:43440447, chr11:43439924 TTC17 ENSG00000052841 INTRON 11p12 chr11:31371068 DCDC1 ENSG00000170959 INTRON 11p13 chr12:121984196, chr12:121984105 KDM2B ENSG00000089094 INTRON 12q24.31 chr12:2748188 CACNA1C ENSG00000151067 INTRON 12p13.33 chr12:76234029, chr12:76240715 RNF41 ENSG00000181852 INTRON 12q21.2 chr3:113455015 NAA50 ENSG00000121579 INTRON 13q13.2 chr14:65212590, chr14:65212728 PLEKHG3 ENSG00000126822 INTRON 14q23.3 chr16:28,878,650 SH2B1 ENSG00000178188 INTRON 16p11.2 chr17:29226407, chr17:29226530 TEFM ENSG00000172171 EXON 17q11.2 chr17:74092305 EXOC7 ENSG00000182473 INTRON 17q25.1 chr19:3423228 NFIC ENSG00000141905 INTRON 19p13.3 chr19:50945470 MYBPC2 ENSG00000086967 EXON 19q13.33 chr19:38901904 RASGRP4 ENSG00000171777 INTRON 19q13.2 chrX:46714213 RP2 ENSG00000102218 INTRON Xp11.23 -/- HCT116 PARP1 chr1:90245629, chr1:90245755 RP11-302M6.4 ENSG00000271949 INTRON 1p22.2 chr1:92605603 BTBD8 ENSG00000189195 INTRON 1p22.1 chr1:712171 RP11-206L10.2 ENSG00000228327 INTRON 1p36.33 chr3:45125764, chr3:45126088 CDCP1 ENSG00000163814 EXON 3p21.31 chr4:79574267, chr4:79574800 LINC01094 ENSG00000251442 INTRON 4q21.21 chr7:31095764 ADCYAP1R1 ENSG00000078549 INTRON 7p14.3 chr8:133085554 HHLA1 ENSG00000132297 INTRON 8q24.22 chr9:136219546, chr9:136219837 SURF1 ENSG00000148290 INTRON 9q34.2 chr11:32803469 NUP98 ENSG00000110713 INTRON 11p15.4 chr11:65138671 RP11-86708.5 ENSG00000255478 INTRON 11q13.1 chr11:111896100 DLAT ENSG00000150768 EXON 11q23.1 chr12:66359974 HMGA2 ENSG00000149948 EXON 12q14.3 chr13:54694030 LINC00458 ENSG00000234787 INTRON 13q14.3 chr16:89631043, chr16:89631887 RPL13 ENSG00000167526 EXON 16q24.3 chr17:76354662 SOCS3 ENSG00000184557 EXON 17q25.3 chr17:6892388 AC027763.2, ALOX12-AS1 ENSG00000215067, ENSG00000267047 INTRON 17p13.1 chr17:36559012 SOCS7 ENSG00000174111 EXON 17q12 chr17:58537143 APPBP2 ENSG00000062725 INTRON 17q23.2 chr18:34096762 FHOD3 ENSG00000134775 INTRON 18q12.2 chr20:61382288 NTSR1 ENSG00000101188 INTRON 20q13.33 chr21:35464287 SLC5A3, MRPS6 ENSG00000198743, ENSG00000243927 INTRON 21q22.11 chrX:98744344 XRCC6P5 ENSG00000215070 INTRON Xq22.1 -/- HCT116 LIG3 chr4:89715874 FAM13A ENSG00000138640 INTRON 4q22.1 chr5:31122528 RP11-152KL.2 ENSG00000254138 INTRON 5p13.3 chr6:43037144 KLC4 ENSG00000137171 INTRON 6p21.1 chr8:140779263, chr8:140779460 TRAPPC9 ENSG00000167632 INTRON 8q24.3 chr10:98612890, chr10:98612754 LCOR ENSG00000196233 INTRON 10q24.1 chr11:1083184 MUC2 ENSG00000198788 EXON 11p15.5 chr11:75176251 GDPD5 ENSG00000158555 INTRON 11q13.4 chr11:107687318 SLC35F2 ENSG00000110660 INTRON 11q22.3 chr11:58411940 GLYAT ENSG00000149124 INTRON 11q12.1 chr16:2140355 PKD1 ENSG00000008710 EXON 16p13.3 chr16:2140216 PKD1, miR1225 ENSG00000008710, ENSG00000221656 INTRON, EXON 16p13.3 chr17:72939724 OTOP3 ENSG00000182938 EXON 17q25.1 chr17:1977456, chr17:21285018 SMG6 ENSG00000070366 INTRON 17p13.3 chr17:3723625 C17orf85 ENSG00000074356 INTRON 17p13.2 chr22:42252650 SREBF2 ENSG00000198911 EXON 22q13.2 chrX:41206622 DDX3X ENSG00000215301 EXON Xp11.4 -/- HCT116 LIG4 chr12:111322873, chr12:111324228 CCDC63 ENSG00000173093 INTRON 12q24.11 chr14:80956647, chr14:80957623 CEP128 ENSG00000100629 INTRON 14q31.1 chr16:20712863, chr16:20719063 ACSM3 ENSG00000005187 INTRON 16p12.3 chr17:63908194 CEP112 ENSG00000154240 INTRON 17q24.1 -/-:NC3 HCT116 LIG3 chr1:33264009, chr1:33264223 YARS ENSG00000134684 INTRON 1p35.1 chr1:50953111, chr1:50943506 FAF1 ENSG00000185104 INTRON 1p32.3 chr3:9437280 SETD5-AS1 ENSG00000206573 EXON and INTRON 3p25.3 chr6:136878873 MAP3K5 ENSG00000197442 EXON 6q23.3 chr6:159152538 SYTL3 ENSG00000164674 INTRON 6q25.3 chr7:50063983 ZPBP ENSG00000042813 INTRON 7p12.2 chr8:22944386, chr8:22944735 TNFRSF10c ENSG00000173535 INTRON 8p21.3 chr9:139748209 MAMDC4 ENSG00000177943 INTRON 9q34.3 chr10:71866448 H2AFY2 ENSG00000099284 INTRON 10q22.1 chr12:95392590, chr12:95392689 NDUFA12 ENSG00000184752 INTRON 12q22 chr14:92376909, chr14:92377206 FBLN5 ENSG00000140092 INTRON 14q32.12 chr15:85386692 ALPK3 ENSG00000136383 INTRON 15q25.3 chr16:84347911 WFDC1 ENSG00000103175 INTRON 16q24.1 chr19:46258100 AC074212.3 ENSG00000237452 INTRON 19q13.32 chr19:6394450, chr19:6394512 CTB-180A7.8 ENSG00000214347 INTRON 19p13.3 chr19:50097317 PRR12 ENSG00000126464 INTRON 19q13.33 chr22:19921607, chr22:19922030 TXNRD2 ENSG00000184470 INTRON 22q11.21 chr22:22052426 YPEL1 ENSG00000100027 EXON 22q11.21 HCT116 LIG3-/-:LIG4-/- chr1:184853591 FAM129A ENSG00000135842 INTRON 1q25.3 chr8:18513889 PSD3 ENSG00000156011 INTRON 8p22 chr15:89428980 HAPLN3 ENSG00000140511 INTRON 15q26.1 chr16:65606589, chr16:65606939 LINC00922 ENSG00000261742 INTRON 16q21 chr17:39958934 LEPREL4 ENSG00000141696 EXON 17q21.2 HCT116 LIG3-/-:TP53-/- chr1:24291140, chr1:24293473 SRSF10 ENSG00000188529 EXON 1p36.11 chr1:27271348 NUDC ENSG00000090273 INTRON 1p36.11 chr1:179363129, chr1:179364353 AXDND1 ENSG00000162779 EXON 1q25.2 chr1:63986165 ITGB3BP ENSG00000142856 INTRON 1p31.3 chr1:64104767 PGM1 ENSG00000079739 INTRON 1p31.3 chr1:33300876, chr1:33300997 S100PBP ENSG00000116497 EXON 1p35.1 chr3:128526843 RAB7A ENSG00000075785 INTRON 3q21.3 chr4:70714934, chr4:70715319 SULT1E1 ENSG00000109193 EXON 4q13.3 chr4:78115944 CCNG2 ENSG00000138764 INTRON 4q21.1 chr4:88942405, chr4:88942895 PKD2 ENSG00000118762 INTRON 4q22.1 chr4:169421799 PALLD, DDX60L ENSG00000129116, ENSG00000181381 INTRONS 4q32.3 chr5:128011517 SLC27A6 ENSG00000113396 INTRON 5q23.3 chr5:137738851, chr5:137739212 KDM3B ENSG00000120733 INTRON 5q31.2 chr5:176732265 PRELID1 ENSG00000169230 INTRON 5q35.3 chr5:109795987 TMEM232 ENSG00000186952 INTRON 5q22.1 chr5:90277489, chr5:90277504 GPR98 ENSG00000164199 INTRONS 5q14.3 chr6:146376502 GRM1 ENSG00000152822 INTRON 6q24.3 chr7:26239040 HNRNPA2B1 ENSG00000122566 INTRON 7p15.2 chr7:91628652, chr7:91628459 AKAP9 ENSG00000127914 INTRON 7q21.2 chr7:6097835 EIF2AK1 ENSG00000086232 INTRON 7p22.1 chr8:106070965 RP11-127H5.1 ENSG00000253350 INTRON 8q22.3 chr8:104446775 DCAF13 ENSG00000164934 INTRON 8q22.3 chr9:13168100 MPDZ ENSG00000107186 INTRON 9p23 chr10:100540984, chr10:100541189 HPSE2 ENSG00000172987 INTRON 10q24.2 chr10:35310372, chr10:35310458 CUL2 ENSG00000108094 INTRON 10p11.21 chr11:66781086 SYT12 ENSG00000173227 INTRON 11q13.2 chr14:23496362 PSMB5 ENSG00000100804 INTRON 14q11.2 chr15:78863110 CHRNA5 ENSG00000169684 INTRON 15q25.1 chr15:66707833, chr15:66708077 MAP2K1 ENSG00000169032 INTRON 15q22.31 chr16:15946202, chr16:15946417 MYH11 ENSG00000133392 INTRON 16p13.11 chr17:46989758 UBE2Z ENSG00000159202 INTRON 17q21.32 chr17:58958091 BCAS3 ENSG00000141376 INTRON 17q23.2 chr18:9818872 RAB31 ENSG00000168461 INTRON 18p11.22 chr19:56480469 NLRP8 ENSG00000179709 INTRON 19q13.43 chr19:12952782, chr19:12953042 MAST1, HOOK2 ENSG00000105613, ENSG00000095066 INTRONS 19p13.2 chr20:2372026, chr20:2372375 TGM6 ENSG00000166948 INTRON 20p13 chr22:39713390, chr22:39713924 RPL3 ENSG00000100316 EXON 22q13.1 chr22:33946960 LARGE ENSG00000133424 INTRON 22q12.3 chrX:19767350 SH3KBP1 ENSG00000147010 INTRON Xp22.12 chrX:32899288, chrX:32899358 DMD ENSG00000198947 INTRON Xp21.1 chrX:14547908 GLRA2 ENSG00000101958 EXON Xp22.2 Supplementary Table 3B:

Gene ontologies enriched in lists of genes disrupted by inter-chromosomal telomere fusions in crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 samples mapped using DAVID Bioinformatics Resources 6.7; NIAID (NIH): https://david.ncifcrf.gov/

(i) All 176 genes disrupted in all MRC5HPVE6E7 and 17p TALEN-treated HCT116 samples

GO category % genes within Benjamini-Hochberg P-Value DAVID assigned GO Term gene count GO category corrected P-value sequence variant 108 69.2 8.90E-03 8.60E-01 polymorphism 105 67.3 6.20E-03 2.00E-01 cellular process 99 63.5 4.50E-02 9.40E-01 phosphoprotein 88 56.4 1.60E-07 4.00E-05 alternative splicing 87 55.8 1.70E-06 2.10E-04 splice variant 85 54.5 8.90E-06 5.90E-03 membrane 58 37.2 7.20E-02 5.70E-01 cytoplasm 42 26.9 1.10E-03 8.70E-02 localization 36 23.1 2.40E-02 9.90E-01 non-membrane-bounded organelle 35 22.4 2.00E-03 1.10E-01 intracellular non-membrane-bounded organelle 35 22.4 2.00E-03 1.10E-01 cellular component organization 32 20.5 1.60E-02 9.70E-01 establishment of localization 32 20.5 3.50E-02 9.70E-01 transport 31 19.9 5.00E-02 9.30E-01 mutagenesis site 29 18.6 1.90E-03 4.80E-01 cellular protein metabolic process 28 17.9 6.00E-02 8.90E-01 acetylation 28 17.9 7.20E-02 5.90E-01 nucleotide binding 26 16.7 8.60E-02 6.90E-01 coiled coil 24 15.4 3.60E-02 5.00E-01 purine nucleotide binding 23 14.7 8.20E-02 7.00E-01 cytoskeleton 22 14.1 3.40E-03 9.20E-02 organelle organization 22 14.1 4.00E-03 9.90E-01 ribonucleotide binding 22 14.1 9.10E-02 6.80E-01 purine ribonucleotide binding 22 14.1 9.10E-02 6.80E-01 transport 21 13.5 3.10E-02 4.90E-01 nucleotide-binding 20 12.8 5.90E-02 6.20E-01 adenyl nucleotide binding 20 12.8 7.10E-02 7.40E-01 purine nucleoside binding 20 12.8 8.00E-02 7.10E-01 nucleoside binding 20 12.8 8.40E-02 7.00E-01 cell fraction 19 12.2 2.70E-03 8.40E-02 transferase 19 12.2 2.10E-02 4.20E-01 ATP binding 19 12.2 7.10E-02 7.20E-01 adenyl ribonucleotide binding 19 12.2 7.90E-02 7.20E-01 cytoskeletal part 18 11.5 1.70E-03 1.70E-01 Golgi apparatus 17 10.9 1.80E-03 1.30E-01 macromolecule localization 16 10.3 3.90E-02 9.60E-01 atp-binding 16 10.3 8.60E-02 5.90E-01 insoluble fraction 15 9.6 7.90E-03 1.50E-01 cellular localization 15 9.6 2.60E-02 9.80E-01 membrane fraction 14 9 1.40E-02 2.40E-01 calcium ion binding 14 9 4.70E-02 6.80E-01 calcium 13 8.3 2.10E-02 3.90E-01 ion transport 13 8.3 2.30E-02 1.00E+00 ion transport 13 8.3 3.00E-02 9.70E-01 ion transport 12 7.7 5.20E-03 2.30E-01 synapse 11 7.1 6.20E-04 1.30E-01 activity 11 7.1 1.60E-03 3.90E-01 substrate specific channel activity 11 7.1 2.00E-03 2.70E-01 channel activity 11 7.1 2.60E-03 2.40E-01 passive transmembrane transporter activity 11 7.1 2.60E-03 1.90E-01 golgi apparatus 11 7.1 1.60E-02 3.60E-01 cellular macromolecule catabolic process 11 7.1 7.40E-02 9.10E-01 cellular macromolecule catabolic process 11 7.1 8.90E-02 9.10E-01 enzyme binding 10 6.4 3.30E-02 5.80E-01 microtubule cytoskeleton 10 6.4 3.50E-02 4.00E-01 cell cycle process 10 6.4 4.30E-02 9.70E-01 modification-dependent protein catabolic process 10 6.4 4.60E-02 9.40E-01 modification-dependent macromolecule catabolic process 10 6.4 4.60E-02 9.40E-01 cell cycle process 10 6.4 5.10E-02 9.10E-01 modification-dependent macromolecule catabolic process 10 6.4 5.60E-02 8.90E-01 modification-dependent protein catabolic process 10 6.4 5.60E-02 8.90E-01 cytoskeleton 10 6.4 5.70E-02 6.30E-01 proteolysis involved in cellular protein catabolic process 10 6.4 5.80E-02 9.20E-01 cellular protein catabolic process 10 6.4 6.00E-02 9.10E-01 protein catabolic process 10 6.4 7.00E-02 9.20E-01 proteolysis involved in cellular protein catabolic process 10 6.4 7.00E-02 9.00E-01 cellular protein catabolic process 10 6.4 7.10E-02 9.00E-01 protein catabolic process 10 6.4 8.30E-02 9.10E-01 ionic channel 9 5.8 3.30E-03 1.90E-01 ubl conjugation pathway 9 5.8 4.30E-02 5.50E-01 cell junction 9 5.8 6.00E-02 5.30E-01 synapse part 8 5.1 3.90E-03 9.50E-02 synaptic transmission 8 5.1 1.10E-02 1.00E+00 synaptic transmission 8 5.1 1.40E-02 9.90E-01 gated channel activity 8 5.1 1.60E-02 4.30E-01 transmission of nerve impulse 8 5.1 2.50E-02 9.90E-01 transmission of nerve impulse 8 5.1 3.00E-02 9.80E-01 protein kinase cascade 8 5.1 3.30E-02 9.90E-01 protein kinase cascade 8 5.1 3.90E-02 9.70E-01 dendrite 7 4.5 2.10E-03 9.30E-02 synapse 7 4.5 6.20E-03 2.30E-01 cellular macromolecular complex assembly 7 4.5 4.70E-02 9.20E-01 cellular macromolecular complex assembly 7 4.5 5.40E-02 9.00E-01 neuron projection 7 4.5 6.00E-02 5.10E-01 protein domain specific binding 7 4.5 6.20E-02 7.40E-01 cellular macromolecular complex subunit organization 7 4.5 7.30E-02 9.20E-01 Apoptosis 7 4.5 7.40E-02 5.70E-01 serine/threonine-protein kinase 7 4.5 7.40E-02 5.70E-01 mitotic cell cycle 7 4.5 8.40E-02 9.20E-01 cellular macromolecular complex subunit organization 7 4.5 8.40E-02 9.10E-01 cell junction 7 4.5 8.80E-02 5.80E-01 mitotic cell cycle 7 4.5 9.60E-02 9.20E-01 postsynaptic membrane 6 3.8 4.90E-03 1.10E-01 anion transmembrane transporter activity 6 3.8 8.20E-03 2.70E-01 ion channel complex 6 3.8 2.60E-02 3.30E-01 chromosomal rearrangement 6 3.8 6.50E-02 6.10E-01 microtubule 6 3.8 7.30E-02 5.40E-01 ligase 6 3.8 8.70E-02 5.80E-01 postsynaptic density 5 3.2 2.60E-03 9.60E-02 activity 5 3.2 3.20E-03 1.80E-01 anion channel activity 5 3.2 4.30E-03 1.70E-01 postsynaptic cell membrane 5 3.2 1.00E-02 2.80E-01 compositionally biased region:Gln-rich 5 3.2 3.00E-02 9.80E-01 muscle contraction 5 3.2 3.70E-02 9.80E-01 muscle contraction 5 3.2 4.20E-02 9.50E-01 cellular protein complex assembly 5 3.2 4.40E-02 9.60E-01 Ubiquitin mediated proteolysis 5 3.2 4.50E-02 9.80E-01 extracellular structure organization 5 3.2 4.50E-02 9.60E-01 cellular protein complex assembly 5 3.2 4.90E-02 9.40E-01 muscle system process 5 3.2 4.90E-02 9.10E-01 extracellular structure organization 5 3.2 5.00E-02 9.20E-01 muscle system process 5 3.2 5.50E-02 9.00E-01 chromatin regulator 5 3.2 8.20E-02 5.90E-01 ubiquitin protein ligase binding 4 2.6 3.40E-03 1.60E-01 chloride transport 4 2.6 1.40E-02 9.90E-01 chloride transport 4 2.6 1.50E-02 9.80E-01 sarcolemma 4 2.6 1.70E-02 2.50E-01 extracellular ligand-gated ion channel activity 4 2.6 2.30E-02 5.20E-01 domain:SAM 4 2.6 2.50E-02 9.90E-01 cell-matrix adhesion 4 2.6 3.70E-02 9.90E-01 cell-matrix adhesion 4 2.6 4.00E-02 9.60E-01 inorganic anion transport 4 2.6 4.10E-02 9.70E-01 neuropeptide signaling pathway 4 2.6 4.10E-02 9.70E-01 Sterile alpha motif SAM 4 2.6 4.10E-02 9.30E-01 inorganic anion transport 4 2.6 4.50E-02 9.40E-01 neuropeptide signaling pathway 4 2.6 4.50E-02 9.40E-01 RNA transport 4 2.6 4.60E-02 9.50E-01 nucleic acid transport 4 2.6 4.60E-02 9.50E-01 establishment of RNA localization 4 2.6 4.60E-02 9.50E-01 cell-substrate adhesion 4 2.6 4.70E-02 9.30E-01 RNA localization 4 2.6 4.90E-02 9.20E-01 establishment of RNA localization 4 2.6 5.00E-02 9.40E-01 RNA transport 4 2.6 5.00E-02 9.40E-01 nucleic acid transport 4 2.6 5.00E-02 9.40E-01 cell-substrate adhesion 4 2.6 5.10E-02 9.10E-01 SAM 4 2.6 5.20E-02 8.50E-01 RNA localization 4 2.6 5.40E-02 9.10E-01 anatomical structure homeostasis 4 2.6 5.70E-02 9.20E-01 anatomical structure homeostasis 4 2.6 6.20E-02 8.90E-01 contractile fiber part 4 2.6 6.50E-02 5.20E-01 nucleobase, nucleoside, nucleotide and nucleic acid transport 4 2.6 6.60E-02 9.20E-01 nucleobase, nucleoside, nucleotide and nucleic acid transport 4 2.6 7.20E-02 9.00E-01 contractile fiber 4 2.6 7.60E-02 5.30E-01 compositionally biased region:Poly-Leu 4 2.6 8.60E-02 1.00E+00 ligand-gated channel activity 4 2.6 9.30E-02 6.80E-01 ligand-gated ion channel activity 4 2.6 9.30E-02 6.80E-01 Vascular smooth muscle contraction 4 2.6 9.70E-02 9.80E-01 detection of mechanical stimulus 3 1.9 7.40E-03 1.00E+00 detection of mechanical stimulus 3 1.9 8.00E-03 9.90E-01 AT hook, DNA-binding, conserved site 3 1.9 2.10E-02 1.00E+00 AT_hook 3 1.9 2.40E-02 9.30E-01 glutamate receptor activity 3 1.9 2.80E-02 5.50E-01 smooth muscle contraction 3 1.9 3.00E-02 9.90E-01 domain:GPS 3 1.9 3.20E-02 9.50E-01 smooth muscle contraction 3 1.9 3.20E-02 9.70E-01 Extracellular ligand-binding receptor 3 1.9 3.50E-02 9.90E-01 Cell division and chromosome partitioning 3 1.9 3.80E-02 1.10E-01 GPS 3 1.9 3.90E-02 9.80E-01 JAK-STAT cascade 3 1.9 4.10E-02 9.80E-01 JAK-STAT cascade 3 1.9 4.30E-02 9.50E-01 GPS 3 1.9 4.60E-02 9.20E-01 structural constituent of muscle 3 1.9 4.90E-02 6.70E-01 GPCR, family 2, secretin-like, conserved site 3 1.9 5.90E-02 9.50E-01 growth regulation 3 1.9 6.30E-02 6.20E-01 protein methyltransferase activity 3 1.9 6.70E-02 7.40E-01 chloride channel 3 1.9 6.90E-02 5.90E-01 chloride channel complex 3 1.9 7.40E-02 5.30E-01 I band 3 1.9 7.40E-02 5.30E-01 response to mechanical stimulus 3 1.9 7.70E-02 9.20E-01 response to mechanical stimulus 3 1.9 8.20E-02 9.10E-01 GPCR, family 2-like 3 1.9 8.50E-02 9.40E-01 repeat:TPR 6 3 1.9 9.80E-02 9.90E-01 detection of abiotic stimulus 3 1.9 1.00E-01 9.40E-01 polycystin complex 2 1.3 1.60E-02 2.50E-01 region of interest:Dihydropyridine binding 2 1.3 3.10E-02 9.70E-01 region of interest:Phenylalkylamine binding 2 1.3 3.10E-02 9.70E-01 Voltage-dependent , L-type, alpha-1 subunit 2 1.3 3.20E-02 1.00E+00 short sequence motif:Polycystin motif 2 1.3 3.80E-02 9.60E-01 HMG-I and HMG-Y, DNA-binding, conserved site 2 1.3 4.00E-02 9.60E-01 PIRSF005657:voltage-gated calcium channel 2 1.3 4.50E-02 9.40E-01 motile primary cilium 2 1.3 4.80E-02 4.80E-01 dendrite cytoplasm 2 1.3 4.80E-02 4.80E-01 cross-link:Glycyl lysine isopeptide (Lys-Gly) (interchain with G-Cter in NEDD8) 2 1.3 5.30E-02 9.80E-01 region of interest:Binding to the beta subunit 2 1.3 5.30E-02 9.80E-01 cell projection cytoplasm 2 1.3 5.60E-02 5.20E-01 bicarbonate transport 2 1.3 5.60E-02 9.30E-01 Voltage-dependent calcium channel, alpha-1 subunit, IQ domain 2 1.3 5.60E-02 9.60E-01 bicarbonate transport 2 1.3 5.80E-02 8.90E-01 chondroitin sulfate biosynthetic process 2 1.3 6.40E-02 9.20E-01 Polycystin cation channel, PKD1/PKD2 2 1.3 6.40E-02 9.40E-01 chondroitin sulfate biosynthetic process 2 1.3 6.60E-02 9.00E-01 sulfotransferase 2 1.3 6.70E-02 6.00E-01 UTP biosynthetic process 2 1.3 7.10E-02 9.20E-01 UTP metabolic process 2 1.3 7.10E-02 9.20E-01 GTP biosynthetic process 2 1.3 7.10E-02 9.20E-01 Cullin homology 2 1.3 7.20E-02 9.40E-01 Voltage-dependent calcium channel, alpha-1 subunit 2 1.3 7.20E-02 9.40E-01 nucleoside diphosphate kinase activity 2 1.3 7.40E-02 7.10E-01 GTP biosynthetic process 2 1.3 7.40E-02 9.00E-01 UTP biosynthetic process 2 1.3 7.40E-02 9.00E-01 UTP metabolic process 2 1.3 7.40E-02 9.00E-01 site:Calcium ion selectivity and permeability 2 1.3 7.50E-02 9.90E-01 zinc finger region:PHD-type 3 2 1.3 7.50E-02 9.90E-01 CULLIN 2 1.3 7.80E-02 8.90E-01 pyrimidine ribonucleoside triphosphate metabolic process 2 1.3 7.90E-02 9.10E-01 CTP biosynthetic process 2 1.3 7.90E-02 9.10E-01 CTP metabolic process 2 1.3 7.90E-02 9.10E-01 pyrimidine ribonucleoside triphosphate biosynthetic process 2 1.3 7.90E-02 9.10E-01 Cullin, conserved site 2 1.3 7.90E-02 9.40E-01 Cullin, N-terminal region 2 1.3 7.90E-02 9.40E-01 Cullin, N-terminal 2 1.3 7.90E-02 9.40E-01 CTP metabolic process 2 1.3 8.20E-02 9.20E-01 pyrimidine ribonucleoside triphosphate biosynthetic process 2 1.3 8.20E-02 9.20E-01 CTP biosynthetic process 2 1.3 8.20E-02 9.20E-01 pyrimidine ribonucleoside triphosphate metabolic process 2 1.3 8.20E-02 9.20E-01 chondroitin sulfate proteoglycan biosynthetic process 2 1.3 8.70E-02 9.20E-01 DNA-binding region:A.T hook 1 2 1.3 8.90E-02 9.90E-01 DNA-binding region:A.T hook 2 2 1.3 8.90E-02 9.90E-01 fatty-acid ligase activity 2 1.3 8.90E-02 6.90E-01 chondroitin sulfate proteoglycan biosynthetic process 2 1.3 9.00E-02 9.10E-01 pyrimidine nucleoside triphosphate biosynthetic process 2 1.3 9.40E-02 9.30E-01 GTP metabolic process 2 1.3 9.40E-02 9.30E-01 endoplasmic reticulum organization 2 1.3 9.40E-02 9.30E-01 GTP metabolic process 2 1.3 9.70E-02 9.20E-01 endoplasmic reticulum organization 2 1.3 9.70E-02 9.20E-01 pyrimidine nucleoside triphosphate biosynthetic process 2 1.3 9.70E-02 9.20E-01

(ii) 23 genes disrupted in MRC5HPVE6E7 sample

GO category % genes within Benjamini-Hochberg P-Value DAVID assigned GO Term gene count GO category corrected P-value alternative splicing 13 68.4 8.20E-03 4.90E-01 splice variant 13 68.4 8.40E-03 4.10E-01 membrane 10 52.6 6.50E-02 6.00E-01 cytoplasm 7 36.8 5.90E-02 6.30E-01 topological domain:Cytoplasmic 7 36.8 6.40E-02 9.40E-01 Golgi apparatus 5 26.3 6.80E-03 4.40E-01 topological domain:Lumenal 4 21.1 6.60E-03 5.70E-01 golgi apparatus 4 21.1 1.40E-02 4.40E-01 negative regulation of cell communication 3 15.8 2.30E-02 5.20E-01 ionic channel 3 15.8 3.10E-02 5.80E-01 skeletal system development 3 15.8 3.60E-02 6.10E-01 gated channel activity 3 15.8 4.30E-02 8.60E-01 Signal-anchor 3 15.8 5.20E-02 6.70E-01 ion channel activity 3 15.8 6.30E-02 8.60E-01 substrate specific channel activity 3 15.8 6.70E-02 7.90E-01 channel activity 3 15.8 7.10E-02 7.30E-01 passive transmembrane transporter activity 3 15.8 7.10E-02 6.70E-01 protein complex assembly 3 15.8 8.30E-02 7.40E-01 protein complex biogenesis 3 15.8 8.30E-02 7.40E-01 ion transport 3 15.8 9.10E-02 6.70E-01 ubl conjugation 3 15.8 9.40E-02 6.40E-01 chondroitin sulfate biosynthetic process 2 10.5 7.70E-03 8.60E-01 chondroitin sulfate proteoglycan biosynthetic process 2 10.5 1.10E-02 7.40E-01 endoplasmic reticulum organization 2 10.5 1.10E-02 6.20E-01 chondroitin sulfate metabolic process 2 10.5 1.30E-02 5.70E-01 chondroitin sulfate proteoglycan metabolic process 2 10.5 1.70E-02 5.80E-01 glycosaminoglycan biosynthetic process 2 10.5 2.00E-02 5.70E-01 Chondroitin sulfate biosynthesis 2 10.5 2.10E-02 2.60E-01 aminoglycan biosynthetic process 2 10.5 2.20E-02 5.50E-01 proteoglycan biosynthetic process 2 10.5 2.50E-02 5.00E-01 ubiquitin protein ligase binding 2 10.5 3.80E-02 9.70E-01 proteoglycan metabolic process 2 10.5 4.10E-02 6.10E-01 polysaccharide biosynthetic process 2 10.5 4.20E-02 6.00E-01 sulfur compound biosynthetic process 2 10.5 4.70E-02 6.10E-01 glycosaminoglycan metabolic process 2 10.5 5.20E-02 6.10E-01 aminoglycan metabolic process 2 10.5 6.10E-02 6.50E-01 carbohydrate biosynthetic process 2 10.5 9.80E-02 7.80E-01

(iii) 22 genes disrupted in Supra-A-NHEJ samples (HCT116 LIG4-/- and LIG3-/-:NC3)

GO category % genes within Benjamini-Hochberg P-Value DAVID assigned GO Term gene count GO category corrected P-value adenyl nucleotide binding 5 23.8 3.50E-02 9.40E-01 purine nucleoside binding 5 23.8 3.70E-02 7.70E-01 nucleoside binding 5 23.8 3.80E-02 6.30E-01 purine nucleotide binding 5 23.8 6.60E-02 7.30E-01 apoptosis 4 19 1.80E-02 9.60E-01 programmed cell death 4 19 1.90E-02 8.20E-01 cell death 4 19 2.90E-02 8.30E-01 death 4 19 2.90E-02 7.40E-01 Cell division and chromosome partitioning 3 14.3 8.50E-03 1.70E-02 Apoptosis 3 14.3 5.40E-02 9.70E-01 oxidoreductase activity, acting on NADH or NADPH 2 9.5 6.80E-02 6.60E-01

(iv) 41 genes disrupted in Supra-C-NHEJ samples (HCT116 PARP1-/- and LIG3-/-)

GO category % genes within Benjamini-Hochberg P-Value DAVID assigned GO Term gene count GO category corrected P-value phosphoprotein 19 59.4 4.00E-03 1.90E-01 alternative splicing 17 53.1 3.90E-02 7.50E-01 splice variant 16 50 8.40E-02 9.80E-01 mutagenesis site 7 21.9 7.20E-02 1.00E+00 growth regulation 3 9.4 2.70E-03 2.40E-01 regulation of growth 3 9.4 8.20E-02 1.00E+00 response to hormone stimulus 3 9.4 9.30E-02 1.00E+00 Signal transduction inhibitor 2 6.2 5.10E-02 7.40E-01 JAK-STAT cascade 2 6.2 5.30E-02 1.00E+00 domain:SOCS box 2 6.2 5.40E-02 1.00E+00 polyol metabolic process 2 6.2 5.50E-02 1.00E+00 SOCS 2 6.2 5.70E-02 7.80E-01 SOCS protein, C-terminal 2 6.2 6.00E-02 1.00E+00 repeat:TPR 7 2 6.2 8.30E-02 9.90E-01 repeat:TPR 6 2 6.2 9.50E-02 9.70E-01 Supplementary Table 3C:

Enrichment of DNA binding motifs defined by the Molecular Signatures Database (MSigDB v. 5.0) within lists of all 176 genes disrupted by inter-chromosomal telomere fusions in crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 identified using Gene Set Enrichment Analysis (GSEA v.2.2.0; Broad Institute)

Number of genes Number of experimental gene False Discovery Rate q- MSigDB gene set name Description of gene set k/K P-value in gene set (K) list genes in overlap (k) value

CTTTGT_V$LEF1_Q2 1972 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CTTTGT which matches annotation for LEF1: lymphoid enhancer-binding factor 1 29 0.0147 9.35E-12 7.81E-09 TGGAAA_V$NFAT_Q4_01 1896 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TGGAAA which matches annotation for NFAT
NFATC 25 0.0132 2.82E-09 1.18E-06 CAGGTA_V$AREB6_01 792 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CAGGTA which matches annotation for TCF8: transcription factor 8 (represses interleukin 2 expression) 16 0.0202 8.39E-09 2.34E-06 GGGCGGR_V$SP1_Q6 2940 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif GGGCGGR which matches annotation for SP1: Sp1 transcription factor 30 0.0102 2.20E-08 4.59E-06 AACTTT_UNKNOWN 1890 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif AACTTT. Motif does not match any known transcription factor 23 0.0122 5.64E-08 9.43E-06 TTGTTT_V$FOXO4_01 2061 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TTGTTT which matches annotation for MLLT7: myeloid/lymphoid or mixed-lineage leukemia (trithorax homolog, Drosophila); translocated to, 7 23 0.0112 2.62E-07 3.65E-05 CAGGTG_V$E12_Q6 2485 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CAGGTG which matches annotation for TCF3: transcription factor 3 (E2A immunoglobulin enhancer binding factors E12/E47) 25 0.0101 5.12E-07 6.12E-05 V$FOXJ2_02 237 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif AYMATAATATTTKN which matches annotation for FOXJ2: forkhead box J2 8 0.0338 1.25E-06 1.19E-04 V$EVI1_04 238 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif DGATADGAHWAGATA which matches annotation for EVI1: ecotropic viral integration site 1 8 0.0336 1.29E-06 1.19E-04 V$GATA1_02 244 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNNNNGATANKGNN which matches annotation for GATA1: GATA binding protein 1 (globin transcription factor 1) 8 0.0328 1.55E-06 1.29E-04 GCAAAAA,MIR-129 183 Targets of MicroRNA GCAAAAA,MIR-129 7 0.0383 2.58E-06 1.96E-04 TGACAGNY_V$MEIS1_01 827 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TGACAGNY which matches annotation for MEIS1: Meis1, myeloid ecotropic viral integration site 1 homolog (mouse) 13 0.0157 3.61E-06 2.51E-04 GGGAGGRR_V$MAZ_Q6 2274 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif GGGAGGRR which matches annotation for MAZ: MYC-associated zinc finger protein (purine-binding transcription factor) 22 0.0097 5.03E-06 3.23E-04 ATATGCA,MIR-448 212 Targets of MicroRNA ATATGCA,MIR-448 7 0.033 6.78E-06 3.96E-04 AAGCACA,MIR-218 398 Targets of MicroRNA AAGCACA,MIR-218 9 0.0226 7.10E-06 3.96E-04 WGTTNNNNNAAA_UNKNOWN 547 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif WGTTNNNNNAAA. Motif does not match any known transcription factor 10 0.0183 1.39E-05 7.26E-04 TGCACTT,MIR-519C,MIR-519B,MIR-519A 448 Targets of MicroRNA TGCACTT,MIR-519C,MIR-519B,MIR-519A 9 0.0201 1.81E-05 8.91E-04 V$PAX6_01 101 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNNNTTCACGCWTGANTKNNN which matches annotation for PAX6: paired box gene 6 (aniridia, keratitis) 5 0.0495 2.13E-05 9.28E-04 V$AREB6_02 254 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif WNWCACCTGWNN which matches annotation for TCF8: transcription factor 8 (represses interleukin 2 expression) 7 0.0276 2.18E-05 9.28E-04 TGTTTAC,MIR-30A-5P,MIR-30C,MIR-30D,MIR-30B,MIR-30E-5P 579 Targets of MicroRNA TGTTTAC,MIR-30A-5P,MIR-30C,MIR-30D,MIR-30B,MIR-30E-5P 10 0.0173 2.26E-05 9.28E-04 V$ATF4_Q2 258 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CVTGACGYMABG which matches annotation for ATF4: activating transcription factor 4 (tax-responsive enhancer element B67) 7 0.0271 2.41E-05 9.28E-04 TATAAA_V$TATA_01 1296 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TATAAA which matches annotation for TAF
TATA 15 0.0116 2.44E-05 9.28E-04 V$HOXA4_Q2 267 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif AWAATTRG which matches annotation for HOXA4: homeobox A4 7 0.0262 3.00E-05 1.07E-03 TCCAGAT,MIR-516-5P 109 Targets of MicroRNA TCCAGAT,MIR-516-5P 5 0.0459 3.08E-05 1.07E-03 WTGAAAT_UNKNOWN 616 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif WTGAAAT. Motif does not match any known transcription factor 10 0.0162 3.80E-05 1.27E-03 RTAAACA_V$FREAC2_01 919 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif RTAAACA which matches annotation for FOXF2: forkhead box F2 12 0.0131 5.29E-05 1.70E-03 V$PPARA_02 129 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNRGGTCATWGGGGTSANG which matches annotation for PPARA: peroxisome proliferative activated receptor, alpha 5 0.0388 6.89E-05 2.13E-03 TTANTCA_UNKNOWN 952 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif TTANTCA. Motif does not match any known transcription factor 12 0.0126 7.39E-05 2.21E-03 TAATTA_V$CHX10_01 810 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TAATTA which matches annotation for VSX1: visual system homeobox 1 homolog, CHX10-like (zebrafish) 11 0.0136 7.73E-05 2.23E-03 TGCGCANK_UNKNOWN 545 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif TGCGCANK. Motif does not match any known transcription factor 9 0.0165 8.22E-05 2.28E-03 YNTTTNNNANGCARM_UNKNOWN 70 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif YNTTTNNNANGCARM. Motif does not match any known transcription factor 4 0.0571 8.46E-05 2.28E-03 CTTTAAR_UNKNOWN 972 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif CTTTAAR. Motif does not match any known transcription factor 12 0.0123 8.98E-05 2.35E-03 YCATTAA_UNKNOWN 556 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif YCATTAA. Motif does not match any known transcription factor 9 0.0162 9.56E-05 2.42E-03 TTCYRGAA_UNKNOWN 326 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif TTCYRGAA. Motif does not match any known transcription factor 7 0.0215 1.05E-04 2.58E-03 V$BRN2_01 237 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNCATNSRWAATNMRN which matches annotation for POU3F2: POU domain, class 3, transcription factor 2 6 0.0253 1.36E-04 3.23E-03 V$CHOP_01 238 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNRTGCAATMCCC which matches annotation for DDIT3: DNA-damage-inducible transcript 3
CEBPA DIFF GENES 6 0.0252 1.39E-04 3.23E-03 V$CEBPDELTA_Q6 240 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif MATTKCNTMAYY which matches annotation for CEBPD: CCAAT/enhancer binding protein (C/EBP), delta 6 0.025 1.46E-04 3.29E-03 ACCTGTTG_UNKNOWN 154 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif ACCTGTTG. Motif does not match any known transcription factor 5 0.0325 1.58E-04 3.45E-03 CTCCAAG,MIR-432 83 Targets of MicroRNA CTCCAAG,MIR-432 4 0.0482 1.64E-04 3.45E-03 SCGGAAGY_V$ELK1_02 1199 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif SCGGAAGY which matches annotation for ELK1: ELK1, member of ETS oncogene family 13 0.0108 1.65E-04 3.45E-03 ACACTGG,MIR-199A,MIR-199B 157 Targets of MicroRNA ACACTGG,MIR-199A,MIR-199B 5 0.0318 1.73E-04 3.53E-03 GAGCTGG,MIR-337 159 Targets of MicroRNA GAGCTGG,MIR-337 5 0.0314 1.84E-04 3.66E-03 ATGTTTC,MIR-494 162 Targets of MicroRNA ATGTTTC,MIR-494 5 0.0309 2.01E-04 3.90E-03 CTTTGA_V$LEF1_Q2 1232 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CTTTGA which matches annotation for LEF1: lymphoid enhancer-binding factor 1 13 0.0106 2.15E-04 4.00E-03 V$IRF1_Q6 258 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TTCACTT which matches annotation for IRF1: interferon regulatory factor 1 6 0.0233 2.15E-04 4.00E-03 V$IPF1_Q4 260 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif GHNNTAATGACM which matches annotation for IPF1: insulin promoter factor 1, homeodomain transcription factor 6 0.0231 2.24E-04 4.07E-03 V$NF1_Q6 261 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNTTGGCNNNNNNCCNNN which matches annotation for NF1: neurofibromin 1 (neurofibromatosis, von Recklinghausen disease, Watson disease) 6 0.023 2.29E-04 4.07E-03 V$CREB_Q2 263 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NSTGACGTAANN which matches annotation for CREB1: cAMP responsive element binding protein 1 6 0.0228 2.38E-04 4.15E-03 V$NFAT_Q4_01 266 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NWGGAAANWN which matches annotation for NFAT
NFATC 6 0.0226 2.53E-04 4.32E-03 V$CREB_Q4 268 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NSTGACGTMANN which matches annotation for CREB1: cAMP responsive element binding protein 1 6 0.0224 2.64E-04 4.32E-03 V$OCT_C 268 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif CTNATTTGCATAY. Motif does not match any known transcription factor 6 0.0224 2.64E-04 4.32E-03 V$NKX22_01 190 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TTAAGTRSTT which matches annotation for NKX2-2: NK2 transcription factor related, locus 2 (Drosophila) 5 0.0263 4.18E-04 6.71E-03 YYCATTCAWW_UNKNOWN 191 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif YYCATTCAWW. Motif does not match any known transcription factor 5 0.0262 4.28E-04 6.75E-03 CAGCTG_V$AP4_Q5 1524 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CAGCTG which matches annotation for REPIN1: replication initiator 1 14 0.0092 4.96E-04 7.68E-03 GTGCCAA,MIR-96 303 Targets of MicroRNA GTGCCAA,MIR-96 6 0.0198 5.05E-04 7.68E-03 V$CEBP_C 200 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NGWVTKNKGYAAKNSAYA which matches annotation for CEBPA: CCAAT/enhancer binding protein (C/EBP), alpha 5 0.025 5.27E-04 7.86E-03 GCCATNTTG_V$YY1_Q6 427 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif GCCATNTTG which matches annotation for YY1: YY1 transcription factor 7 0.0164 5.36E-04 7.86E-03 CACGTG_V$MYC_Q2 1032 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CACGTG which matches annotation for MYC: v-myc myelocytomatosis viral oncogene homolog (avian) 11 0.0107 6.10E-04 8.79E-03 TTTGCAG,MIR-518A-2 210 Targets of MicroRNA TTTGCAG,MIR-518A-2 5 0.0238 6.57E-04 9.31E-03 AGCYRWTTC_UNKNOWN 122 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif AGCYRWTTC. Motif does not match any known transcription factor 4 0.0328 7.10E-04 9.88E-03 GGATTA_V$PITX2_Q2 587 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif GGATTA which matches annotation for PITX2: paired-like homeodomain transcription factor 2 8 0.0136 7.29E-04 9.88E-03 ACTGTAG,MIR-139 123 Targets of MicroRNA ACTGTAG,MIR-139 4 0.0325 7.32E-04 9.88E-03 GCACTTT,MIR-17-5P,MIR-20A,MIR-106A,MIR-106B,MIR-20B,MIR-519D 595 Targets of MicroRNA GCACTTT,MIR-17-5P,MIR-20A,MIR-106A,MIR-106B,MIR-20B,MIR-519D 8 0.0134 7.95E-04 1.06E-02 ATGCTGC,MIR-103,MIR-107 221 Targets of MicroRNA ATGCTGC,MIR-103,MIR-107 5 0.0226 8.26E-04 1.08E-02 V$PAX2_01 58 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNNNGTCANGNRTKANNNN which matches annotation for PAX2: paired box gene 2 3 0.0517 9.22E-04 1.19E-02 RNGTGGGC_UNKNOWN 766 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif RNGTGGGC. Motif does not match any known transcription factor 9 0.0117 9.78E-04 1.24E-02 V$HNF6_Q6 234 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif HWAAATCAATAW which matches annotation for ONECUT1: one cut domain, family member 1 5 0.0214 1.07E-03 1.33E-02 V$NKX3A_01 237 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NWATAAGTATWT which matches annotation for NKX3-1: NK3 transcription factor related, locus 1 (Drosophila) 5 0.0211 1.13E-03 1.39E-02 CTAWWWATA_V$RSRFC4_Q2 361 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif CTAWWWATA which matches annotation for MEF2A: MADS box transcription enhancer factor 2, polypeptide A (myocyte enhancer factor 2A) 6 0.0166 1.25E-03 1.50E-02 ATGTTAA,MIR-302C 243 Targets of MicroRNA ATGTTAA,MIR-302C 5 0.0206 1.26E-03 1.50E-02 GACAATC,MIR-219 143 Targets of MicroRNA GACAATC,MIR-219 4 0.028 1.28E-03 1.51E-02 V$GATA1_04 245 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNCWGATARNNNN which matches annotation for GATA1: GATA binding protein 1 (globin transcription factor 1) 5 0.0204 1.31E-03 1.52E-02 V$YY1_01 246 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNNNNCCATNTWNNNWN which matches annotation for YY1: YY1 transcription factor 5 0.0203 1.33E-03 1.52E-02 V$HMGIY_Q6 248 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif GGAAAWT which matches annotation for HMGA1: high mobility group AT-hook 1 5 0.0202 1.38E-03 1.56E-02 V$IRF1_01 250 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif SAAAAGYGAAACC which matches annotation for IRF1: interferon regulatory factor 1 5 0.02 1.43E-03 1.57E-02 V$CEBP_Q3 251 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNNTKNNGNAAN which matches annotation for CEBPA: CCAAT/enhancer binding protein (C/EBP), alpha 5 0.0199 1.45E-03 1.57E-02 V$T3R_Q6 251 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif MNTGWCCTN. Motif does not match any known transcription factor 5 0.0199 1.45E-03 1.57E-02 GCGSCMNTTT_UNKNOWN 69 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif GCGSCMNTTT. Motif does not match any known transcription factor 3 0.0435 1.53E-03 1.57E-02 V$CDX2_Q5 254 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif ANANTTTTATKRCC which matches annotation for CDX2: caudal type homeobox transcription factor 2 5 0.0197 1.53E-03 1.57E-02 V$HLF_01 254 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif GTTACRYAAT which matches annotation for HLF: hepatic leukemia factor 5 0.0197 1.53E-03 1.57E-02 V$NGFIC_01 255 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif WTGCGTGGGYGG which matches annotation for EGR4: early growth response 4 5 0.0196 1.56E-03 1.57E-02 V$USF_01 256 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif NNRYCACGTGRYNN. Motif does not match any known transcription factor 5 0.0195 1.58E-03 1.57E-02 TTTGCAC,MIR-19A,MIR-19B 516 Targets of MicroRNA TTTGCAC,MIR-19A,MIR-19B 7 0.0136 1.60E-03 1.57E-02 TGACATY_UNKNOWN 665 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif TGACATY. Motif does not match any known transcription factor 8 0.012 1.61E-03 1.57E-02 GTTTGTT,MIR-495 257 Targets of MicroRNA GTTTGTT,MIR-495 5 0.0195 1.61E-03 1.57E-02 V$DBP_Q6 257 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif AGCAHAC which matches annotation for DBP: D site of albumin promoter (albumin D-box) binding protein 5 0.0195 1.61E-03 1.57E-02 V$TST1_01 262 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNKGAATTAVAVTDN which matches annotation for POU3F1: POU domain, class 3, transcription factor 1 5 0.0191 1.75E-03 1.68E-02 V$AML_Q6 266 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNGKNTGTGGTTWNC which matches annotation for RUNX1: runt-related transcription factor 1 (acute myeloid leukemia 1; aml1 oncogene) 5 0.0188 1.87E-03 1.71E-02 V$OCT1_01 266 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNNNWTATGCAAATNTNNN which matches annotation for POU2F1: POU domain, class 2, transcription factor 1 5 0.0188 1.87E-03 1.71E-02 V$CEBP_Q2_01 267 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NTTRCNNAANNN which matches annotation for CEBPA: CCAAT/enhancer binding protein (C/EBP), alpha 5 0.0187 1.90E-03 1.71E-02 V$IK2_01 267 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif NNNTGGGAWNNC. Motif does not match any known transcription factor 5 0.0187 1.90E-03 1.71E-02 V$OCT1_B 267 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif TATGCAAATN which matches annotation for POU2F1: POU domain, class 2, transcription factor 1 5 0.0187 1.90E-03 1.71E-02 V$NF1_Q6_01 268 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NTGGNNNNNNGCCAANN which matches annotation for NF1: neurofibromin 1 (neurofibromatosis, von Recklinghausen disease, Watson disease) 5 0.0187 1.93E-03 1.71E-02 CAGNYGKNAAA_UNKNOWN 75 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif CAGNYGKNAAA. Motif does not match any known transcription factor 3 0.04 1.94E-03 1.71E-02 GAANYNYGACNY_UNKNOWN 75 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif GAANYNYGACNY. Motif does not match any known transcription factor 3 0.04 1.94E-03 1.71E-02 V$CRX_Q4 269 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif YNNNTAATCYCMN which matches annotation for CRX: cone-rod homeobox 5 0.0186 1.97E-03 1.71E-02 KTGGYRSGAA_UNKNOWN 76 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif KTGGYRSGAA. Motif does not match any known transcription factor 3 0.0395 2.01E-03 1.73E-02 V$AP1_Q4 271 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif RGTGACTMANN which matches annotation for JUN: jun oncogene 5 0.0185 2.03E-03 1.73E-02 YKACATTT_UNKNOWN 276 Genes with promoter regions [-2kb,2kb] around transcription start site containing motif YKACATTT. Motif does not match any known transcription factor 5 0.0181 2.20E-03 1.86E-02 V$RFX1_02 278 Genes with promoter regions [-2kb,2kb] around transcription start site containing the motif NNGTNRCNATRGYAACNN which matches annotation for RFX1: regulatory factor X, 1 (influences HLA class II expression) 5 0.018 2.27E-03 1.89E-02 Supplementary Table 4:

17p subtelomeric coordinates and sequences of all intra-chromosomal fusion junctions mapped using soft-clipped Illumina HiSeq 2000 paired-end sequence reads from crisis-stage MRC5HPVE6E7 and 17p TALEN-treated HCT116 cells

17p junction coordinates Sample Junction-proximal sequence (chromatid 1 chromatid 2) (chromatid 1: chromatid 2) Crisis-stage MRC5HPVE6E7 109:telomere GCAACTTCCAGTAGTA ACCCTAACCCTAACCCT 415:telomere TCCTGCATATTGACC CTG ACCCTAACCCCTAACC 679:telomere GGTTATCTCTTTATG ACCCTAACCCTGA 797:telomere GTTGTAAAAGTTC CGTT ACCCTAACCCTAACCC 956:telomere TGTACGAAGTAATGTG ACCCTGACCCTGACCC 1024:telomere AGTGGGGCTGGCTGCC CTAACCCTAACCCTAA 1219:telomere TGGGAGGATCATATC CCTAACCCTAACCCT 1269:494 TACCTCACCTTGTGC TGATTCCATTTGA 1451:telomere GTAACCTGGGAACC TAACCCCTAACCCT 1587:telomere GCAATTATAAGCTTTAC AACCCTAACCCTAA 1906:telomere TAAGGCAAGAGCCT AACCCTAACCCTA 2109:telomere CCCAAGATTTGGT A CCTAACCCTAA 2219:telomere CATTTTGTCTTTCAAC CCTAACCCTAACCC 2260:telomere AAAGTTGTAAATTT AACCCTAACCCT 2325:telomere AGATGTAAGAAAC CCTGACCCTGACCC 2416:telomere CTTTGATCCATTGTAA CCCTAACCCTAACCC 2452:telomere GTAAGGATACAAT CTAACCCTGACCC 2512:telomere TTGAGGACAGGATTCT AACCCTAACCCTAA 2624:telomere GTTTATTCCTGTGCA CCCTAACCCTAA 2710:telomere TCCACTTCTGTCCC TAACCCTAACCC 2820:telomere AAGATACATTTTTAC CCTAACCCTAACC 2891:telomere TTGTCGTTTCATTC TCCTAACCCTGA 3012:telomere TTTGGCTAGGCACCC TGACCCTGACCCTGA 3060:telomere TGAGGAAGGGTTAG CGATAGGGA TAACCCTAACCCT 3070:telomere TAGGGTTAGGGTTA ACCCTAACCCTAA telomere:telomere TTAGGGGTTAGGGT AACCCTAACCCT t48 HCT116 WT 46:2925 GCATGGTATTGACAT ACCAATAGCAATATT 103:2999 TATTTCTAGGCAA C(A)TTCCA TTTTATATGTATATG 103:2763 TTTCTAGGCAACTTCCA CAAAGCAATCCGTGGA 124:3006 AGTACAAATACTACT ACTAGTA CTAGCCAAACATTC 140:2967 GTTTCTTCCCTTAGA GATTATTCTCTAAA 159:3004 CCAACAGTTATTGCT AGCCAAACATTCCAT 227:2952 GACACTCTCCCAG TGT CCAAATTATGAAAAT 199:3008 TACTCCGGTATGT C GCCTAGCCAAACA 216:2532 ACTGAAAGCATGGA ATGAATAATCTCTA 244:2532 AGGTAACTACCACATAACC AGTAAATTGGGGAA 261:376 CACATAACCCCTAGATCAGGAAATCA AAACTGGATCCTAAT 631:2783 TTGTCAACACTTGTTA AGTTTCCCAAAGTA 980:572 TCTTTCGTACGCTA ATGAAATGAT GCAGCCTATGTGTAA 1260:2566 CCTGCATTCCTTCCATATACCTC TATGCTCAACTGAT 1426:411 CAATCTGTAGGAAAGTC TGGGGAACAGTTCCAG 2289:329 ATTGATTTATTTTTCCTT CCAAGTGTGGCATGA 2392:2757 GAGTTTTTAGTTTTCACCT TTGGGATTTTTGTAG 2529:166 CTTCCCCAATTTACTGGTA CAGCAATAACTGTTG 2519:415 GACAGGATTCTTCCCCAA GGTCATAAATATGCA 2602:143 CAATTCAATTCAGTTGA ATCTAAGGGAAGAAA 2674:293 AATAAAATTTGAAATTG TGGACTGAATTATC 2904:889 TTTCATTCCATTTATTCCCAA AGTACACAAAGCAAT 2948:889 TAAGGTAGAATTTTCATAA CTCTTATTACGTTCC 2972:216 TTTAGAGATTATTCATTCC ATGCTTTCAGTACA 2994:477 ATACATATAAAATGGA TACCCCAAAGACCTG 2998:189 CATATAAAATGGAACGT ATTTTGGTCCCACA 2995:103 TATACATATAAAATGGAA GTTGCCTAGAAATAT 3024:telomere ACCCGGGCTCATACCT AATCCCAGCACTTT 3020:telomere GGCACCCGGGCTCAT AATTGTAAAAGCACC -/- t48 HCT116 TP53 172:3009 TCCCCAACAGTTATTGC CTAGCCAAACATTCCA 66:3017 TTAAAAAGTAACT AGCCCGGGTGCCT 52:2961 ATTTGGTGTAGAGA ACAACCATGTCAA 3015:412 GCATATTTATGACC CGGGTGCCTAGCC 3009:197 ATGTTTGGCTAGGCA AAATACCGGAG 387:3015 CCTGGAACTGTTCCC GGGTGCCTAGCCA 3009:755 TTTGGCTAGGCA AAGTCTT TTGAAAAAAGACTTT 382:3000 GTTTTCCTGGAACTG AAACATTCCATTTTA 2945:3015 AAGGTAGAATTTTCA CCCGGGTGCCTAG 3015:2672:141 CAGGCACCCGGG TTGGGA..CAAGG GGAC ATCTAAGGGAAG 71:2846 AAAGTAACTTATTT AAGGAAATTAAAC 2983:3008-3016:3008 CATATACA CCCGG GCCTAGCCAAA 80:2953 TTTCCAATAAT TT CCAATAATTTTCC 65:2999 TTAAAAAGTAAC ATTCCATTTTATAT 81:2890 TTTTCCAATAATG AAACGGCAAGAC 88:2913 CCAATAATGTCATATT ATACTTAGGAATAAA 88:3021 TTTCCAATAATGTCATATT AGCCCGGGTGCCTAGC 98:3015 ATATTTCTAGGCAAC CCGGGTGCCTAG 107:3005 GGCAACTTCCAGTAG CCAAACATTCCAT 109:2270 CAACTTCCAGTAGTA TTTCCTCAAAATT 110:3005 CTTCCAGTAGTAG CCAAACATTCCATTT 143:3016 TCCCTTAGATTC TAG GCCCGGGTGCCTAGC 140:3009 GTTTCTTCCCTTAGA AGAATGTT TGCCTAGCCAAA 167:3016 TGCTGTACCAGA CAATTT GCCCGGGTGCC 146:3015 TCCCTTAGATTCCCC GGGTGCCTAGCCA 165:3002 CAGTTATTGCTGTACCA AACATTCCATTT 171:2722 TGTACCAGATTTG TA CCCAAAGGCCTTGGG 3016:2988 TGCTGTACCAGA GACAATTT GCCCGGGTGCCTAGCC 168:2894 ATTGCTGTACCAGAT ATGGAATGAAACGAC 174:2974 TACCAGATTTGCAG CATATACATATAAAA 188:2914 GTGTCCCACAAAATA TTATACTTAGGAA 201:3009 TACTCCGGTATTTTG CCTAGCCAAACATT 202:2803 TACTCCGGTATGTTGT GGATTTGTAGACC 203:3010 CCCACAAAATACTC ACAAAATACTCCGGTATGTT GTACCTAGCCAAACATT 250:2844 CATAACCCCTAGA AAGG AAGGAAATTAACTTC 243:2778:2773 AGGTAACTACCACATAAC AATGTGG TCCCAAAGTACACAAA 253:2720 AACCCCTAGATCA ATAGGG GGGATTTTTGTAG 253:2435:2977:2987 CTACCACATAAC TACCACATATATACATA 236:3015 CCAGGTAACTACC CGGGTGCCTAGC 245:2975 CTACCACATAACCC TAGGAATGAATAAT 261:3000 TAGATCAGGAAATCA TTCCATTTTATATG 255:3011 CCCCTAGATCAGG TTATGT GGTGCCTAGCCA 259:3020 AGATCAGGAAAT GA ATGAGCCCGGGTG 288:2878 CCTACATGATAATTCAGTC TTAGATGAGGGAA 294:2763 GATAATTCAGTCACAAA GCAATCCGTGGAT 323:3006 TTTACCTCATGCCACACT AGCCAAACATTCCA 335:2920 ACTTGGGAGTATAAT TAGCAATATTATA 358:3049 TTTTTTCATTAGGA GGCTGG CAGCCTCTCAACC 426:2943 TGACACATTTAAA ATTCTACCTTAAACA 559:904 TTTACACGTAGGC AGAGACCACACA 794:3004 ATGGAGTTGTAAAAG CCAAACATTC 905:583 GTGTGGTCTCTGCT TGTGACAATATGA 955:2984 TGTACGAAGTAATGT ATATGCTAGGAA 1146:2597 TAACCAATCTGCCTG AATTGAATTG 1193:1193 TGCTCTCCCAAC CCTGTCCTTTATT 1742:2664 TTACAACGTGGCT G TTTTATTACAAAGGA 2243:908 GTCTTCTGAGAC A TGCAGCAGA 2404:257 CACCTGTTTTGGTCTTC ATGTAGGAACAAT 2435:242 TAAGTTAATTTTTTATGTGGTA GTTACCTGGGAGA 2489:383 GCAAGTTGCCTTACA GTTCCAGGAAAACAA 2511:407 TTGAGGACAGGATTC A TAAATATGCAGG 2565:2006 TGAGCATAGAGGTA CACGTCATGCCCC 2680:523 GAAATTGGGATGT TACAAATAGATTTA 2763:288 CGGATTGCTTTGTG ACTGAATTATC 2744:2854 GTAGGAATAACATCGA CATTACTTAAGGAAAT 2772:84 TTTGTGTACTTTGGG AAGTTGCCTAGAAA 2773:2790 CTTTGTGTACTTTGGGA CCACATTGTTA 2772:102 TTTGTGTACTTTGGG AAGTTGCCTAGAAA 2844:250 AAGTTTAATTTCCTT CCTT TCTAGGGG 2851:2835 TACTGGAAGTTTAAT TTCCTTAAGTAAT 2846:71 TTTAATTTCCTTAA ATAAGTTACTTTTTAA 2928:215 GTATTGTATTGT CAGA CCATGCTTTCAGT 2939:161 TTGTTTAAGGTAGAA CAGCAATAACTG 2982:488 TTCATTCCTAGCTATATAC CCCAAAGACC 2959:3008:488 TTTGGTTTAGA ATAGG GCCTAGCCAAA CATTCCTAGCT 2978:827 TTCATTCCTAGCAT CAGTGTCTGTTATCC 2990: 444 GCATATACATACAAAA ACCTGTATGCTCTT 2991: 518 TAGCATATACATATAAAAT AGATTTACCATAT 2995: 374 TACATATAAAATGGAA AACAAAACT 2987: 2987 TAGCATATACATATA TGG TATATGTATATATG 2998:218 TAAAATGGAATGT CCATGCTTT 2961:2301:1830 TTGGTGTAGAGA CACAGACCACAA AGGGGATGAGGT 2973:530 AGATTATTCATTCCT CAT TCATGAAGTTACA 2980:251 CATTCCTAGCATAT CTAGGGGTTATG 2980:358 ATTCCTAGCAT TTT TCCTAATGAA 2983:3015 TCCTAGCATATACA CCCGGGCCTAGCCAA 3000:536 AATGGAATGTTT CCTCATGAA 3001:253 AATGGAATGTTTG ATCTAGGGGTTA 46: 2990 TATACATATAAAATG TCAATACCATGCCA 3009:2731 TGTTTGGCTAGGCA TACAAAAA 2998:56 AAAATGGAATGT ATA TAAAACAACCAT 3011:3002 GGCTAGGCACC AAACAT GCT 3007:312 AATGTTTGGCTAGG TAAAAGTGAAGT 2998:221 TACATATAAAATGGAATGT CCATGCTTTCAGTAC 3001:504 ATGGAATGTTTG T TATGAGTCAATGAGTC 3001:553 AAATGGAATGTTTG GCCATA GTGTAAAATAATTT 3010:204 ATGTTTGGCTAGGTAC AACATACCGGAGT 3015:229 TTGGCTAGGCACCCGGG AGAGTGTCCAT 3019:127 GGCACCCGGGCTCA CAAGTTG 3023:40 CCCGGGCTCATACC ATGAGCCCGGGTGCC 3009:2807 GTTTGGCTAGGCA AACA TTATTTGTGGAT 3009:896 AATGTTTGGCTAGGCA AACATTATAGGTT 3001:503 TGGAATGTTTG TTATGAGTCA ATGAGTCAATGA 3009:841 GTTTGGCTAGGCA TTAGATACTTCATCAG 3009:267 GTTTGGCTAGGCA ATGCTGATTTCCTG 3016:167 TAGGCACCCGGGC AAATTGTC TCTGGTACAGCAAT 3023:475 CCCGGGCTCATACC CCAAAGACCTGA 3027:1973 GGGCTCATACCTATA CG CAGCAGGCGG TT AATGACAAACA 3019:2022 CTAGGCACCCGGGCTCA GTCCTAGGAATACA 3009:67 GTTTGGCTAGGCA AACATTAAATGTTACTTTAA AAGTTACTTTTTAA 3015:233 CTAGGCACCCGGG AG AGTTACCTGGGAGAG 3016:732 AGGCACCCGGGC AGCCGCGG CTTCACCAACAAG 3023:2000:522 GGCACCCGGGCTCATGCC CCTTGTTGCTTGCAAAT AGATTTACCATAT 3041:117 AACCCAAGCAGGTTG CCGCCGTTGTTG CTTTGTACTACTA telomere:525 GGCTTAGGGCT AAGTTACAAATA telomere:242 GGGTTAGGGTTA TGTGGTAGTTACC t48 HCT116 PARP1-/- 53:2132 ATGGTATTGACATGGTTTTT CCCCACACAGACCA 64:2459 GTTGTTTTAAAAAGAAA ATTGTATCCTTACC 78:2920, AACTTATTTTCAAATA GCAATATTATACTTAG 59:2956 ACATGGTTGTTTTAAAA CCAAATTAAGAAAAT 79:2552 TTATTTTCCAATAA CTGATTTTAGACAA 90:2735 CCAATAATGTCATATTTC CTACAAAAATCC 92:2962 AATAATGTCATATTTCTA AACCAAATTATGAA 95:3003 TCATATTTCTAGGC CAAACATTCCAT 158:2757 CCCAACAGTTATTGC AATCCGTGGATTC 186:2941 CAGTGTCCCACAAAA CCAAATTATGA 159:3006 CCAACAGTTATTGCT AGCCAAACATTCCA 218:3003 ACTGAAAGCATGGCCA AACATTCCATTT 259:2174 AGATCAGGAAAT ATACTCTCAACAC 266:2997 AGGAAATCAGCATT CCATTTTAT 275:2085 AAATCAGCATTGTCCTACA GTGCAA 332:3004 CTTGGGAGTAT GTTG AGCCAAACATTCC 359:2974 TTTTTCATTAGGAT GAATAATCTCTAA 390:3017 AACTGTTCCCCAG CCCGGGTGCCTA 466:3006 TGTTTTCAGGTCT AGCCAAACATTCC 522 :3033 ATCTATTTGTAA ATCTATA TTGGGTTATAG 535:1988 TTCATGAGGAAA T TTGCTTGCAAATTT 535:1988-1975:1988 AACTTCATGAGGAAA T TTGCTTGCAAATT 724:585 CATCTTTTCATGTGCTTGT GACAATATGAAAT 780:3003 TTGTCTTTGCTA AACATTCCATTTT 976:1294 TGTGTCTTTCGTATGCAAATGAAA AGCCAGATGAAGAG 1193:1163 TGCTCTCCCAAC CCTGTCCTTTATT 1608:2161 TTTACTGTCTCTCAACACAGAACACTTC TGTTACCAGATA 1988:535 TTGCAAGCAA GCAAATTTGCAAGCAAA TTTCCTCATGAAG 2067:742 ACTGTCACAATT AATACACACTTCACC 2261:305:326 TAAAAGTTGTAAATTTT ACCTCATG CCAAGTGTGGCATGA 2465:390 TTTCATTTCCCTTT CATTTT CCTGGGGAACAG 2549:201 TGGACTTTGTCTAAAATCA AAATACCGGAGTATT 2591:171 TCTGTCTGGACTCCCAAATC TGGTACAGCAATAACT 2762:499 TCCACGGATTGCTTTGT CAATGATTCCATTC 2927:625 ATATTGCTATTGGTATTG ACAAAGATGTGCA 2956:186 CATAATTTGGTTT TGTGGGACACTGC 3000:192 TACATATAAAAT GGAGTATTTTG 2956:212 TCATAATTTGGTTT CAGTACAACATACC 2982:488 GATTATTCATTCCTAGCTATATAC TCCTAGCTATATAC 2992:581 TATACATATAAAATG TGACAATATGAA 2997:266 ATACATATAAAATGGAATG CTGATTT 2993:165 ATACATATAAAATGG TACAGCAATAACTGT 3001:821 TATACATATAAAATGTCTG TTATCCAGAATG 2989:371 GCATATACATATAAA ACAAAACTGGATCC 2991:2106 GCATATACATATAAAAT CTTGGGTACCAGCTG 2992:121 GCATATACATATAAAATG TACTTTGTACTACTA 2990:372 CTAGCATATACATATAAAA CAAAACTGGATCCT 3009:2610 TGTTTGGCTAGGCA GAAAGATCAACTG 2997:600 ACATATAAAATGGAATG AACTCTTAAACACTGCT 3015:2721 TGGCTAGGCACCCGGG TG CCAAAGGCCTTGG 2992:563 TACATATAAAATG TAGCCTATGTGTAA 3001:585 AAAATGGAATGTTTG TGACAATATGAAATGGT 3009:2610:2278:262 TGTTTGGCTAGGCA GAAAGATCAACTGAAT CACGTTCC CAATTTAT CTGATTTCCTGA 3022:230 CCCGGGCTCATAC ACACCAT..TACTAA TACCTGGGAGAGGG telomere:telomere TTAAGGGTAATAT CCACACCATGTAGG t48 HCT116 LIG3-/- 45:3004:3009 CATGGTATTGCCA GCCAAACAT GTT TGCCTAGCCAAA 84:3001 ATAATGTATAATGTCA AACATTCCATTTT 88:2416 AATAATGTCATATT ACAATGGATCGAA 109:2947 ACTTCCAGTAGTA TGAAAATTCTAC 112:2987 ACTTCCAGTAGTAGTA TATGTATATGCTAG 127:2985 AGTACAACTTGTGT ATGTATA 131:2956 TACAACTTGTTTCT AAACC AAACCAAATTAT 169:2954 TTGCTGTACCAAATT ATGAAAATTCT 221:2992 AGCATGGACACTC ATTTTATATGTAT 266:2874 TCAGGAAATCAGCATT AGATGAGGGAAATTA 407:2979 TCCTGCATATTTA TGCTAGGAA 2301:156 CCTTTTGTGGTCTGTG GGG AATAACTGTTGGG 2424:604 CCATTGTAAGTTAACTT GAACTCTTAAACACTG 2673:764 AAAATTTGAAATT TGAAATTGGG ATCTAAAAATTG 146:2940 TAAGGTAGAAT C GGGGAATCTAAG 2940:503 ATTGTTTAAGGTAGAAT GTGTCAATGATTCC 2951:731 TTTTCATAATTT CACCAACAAGCACAT 2990:372 CATATACATATAAAA CAAAACTGGATCCT 2995:661 CATATAAAATGGAA CTCACACAA 2998:276 ATATAAAATGGAATGT AGGAACAATGCTGATT 2992:538 CATATAAAATG TTTCCTCATGAAGT telomere:457 TTAGGGTTAGGGTT AACAACTGTTCAAACA t48 HCT116 LIG4-/- 113:1901 CAGTAGTAGTAC TTGCCTTAAG 107:3005 AGGCAACTTCCAGTAG CCAAACATTCCATT 111:2977 ACTTCCAGTAGTAGT TGCTAGGAATGAAT 141:2750 CTTCCCTTAGAT GTGGATTCGATGT 191:2770 CCACAAAATACTC AAAGCAATCC 261:376 CCCCTAGATCAGGAAAACA AAACTGGATCCTAA 253:2992 AACCCCTAGATCA TTTTATATGTA 279:2782 TTCCTACATGAT GGTAAGTTTC 334:3000 TTGGGAGTATAA ACATTCCA 356:3005 TTTTTTCATTAG CCAAACATTCCA 378:2984 GTTTTCCTGGA TGTATATGCT 523:3014 CTATTTGTAAC CCGGTGCCTGG 588:2992 CACAAGCAGTG A CATTTTATA 620:3053 TCTGCACATCTTTGTC CTCAGCCTCTCAACCTGCTTGG 805:738 TTGTAAAAGTTCTTTATACA CTTCACCAACAAGC 1566:2875 ATTTAGCTCCTTAG ATGAGGGAAATTATAAGACATT 1790:2054 CTAATGGAGAAAACATCAGA CCCCACAGGCTAAGGG 2677:374 ATTTGAAATTGGGA AAACAAAACTGGAT 2701:724 TCAGCTTATCCAC AAGCACAT 2697:79 TCAGGATCAGCTTAT TGGAAAATAAGTT 2782:1104 GGGAAACTTACCA TCAGCACCTCTCCCC 2759:732 TCCACGGATTGCTT CACCAACAAGC 2806:2255-2219:726 AATCCACAAATA CAACTTT...AAAAG CAACAAGCACA 2868:518 TTGGTATTGTTTAA ATAGATTTACCAT 2948:424 TAGAATTTTCATAA ATGTGTCAAG 2957:2507 AATTTGGTTTA AA CCTGTCCTCA 2950:140 ATTTTCATAATT CTAAGGGAAGAA 2966:661 GTTTAGAGATTATT AACTCACACAAGA 2993:165 ACATATAAAATGG TACAGCA 2955:450 TCATAATTTGGTT CAAACAAAAACC 2977:111 TATTCATTCCTAGCA ACTACTACTGG 2956:250 AATTTGGGTT T TCTAGGGGTTA 2990:459 TATACATATAAAA CAACTGTTCA 2998:625 ATGGAATGT GTTGAC 3003:2432 GAATGTTTGGC ACATAAAAAAT 2986:241 CTAGCATATACATAT GTGGTAGTTAC 3001:298 TAAAATGGAATGTTTG AGTTTGTGGACT 2977:382 CATTCCTAGCA GTTCCAGGAAA 2984:376 CTAGCATATACAT G CAGGAAAACAAAA telomere:565 TTAGGGTTAGGGT GCAGCCTATGTG t48 HCT116 LIG3-/-:NC3 92:2939 ATAATGTCATATTTCTA CCTTAAACAAT 107:2922 TCTAGGCAACTTCCAATAG CAATATTATA 173:2858 ACCAGATTTGCA ATAAGACATTACTT 273:3009 AGCATTGTTCCTA GCCAATGTT TGCCTAGCCAAA 321:2443 TTACCTCATGCCACA TAAAAAATTAACT 363:2299 CATTAGGATCCAG ACCACAAAAGG 518:2400 TAAATCTATTT TC ACCAAAACA 724:585 TTTTCATGTGCTTGT GACAATATGAAATGGT 1056:544 CATGAGAGAATTTGA TGTTTCCTCA 1193:1193 TGCTCTCCCAAC CCTGTCCTTTATTAC 1280:830 TGCATCTCTTCATC AGTGTCTGTTA 1263:1163 TTCCATATACCTCACC AGATATGATCCT 1667:1624 AGGGACCAAACCTTG CAGTATAGGAA 1918:1900 CTGTATGTTCCCTT AAGGGAAGCGCTT 2870:81 TAATTTCCCTCAT TTATTGGAAAAT 2882:909 TAAGTCTTGTC T CTGCAGCAGAGACC 2916:472 TAAGTATAATATTGC CCCAAAGACCTGAAA 2992:299 CATATAAAATG TGAGTTTGTGGA 2992:435 TACATATAAAATG TATGCTCTTTAA 3007:telomere GTTTGGCTAGG GTTAGGG 362:telomere ATTAGGATCCA AACCCTAACCCTA telomere: 1807 TTAGGGTTAGGGT TATGGGACATGCT telomere:3023 GGGTTAGGGTTA GGTATGAGCCCGG telomere:3016 TTAGGGT CCCGG GCCCGGGTGCCT telomere:2578 TTAGGGTTAG ACAGAAAA telomere:2974 TTAGGGTTAGG AATGAATAAT telomere:2982 AGGGTTAGGGT ATATGCTAG t48 HCT116 LIG3-/-:LIG4-/- 89:2991 TTCCAATAATGTCAT ATTTTATATGTAT 103:2995 TTTCTAGGCAACTTCCA TTTTATATGTATATG 124:3006 TAGTACAAAGTACAA GTACAAAGTACAA CTAGCCAAACATTC 145:2953 CCCTTAGATTCCC AAATTATGAAAA 188:2952 TGTCCCACAAAATA TGAAAATTCTACC 211:2715 GTTGTACTGAAAGC CTTGGGACAGAAGT 223:2999 ATGGACACTCTC AACATTCCATTTTA 236:2937 CCCAGGTAACTACC TTAAACAATACCAAT 251:2999 TAACCCCTAGAT TT AACATTCCATTTTA 278:2971 TGTTCCTACATGA ATGAATAATCTCTA 306:2918 AAACTCACTTCACTT AGCAATATTATACTT 435:2952 TAAAGAGCATACA AATTATGAAAATTC 503:2272-2280:3000 TCATTGACTCAT TGATTTA AAACATTCCATTTT 702:540 TATGGTAATGATGTT TCCTCATGAAGTTAC 853:1863-1880:2951-2976:3007 TAATGTGCAGACATTTTCT GCCCAGAAAG TGGTTTAGAGATTATTCATT CCTAGCCAAACA 912:921 TCTCTGCTGCAGTGT TTCAGGACA 889:2904 AACGTAATAAGAGTTA GGAATAAATGGAATGA 1193:1163 GCATCTGTGTTGCTCTCCCAAC CCTGTCCTTTATT 1234:1117 CTGGCGAGGGCATGGG TGATCAACTCAACCAT 2072:2949 CTGTCACAATTGATT ATGAAAATTCTACCTT 2317:1282 TTTGGTGTCAGATG AAGAGATGCACAAGGT 2950:691 TAGAATTTTCATAATT AAAACCA 2934:164 GGTATTGTTTAAGG TACAGCAATAACT 2957:109 CATAATTTGGTTTA CTACTGGAAGTTGCCT 2939:526 TTGTTTAAGGTAGAA GTTACAAATAGAT 54:2880 TATTGACATGGTTGTTT CATTCCATTTATTCCT 2982:488 GAGATTATTCATTCCTAGCATATAC 102:2898 TTTCTAGGCAACTTCC TAAGTATAATATTG 2964:154 TTGGTTTAGAGATTA ACTGTTGGGGAATC 2963:2942 TTTGGTGTAGAGATT CATAATTTGGTGT 2992:538 TATACATATAAAATG TTTCCTCATGAA 2961:2755 ATTTGGTTTAGAGA ATCCGTGGATTCGA 2991:2667 ATATATATATAAAAT TTTATTACAAA 3003:968 ATGGAATGTTTGGC GTACGAAAG 2988:2600 TAGCATATACATATAA CTGAATTGAATTGGGAG 2978:280 TCATTCCTAGCAT GTTCC TAGCATGTAGGAACAA 2987:678 AGCATATACATATA CAT ATAAAGAGATAACCC 2997:924 ATAAAATGGAATG AATAAT CTCTGTTTCAGGAC 2992:405 CATATACATATAAAATG CAGGAAAGTCTGG 2996:406 CATATAAAATGGAAT ATACATATAA AAATATGCAGGAAA 2988:525 CATATACATATAA GTTACATAT AAGTTACAAATAGA 3017:427 AGGCACCCGGGCT TTAAATGTGTCAA 3020:545 AGGCACCCGGGC TAATTTGATGTTTC 3017:2732-2710:212 AGGCACCCGGGCT ACAAAAATCCCAAAGGCCT TGCTTTCAGTACAA t48 HCT116 LIG3-/-:TP53-/- 37:3017 TTTCCTGGCATGG ACTGGACCACAGT AGCCCGGGTGCC 81:3020 TTTTCCAATAATG AGCCCGGGTGCCT 103:2995 TTCTAGGCAACTTCCA TTTTATATGTATATG 120:3006 GTAGTACAAAGTAC TA CTAGCCAAACATTC 127:122 CAAAGTACAACTTGT ACTTTTGGC 120:2914 GTAGTACAAAGTAC T AATATTATACTTAGG 110:3001 CTTCCAGTAGTAG CAAACATTCCATTTT 120:3010 TAGTAGTACAAAGTAC TACTACAATAACTATAACAAA GTGCCTAGCCAAA 122:2517-2536:3006 TAGTACAAAGTACAA TTTACTGGTAATGGA CTAGCCAAACATTCCA 150:2999 TTAGATTCCCCAACA TTCCATTTTATATGT 164:191 CAGTTATTGCTGTACC GGTATGTTGTACTG 188:2986 TGTCCCACAAAATA TGTATATGCTA 159:3006 CAACAGTTATTGCT AGCCAAACATTCCA 192:3002 TCCCACAAAATACTCC AAACATTCCATTTT 214:3010 GTACTGAAAGCATG CCTAGCC T GTGCCTAGCC 211:3004 TGTTGTACTGAAAGC CAAACATTCCATT 200:3009 CTCCGGTATGTT TGCCTAGCCAAACATT 245:2772 TACCACATAACCC AAAGTACAC 230:164 CACTCTCCCAGGTA CAGCAATAACTGTT 263:3004 GATCAGGAAATCAGC CAAACAT 275:2994 CATTGTTCCTACA G TCCATTTTATATGT 2996:2673-2646:260 CATATAAAATGGAAT TTCAAATTTTATTACAAAGGAA CAGCATTGTTCCTACATGA 332:3022 CACTTGGGAGTAT GAGCCCGGGTGCC 338:3010 GGGAGTATAATGTG CCTAGCCAAACATTC 334:2999 ACTTGGGAGTATAA CATTCCATTTTAT 422:2787 TATTTATGACCTTGA CACATTGTTAAG 392:2649 AACTGTTCCCCAGAC AGTAATAAAAACAGT 475:2427 TCAGGTCTTTGGGG TATGTGGTA 435:2998-2989:422-430:3004 TTTAAAGAGCATACA TTCCA TTAAAG AGCCAAACATTCCATTTT 488:2574-2583:1713 TATATACCTAGGAATG TGTCTGGACT ACA TGCCAATC 489:3003 ATATAGCTAGGAATGG CCAAACATTCCATT 505:3017 TCATTGACTCATAT AGCCCGGGTGCCT 515:142 ACTCATATGGTAAATCTA AGGGAAGAAACAAG 523:2564 ATCTATTTGTAAC CTCTATGCTCAA 521:3022 GGTAAATCTATTTGTA TGGGCCCGGGTGCC 578:2994 CCATTTCATATTGTC CATTTTATATGTATAT 588:2992 TGTCACAAGCAGTG A CATTTTATATG 629:583 TCTTTGTCAACACTTGT GACAATATGAAATGG 650:1333 TAGTATAACTATTC CTGACTCATTACC 724:585 CTTTTCATGTGCTTGT GACAATATGAAATG 743:2098 AAGTGTGTATTAAA CA CCAAGATTTGGTC 726:2826 CATGTGCTTGTTG TAAATGTAAAAATGT 753:535 TATTAAAGTCTTTTTTC CTCATGAAGT 781:1342 TTGTCTTTGCTAT CA AGTGTACA 866:1145 TTTCTTCCATTTTATAGG CAGATTGGTTACGT 881:2495-2505:3001 TTGGAACGTAAT GTTGAGGA CAAACATTCCATTTT 857:3010 GAACGTAATAAGAGT GCCTAGCCAAACATTC 981:408 GCTAATGAAATGATT AAATATGCAGGAAAGT 1044:1911 AAGCAACCTTGTCAT ACAGGCTCTTGCCT 1061:2428 GAGAATTTGAAATTT AAAAAATTAACTTACAA 1162:1147 GTAATAAAGGACAGG CAGATTGGTTACGTCT 1174:1193 GGACAGGGTTGGGAGAGCA ACACAGATGCTCTC 1258:477 ATTCCTTCCATATACC CCAAAGACCTGAA 1327:1385 GGTAATGAGTCAGG TTCAGCAATTTGCT 1410:560 GTTGTGGGAGCCTCC GTGTAAAATAATT 1618:2986 TCCTATACTGCA TATGTATATG 1672:1915 AAACCTTGGAACA TACAGGCTCTT 1762:2504 GTGATGGTGCTAGG TCCTCAACAAGTGAT 1874:1594 GCTTTCTGCCCAGA GACAGTAAGGTTT 2110:1667 CAAGATTTGGTCCCTGGAACTTACAAGTAAT 2311:2967 GTGCTTTTGGTGT G GAATAATCTCTAAA 1548:2414 CATTTTGCTGTTC TAAGTTAATTTTT 2508:631 TGAGGACAGGA CATG TAACAAGTGTTGA 2535:3017 TTTACTGGTAATGGAC CT AGCCCGGGTGCCTAG 2573:161 GGTATTGTTTTC A ACAGCAATAACTGT 2581:144 GTTTTCTGTCTGGA ATCTAAGGGAAG 2860:711 GTAATGTCTTATAA GATGCTTAACATCAT 2858:281 AGTAATGTCTTAT CATGTAGGAACAA 2920:335 TAATATTGCTATT ATACTCCCAAGTG 2948:54 GTAGAATTTTCATAA ACAACCATGTCAATA 155:2951-2938:2951 CCCCAACAGTTAT AAATTATGA AAATTATGAAAATT 2957:891 CATAATTTGGTTTA TTAACTCTTA 2952:2222 GAATTTTCATATTTTG TTCATGTCTTCTGAG 2955:261 TTCATAATTTGGTT GATTTCCTGATCTAG 765:2990 TTTCAATTTTTAGATT TTTTATATGTATAT 2987:2560 CATATACATATA TG CTATGCTC 2964:369 TTTGGTTTAGAGATTA ACAAA...AAAAA ACACATTA 65:2999 TTGTTTTAAAAAGTAAC ATTCCATTTTATAT 2975:573 ATTATTCATTCCTAG TATGAAATGGTGCA 2991:638 TAGCATATACATAAAAAAT AACAAGTGTTGAC 2980:731 TTCATTCCTAGCATAT TCACCAACAAGCA 2987:162 CTAGCATATACATATA CAGCAATAACTGTT 2998:917 ATAAAATGGAATGT G TCAGGACA 2996:517 ATATAAAATGGAAT AGATTTACCAT 2987:478-472:870 TCCTAGCATATACATATA TATATAC A TATAT ACAACCTAT 3002:226 TAAAATGGAATGTT TGGGAGAGTGTCC 3007:2586 GAATGTTTGGCTAGG GAGTCCA 2982:488 TTATTCATTCCTAGCATATAC CCCAAAGACCTGAAAAC 2987:212 TAGCATATACATATA TGT TGCTTTCAGTACAA 3009:700 AATGTTTGGCTAGGCA TCATTACC 3006:97 ATGGAATGTTTGCCTAG AAATATGACATTA 3007:74 GGAATGTTTGGCTAGG AAAATAAGTTACTT 2997:348 CATATAAAATGGAATG AAAAAAAAAACACATTATA 3014:808 GCTAGGCACCCGG AATGTATAAAGA 3010:383 TGTTTGGCTAGGCAC AGTTCCAGGAAAACAA 2982:1323 CATTCCTAGCATATAC TCATTACCAAGGATC 3011:530 TTTGGCTAGGCACC TCATGAAGTTACAA 3015:2901 GGCTAGGCACCCGGG AATAAATGGAATGAAA 2964:3016 ATTTGGTTTAGAGATTA G GCCCGGGTGCCTAGCCA 3009:518 GGAATGTTTGGCTAGGCA AAATAGATTTACCATA 3010:120 TGTTTGGCTAGGCAC TTTGTTATAGTTATTGTAGTA GTACTTTGTACTACTA 3011:819 TTGGCTAGGCACC AAACAT TCTGTTATCCAGAA 3000:1062 TAAAATGGAATGTTT TCTTTTGTCACTGGGCC GAAATTTCAAATTCTCT 3010:520 TGTTTGGCCAGGCAC AAATAGATTTACCA 3016:96 TTTGGCTAGGCACCCGGGC CAAACAT GCCTAGAAATATGACA AAACAACCATGTC 3014:289 GCTAGGCACCCGG ACTGAATTATCA 3007:3024 TGGAATGTTTGGCTAGG TATGAGCCCGGGTG 3015:2964-2931:294 TAGGCACCCGGG TGCC TAATCTC...GTCTACC TTTGTGGACTGAATTAT telomere:74 TTAGGGTTAGG AAAATAAGTT ACATTATTGGAAAATAAGT telomere:747 TTAGGGTTAGGGTTAG ACTTTAATACACACTTCA t120 HCT116 WT 125:2534 CAAAGTACAACTT TGTCTA 105:2675 ACTTCCAGT TTCAAATTT 188:2858 TCCCACAAAATA AGACATTACTT 271:2995 ATCAGCATTGTTCC ATTTTATATGTA 276:2998 TTGTTCCTACAT TCCATTTTATA 322:3003 TACCTCATGCCAAAC ATTCCATTTTA 26:1031 TGGATTTTGA GCAACCTTGTC 2919:86 TATAATATTGCTAT GACATTATTGGA 2943:129 AGGTAGAATTTT AAACAAGTTGTA 693:2875-2931:695 AATTATGGT CTTGTCG...TATTGT TTACCATAATT 2956:119 AATTTGGTTT CA TAATTTGTACTAC 2975:249 TTCATTCCTAG GGGTTATGTGGT 2996:2986 ATAAAATGGAAT ATGTATAT 208:3001 GTTGTACTGAA CAAACATTCCAT 2982:811 TCCTAGCATATAC CAGAATGTAT 2984:337 TAGCATATACAT TATACTCCCAA 3024:738 GGGCTCATACCT ACACACTTC 3022:1971 CCGGGCTCATAC ACAAGGATGGT 3009:125 TTGGCTAGGCA AAGTTGTACT t120 HCT116 TP53-/- 46:2998 GGTATTGACAT TCCATTTTATAT 53:2783 TGACATGGTTGTT AAGTTTCACAAA 92:2990 ATGTCATATTTCTA TATGTATATG 114:1172 TAGTAGTACA G GCATCTGCAT 97:3006 TCATATTTCTAGGCAA ACATTCCATTTT 99:3009 TCTAGGCAACT AGAAATATG TT TGCCTAGCCA 173:2995 ACCAGATTTGCA TTTTATATGT 176:3010 GATTTGCAGTG CCTAGCCAAA 164:2627 ATTGCAGTACC TTGCACAGGAA 156:2782 AACAGTTATT AAGTTAAC TGTTAAGTTT 205:3016 TATGTTGTACT AG GCCCGGGTG 195:2637 TACTCCGGT GGT ACAGTGGGAT 201:3016 CGGTATGTTG GCCCGGGTGC 226:3002 GACACTCTCCCA AACAT 189:2770 TGTCCCACAAAATAC ACAAAGCAAT 190:3006 CACAAAATACT AGCCAAACAT 204:2998 GGTATGTTGTAC ATTCCATTTTA 205:3016 TATGTTGTACT CC GGTATGTTGT 189:2926 CCCACAAAATAC CAATAGCAAT ATACCAATAGCAA 215:2973 GTACTGAAAGCATGG AGGAATGAATAAT 264:2927 GGAAATCAGCA ATACCAATAGC 260:2757 CCTAGATCAGGCAATC CGTGGATTCG 285:2992 ATGATAATTCA TTTTATATGTA 277:3009 TGTTCCTACATG TTT TGCCTAGCCA 298:3020 TCCACAAACTCA TGAGCCCGGGTGC 284:3013 TACATGATAATTC GGGTGCCTAGCCA 307:2998 ACTCACTTCACATT CCATTTTATATGT 321:2443 TTACCTCATACCACA TAAAAAATTAACTT 330:3010 CCACACTTGGGAGT GCCTAGCCAAACA 362:3012 CATTAGGATCCA CTCC....TAAG GGGGCCTAGCCAAA 395:2786 CTGTTCCCCAGACATT GTTAAGTTTCC 379:2901 TTGTTTTCCTGGAA TAAATGGAATGA 390:736 ACTGTTCCCCAG CACACTTCACCAA 425:3017 CCTTGACACATTTAA TACACAC CATCACTA GCCCGGGTGC 460: 2776 AACAGTTGTTTTCC AAAGTACACA 471:3019 TTCAGGTCTTTGGG TGAGCCCGGGTGC 488:85 TAGCTAGGAATG ACATTATTGGAAAAT 468:2995 TTCAGGTCTTT TTTATT...TGTGCT TTCCATTTTATATG 476:2859 GGTCTTTGGGGTATA AGACATTACTT 492:2926 GCTAGGAATGGAAT ACCAATAGCA 530:754 TGTAACTTCATGA AAAAAGACTTTAAT 521:2531 ATTACCAGTAAATA TTACCAGTAAAT 578:3014 TTTCATATTGTC CGGGTGCCTAGC 657:3010 CTATTCTTGTGTG CTATTCTTGTGTG 701:2984 AATTATGGTAATGATGT ATATGCTAGGAAT 709:2994 ATGATGTTAAGCATC CATTTTATATGTAT 736:390 GTTGGTGAAGTG TG CTGGGGAACA 726:624 TCATGTGCTTGTTG ACAAAGATGTGC 739:2983 TGGTGAAGTGTATAT GCTAGGAATGA 764:3005 ATTTTTAGAT ATAATA TAGCCAAACA 771:3019 TAGATTGGGTTG AGCCCGGGTGCCT 845:3011 TAATGTGCAGA TTAGATTGG CTAGAAT GGTGCCTAGCC 1174:1193 GGACAGGGTTGGGAGAGCA ACACAGATGCTCTCCCAACCC 1790:2054 AATGGAGAAAACATCAGA CCCCACAGGCTAAT 1916:2629 AGCCTGTATGTTCCC ACTGTTTTTA 2151:127 TAACAGAAGTGTT ACCAT TACAAAGT ACAAGTTGTACTTTGT 2462:316 AATTTCATTTACC ATGAGGTAAAAGT 2535:470 TGGTAATGGAC CAAAGAC 2501:431 ATCACTTGTTGAG C TGCTCTTTAAA 2540:60 TAATGGACTTTGT TTTTTAAAACAAC 2637:195 GATCCCACTGT ACC ACCGGAGTATT 2738:2619 TTTTTGTAGGAATAA ACAGAAAGATCAACT 2769:436 CTTTGTGTACTTT ATGCTCTTTAAA 2770:189 GCTTTGCGTATTTTG TGGGACACTGCAAA 2827:127 TTTACATTTACT T ACAAGTTGTACT 2827:119 TTTTACATTTACT TTGTACTACTA 45:2777:3011 CATGGTATTAACAA TGTGGTC...TGTAA AT GGTGCCTAGCCA 2852:451 CCTTAAGTAATG ATCAAACAAAAA 2603:2808 CAATTCAGTTGAT ACATTTTTACA 2861:2848 AATGTCTTATAAT TTCCCTCATCTAAG 2918:110 GTATAATATTGCTA CTACTGGAAGT 2946:214 GGTAGAATTTTCAT GCTTT 2925:434 TGCTATTGGTAT GCTCTTTAAATGT 2954:2483 TCATAATTTGGT ACA GCAACTTGCTAT 2965:809 TAGAGATTAT GGTATTGTTT 2971:2961 AGATTATTCATTC TCTAAA 2444:2895-2918:249 GCCACGATGG CTTA TTATTC...TTG CTAGGGGTTATGTGGT 2962:493 GGTTTAGAGAT TCCATTCCTAGC 2957:315 AATTTGGTTTA TGAGGTAAAAGT 2960:124 TGGTTTAGAG TTGTACTTTGT 2968:482 GTTTAGAGATTATGCA GCCAACAGACACAT TAGCTATATAC 2983:127 CCTAGCATATACA AGTTGTAC 2994:518 TATACATATAAAATAGA TTTACCATAT 2991:1987 TATACATATAAAAT GGAATGT 3001:472 TGGAATGTTTG TATATA 2974:2946 TATTCATTCCTA ATTTGGTTTA 2987:1218 CATATACATATA TATATATAT ATATGAT 2967:379 AGAGATTATTC CAGGAAAACAAA 2997:347 TATAAAATGGAATG AAAAAAAAACACATTAT 3004:374 AATGTTTGGCT GGAAAACAAA 2994:2401 TACATATAAAATGGA CCAAAACAGGT 3003:609 TGGAATGTTTGGC AGAAACTTGAAC 3012:624 GGCTAGGCACCC T TGTTGACA 3004:427 AATGTTTGGCT TTAAATGTGTCAA 3009:2629 TTTGGCTAGGCA AAGTCTGGGGAA 3014:194 TAGGCACCCGG AGTATTTTGT 3016:2544 ACCCGGGC C TTAGACAA 3016:2014 GCACCCGGGC CTAGGAATA 3024:533 CTCATACCT AAACATC 2995:467 TAAAATGGAA AACCTGAAAA 3017:621 CACCCGGGCT GACAAAGATGT 3011:2828 GCTAGGCACC ATTTA CAGTAAATG 430:2949 TTAAAGAGC CAC TTTGGTTTAGAGA 3008:596 TGGCTAGGC TCTTAAACAC 3016:591 ACCCGGGC C AAACACTGCTT 3016:2903 CACCCGGGC C TAGGAATAAAT 3017:557 CACCCGGGCTCT ACGTGTAAAA 3012:359 GGCTAGGCATCC TAATGAAAAA 3020:1976:1561:174 CCCGGGCTCAT TAAATGA GCAAAAT CTGCAAATCTG 3016:2801:79 GGCACCCGGGC CAAATAAG T TTGCCACGAC TTATTTAAAAAT 3016:2934-2902:2860 CACCCGGGC CTTAA..TTATAC TTATAAGACA 3048:838 TTGAGAGGCT ATTAGATACTTC 17p 3081 telomere:601 TTAGGGTTGGGG TTGAACTCTTA telomere: 260 TTAGGGTTAGGGTTAG ATTTCCTGATC telomere:3024 TTAGGGTTAGGGTTAGG TATGAGCCCG telomere:3017 TTAGGGTTAGGGTT TG GCCCGGGTGCCTAG t120 HCT116 PARP1-/- 72:2942 AAGTAACTTATTTT AAAAAGTAACTT AAATTCTACCTTAA 68:3007 AAAAAGTAACCTA GCCAAACATTCCA 279:12 GTTCCTACATGAT GTGTCTGCTTT 86:475 AATAATGTCATA CCCCAAAGACC 116:2731 CCAGTAGTAGTACAAA AATCCCAAAGG 125:2960 CAAAGTACAACTT TAAACCAAATTAT 143:2353 TCTTCCCTTAGCATTC ATGACC 162:2912 CAGTTATTGCTGTA TTATACTTAG 168:2408 TTGCTGTACCAGAT CGAAGACCAAAA 191:2033 CACAAAATACTC ACAGGCTAATGGC 209:2107-2098:3003 TTGTACTGAAA CAAATCTTG GCCAAACATTCCA 231:2650 CTCCCAGGTAA TAAAAACAGT AACAGTAAAA AACAGTAATAAAAAC 281:2769 TTCCTACATGATAA AAGAACACAAA 293:655 ATTCAGTCCACAA GAATAGTTATACT 312:3017 CACTTTTACCT AGCCCGGGTG 332:2985 CACTTGGGAGTAT GTATATGCTAG 315:3015 ACTTTTACCTCAT AGGCACCGGG TGCCTAGCCAAAC 362:1526 TTAGGATCCA TTCATT 334:2943 TGGGAGTATAA AATTCTACCTT 2725:395 CTACAAAA TCCTGCATATT 504:3005 TCATTGACTCATA GCCAAACATT 629:583 TCAACACTTGT GACAATATGA 724:585 TCATGTGCTTGT GACAATATGAAA 1175:1193 TAAAGGACAGGGTTGGGAGAGCAA CACAGATGCTCT 1248:1573 GGGAGGCCTGCATTCCT AAAGAGCTAAAT 1335:2653 GAGTCAGGAACAGTAA TAAAAACAGTGGG 1314:2630 TTAAAAGATCCTTG CACAGGAATA 1521:133 TTTCCTTAAAGAAGAAA CAAGTTGTACTTT 2033:191 CATTAGCCTGTG AGTATTTTGTGG 2083:1619 ACAATTGGATTGCACTGTAGGA AGTGTCTGG 2086:111 GCACTGTAGGACAC TACTACTGGAAGTT 2250:3017 TGAGACGTAAAAG CCCGGGTGCC 2559:477 CAGTTGAGCATA TACCCCAAA 2587:124 TGGACTCCCA GT AGTAGTACTTT 2769:259 TTTGTGTACTTT TATCATGTAGG 2912:162 CCTAAGTATAATA CAGCAATAACT 2904:362 TTTATTCCTAA TGAA TGGATCCTAAT 2917:647 AGTATAATATTGCT TAGTTATACTAAAA 2925:3009 GCTATTGGTAT AT TGCCTAGCCAAA 2932:525 TGGTATTGTTTAAG TTACAAATAGATTT 2930:585 CTATTGGTATTGCTT GTGACAATAT 2958:291 ATAATTTGGTTTAG TGGACTGAA 2956:731 TCATAATTTGGTTT CACCAACAAGC 2956:261 TCATAATTTGGTTT CCTGATCTAGGGG 2986:359 TAGCATATACATAT CCTAATGAAAA 3007:312 ATGTTTGGCTAGG TAAAAGTGAA 2995:310 CATATAAAATGTAA AAGTGAAGTGAGT 2996:826 AAAATGGAAT CAGTGTCTGTTA 3012:2342 TGGCTAGGCACCC TTACATGTAGC 2996:984 TAAAATGGAAT CAATCATTTCA 3022:521 CCCGGGCTCATAC AAATAGATTTA 2995:871 CATATAAAATGGAA GAAA...ATGG AACAACCTATAAA 3010:843 GTTTGGCTAGGCAC ATTAGATACTTC 3014:164 CTAGGCACCCGG TACAGCAATAA 3020:499 CCCGGGCTCAT TAT AGGTATGAGAG GTCAATGATT telomere:263 GGGTTAGGTTA GCTGATTTCCT telomere:674 GGGTTAGGGTTAG AGATAACCCT t120 HCT116 LIG3-/- 48:2423 TATTGACATGG ATC AATTAACTTA 58:3006 GGTTGTTTTAAA GACAT CTAGCCAAACA 53:2874 TGACATGGTTGTT AGATGAGGGAAA 77:2792 TTATTTTCCAAT TAGACCACA 89:1872 TAATGTCATATTT GGGCAGAAAGCT 107:2694 CAACTTCCAGTAG CTGATCCTGATCA 90:2745 ATGTCATATTTC GATGTTATTCCTAC 124:2960 AAAGTACAACT CTAAACCAAAT 153:2992 ATTCCCCAACATTT TATATGTATATGCT 180:3006 TGCAGTGTCCC TAGCCA 180:3010 TTTGCAGTGTCCC TAGCCAAACATT 176:2636 AGATTTCAGTG GGATCCTTGCACA 185:3001 TGTCCCACAAA CATTCCATTTT 285:2167 TGATAATTCA ACACT 285:2927 CATGATAATTCA ATACCAATAG 290:2351 TAATTCAGTCCA TTCAT 290:2536 TGATAATTCAGTCCA TTACCAG 275:2720 TTGTTCCTACA AAGGCCTTGGGA 285:2989 TACATGATAA TTCA TATGTATATGCT 321:2443 TTACCTCATACCACA TAAAAAATTA 290:2978 ATTCAGTCCA TTC ATGCTAGGAAT 337:1781 AGTATAATGT TCTCCATTAGGT 327:245 TGCCACACTTGGG TTATGTGGTA 462:959 CAGTTGTTTTCAG ACACATTACTTTG 572:492 CTGCACCATTTCAT TCCTAGCTAT 744:2917 TGTATTAAAG CAATATTATACTT 833:2911 CACTGATGAAGTAT TGCTATTGGT 900:1771 AGTTAATGGGTGGTCT TCCTAGCACCATCACG 913:824 TCTCTGCTGCAGTGTC TGTTATCCAG 1174:1193 GTTGGGAGAGCA ACACAGATGCTCT 1162:1147 GTAATAAAGGACAGG CAGATTGGTTA 1655:381 CTTGTAAGTTCCAGG AAAACAAAACT 336:1780 GAGTATAATG CTCTCC 2238:1525 TTTTCTTCATGTCTTCT TTAAGGAAATAT 2389:2995 TTTTTAGTTTCCA TTTTATATGTA 2422:2827 CATTGTAAGTAAAT GTAAA 2400:527 CCTGTTTTGGT GAAAATACAAA 317:2419 TACCTCATGC A TAATTTTTTA 2579:151 GTTTTCTGTCTG TTGGGGAATC 2696:314 GCTTTGTGTACTTTG AGGTAAAAGTGAAGT 2700:314 GTGTACTTTG AGGTAAAAGTGA 2796:3010 GGTCTACAAA CAAT GTGCCTAGCCAA 2828:152 TTACATTTACTG TTGGGGAATC 2911:145 CCTAAGTATAAT CTAA GGGAATCTAAGG 2929:763 TTGGTATTGTT CTAAAAATTGA 2982:488 CATTCCTAGCTATATAC CCCAAAGAC 2959:881 TTTGGTTTAGA AA ATTAGGTTCCAACA 2959:879 TTTGGTTTAGA AAAC TAGGTTCCAACAA 2965:540 TAGAGATTAT GATGTTTCCTCA 2995:2386 CATATAAAATGGAA ACTAAAAAC 42:2920-2986:2952 TTCCTGGCATGGTATTG TTTAAGG...TATACAAAT TATGAAAATTC 2977:655 CATTCCTAGCA CAAGAATAGTTAT 2987:112 CATATACATATA CTACTACTGGAA 2987:592 TATACATATA CA TATACACTGCT 2992:395 CATATACATATAAAATG TCTGGGGAACA 2990:3002:367 TACATATAAAA CATT CCAAACATTCCATTT G AAAAATGGATCCT 3023:244 CGGGCTCATACC C GGTTATGTGGTA telomere: 2851 TTAAGGTTAGGGTTA ATTACTTAAGGAA -/- t120 HCT116 LIG4 50:2939 ATTGACATGGTT CTACCTTAAACAA 81:2624 ATTTTCCAATAATG GAA TGCACAGGAATAA 110:2630 CTTCCAGTAGTAG ATCCTTGCACAG 163:2853 GTTATTGCTGTAC ATTACTTAAGGAA 166:2375 TTGCTGTACCAG TTATTGCTGTATTGTT TCTGGAAGAAA 171:2821 GTACCAGATTTG TGGATTTGTAGA 213:2992 ACTGAAAGCAT TTTATATGTATAT 245:2035 TACCACATAACCC ACTGAGGCTAA 263:3004 ATCAGGAAATCAGC CAAACATTCCA 271:3002 TCAGCATTGTTCC AAACATTCCAT 288:2713 TGATAATTCAGTC TTGGGACAGAAG 416:2945 TTTATGACCTTGA AAATTCTACCT 437:2621 AGAGCATACAGG AATAAACAGAAA 695:2957 TTAATTATGGTAA ACCAAATTATG 724:585 TTTCATGTGCTTGT GACAATATGAAA 866:1145 CATTTTATAGG CAGATTGGTTAC 1147:1162 ACCAATCTGCCTGT CCTTTATTACACAG 1402:150 GAACCCAATAAGGG TGTTGGGGAATC 1718:906 GAGATTGGCAGCAGA GACCACACATTAA 2110:1667 GTACCCAAGATTTGGTC CCTGGAACTTACAAG 2534:364 TTACTGGTACTGGA TCCTAATGA 2568:97 AGCACAGAGGTATTG CCTAGAAATATG 2624:81 TATTCCTGTGCA TTC CATTATTGGAAAAT 2750:127 ACATCGAATCCAC AAGTTGTACTT 2930:588 TTGGTATTGTTT GT CACTGCTTGTGAC 2994:518 ACATATAAAATAGA TTTACCAT 2965:282 GTTTAGAGATTAT CATGTAGGAACA 2977:3014 TTCATTCCTAGCA CCGGGTGCCTAGC 2995:725 TATAAAATGGAA CAAGCACAT 2996:2511 ATAAAATGGAAT CCTGTCCT 2997:81 ATATAAAATGGAATG ACATTATTGGAA 2965:553 TTAGAGATTAT GTGTAAAATAATT 3001:578 TAAAATGGAATGTTTG ACAAAGATGTGCAGA 2992:2907 ATATAAAATG TACTTAGGAAT 2992:2625 TACATATAAAATG TTTA TTGCACAGGAATAAA 523:2947-2993:2590 CTATTTGTAAC AATTT...TATA AAATGGGAGTCCAG 3023:199 CGGGCTACATACC GGAGTATTTTGTG telomere:523 TTAGGGTTAGG GTTACAAATAG t120 HCT116 LIG3-/-:NC3 40:2913 GGCATGGTAT ATACTTATATAT ATATATATACTTA 97:2568 TTCTAGGCAA TACCTCTATGC 204:1533 TCCGGTATGTTG TACCATTT 207:2938 GTATGTTGTACTGA TCTACCTTAAACAA 268:2815 AAATCAGCATTGT AT CTAATAC AC AATGTATCTTATTTG 435:2992 TTAAAGAGCATACA TTTTATATGTAT 558:3024 TTACACATAGG GAAT...TCC AT AGGTATGAGCCC 600:2747 TAAGAGTTCA CTTAAACTCTTAAACA TTCGA TGTTTAA GATTCGATGTTA 5:647 CCTCTGCAATGG ATTCTTGTGTGAG 724:585 TCATGTGCTTGT GACAATATGAAATGGTG 837:664 TGATGAAGTATCTAA ACTCACACAAGAAT 1174:1193 GGGTTGGGAGAGCA ACACA 1662:2401 GTTCCAGGGACCAAA ACAGGTGAAAAC 2960:222 TTGGTTTAGAG TGTCCAT 2989:156 TATACATATAAA TAACTGTTGG 2961:101 TTTGGTTTAGAGA AGTTGACTAGAAA 2995:374 CATATAAAATGGAA AACAAAACT 2996:556 ATAAAATGGAAT TATGTGTAAAAT 3012:618 TGGCTAGGCACCC AAAGATGTGCAGA 3007:2684-2667:530 TGTTTGGCTAGG ATCACATCCCAATT TCATGAAGTTACAA telomere:2819 GGGTTAGGGTTA AAAATGTATCTT telomere: 3024 GGGTTAGGGTTAG AGGTATGAGCCCGGG telomere : 2967 TTAGGGTTAG AATAATCTCTAAA telomere: 1698 TTAGGGTTAGGG GGAGAGATATGGA telomere:2983 AGGGTTAGGGT GTATATGCTAGG telomere : 3012 TTAGGGTTAGGGT GCCTAGCCAAACA telomere: 2989 ATACATATAAA CCTAACCCTAACCCT t120 HCT116 LIG3-/-:LIG4-/- 36:2880 TCTGCTTTCCTGGCATG ACTTAGATGGGGG 80:2920 TTATTTTCCAATAAT AGCAATATTATACTT 92:2971 TAATGTCATATTTCTA ACCCTAA...CCCTAAC CCTAGCATATACATATAA 121:2770 GTAGTAGTACAAAGTACA CAAAGCAATCCGTGG 127:692 CAAAGTACAACTTGT AATGATGTT 100:2485 TCATATTTCTAGGCAACTT GCTATCCACATAAAGG 119:2656 GTAGTAGTACAAAGTA ATAAAATTTGAAAT 103:2995 ATATTTCTAGGCAACTTCCA TTTTATATGTATATG 172:2913 TTGCTGTACCAGATTTGC TATTGGTATTGTTTAAGGTAG TTTATACTTACGTTG 197:2904 AATACTCCGGTAT TAGGAATAAATGGA 209:2945 CGGTATGTTGTACTGAAA ATTCTACCTTAAACA 234:3006 CTCTCCCAGGTAACTA GCCAAACATTCCATT 281:2621 GTTCCTACATGATAA CAGGAATAAACAGA 310:794 ACTCACTTCGCTTTTAC AACTCCAT 327:3012 TCATGCCACACCTGGG TGCCTAGCCAA 348:2738 AATGTGTTTTTTTTTT ATTCCTACAAAAATCC 407:2910 ACTTTCCTGCATATTTA TACTTAGGAATAAAT 412:2722 TGCATATTTATGACC CAAAGGCCTTGGGAC 610:2956 GTTCAAGTTTCTGCA AAACCAAATTATGAA 615:2959 CAAGTTTCTGCACATCTCT AAACCAAATTATGA 747:3006 GTGTGTATTAAAGTCT AGCCAAACATTCC 724:585 CATCTTTTCATGTGCTTGT GACAATATGAAATGGTG 735:1608 GTGCTTGTTGGTGAAGTGT CTGGTGGAGAGACAG 1600:827 TGCAGTATAGGAAGTGTCT AATGTGCAGACATTTT 1174:1193 GGACAGGGTTGGGAGAGCA ACACAGATGCTCTCC 1265:1087 CCATATACCTCACCTT CACAGA 1248:488 GGGAGGCCTGCATTCCT AGCTATATACCCCA 1250:1087 GGAGGCCTGCATTCCTTC ACAGAGACGGGACG 1248:1573 GGAGGCCTGCATTCCT AAAGAGCTAAATGCA 1263:1227 CCTTCCATATACCTCACC AGATATGATCCTCC 1623:1631 CTATACTGCAAAGTT GGAAAACTTTGCA 1971:12 CTTGTGTTTGTCA AAATCCATT 2166:103 TGTGTTGAGTGTTG GAAGTTG 2194:2589 TAAGACAGTTGTTTTTC AATTCAG TGTATCTTATTTGTGGATT 2400:3011 TTCACCTGTTTTGGT GCCTAGCCAAACA 2478:1544 TTATGTGGATAGCAA AATGGTAAA 2565:164 AGCATAGAGGTA CAGCAATAAC 2572:2990 AGAGGTATTATTTT TTATATGTATAT 2698:506 GATCAGCTTATC ATATGAGTCAA 2716:101 CTGTCCCAAGGCC AGAAT GAAGTTGCCTAG 2756:232 CGAATCCACGGATTG TTACCTGGGAGA 2754:3006 ATCGAATCCACGGAT GCAATGTGTTGG CTAGGCAAACATTCCA 2813:2599 ACAAATAAGATACA CTGAATTGAA 2813:2599-2589:2194 CAAATAAGATACA CTGAATT GAAAAACAACTGTCTTA 2828:259 TTTTTACATTTACTG GATCTAGGGGTTA 2903:487 CATTCCATTTATTCCTA GCTATATA 2955:208 TTTCATAATTTGGTT CAGTACAACATA 2555:2443 TTCATAATTTGGTT ACCTCATACCACA 2957:426 TTTCATAATTTGGTTTA AATGTGTCAAGGTC 2980:172 CATTCCTAGCATAT CTGTACC TA GCAAATCTGGTACAG 2964:840 TGGTTTAGAGATTA GATAC TTAGAT ACATTAGATACTT 2989:728 ATATACATATAAA CCAACAAGCACAT 2986:553 AGCATATACATAT GTGTAAAATAATTT 2970:2807 GAGATTATTCATT ATTTGTGGATTTG 3001:755 AAATGGAATGTTTG AAAAAAGACTT 2999:523 TATAAAATGGAATGTT ACAAATAGATTTACC 2978:2754 TCATTCCTAGCAT CCGTGGATTCGATG 3010:734 TGTTTGGCTAGGCAC TTCACCAACAA 3017:471 AGGCACCCGGGCT AT CCCAAAGACCTGA 3021:506 GGCACCCGGGCTCATA TGAGTCAATGAT t120 HCT116 LIG3-/-:TP53-/- 56:2990 TGACATGGTTGTTTTA TATGTATATGCT 66:3000 TTTTAAAAAGTAACT AAACATTCCATTTTA 50:3014 TATTGACATGGTT CCGGGTGCCTAGCC 79:3014 TTATTTTCCAATAA GTG CCGGGTGCCTAGCCA 63:3005 TTGTTTTAAAAAGTA GCCAAACATTCCA 66:3008 TTAAAAAGTAACT A GCCTAGCCAAACATTC 125:2576 GTACAAAGTACAACTT CTGGACTCC 103:2995 TTCTAGGCAACTTCCA TTTTATATGTA 103:3013 TTCTAGGCAACTTCCA ATGTTTGG CGGGTGCCTAGCC 102:3015 TTCTAGGCAACTTCC CGGGTGCCTAGC 127:3006 TACAAAGTACAACTTGT ACAA CTAGCCAAACA 150:3008 TAGATTCCCCAACA CCCGGGCTCA 150:2999 CTTAGATTCCCCAACA TTCCATTTTATA 139:3015 TGTTTCTTCCCTTAG CAC CCCGGGTGCCTAGCC 132:3022 GTACAACTTGTTTCTT AAGTTTAATTTCCTTAA GTATGAGCCCGGGTGC 193:2687 CACAAAATACTCCG ATCAGCTTATCC 176:3010 CCAGATTTGCAGTG CCTAGCCAAACA 192:3057 CCCACAAAATACTCC CTTCCTCAGCCTCTCAA 174:2992 TACCAGATTTGCAG CATTTTATATGTA 278:2999 ATTGTTCCTACATGA ACATTCCATTTTA 289:3007 CATGATAATTCAGTCC TAGCCAAACATTC 295:3001 ATAATTCAGTCCACAAAC ATTCCATTTTAT 311:3017 CTTCACTTTTACC ACAAACTCACTTC TGT AGCCCGGGTGCCTA 331:2827 GCCACACTTGGGAGTA AATGTAAAAATG 337:2957 TGGGAGTATAATGT A TAAACCAAATTATGA 366:2994 TTTTCATTAGGATCCAGTTT ATATGTATATGCTAGGA 366:3009 ATTAGGATCCAGTTT GCCTAGCCAAACA 422:339 ATGACCTTGACACATT ATACTCCCAAGTGTGG 436:3004 ATTTAAAGAGCATACAG CCAAACATTCCAT 472:148 TTCAGGTCTTTGGGG AATCTAA 545:2964 GAGGAAACATCAAATTA AATCTCTACACCAA 580:2996 CATTTCATATTGTCAC TTCCTATACTG GCAACT AG ATTCCATTTTA 602:3008 TTTAAGAGTTCAG CCTAGCCAAAC 613:2998 AGTTCAAGTTTCTGCACAT TCCATTTTATATG 665:2436 TCTTGTGTGAGTTAAGG ATACAATTTCATT 692:3003 GTTTTAATTATGG CCAAACATTCCATT 743:2744 GAAGTGTGTATTAAA TCCACGGATTGCTT 724:585 TTTTCATGTGCTTGT GACAATATGAAATG 754:530 AAAGTCTTTTTTCA TGAAGTTAC 740:2996 TGGTGAAGTGTGTATT CCATTTTATATGTA 743:2998 AGTGTGTATTAAA TCCACGGATTGCTTTGTGT ACATTCCATTTTAT 783:3011 TTGTCTTTGCTATGG TGCCTAGCCAAAC 897:2842 AAGAGTTAATGTGTGG AAATTAAACTTCCAG 1193:1174 CTGTGTTGCTCTCCCAAC CCTGTCCTTTATTACA 1781:1557-1619:2083 ATGCTCTGCTCTTT AGGAAT...ACACT TCCTATACTGCAATC 1714:2485 GGAGATTGGCAG G AGGCAAC CGG 1792:3018 AAAACATCAGAGC CCGGGTGCCTAG 2076:1758 CAATTGGATTGCAC CACCACGGGCAG 2069:1139 TACTGTCACAATTGG TTACGTCTGTGG 213:2997 GTACTGAAAGCAT TCCATTTTATAT 302:2707 ACAAACTCACTTC CCAAGGCCTTTG 2416:2999 TCGATCCATTGTAA TGT AACATTC 2400:3000 CCTGTTTTGGT G AAACATTCCATT 2822:214 TACATTTTTACAT GCTTTCAGTACAA 1730:2888 CAACAGTCAATT GTC TGAAACGACAA 2952:435 ATTTTCATAATTTG TATGCTCTTTAAA 2967:391 TTAGAGATTATTC TGGGGAACAGTTC 284:2999 TCCTACATGAACATTC CATTTTATATGTATA 2983:369 TCCTAGCATATACA AAACTGGATCCTA 2987:833 AGCATATACATATA CTTCATCAGTGTCT 2982:197 ATTCCTAGCATATAC CGGAGTATTTTG 2972:475 GAGATTATTCATTCC CCAAAGACCTGAAAA 2998:2766 AAATGGAATGT GTACACAAAGCA 3001:286 CTGAATTATCATG TCACG CTGAATTATCATG 211:3017 GTTGTACTGAAAGC CCGGGTGCCTA 3009:267 TGTTTGGCTAGGCA ATGCTGATTTCC 3005:487 GAATGTTTGGCTA T ATTCCTAGCTAT 2989:324 CATATACATATAAA GTGTGGCATGAGG 3002:165 ATGGAATGTTTGG TACAGCAATAACT 3016:2301 GGCACCCGGGC CACAGACCAC 2998:435 AAAATGGAATGT ATGCTCT...GGTC ATAA 3005:2539 GAATGTTTGGCTA TTA CAAAGTCCATTA 3017:706 AGGCACCCGGGCT TAACATCAT 3000:545 AAAATGGAATGTTT TAATTTGATGTTT 3005:339 GAATGTTTGGCTA ATGT ACACATTATACTC 3011:2733 TTTGGCTAGGCACC TACAAAAATCC 3009:3041 TGTTTGGCTAGGCA ACCTGCTTGGGTTA 3016:105 AGGCACCCGGGC CA ACTGGAAGTTGCC 2999:540 TAAAATGGAATGTT ATA TGATGTTTCCTC 2951:2938-3006:300 GAATTTTCATAATTT TCATAATT...TGGCTAG TGAGTTTGTGGA 3007:164 ATGTTTGGCTAGG TACAGCAATAACTG 3010:161 GTTTGGCTAGGCAC AGCAATAACTGTT 3003:1970 GGAATGTTTGGC CTA GACAAACACAAGG 3012:1901 GGCTAGGCACCC TTGCCTTAAGGGA 3019:325 GGCACCCGGGCTCA AGTGTGGCATGA 3009:150 GTTTGGCTAGGCA ACCTG...CGGG TGTTGGGGAAT telomere:294 GGTTAGGGTTAGGGTT TGTGGACTGAATTATC 732:telomere TGCTTGTTGGTGAAG AAACCTAACCTG telomere:434 GGTTAGGGTTAGGGT ATGCTCTTTAAATGTGT telomere: 123 GGGTTAGGGTAGG GTTGTACTTTGTACT